Dear Tiago,

Thank you very much for creating and maintaining this awesome package! I

really enjoy using it.

I have a question regarding finding the NestedBlockState with minimum

description length.

I currently have a very large bipartite network with approximately 300k

edges and 20k nodes, say, g.

1. founding the initial state using an agglomerative heuristic.

state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)

2. equilibrate the Markov chain at beta = 1

bs = state.get_bs()

bs += [np.zeros(1)] * (nL - len(bs))

state = state.copy(bs=bs, sampling=True)

gt.mcmc_equilibrate(state, wait=1000, epsilon=1e-6,

mcmc_args=dict(niter=10))

3. equilibrate the Markov chain at beta = numpy.inf

gt.mcmc_equilibrate(state, wait=1000, epsilon=1e-6, mcmc_args=dict(beta =

numpy.inf, niter=10))

4. founding the state with minimum description length

gt.mcmc_equilibrate(state, wait=200, epsilon=1e-6, mcmc_args=dict(beta =

numpy.inf, niter=10), callback = finding_minState)

where finding_minState is a function that checks whether the state is the

minimum MDL found so far.

5. to save the state to pickle files, I use the following to construct a new

NestedBlockState

minState = gt.NestedBlockState(g, bs = state.get_bs(), state_args =

state.state_args)

where state_args is some partition prior imposed on the graph.

In the above, to speed up the inference, I chose epsilon to be 1e-6 or even

1e-2. I know I can also speed up the program by passing multiflip=True, but

since I ran the program on the school server. It currently only have 2.27

available.

I run the above code for multiple times to study the computational cost and

convergence behavior. For each run, I also calculate the average description

length over MCMC sweeps.

My question is:

Is the above approach correct to find the state with minimum description

length?

Looking forward to your reply.

Best,

Terry