Finding the ground state

Dear Tiago,

Thank you very much for creating and maintaining this awesome package! I
really enjoy using it.

I have a question regarding finding the NestedBlockState with minimum
description length.

I currently have a very large bipartite network with approximately 300k
edges and 20k nodes, say, g.

1. founding the initial state using an agglomerative heuristic.
state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)

2. equilibrate the Markov chain at beta = 1
bs = state.get_bs()
bs += [np.zeros(1)] * (nL - len(bs))
state = state.copy(bs=bs, sampling=True)
gt.mcmc_equilibrate(state, wait=1000, epsilon=1e-6,
mcmc_args=dict(niter=10))

3. equilibrate the Markov chain at beta = numpy.inf
gt.mcmc_equilibrate(state, wait=1000, epsilon=1e-6, mcmc_args=dict(beta =
numpy.inf, niter=10))

4. founding the state with minimum description length
gt.mcmc_equilibrate(state, wait=200, epsilon=1e-6, mcmc_args=dict(beta =
numpy.inf, niter=10), callback = finding_minState)
where finding_minState is a function that checks whether the state is the
minimum MDL found so far.

5. to save the state to pickle files, I use the following to construct a new
NestedBlockState
minState = gt.NestedBlockState(g, bs = state.get_bs(), state_args =
state.state_args)
where state_args is some partition prior imposed on the graph.

In the above, to speed up the inference, I chose epsilon to be 1e-6 or even
1e-2. I know I can also speed up the program by passing multiflip=True, but
since I ran the program on the school server. It currently only have 2.27
available.

I run the above code for multiple times to study the computational cost and
convergence behavior. For each run, I also calculate the average description
length over MCMC sweeps.

My question is:

Is the above approach correct to find the state with minimum description
length?

Looking forward to your reply.

Best,
Terry

It looks right, but I don't think in many cases going to beta=1 and then
cooling to beta=inf is going to improve much over the initial result
from minimize_nested_blockmodel_dl()... You should do some
experimentation first.

Dear Tiago,

Thanks for your reply very much.

Two follow-up questions would be:

if I have some partition prior, say, for example, bipartiteness, at this
step
state = state.copy(bs=bs, sampling=True)
do I need change this step to:
state_copy = state.copy(bs=bs, sampling=True, state_args = state.state_args)
to prevent that nodes will not be moved to the group that is not allowed to
be grouped before performing equilibration step?

From the documentation, my answer is no . But if I do not pass this

parameter, from my experiments, after the equilibration the results do not
make sense. Since at one side of the bipartite network it has much larger
groups even at the second highest possible level.

Additionally, can I call minimize_nested_blockmodel_dl() again and pass bs =
state_copy.get_bs() in the function to get the final state instead of
writing a callback as I mentioned in the previous email?

Best,
Terry