Dear Tiago,
Thank you very much for creating and maintaining this awesome package! I
really enjoy using it.
I have a question regarding finding the NestedBlockState with minimum
description length.
I currently have a very large bipartite network with approximately 300k
edges and 20k nodes, say, g.
1. founding the initial state using an agglomerative heuristic.
state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)
2. equilibrate the Markov chain at beta = 1
bs = state.get_bs()
bs += [np.zeros(1)] * (nL - len(bs))
state = state.copy(bs=bs, sampling=True)
gt.mcmc_equilibrate(state, wait=1000, epsilon=1e-6,
mcmc_args=dict(niter=10))
3. equilibrate the Markov chain at beta = numpy.inf
gt.mcmc_equilibrate(state, wait=1000, epsilon=1e-6, mcmc_args=dict(beta =
numpy.inf, niter=10))
4. founding the state with minimum description length
gt.mcmc_equilibrate(state, wait=200, epsilon=1e-6, mcmc_args=dict(beta =
numpy.inf, niter=10), callback = finding_minState)
where finding_minState is a function that checks whether the state is the
minimum MDL found so far.
5. to save the state to pickle files, I use the following to construct a new
NestedBlockState
minState = gt.NestedBlockState(g, bs = state.get_bs(), state_args =
state.state_args)
where state_args is some partition prior imposed on the graph.
In the above, to speed up the inference, I chose epsilon to be 1e-6 or even
1e-2. I know I can also speed up the program by passing multiflip=True, but
since I ran the program on the school server. It currently only have 2.27
available.
I run the above code for multiple times to study the computational cost and
convergence behavior. For each run, I also calculate the average description
length over MCMC sweeps.
My question is:
Is the above approach correct to find the state with minimum description
length?
Looking forward to your reply.
Best,
Terry