Understanding the default parameters used by minimize_blockmodel_dl()

Hi Tiago,

If I understood minimize_blockmodel_dl() correctly, the success of its
inference comes from the following KEYs:
KEY-1. Agglomerative membership initialization
KEY-2. Smarter MCMC proposal moves + successful MCMC
KEY-3. Correct expression of the SBM entropy

For (1), the tunable parameters are:
a) shrink_args[“niter”]: attempted block merges for each block node
b) mcmc_multilevel_args[“r”]: the greediness of the agglomeration.

For (2), the tunable parameters are:
c) mcmc_args[“c”]: controls how the proposed move respects the current block
d) mcmc_args[“d”]: the probability to move to a new and vacant group
e) mcmc_args[“beta”]: the inverse temperature
f) mcmc_args[“niter”]: the number of sweeps to perform. During each sweep, a
move attempt is made for each node
g) mcmc_equilibrate_args[“wait”]: the number of iterations to wait for a
record-breaking event
h) mcmc_equilibrate_args[“nbreaks”]: the number of iteration intervals (of
size `wait`) without record-breaking events necessary to stop the algorithm.
i) anneal_args[“niter”]: the number of steps (in logspace) from the starting
temperature to the final one.

And for (3), the tunable parameters are hidden in the
mcmc_args.entropy_args, whose default settings are clear to me. They
implement the flat prior as detailed in Phys. Rev. E 95, 012317.

I have the following 4 questions.
FIRST - If I just run minimize_blockmodel_dl(g), where g is the graph, what
are the default numbers used by (a)-(i)? I found that many parameters do not
match the default numbers as stated in the documentation. For example, b) is
set to 1.3, where the documentation says 2. And g) is set to 1, where the
documentation says 1000. With `verbose` turned on, reverse engineering
helps. But I would like to hear your response.

SECOND - To prevent bad merges in (KEY-1.), we also allow individual node
moves between each sweep. This means (KEY-1) + (KEY-2) is applied
alternatively. Between each merge (from B’ to B), what is the annealing
scheme used? Is it abrupt cooling all the way through? Or does it let the
MCMC equilibrates first, followed by the abrupt cooling?

THIRD - Followed from the SECOND, It seems that the latter implementation is
best but computationally expensive. However, for the last stage of the merge
algorithm (when we reached the desired B), the latter [equilibration +
cooling] algorithm must be applied. How can we customize the arguments in
minimize_blockmodel_dl() such that we only do abrupt merges between stages
except for the last?

FOURTH - Bisection is an important ingredient to the efficiency of
successful optimization. What is the default formula to determine the
desired block number B to be merged from B0 = N (number of nodes)? In other
words, how do we determine the first rightmost state to bracket the search?
It seems that either N^0.5 or E^0.5 (the resolution limit via the flat prior
for edge count) does not match what I observed.

I hope my questions were not annoying. Thank you again for designing the
many detailed auxiliary functions. They helped me a lot to catch bugs by
these “reverse engineerings”. You are so sweet.