Dear Tiago,
I am trying to understand each value of NestedBlockState.level_entropy that
contributes to the entropy of a hierarchical SBM (hSBM). In particular, I am
focusing on the “prior for the edge counts“. In PRE 95, 012317 (2017), we
have used the multigraph entropy for _all levels_ of the hSBM because of the
dominance of parallel edges at higher levels. Nevertheless, for each level
(l=k; 1 <= k < L), we assume its connectivity pattern is generated by some
NDC-SBM at a higher level (l=k+1), rendering the sum of the entropy terms of
all levels smaller than that of a flat prior, avoiding under-fitting.
There will be 2 questions along with the thread.
To see what arguments BlockState.entropy() uses, I added,
print("eargs.###: {}".format(eargs.###))
before the return line of the entropy() function in the
graph_tool/inference/blockmodel.py file, where ### can be _degree_dl_,
_edges_dl_, and _multigraph_, etc.
Taking the “lesmis” dataset as an example, if we run
minimize_nested_blockmodel_dl() on it, we (may) obtain two levels, i.e.
[(77, 6), (6, 1)]. Now using this commands,
nested_state.level_entropy(0)
which outputs:
eargs.dense: False
eargs.edges_dl: False
eargs.multigraph: True
Out[•]: 630.133156768878
And, with this command,
nested_state.level_entropy(1)
It outputs:
eargs.dense: True
eargs.edges_dl: True
eargs.multigraph: True
Out[•]: 71.0082133080805
Which constitute the two levels of entropies that sum up to
nested_state.entropy().
Here is my 1st question:
Why is `edges_dl` excluded except the highest level?
I expected `edges_dl` at the lowest to be something nonzero, but _less_ than
`69.21645383885243`, i.e. the exact negative logarithm of Eq. (40) of the
PRE paper, using B=6 and e=254 (the number of edges of the “lesmis”
dataset). Am I thinking it the right way? <— this is the 2nd question.
Sincerely thanks,
Tzu-Chi