Tune the size of detected communities?

Peter_Straka · November 16, 2016, 1:13am

Hello,

I have a large bipartite network with ~10^4 vertices of type I and ~10^6
vertices of type II. I fitted a nested blockmodel, hoping to identify
communities of type I. Unfortunately, the detected communities (at the
lowest level in the hierarchy) have a median size that is about twice as
big as the empirical evidence suggests; (kind of reminding me of the
resolution limit problem).

Is there a way to tune the sizes of the communities at the lowest level?
I'm thinking

   - could forcing an extra hierarchy level help, or
   - adding in another (non-nested) simple block model at the lowest level?
   - Is it possible to reduce the penalty on the description length?

Any ideas would be greatly appreciated... many thanks in advance!

Peter

Dr Peter Straka
Research Fellow (DECRA)
School of Physical Engineering and Mathematical Sciences | UNSW Canberra
Google Scholar <https://scholar.google.com.au/citations?user=o80TaWgAAAAJ>
E: p.straka(a)unsw.edu.au
skype: straka.ps

attachment.html (1.53 KB)

tiago · November 17, 2016, 8:24am

Although it is possible to do what you want, I think this should be
discouraged. You say that the groups found are larger than what the
"empirical evidence suggests". However the inference approach implemented
attempts precisely to gauge the empirical evidence. It tries to avoid
overfitting the data, where random fluctuations are mistaken by structure
(like finding communities in completely random graphs). By forcing the
number of groups to a higher value you may be risking overfitting, and you
would be beating the purpose of the algorithm.

Note the algorithm implemented is stochastic in nature. Usually one needs to
run it several times, in particular if the modular structure is hard to
detect. It may be that the algorithm will find a partition that you judge
more reasonable (and has a lower description length) if you try many times.

If it doesn't, then the algorithm is telling you something about the
structure of the network. Maybe what you judge is more reasonable cannot be
found in the data with a good statistical significance.

However, if you absolutely insist on doing this (despite the consequences),
the minimum number of groups can be specified as:

state = minimize_nested_blockmodel_dl(g, B_min=B)

Best,
Tiago