Questions about inference algorithms

Andrea_Briega · May 9, 2016, 2:33pm

Dear Dr Peixoto,

I would like to solve some questions I have about inference algorithms for
the identification of large-scale network structure via the statistical
inference of generative models.

Minimize_blockmodel algorithm takes an hour to finish using my network
with 21000 nodes (like the hierarchical version), and it spends two days
and a half with overlap. However, I have run an hierarchical analysis with
overlap, and it is still running since 14 days ago. So my first question
is: is this time normal, or maybe there is any problem? Do you know how
long could it ussually takes?

Secondly, I have repeated some of these analysis with exactly same options
but I get different solutions (similar but different), so I wonder if the
algorithm is heuristic (I thought it was exact).

My last question question regards bipartite analysis. I have two types of
nodes in my network and I wonder if there are any analytical difference
when running these algorithms with the bipartite option (clabel=True, and
different labels in each group of nodes) or not, because it seems that the
program “knows” my network is bipartite in any case. If there are
differences between bipartite and “unipartite” analysis (clabel=False), is
it possible to compare description length between them to model selection?

Thank you very much for your help!

Best regards,

Andrea

attachment.html (1.52 KB)

tiago · May 9, 2016, 3:15pm

Minimize_blockmodel algorithm takes an hour to finish using my network
with 21000 nodes (like the hierarchical version), and it spends two
days and a half with overlap. However, I have run an hierarchical
analysis with overlap, and it is still running since 14 days ago. So
my first question is: is this time normal, or maybe there is any
problem? Do you know how long could it ussually takes?

The overlapping model is indeed much more time consuming than the
non-overlapping one. This is because in the overlapping case we need to
partition the _half-edges_ instead of the nodes, as we do in the
non-overlapping case. The number of half-edges is twice the number of
edges, which is typically one or two orders of magnitude larger than the
number of nodes. Because of this, and the fact that group mixtures are
more expensive to represent algorithmically, the overlapping model tends
to be slower.

So, the time difference you are observing is expected. Although, it is
difficult to say how long it will in fact take, since it depends on the
number of edges, not nodes.

In my experience, I find that there are comparatively few networks that
are better modelled by the overlapping model (see
Phys. Rev. X 5, 011033 (2015) - Model Selection and Hypothesis Testing for Large-Scale Network Models with Overlapping Groups). Usually, the
non-overlapping model provides a shorter description length, indicating
that it has a better explanatory power. Because of this, I generally
tend to work with the non-overlapping model, unless there is a very
specific reason not to.

Secondly, I have repeated some of these analysis with exactly same
options but I get different solutions (similar but different), so I
wonder if the algorithm is heuristic (I thought it was exact).

No, it is not exact. This type of inference problem is NP-Hard, hence
exact algorithms are not feasible. In the default parametrization, the
algorithm implemented is an agglomerative heuristic. The algorithm is
probabilistic, so it will indeed return a slightly different fit every
time. Thus the typical practice is to perform several fits and choose
the one with the smallest description length.

If more precision is desired, the algorithm can also be configured via
the appropriate parameters to either run a unity-temperature MCMC, Gibbs
sampling or simulated annealing. These, of corse, are more expensive
numerically.

My last question question regards bipartite analysis. I have two types
of nodes in my network and I wonder if there are any analytical
difference when running these algorithms with the bipartite option
(clabel=True, and different labels in each group of nodes) or not,
because it seems that the program “knows” my network is bipartite in
any case. If there are differences between bipartite and “unipartite”
analysis (clabel=False), is it possible to compare description length
between them to model selection?

The clabel parameter should not be Boolean; it needs to be a property
map that marks the partition constraints for each node. Namely, vertices
with different label values will not be clustered in the same
group. This does not affect the computation of the description length;
it simply guides the algorithm so that violating moves are not
attempted.

Indeed, as you noticed, the algorithm will always learn that you have a
bipartite network if that is the case, hence the 'clabel' parameter does
not need to be used for that purpose. The clabel parameter is useful
only in situations where there are other types of forbidden partitions
that the algorithm would run into otherwise.

Note that if the agglomerative heuristic is turned off, and MCMC is
used instead, the algorithm may in fact put nodes belonging to different
partitions of a bipartite network in the same group. The probability of
this happening will depend on how "strongly" bipartite the network
is. In this case, the clabel parameter is useful to forbid those
non-bipartite divisions, if they are undesired.

Note that there is also a 'pclabel' parameter. This is almost identical
to 'clabel', with the only difference that it _is_ in fact used to
compute the description length. Namely, it assumes that this division is
known a priori, and hence the description length for the partition will
include only subdivisions thereof. This is indeed useful, for example,
when you know from context that your network is bipartite, and this is
not a fact that you wish to learn. This generalizes also to other types
of known divisions. I recommend using this, if this is the case.

Best,
Tiago

Andrea_Briega · May 10, 2016, 9:41am

Thank you very much! your answer has been really helpful, now I understand
this much better. I'll think about the options you said.

Thanks again,

Andrea

2016-05-09 16:33 GMT+02:00 Andrea Briega <annbrial(a)gmail.com>:

attachment.html (2.07 KB)

Andrea_Briega · May 12, 2016, 8:56am

Hi again,

I have recently noticed the actualization of graphtool and now I am a
little bit confused about some changes. Sorry, I know my questions are very
basic. I am not familiar with these language and I have some dificulties to
get results.

I am running inference algorithms to get the best model using different
options of model selection. I want to set pclabel in the inference
algorithms because I know a priori my network is bipartite, and next I want
to get the description length. Before actualization I did this by this way:

vprop_double = g.new_vertex_property("int") # g is my network
for i in range(0, 11772):
vprop_double[g.vertex(i)] = 1
for i in range(11773, 214221):
vprop_double[g.vertex(i)] = 2

state = gt.minimize_blockmodel_dl(g, pclabel=True)

state.entropy(dl=True) # I am not sure this is the right way to get the
description length.

But now I have some problems. First of all, minimize_blockmodel_dl doesn't
have a pclabel argument so I don't know how indicate it in the inference
algorithm. I have tried this:

state.pclabel = vprop_double

But I get the same result when I do "state.entropy(dl=True)" as before.
Also, I get the same result doing "state.entropy(dl=True)" or
"state.entropy()", and I don't understand why neither.

And finally, in NestedBlockState objects I don't know to get description
length because entropy hasn't a "dl" argument. In these objects entropy and
dl are the same?

In conclusion, I don't know how to set pclabel and to get the description
length in hierarchical models, and I am not sure if I am getting it
correctly in non-hierarchical ones.

Sorry again for my basic questions but I can't go on because of these
problems.

Thank you very much!

Best regards,

Andrea

2016-05-10 11:41 GMT+02:00 Andrea Briega <annbrial(a)gmail.com>:

attachment.html (4.73 KB)

tiago · May 12, 2016, 2:31pm

vprop_double = g.new_vertex_property("int") # g is my network
for i in range(0, 11772):
vprop_double[g.vertex(i)] = 1
for i in range(11773, 214221):
vprop_double[g.vertex(i)] = 2

state = gt.minimize_blockmodel_dl(g, pclabel=True)

You should not pass "pclabel=True" here... the pclabel parameter expects
the property map itself, not a Boolean value. Furthermore, you should
not pass it the minimize_blockmodel_dl() function directly. Instead, you
should do

state = gt.minimize_blockmodel_dl(g, state_args=dict(pclabel=vprop_double))

state.entropy(dl=True) # I am not sure this is the right way to get the description length.

It is, but if you look at the documentation, you will see that the
default is "dl = True", hence calling only state.entropy() is
sufficient.

But now I have some problems. First of all, minimize_blockmodel_dl
doesn't have a pclabel argument so I don't know how indicate it in the
inference algorithm. I have tried this:

state.pclabel = vprop_double

This is incorrect. Never try to modify the states directly like
this. You have to do it like I described above.

And finally, in NestedBlockState objects I don't know to get
description length because entropy hasn't a "dl" argument. In these
objects entropy and dl are the same?

Yes. Since the NestedBlockState corresponds to the regular BlockState
plus an hierarchical prior, its full entropy corresponds always to the
description length.

Best,
Tiago

Andrea_Briega · May 12, 2016, 4:43pm

Thank you very much! I was wrong, I meant "state =
gt.minimize_blockmodel_dl(g, pclabel=vprop_double)", it has been a mistake
while I was writing the mail. So the key was the use of "state_args", I
tried it but with a different notation and obviously it didn't work. Now I
can go on!

Thanks again,

Andrea

2016-05-12 10:56 GMT+02:00 Andrea Briega <annbrial(a)gmail.com>:

attachment.html (5.56 KB)

Andrea_Briega · May 19, 2016, 7:55am

Dear Mr Peixoto,

I have just run a few analysis of the new version of your package and my
results totally change between v2.13 and v2.16.

Nested_minimize_blockmodel is the one that make most relevant changes and
it is very difficult to get a biological explanation of the new results,
mainly at the superior hierarchical levels.
I would like to know the particular changes in these two analysis to better
understanding of my results. Is it possible to change any parameter to run
this function in a similar way to the v2.13? I used to run this function on
this way:

state = minimize_nested_blockmodel_dl(g, pclabel=vprop_double,
overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)

And I have run the new version of the function on this way:

state = minimize_nested_blockmodel_dl(g,
state_args=dict(pclabel=vprop_double), overlap=False,
nonoverlap_init=False, deg_corr=True, layers=False)

Thank you very much,

Andrea

2016-05-12 18:43 GMT+02:00 Andrea Briega <annbrial(a)gmail.com>:

attachment.html (8 KB)

tiago · May 19, 2016, 8:07am

I have just run a few analysis of the new version of your package and
my results totally change between v2.13 and v2.16.

Nested_minimize_blockmodel is the one that make most relevant changes
and it is very difficult to get a biological explanation of the new
results, mainly at the superior hierarchical levels.

There is nothing fundamental that changed between versions, only a
re-organization of the code.

I would like to know the particular changes in these two analysis to better understanding of my results. Is it possible to change any parameter to run this function in a similar way to the v2.13? I used to run this function on this way:

state = minimize_nested_blockmodel_dl(g, pclabel=vprop_double, overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)

In that version, the pclabel option did not exist. Note also that
passing all the other options are redundant, since you are using the
default values for them.

And I have run the new version of the function on this way:

state = minimize_nested_blockmodel_dl(g, state_args=dict(pclabel=vprop_double), overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)

The difference here is that pclabel is being used. In the previous
version it was being ignored.

Maybe you should try without the pclabel option, to see if you get
similar results to the ones you got previously. This may help you
understand the difference.

Best,
Tiago

Andrea_Briega · June 8, 2016, 8:14am

Thank you very much, your answers have been really helpful. I am now on
the last step, model selection, and I would like to be sure that I’m doing
it right. I get the posterior odds ratio to compare two partitions throught
this way: e^-(dl1-dl2), with dl1 and dl2 as higher and lower description
length respectively. I have obtained description length using
‘state.entropy()’ for nested models and ‘state.entropy(dl=True)’ for no
nested ones.
I have doubts about this because small differences in description length
cause much lower values than 0.01, so in most cases the evidence supporting
one of the models is decisive. I only get higher values than 0.01 if the
difference in description length is lower than 5 units. With my data
(24.000 nodes and 5.000.000 edges) I always obtain decisive supports,
either when I compare different models or when I compare different runs of
the same model. I wonder if this is rigth.

Thanks again,

Andrea

2016-05-19 9:55 GMT+02:00 Andrea Briega <annbrial(a)gmail.com>:

attachment.html (9.37 KB)

tiago · June 8, 2016, 8:29am

Thank you very much, your answers have been really helpful. I am now
on the last step, model selection, and I would like to be sure that
I’m doing it right. I get the posterior odds ratio to compare two
partitions throught this way: e^-(dl1-dl2), with dl1 and dl2 as higher
and lower description length respectively. I have obtained description
length using ‘state.entropy()’ for nested models and
‘state.entropy(dl=True)’ for no nested ones.

This is correct. Note that in current versions of graph-tool you can
also just call state.entropy() for non-nested models, since we have
dl=True per default.

I have doubts about this because small differences in description
length cause much lower values than 0.01, so in most cases the
evidence supporting one of the models is decisive. I only get higher
values than 0.01 if the difference in description length is lower than
5 units. With my data (24.000 nodes and 5.000.000 edges) I always
obtain decisive supports, either when I compare different models or
when I compare different runs of the same model. I wonder if this is
rigth.

This is indeed expected if you have lots of data (i.e. large
networks). For sufficient data, the evidence for the better model should
always become decisive, as long as the models being compared are indeed
distinguishable. 5 million edges is quite a bit, and indeed I would
expect the posterior odds to be quite small in this situation.

You just have to make sure that you found the best fit (i.e. smaller
description length) for each model you are comparing, by running the
algorithm as many times as you can.

Best,
Tiago

Andrea_Briega · June 8, 2016, 3:02pm

Ok, thanks! I'll run the algorithm more times then to try to find the best
fit.

Best regards,

Andrea

2016-06-08 10:14 GMT+02:00 Andrea Briega <annbrial(a)gmail.com>:

attachment.html (9.91 KB)