Problem with the partition into communities

Dear list,
I am new with graph-tools, which looks pretty nice.
However, I have the following problem. The result of the partition into
communities gives one big community and two very small ones, where the
vertices are almost no connected between them. I wonder if I am doing
something wrong. My network is directed and unweighted. My algorithm is
just:
"
g2 = load_graph("test.graphml")
state = gt.minimize_blockmodel_dl(g2, deg_corr=False)
b = state.get_blocks()

for v in g2.vertices():
    print v,b[v]
"
My network has only 50 vertex, is that a limitation?

Thanks in advance,

Yérali.

I don't understand what seems to be the problem. What makes you think that
the partition the algorithm finds is wrong?

I would help if you were more specific about your data, and what you were
expecting.

Best,
Tiago

Thanks for your reply.

The input is test2.graphml: test2.graphml
<http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027136/test2.graphml&gt;

and the final partition (using the commands explained before) can be seen
in:
<http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027136/dib.jpg&gt;

I was expecting each community to have connected nodes, but that is not the
case in the blue one, for example.

Thanks a lot,

Yérali.

I don't see anything wrong.

The approach is based on fitting the stochastic block model, which accepts
any type of mixing pattern, not only the assortative one you are expecting.

If you use the non-degree-corrected model (deg_corr=False), the model
expects that nodes inside the same group will receive typically the same
number of edges, and thus the inference procedure will often just split the
nodes according to degree, if there is no structures that are more salient.
Indeed, this is why the degree-corrected model (deg_corr=True) is often
preferred.

Best,
Tiago

Ok, Thanks for your answer.

In that case not communities are detected (all the system is in the same
community). Maybe the system is to small for SBM, while partitions in the
same network can be done using other methods.

:frowning:

Thanks again,

Yérali.

No, I don't think this is the correct conclusion. If the method finds only
one community, it means that there is no statistically significant community
structure in the network!

Indeed, a very powerful feature of this approach is that it can separate
structure from noise, and it will only yield partitions that are
substantiated by evidence.

Certainly you will be able to find some algorithm that splits your network,
but I would advise against using such partition, since it is probably
meaningless.

Best,
Tiago

Dear Tiago,

Good point to take into account. Thank you very much.

All the best,

Yérali.

Hi Yerali,
for assortative communities you would maybe want to find a partition with
maximum modularity, which is a bit different to minimizing blockmodel
description length.
Peter

attachment.html (1.75 KB)

I don't think this is a good idea for the reason that I have mentioned:
Maximizing modularity ignores the statistical significance of the partition.

Because of this, by maximizing modularity one can find partitions of fully
random graphs with very high modularity, but which are completely meaningless:

https://journals.aps.org/pre/abstract/10.1103/PhysRevE.70.025101

This is not a problem with the SBM inference approach implemented in
graph-tool, which is meant to solve this problem in a principled manner.

So when the algorithm finds only one community in your data, it is telling
you something important: You graph cannot be distinguished from a fully
random one with the same size and density (or degree sequence). It is a bad
idea (and bad practice) to ignore this, and use some heuristic just because
it finds some partition. This is a recipe for overfitting.

Best,
Tiago

Hi, thank you very much for both answers. This is how science works.

My problem with the SBM is that the maximization of internal density is not
a desired condition for the separation into blocks. The pattern is very
clear in the partition of my figure (without degree correlation), where the
nodes of the same colors are connected with the other colors in an
equivalent way, but far from being a community. For example, in
causality-based networks to know the set of vertex where some flow can be
trapped makes sense. The same applies in the case of information flow. I
think the network in my figure can be suitably partitioned into communities
without being simply noise.

Maybe this is a good example for the affirmation of that there is not (yet)
a perfect method to detect communities in all kinds of networks. In this
sense, I would rather think in the direction of that a partitioning has not
a unique and “right” solution.

Best,

Yérali.

My problem with the SBM is that the maximization of internal density is not
a desired condition for the separation into blocks. The pattern is very
clear in the partition of my figure (without degree correlation), where the
nodes of the same colors are connected with the other colors in an
equivalent way, but far from being a community.

That is because this is the most significant mixing pattern present in the data.

One of the most central features of the SBM is that it admits _any_ mixing
pattern, not only assortative ones. For example, it works beautifully well
on networks with bipartite or multipartite structures.

However, if the network does possess assortative structures, the method
_will_ find them.

I'd recommend taking a closer look into the SBM literature.

For example, in
causality-based networks to know the set of vertex where some flow can be
trapped makes sense. The same applies in the case of information flow.

And if these assortative modules exist, they will be picked up by the SBM as
well. But see my point below.

I think the network in my figure can be suitably partitioned into communities
without being simply noise.

And how do you make this assessment?

If you randomize your network, while keeping the degree sequence intact, you
will still be able to partition it into communities (that also trap random
walks, has high modularity, etc). The point is that this structure is the
result of mere statistical fluctuations, not meaningful properties of the
generative mechanism behind the network.

It is _very_ easy to find structure in random networks. You have to be
careful. I recommend taking the SBM result seriously.

Maybe this is a good example for the affirmation of that there is not (yet)
a perfect method to detect communities in all kinds of networks. In this
sense, I would rather think in the direction of that a partitioning has not
a unique and “right” solution.

This is besides my point. Although I agree that there is more than one
unique way to represent a network (with the SBM being only one of them),
methods that do not distinguish structure from noise result in spurious
results, and should be avoided.

Best,
Tiago

Dear Tiago,
Thanks for your answer.
Sorry for the delay of my reply but I was learning how to add a property
from external data.

I just wanted to answer:

And how do you make this assessment?

So, in this picture the separation into communities using partition
stability, as an example:
<http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027150/dib_ps.jpg&gt;
I think colors well represent set of nodes densely connected, which is
important at least in my current case of study, where links represent
causality.

Best,

Yérali.

Dear Yerali,

I think you are not really interested in the "communities" but just the
dense subgraphs. As Tiago already very nicely explained, these two are
quite different things. The existence of dense subgraphs might just be a
result of statistical fluctuations in the underlying model or they may
simply arise because of the particular form of a degree sequence. As he
also warned, if you make a mistake of treating these as "real" communities
in the underlying generative model, you may infer very wrong conclusions
about the model that may not be valid about a different realization of the
same model. On the other hand if you really care _only_ about a given
realization, it is perhaps fine to ignore the underlying generative model.
You might like to have a look at this which explains these things:

http://link.springer.com/article/10.1007/s41109-017-0023-6

Regards
Snehal Shekatkar

attachment.html (3.19 KB)

This is a perfect example of what I am trying to convey to you.

What you are seeing are basically three groups of nodes with low degree
around nodes with high in-degree, in addition to some smaller groups. Any
random graph with the same degree sequence will admit similar partitions,
although they carry no meaning from a generative point of view. You can
check this by randomizing your graph with random_rewire() and then obtaining
the partition again with the same method. You will probably see a similar
division. In other words, these are not statistically significant
communities; they arise out of random fluctuations.

Best,
Tiago

Hello again,

Thanks for your important answer. This is indeed a very good example. Maybe
I should clarify from the beginning that I am interested, in my current
study, in single realizations of the network. Being the links of my system
based on causalities, the importance here is in the flow across the links.
And, as in the case of networks for information flows, the time-scale
separation works perfect as separator in communities. So, *I will be
interested in the same partition happening in any random graph with the same
degree sequence*. In this sense, colors well represent the set of nodes
densely connected in which I am interested in. Sorry for all this thread,
maybe for you everything was trivial from the beginning but for me it has
been a very didactic and now intuitive point regarding how community
detection depends on the particularities of the problem of interest.

Thanks and Best Regards,

Yérali.