Community detection: how do 'ghost' edges appear?

Hi Tiago and everyone,

I'm trying the community detection algorithm following the example
from the docs.

Brifely, with a graph g, I'm running the following:

# Plot the full graph
graph_draw(g, size=(15,15), output=output_folder+"/full_graph.png")
# Detect communities
spins = community_structure(g, n_iter=1000, n_spins=N) ### note the N parameter
ng = condensation_graph(g, spins)
size = ng[0].new_vertex_property("double")
size.a = log(ng[1].a+1)
# Plot the community graph
graph_draw(ng[0], vsize=size, vcolor=size, splines=True,
eprops={"len":20, "penwidth":10}, vprops={"penwidth":10},
output=output_folder+"/community_graph.png", size=(10,10))

I notice that when n_spins is small (say, n_spins=6), the resulting
community graph shows links that do not appear on the full graph.
See, for example, this pair of plots:

http://www.ebi.ac.uk/~spivakov/testX/full_graph.png
http://www.ebi.ac.uk/~spivakov/testX/community_graph.png

I'm a little confused as to how this is possible - am I right that
with this algorithm communities are non-overlapping?

With larger n_spins (such as n_spins=50), the ghost link disappears,
but lots of very small communities appear that segment the data too
much:

http://www.ebi.ac.uk/~spivakov/testX/community_graph50.png

More generally, my problem is that I'd like to use this algorithm in a
tool that would need to have a good enough guess of n_spins without
using a trial-and-error approach. It doesn't need to be the best
possible community detection ever, though. So I would also very much
appreciate your thoughts on estimating the most optimal n_spins from
the data.

Thanks a lot!

Best wishes,
Mikhail

Hi,

I notice that when n_spins is small (say, n_spins=6), the resulting
community graph shows links that do not appear on the full graph.
See, for example, this pair of plots:

http://www.ebi.ac.uk/~spivakov/testX/full_graph.png
http://www.ebi.ac.uk/~spivakov/testX/community_graph.png

I'm a little confused as to how this is possible - am I right that
with this algorithm communities are non-overlapping?

It seems to me the community algorithm simply did not find the best
partition. This can happen, since the simulated annealing may converge
to a non-optimal solution. You have to make the cooling rate slower or
start from different initial conditions.

From the image you sent, it is not possible to assess visually the

quality of the result. You should plot the first graph, and make the
vertex colors correspond to the community values.

More generally, my problem is that I'd like to use this algorithm in a
tool that would need to have a good enough guess of n_spins without
using a trial-and-error approach. It doesn't need to be the best
possible community detection ever, though. So I would also very much
appreciate your thoughts on estimating the most optimal n_spins from
the data.

This is a hard problem to solve in general. It is usually the case that
you should have some insight on the total number of communities, and
AFAIK there is no known reliable method which will give you this
information easily. If your networks are not too large, you can try
increasing n_spins values, and stop if the modularity no longer changes
appreciably.

Cheers,
Tiago