Please, advise on interpreting SBM results

Hi everybody,

I'm not an expert on graph theory, so forgive me if I’m misunderstanding
something. I have a dataset (V=2.5k; E=55k) representing biological entities
and edges linking them based on a similarity measure. This dataset is very
heterogenous with a giant component just shy of 2k nodes while, at the same
time, about 200 singletons. To easy the process I’ve filtered the connected
components with less than 4 nodes, leaving only 2.2k nodes. Upon inspection
the graph seems to reveal many quasi-cliques even in the giant component.
Some of these “putative clusters” are mostly isolated while others have a
lot of links outward, but usually each one have some unique biological
properties.

My goal is to apply a more disciplined approach and, ideally, get to define
the different communities found. The big communities can be found easily
with any algorithm but graph-tool has prove really useful as it has also
detected a community of hub nodes that are instances wrongly entered to the
dataset. However, I get some blocks with mixed results. In fact they are
formed by mostly unconnected “sub-communities”, some of then coming even
from different components of the original graph, with nothing in common
except for their connectivity pattern. As these sub-communities have very
few members (around a dozen of nodes at most) I’m assuming that I’m hitting
the resolution threshold even for nSBM. Is that correct? If it is the case,
there is some way that could help to improve the analysis?

Best,

It's wrong to think that different components should always belong to
different groups.

Think of completely random Erdős–Rényi graph with an average degree
close to one, such that the network is formed by many components. The
correct SBM inference in this case is of model with a single group,
despite the many components. The reason for this is that this division
into components happens by chance, and the nodes that end up together
have no special affinity. If the generative process is run again, the
same nodes will not necessarily belong to the same component.

You should view your results in the same way: nodes end up being grouped
together unless there is clear evidence pointing to the contrary.

Best,
Tiago