Hi everybody,
I'm not an expert on graph theory, so forgive me if I’m misunderstanding
something. I have a dataset (V=2.5k; E=55k) representing biological entities
and edges linking them based on a similarity measure. This dataset is very
heterogenous with a giant component just shy of 2k nodes while, at the same
time, about 200 singletons. To easy the process I’ve filtered the connected
components with less than 4 nodes, leaving only 2.2k nodes. Upon inspection
the graph seems to reveal many quasi-cliques even in the giant component.
Some of these “putative clusters” are mostly isolated while others have a
lot of links outward, but usually each one have some unique biological
properties.
My goal is to apply a more disciplined approach and, ideally, get to define
the different communities found. The big communities can be found easily
with any algorithm but graph-tool has prove really useful as it has also
detected a community of hub nodes that are instances wrongly entered to the
dataset. However, I get some blocks with mixed results. In fact they are
formed by mostly unconnected “sub-communities”, some of then coming even
from different components of the original graph, with nothing in common
except for their connectivity pattern. As these sub-communities have very
few members (around a dozen of nodes at most) I’m assuming that I’m hitting
the resolution threshold even for nSBM. Is that correct? If it is the case,
there is some way that could help to improve the analysis?
Best,