NSBM vs optimization of modularity

dawe · February 19, 2020, 6:28pm

I’m testing and implementing NABM for a specific research field (single cell biology) in which graph partitioning is the standard way to identify groups of cells, and this is done with Louvain (and more recently Leiden) approaches. We already have collected some evidences that NSBM works well and overcomes the issue of setting resolution (as a matter of fact one of the FAQ is “where should I set the threshold). Now, given a partition with those algorithms would it make sense to compare to NSBM using entropy? I could create a PropertMap from Leiden partitions and set it into a BlockState, would it work?

tiago · February 20, 2020, 10:30am

If you do this kind of comparison, the NSBM approach will virtually
always produce results with a smaller description length, since it is
designed to minimize this quantity.

It's true that the description length is a meaningful quantity on its
own, and it can be used to do model selection, i.e. two models can be
compared by how much they compress the data, and the one with the
shortest description length should be selected.

However, I'm afraid that results obtained from modularity maximization
cannot be properly regarded as an inference outcome, so they cannot
really be meaningfully compared to anything else, which itself is
another problem with the method.

If you want to determine the usefulness of modularity maximization in a
particular context, there is a simple test you can do: Just randomize
the data, keeping the same degree sequence, and run the algorithm again.
Almost always it will find many "communities" which are completely
meaningless as explanations of the data. This should be enough to
convince serious researchers that this approach is unreliable.

Best,
Tiago

dawe · February 20, 2020, 10:41am

If you do this kind of comparison, the NSBM approach will virtually
always produce results with a smaller description length, since it is
designed to minimize this quantity.

I've tested myself and this is the case (many orders of magnitudes).
Anecdotical fact on my data: the modularity found by Leiden and the one at a hierarchy level of NSBM with comparable number of communities are not that different.

If you want to determine the usefulness of modularity maximization in a
particular context, there is a simple test you can do: Just randomize
the data, keeping the same degree sequence, and run the algorithm again.
Almost always it will find many "communities" which are completely
meaningless as explanations of the data. This should be enough to
convince serious researchers that this approach is unreliable.

I need a DOI for this conversation as I would cite the last sentence whenever I'll write the paper
Thank you

d