Hi Jean,

I did a hierarchical SBM on a Twitter follow network (pretty sparse actually in comparison to other kinds of networks) with ca. 250k accounts. It took about "11 days for one run, with 60 vCPUs and a peak usage of ca. 400 GB RAM" (my PhD thesis, https://eprints.qut.edu.au/125543/, p. 251). How useful it is theoretically depends on your goals and discipline. In my case, feel free to read the referenced chapter and decide for yourself.

I also did runs on smaller networks with ca 100k -150k nodes. Running time was still about in the same ball park, or even longer … I guess because the network structure was more complex.

The amount of RAM is pretty mandatory. Swap memory won't help you much, as it slows the algorithm down to a degree that makes the running time infeasible. Number of CPUs could be lower I guess, because much of the algo seemed to run serially and those parts made up most of the calculation time.

I had a university/QRIScloud (https://www.qriscloud.org.au/) provided VM with 60 cores and 900 GB RAM. On a Google VM this would have been pretty costly. I adjusted epsilon for less accuracy and greater speed:

state = gt.inference.minimize_nested_blockmodel_dl(core, verbose=True, mcmc_equilibrate_args={'epsilon': 1e-2}, )

(https://github.com/FlxVctr/PhD-code/blob/master/1000%2B%20nested%20SBM.ipynb)

If I wouldn't have had the computing ressources for free I wouldn't have done it.

I'd recommend to test infomap if you're looking for a more efficient alternative (https://www.mapequation.org/index.html) that also works with entropy minimization (even though it's more flow oriented). Just did a 181k network community detection (non-hierarchical) in a matter of seconds on a last-year's Macbook Pro yesterday. I don't know how long it takes for a hierarchical structure, but it can do this, so it's worth a try.

Also efficient, but with all the drawbacks that Tiago Peixoto elaborates on in his papers (which I also refer to in the chapter linked above), and not hierarchical, is the parallelised modularity maximisation (Parallelised Louvain Method) PLM in NetworKit (https://networkit.github.io/). Despite it's theoretical and statistical drawbacks it delivers good heuristical evidence for communities in networks imho. But that depends a lot on what you want to do

Cheers,

Felix

Dr. Felix Victor Münch

Postdoc Researcher

Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI), Hamburg

https://leibniz-hbi.de/

https://felixvictor.net

On Friday, Jun 14, 2019 at 10:16 AM, Lietz Haiko <Haiko.Lietz@gesis.org> wrote:

Hi Jean,

the answer also depends how complicated the desired SBM is. A layered model takes longer than an unlayered one.

Modeling a graph with 100k nodes should take very long. But I'd also be interested in a more informed answer...

Haiko

Von: graph-tool [graph-tool-bounces@skewed.de]" im Auftrag von "Jean Christophe Cazes [cazes.jean.christophe@gmail.com]
Gesendet: Freitag, 14. Juni 2019 09:59
An: graph-tool@skewed.de
Betreff: [graph-tool] [SBM on Dense Graphs]

Hello, I intend to use graph_tool for a big network, +100k nodes and very dense.

The dataset i'm working with at the moment is ~ 40/50 GB csv containing vertices and edges as transactions.

Is it realistic to try SBM on such graph both computationnally and would this be theoretically useful?

If it isnt computationnally, how big can my subgraph be in order to be feasible?

Note: I will rent a Google Cloud Platform VM to do so.

Thank you

_______________________________________________
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool