Wow, i'll try and give my results and computing time if you want :) 
Thank you for providing answers and for graph_tool !

Le ven. 14 juin 2019 à 13:46, Tiago de Paula Peixoto <tiago@skewed.de> a écrit :
Hi,

The important quantity is the number of edges E, not the number of nodes.

The functions minimize_nested_blockmodel_dl() and minimize_blockmodel_dl()
have a complexity of O(E log² N), but the nested version takes longer.

As was already mentioned, the nested model has a large memory footprint,
since several state copies need to be kept.

There is an alternative approach that takes far less memory, which is to
instantiate a NestedBlockState object and just call multiflip_mcmc_sweep()
until equilibration. This is very competitive with the functions above, but
tend to finish much quicker and use less memory.

This is covered in the HOWTO:

    https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#id15

(Remember to replace mcmc_sweep() with multiflip_mcmc_sweep() in the code
above. I had forgotten to make this change the last release...)

Best,
Tiago


Am 14.06.19 um 12:06 schrieb Felix Victor Münch:
> No worries. I'd like to add, that, as Haiko suggests, the non-hierarchical
> SBM is much quicker and needs magnitudes less RAM. So it might also run on a
> recently bought laptop at very acceptable running times, even at your
> network scale. But not sure about how network density might affect this.
>
> Cheers,
>
> Felix
>
>
>
>     On Friday, Jun 14, 2019 at 11:59 AM, Jean Christophe Cazes
>     <cazes.jean.christophe@gmail.com
>     <mailto:cazes.jean.christophe@gmail.com>> wrote:
>     Thank you so much for those quick answers ! 
>
>     Le ven. 14 juin 2019 à 11:57, Felix Victor Münch
>     <felix.v.muench@gmail.com <mailto:felix.v.muench@gmail.com>> a écrit :
>
>         Hi Jean,
>
>
>         I did a hierarchical SBM on a Twitter follow network (pretty sparse
>         actually in comparison to other kinds of networks) with ca. 250k
>         accounts. It took about "11 days for one run, with 60 vCPUs and a
>         peak usage of ca. 400 GB RAM" (my PhD
>         thesis, https://eprints.qut.edu.au/125543/, p. 251). How useful it
>         is theoretically depends on your goals and discipline. In my case,
>         feel free to read the referenced chapter and decide for yourself.
>
>         I also did runs on smaller networks with ca 100k -150k nodes.
>         Running time was still about in the same ball park, or even longer …
>         I guess because the network structure was more complex.
>
>         The amount of RAM is pretty mandatory. Swap memory won't help you
>         much, as it slows the algorithm down to a degree that makes the
>         running time infeasible. Number of CPUs could be lower I guess,
>         because much of the algo seemed to run serially and those parts made
>         up most of the calculation time.
>
>         I had a university/QRIScloud (https://www.qriscloud.org.au/)
>         provided VM with 60 cores and 900 GB RAM. On a Google VM this would
>         have been pretty costly. I adjusted epsilon for less accuracy and
>         greater speed:
>
>             state = gt.inference.minimize_nested_blockmodel_dl(core, verbose=True, mcmc_equilibrate_args={'epsilon': 1e-2}, )
>
>         (https://github.com/FlxVctr/PhD-code/blob/master/1000%2B%20nested%20SBM.ipynb)
>
>         If I wouldn't have had the computing ressources for free I wouldn't
>         have done it.
>
>         I'd recommend to test infomap if you're looking for a more efficient
>         alternative (https://www.mapequation.org/index.html) that also works
>         with entropy minimization (even though it's more flow oriented).
>         Just did a 181k network community detection (non-hierarchical) in a
>         matter of seconds on a last-year's Macbook Pro yesterday. I don't
>         know how long it takes for a hierarchical structure, but it can do
>         this, so it's worth a try.
>
>         Also efficient, but with all the drawbacks that Tiago Peixoto
>         elaborates on in his papers (which I also refer to in the chapter
>         linked above), and not hierarchical, is the parallelised modularity
>         maximisation (Parallelised Louvain Method) PLM in NetworKit
>         (https://networkit.github.io/). Despite it's theoretical and
>         statistical drawbacks it delivers good heuristical evidence for
>         communities in networks imho. But that depends a lot on what you
>         want to do 
>
>
>         Cheers,
>
>
>         Felix
>
>
>
>         *Dr. Felix Victor Münch*
>         Postdoc Researcher
>         Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI),
>         Hamburg
>         https://leibniz-hbi.de/
>         https://felixvictor.net <https://felixvictor.net/
>
>
>             On Friday, Jun 14, 2019 at 10:16 AM, Lietz Haiko
>             <Haiko.Lietz@gesis.org <mailto:Haiko.Lietz@gesis.org>> wrote:
>             Hi Jean,
>
>             the answer also depends how complicated the desired SBM is. A
>             layered model takes longer than an unlayered one.
>
>             Modeling a graph with 100k nodes should take very long. But I'd
>             also be interested in a more informed answer...
>
>             Haiko
>
>
>
>             ----------------------------------------------------------------------------
>             *Von:* graph-tool [graph-tool-bounces@skewed.de
>             <mailto:graph-tool-bounces@skewed.de>]" im Auftrag von "Jean
>             Christophe Cazes [cazes.jean.christophe@gmail.com
>             <mailto:cazes.jean.christophe@gmail.com>]
>             *Gesendet:* Freitag, 14. Juni 2019 09:59
>             *An:* graph-tool@skewed.de <mailto:graph-tool@skewed.de>
>             *Betreff:* [graph-tool] [SBM on Dense Graphs]
>
>             Hello, I intend to use graph_tool for a big network, +100k nodes
>             and very dense.
>
>             The dataset i'm working with at the moment is ~ 40/50 GB csv
>             containing vertices and edges as transactions.
>
>             Is it realistic to try SBM on such graph both computationnally
>             and would this be theoretically useful?
>
>             If it isnt computationnally, how big can my subgraph be in order
>             to be feasible?
>
>             Note: I will rent a Google Cloud Platform VM to do so.
>
>             Thank you
>
>             _______________________________________________
>             graph-tool mailing list
>             graph-tool@skewed.de <mailto:graph-tool@skewed.de>
>             https://lists.skewed.de/mailman/listinfo/graph-tool
>
>         _______________________________________________
>         graph-tool mailing list
>         graph-tool@skewed.de <mailto:graph-tool@skewed.de>
>         https://lists.skewed.de/mailman/listinfo/graph-tool
>
>     _______________________________________________
>     graph-tool mailing list
>     graph-tool@skewed.de
>     https://lists.skewed.de/mailman/listinfo/graph-tool
>
>
> _______________________________________________
> graph-tool mailing list
> graph-tool@skewed.de
> https://lists.skewed.de/mailman/listinfo/graph-tool
>


--
Tiago de Paula Peixoto <tiago@skewed.de>

_______________________________________________
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool