Speeding up minimize_nested_blockmodel_dl on large networks?

Hi, I am having some trouble reproducing the performance of
minimize_nested_blockmodel_dl on large networks as in the publication
"Hierarchical block structures and high-resolution model selection in large
networks"? Namely, for large graphs, even with verbose=True, no output is
generated, but the CPU usage stays at 100% for hours to days. My network
contains 1.5 million nodes and a sorted adjacency list per node such that I
can choose the top K edges per node as a parameter. I have tried sampling
down to 200,000 nodes and K=5, but minimize_nested_blockmodel_dl does not
seem to be proceeding with computation. CPU is being used at 100% while
running , and there is plenty of free memory.

Are there options that I can use in minimize_nested_blockmodel_dl to improve
performance? Other than sampling and limiting K, are there other strategies
that I can try? I compiled using the parallel computing option and am using
AWS EC2 instances. The largest network that I have been able to get a result
for within 24 compute hours on C3 sized EC2 instances has been 100,000
nodes, K=5. (undirected, average degree about 8)

There are several options in the algorithm which affect performance. The
most important ones are 'epsilon', 'nsweeps', 'nmerge_sweeps' and 'r'
(see the documentation for their meanings). The default values should be
OK for large networks, except for 'epsilon', which has a default value
of '0'. You should try a value of 1e-3, or even 1e-2, depending on the
size of your network.

I recommend you experiment with the algorithm with small networks first
(~10000 nodes, something you can easily try on a laptop) to get a
feeling on how the parameters affect the performance and quality of the
results.

I would also recommend trying minimize_blockmodel_dl(), since it is more
verbose than minimize_nested_blockmodel_dl(), and will give you a better
idea of how the algorithm is progressing. If you use 'verbose=True' you
should be able to see how fast each sweep of the network is performed.

I have no experience with AWS EC2, but I was able to obtain results for
very large networks with a regular dedicated computer cluster, as I
describe in the paper. However, for the networks with many millions of
edges, it does take a while, maybe even a day or two.

Best,
Tiago

This worked! Thanks.