minimize_nested_block_model_dl speed and memory tweaks

G'day!

I'm struggling with a network with millions of nodes and billions of edges
(follow network of Australian Twitter users), trying to apply
gt.inference.minimize_nested_block_model_dl().

I'm reducing the size by analysing k-cores and increasing k step-wise to
test the limits. With 19k nodes and 12m edges I'm already exceeding the 64GB
RAM my machine has at the moment, so given its O(n ln^2(n)) I don't expect
the whole network to fit, but I'd like to push the limit at least as far as
possible.

The only parameter set so far is `mcmc_equilibrate_args={'epsilon': 1e-3}`.

I have four questions:

    1. I've tried mcmc_args={'sequential': False, 'parallel': True}, but
this did not seem to have a great impact regarding speed on my smaller
k-cores, could it have on bigger ones?

    2. What does the warning here
(https://graph-tool.skewed.de/static/doc/inference.html#graph_tool.inference.BlockState.mcmc_sweep)
regarding parallel sampling mean in worst case and when is it likely that
the warning actually applies?

    3. I am considering either to try to fund a machine with more
(expensive) RAM or to just swap memory on freely available disk space. What
would you expact would be the impact on speed? Checking the behaviour of the
algorithm so far I don't see it being I/O bound but rather by available
memory and the sequential parts of the algo that are running on one core
only. Is swapping a considerable option or should I just straight away rent
a bigger machine?

   4. Is there any rule of thumb for epsilon? And what is its impact if it's
too large?

Thanks for any help (maybe also hints how to reduce the network size while
minimising the impact on the results of the nested SBM with something more
sophisticated than k-cores).

Cheers,

Felix

    1. I've tried mcmc_args={'sequential': False, 'parallel': True}, but
this did not seem to have a great impact regarding speed on my smaller
k-cores, could it have on bigger ones?

It should give you a 2-3x speedup, but not much more, as part of the
algorithm (the actual node updates) has to be serial.

    2. What does the warning here
(graph_tool.inference — graph-tool 2.58 documentation)
regarding parallel sampling mean in worst case and when is it likely that
the warning actually applies?

I don't know, this is why the warning is there. I generally advise against
using the parallel code, since it does not make things that much faster to
begin with. I'm even considering removing it in a future version.

    3. I am considering either to try to fund a machine with more
(expensive) RAM or to just swap memory on freely available disk space. What
would you expact would be the impact on speed? Checking the behaviour of the
algorithm so far I don't see it being I/O bound but rather by available
memory and the sequential parts of the algo that are running on one core
only. Is swapping a considerable option or should I just straight away rent
a bigger machine?

The memory access during the algorithm is random, so using swap will
probably be prohibitively slow.

   4. Is there any rule of thumb for epsilon? And what is its impact if it's
too large?

If it is too large, your estimate of the ground state (the best hierarchical
partition) will be worse. Simply, it gives you a quality/speed trade-off.

Best,
Tiago