Memory usage

Deklan_Webster · February 10, 2020, 4:30pm

I was trying to do a nested degree-corrected SBM on my laptop, but I was
running out of RAM, 14-15GB before stopping with "Killed". The graph is
directed with 20k nodes 2.8m edges.

Is that a typical amount of memory usage or did I do something wrong? If
it's normal, is there any way for me to work around this such that I can
run it on my computer

attachment.html (425 Bytes)

Felix_Victor_Munch · February 11, 2020, 11:11am

This thread might answer your question:

http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/SBM-on-Dense-Graphs-td4027902.html#a4027907

Cheers,

Dr. Felix Victor Münch Postdoc Researcher
Leibniz Institut for Media Research | Hans-Bredow-Institut (HBI), Hamburg
https://leibniz-hbi.de/
https://felixvictor.net/)

attachment.html (3.37 KB)

Deklan_Webster · February 11, 2020, 8:30pm

Thanks, I got it setup. The memory usage is now about 2~GB!

I was testing with a smaller graph and I noticed the sweeps are much slower
with multiflip. Is this a fundamental tradeoff between speed and memory
usage, or is there some setting to make it faster?

Thanks

attachment.html (4.27 KB)

tiago · February 12, 2020, 8:23am

Hi,

Deklan_Webster · February 12, 2020, 3:39pm

I see. To initialize the NestedBlockState I'm doing this:

bs = [np.zeros(1)] * 10
state = NestedBlockState(g=g, bs=bs, sampling=True, deg_corr=True)

Is there a better way to initialize it such that this process is faster?
Running "minimize_nested_blockmodel_dl" once is seemingly out of the
question because of memory usage.

Also, I notice that only one of my CPU cores is being utilized. Is there no
multicore low-memory way of doing this?

Thanks!

attachment.html (1.95 KB)

dawe · February 24, 2020, 6:32pm

In the thread linked (http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/SBM-on-Dense-Graphs-td4027902.html#a4027907) is written

(Remember to replace mcmc_sweep() with multiflip_mcmc_sweep() in the code
above. I had forgotten to make this change the last release...)

Does this mean that one should also set "multiflip=True" in mcmc_equilbrate?

d

tiago · February 24, 2020, 6:40pm

Yes.

James_Ruffle · February 26, 2020, 3:34pm

Dear community,

I have been following this thread as have a similar predicament - trying to run the nested block model on a graph of 1mil nodes and 12 mil edges.
Since I have read of the computational cost from others, I have got access to a server with 500Gb RAM with ~120 CPU threads.
Based upon this graph complexity and your prior experience, which approach you would aim for:

1) minimize_nested_blockmodel_dl(g)

or

2) prime an empty object and MCMC equilibrate with multi-flip:
bs = [np.zeros(1)]*6
state=NestedBlockState(g=g,bs=bs,sampling=True)
gt.mcmc_equilibrate(state,multiflip=True)

Am very grateful for the guidance…!
(I wonder how long this will take to run…??)

James

Deklan_Webster · February 26, 2020, 8:27pm

I had success with the latter method. Although, it seems the hardware you
have is a bit overkill (I did 3m~ edge graph with 16GB RAM laptop with
enough RAM to spare). I'm not sure if it's possible to utilize
multithreading for this, but if not those threads you have won't help much.
Still hoping to get an answer about multithreading from someone more
knowledgeable

attachment.html (4.14 KB)

Felix_Victor_Munch · February 27, 2020, 9:19am

For the first version I needed 400-500GB RAM, so that wouldn't be overkill. The threads however seem pretty much useless, indeed.

Hoiweveriwever, as the latter version seems so much more memory efficient, I am wondering why it's not implemented as the default in form of a function call with the same prominence in the documentation. Would love to hear Tiago or anyone else chime in on this. What's the advantage of the boiler-plate minimize_nested_blockmodel? Is there any except less lines of code?

Cheers,

Felix

attachment.html (5.67 KB)

tiago · February 27, 2020, 1:00pm

For the first version I needed 400-500GB RAM, so that wouldn't be
overkill. The threads however seem pretty much useless, indeed.

500 GB for a network with 3M edges seems absurd. You should be able to
do it with an order of magnitude less memory, at least.

Hoiweveriwever, as the latter version seems so much more memory
efficient, I am wondering why it's not implemented as the default in
form of a function call with the same prominence in the documentation.
Would love to hear Tiago or anyone else chime in on this. What's the
advantage of the boiler-plate minimize_nested_blockmodel? Is there any
except less lines of code?

The algorithms are not the same, and incur different trade-offs.

The one in minimize_nested_blockmodel_dl() attempts to bracket the model
complexity with a one-dimensional bisection search, and requires more
memory because it needs to keep several copies of the global state
during the search.

On the other hand, multiflip_mcmc_sweep() implements merge-split moves,
and keeps only a single state around, hence the improved memory usage.

Although it depends on the network, minimize_nested_blockmodel_dl()
tends to find better estimates of the ground state (i.e. solutions with
smallest description length) in a shorter time, for a cost of higher
memory usage.

The next version of the library will include an improved version of
multiflip_mcmc_sweep() that I am preparing, and this alternative will
gain prominence in the documentation.

Best,
Tiago

Deklan_Webster · February 27, 2020, 3:31pm

Is multiflip_mcmc_sweep() able to be done in parallel now or in the future
(CPU or GPU)?

attachment.html (2.55 KB)

tiago · February 27, 2020, 3:47pm

No, MCMC is inherently difficult to parallelize.

Felix_Victor_Munch · February 27, 2020, 10:53pm

Am 27.02.20 um 10:19 schrieb Felix Victor Münch:

For the first version I needed 400-500GB RAM, so that wouldn't be
overkill. The threads however seem pretty much useless, indeed.

500 GB for a network with 3M edges seems absurd. You should be able to
do it with an order of magnitude less memory, at least.

Yes, you’re absolutely right, sorry. It was > 60 million edges in my case. My bad for not reading closely enough (and not having provided that information in this thread before).

Hoiweveriwever, as the latter version seems so much more memory
efficient, I am wondering why it's not implemented as the default in
form of a function call with the same prominence in the documentation.
Would love to hear Tiago or anyone else chime in on this. What's the
advantage of the boiler-plate minimize_nested_blockmodel? Is there any
except less lines of code?

The algorithms are not the same, and incur different trade-offs.

The one in minimize_nested_blockmodel_dl() attempts to bracket the model
complexity with a one-dimensional bisection search, and requires more
memory because it needs to keep several copies of the global state
during the search.

On the other hand, multiflip_mcmc_sweep() implements merge-split moves,
and keeps only a single state around, hence the improved memory usage.

Although it depends on the network, minimize_nested_blockmodel_dl()
tends to find better estimates of the ground state (i.e. solutions with
smallest description length) in a shorter time, for a cost of higher
memory usage.

The next version of the library will include an improved version of
multiflip_mcmc_sweep() that I am preparing, and this alternative will
gain prominence in the documentation.

Looking forward to test it. Thanks for the explanation!