memory handling when using graph-tool's MCMCs and multiprocessing on large networks

Sam_G · October 21, 2021, 8:38pm

hi,

i was wondering if anyone had any tips on conserving memory when running graph-tool MCMCs on large networks in parallel.

i have a large network (2GB), and about 260GB of memory, and was surprised to receive a MemoryError when i ran 10 MCMC chains in parallel using multiprocessing. my impression is that MCMC was relatively light on memory.

cheers,
-sam

tiago · October 21, 2021, 9:16pm

As usual, it is not possible to say anything concrete without a minimal
complete working example that shows the problem.

Sam_G · October 22, 2021, 4:08am

thanks for your reply.

here's an example running minimize_blockmodel_dl() 10 times on 10 cores. when i run this on a large network (2GB, 2M vertices, 20M edges) i get a MemoryError.

import graph_tool.all as gt
import multiprocessign as mp
import numpy as np

g = gt.load_graph("large_graph", fmt="graphml")

N_iter = 10
N_core = 10

def fit_sbm():
    state = gt.minimize_blockmodel_dl(g)
    b = state.get_blocks()
    return b

def _parallel_sbm(iter = N_iter):
    pool = mp.Pool(N_core)
    future_res = [pool.apply_async(fit_sbm) for m in range(iter)]
    res = [f.get() for f in future_res]
    return res

def parallel_fit_sbm(iter = M_iter):
    results = parallel_sbm(iter)
    return results

results = parallel_fit_sbm()

tiago · October 22, 2021, 7:35am

It is clear to you that if you run 10 processes in parallel you will
need 10 times more memory, right?

BTW, your example is not self-contained, because I cannot run it without
the 'large_graph' file.

Maybe you can reproduce the problem for a smaller graph, or show how
much memory you are actually using for a single process. It would also
be important to tell us what version of graph-tool you are running.

Best,
Tiago

Sam_G · October 22, 2021, 7:45pm

thanks for your reply

``It is clear to you that if you run 10 processes in parallel you will need 10 times more memory, right?``

this is clear to me. however 2GB*10 ~= 20GB, and the machine has 260GB memory. so unless algorithm is creating 10 copies of the graph within each iteration, it should be within bounds

ideally the parallelization would not make a copy of the entire graph. since the edges are fixed, all it needs to do is make vertex properties. i haven't figured that out yet. i was hoping someone might have tips on how to do this in graph-tool. there are ways to do this with other objects (data frames, lists), but i am not sure how to approach it with graph-tool graphs

```Maybe you can reproduce the problem for a smaller graph, or show how
much memory you are actually using for a single process. It would also
be important to tell us what version of graph-tool you are running.```

will do this soon

tiago · October 22, 2021, 8:03pm

thanks for your reply

``It is clear to you that if you run 10 processes in parallel you will need 10 times more memory, right?``

this is clear to me. however 2GB*10 ~= 20GB, and the machine has 260GB memory. so unless algorithm is creating 10 copies of the graph within each iteration, it should be within bounds

The inference algorithm has internal data structures that take more
memory than the original graph. For instance, it needs to keep track of
the number of edges between groups, the degree distribution inside each
group, several partitions during the bisection search, etc. This is all
done with a trade-off in favor of overall speed, rather than memory usage.

ideally the parallelization would not make a copy of the entire graph. since the edges are fixed, all it needs to do is make vertex properties. i haven't figured that out yet. i was hoping someone might have tips on how to do this in graph-tool. there are ways to do this with other objects (data frames, lists), but i am not sure how to approach it with graph-tool graphs

Graph-tool graphs are not any different than any other data structure in
Python in this regard.

Parallelization with the multiprocessing library is implemented by
spawning multiple processes (via fork()), each with its own copy of the
data. Most systems have a copy-on-write implementation, where initially
the cloned processes point to the same memory. But as soon as this is
modified, the memory usage is duplicated. Sometimes in Python it can
happen that a read will trigger a copy because of the reference
counting, but this should not happen with Graph objects.

But I think in your case the memory use that you are experiencing is
simply due to the 10 different BlockState objects being created,
together the memory usage required for the minimize_blockmodel_dl()
algorithm.

I don't think there is very much you can do, except use a model that
requires less internal book-keeping, e.g. PPBlockState.

Best,
Tiago