thanks for your reply
``It is clear to you that if you run 10 processes in parallel you will need 10 times more memory, right?``
this is clear to me. however 2GB*10 ~= 20GB, and the machine has 260GB memory. so unless algorithm is creating 10 copies of the graph within each iteration, it should be within bounds
The inference algorithm has internal data structures that take more
memory than the original graph. For instance, it needs to keep track of
the number of edges between groups, the degree distribution inside each
group, several partitions during the bisection search, etc. This is all
done with a trade-off in favor of overall speed, rather than memory usage.
ideally the parallelization would not make a copy of the entire graph. since the edges are fixed, all it needs to do is make vertex properties. i haven't figured that out yet. i was hoping someone might have tips on how to do this in graph-tool. there are ways to do this with other objects (data frames, lists), but i am not sure how to approach it with graph-tool graphs
Graph-tool graphs are not any different than any other data structure in
Python in this regard.
Parallelization with the multiprocessing library is implemented by
spawning multiple processes (via fork()), each with its own copy of the
data. Most systems have a copy-on-write implementation, where initially
the cloned processes point to the same memory. But as soon as this is
modified, the memory usage is duplicated. Sometimes in Python it can
happen that a read will trigger a copy because of the reference
counting, but this should not happen with Graph objects.
But I think in your case the memory use that you are experiencing is
simply due to the 10 different BlockState objects being created,
together the memory usage required for the minimize_blockmodel_dl()
algorithm.
I don't think there is very much you can do, except use a model that
requires less internal book-keeping, e.g. PPBlockState.
Best,
Tiago