Memory requirement estimate for a large graph

Carlo_Nicolini · July 6, 2020, 11:00am

Dear Tiago,

How is it possible to get an estimate for the memory requirement of a graph
in graph-tool?
I know that graph-tool is built upon C++ and Boost, and the adjacency list
is stored via a hash-map.
Apart from the cost of storing the values of vertices indices and edges
indices as `unsigned long`, what is the memory overhead of the structures
used in storing the graph?

For example, for a network of 1M vertices and 100M links without
attributes, how much real memory should I plan to use, excluding
temporaries?

Sorry if the question is repeated, but I could not find it in the previous
mailing list posts.

Regards,
Carlo

attachment.html (797 Bytes)

tiago · July 6, 2020, 12:40pm

Dear Tiago,

How is it possible to get an estimate for the memory requirement of a
graph in graph-tool?

Yes, it is, and I should put this in the documentation somewhere.

I know that graph-tool is built upon C++ and Boost, and the adjacency
list is stored via a hash-map.

Not quite, we use an adjacency list using std::vector<>.

Apart from the cost of storing the values of vertices indices and edges
indices as `unsigned long`, what is the memory overhead of the
structures used in storing the graph?

We use a vector-based bidirectional adjacency list, so each edge appears
twice. Each edge is comprised of two size_t (uint64_t) values, for the
target/source and the edge index, so we need 32 bytes per edge.

For each vertex we need a std::vector<> which is 24 bytes and a uint64_t
to separate the in-/out-lists, so we also need 32 bytes per node.

Therefore we need in total:

(N + E) * 32 bytes

For example, for a network of 1M vertices and 100M links without
attributes, how much real memory should I plan to use, excluding
temporaries?

That would be:

3232000000 bytes = 3.01 GB

In practice you will need a little more, since std::vector<> tends to
over-commit.

Best,
Tiago

Carlo_Nicolini · July 6, 2020, 1:53pm

Many thanks Tiago for the quick answer.

I've tried (using `htop` command) to measure the RES memory requirement for
such a graph with 1M nodes and 100M links, but the results is almost twice
the size, a total of 6.2 GB.
Is there a reason why I get that figure?
I am on MacOS 10.15.6 Catalina

attachment.html (2.6 KB)

tiago · July 6, 2020, 1:58pm

Hard to say without a minimal working example. Try saving the network to
disk, and loading it again from a newly started interpreter.

I tried on my machine, and I got 4.5 GB. As I explained, std::vector<>
may over-commit the allocations.