Speeding up graph copy?

Zouhair · January 20, 2020, 6:43pm

Hi,

I'm using graph-tool in my project, and my use case is as follows:
1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6
edges)
2) attach some properties to the vertices
3) for about ~5000 sets of *new* candidate vertices (each set has about
2-30 vertices)
  3.a - add new vertices to graph
  3.b - add edges as necessary between new vertices and old ones
  3.c - update the properties
  3.d - run an algorithm which depends on properties and edges, save result
to a list
  3.e - remove new vertices from graph and restore properties (i.e. undo
3.a-c)

Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d runs
fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing is 3.a
and 3.e (about 80% of runtime is spent there)

I've read through the docs and tried different approaches:

i- make a copy of the graph using g.copy() or Graph(g): this avoids steps
3.e, but copying the graph is much slower
ii - use a GraphView: adding vertices to the graph view adds them to the
original graph, so still need to do step 3.e
iii- use GraphView but only filter on the original vertices in the graph:
this avoids having to delete the vertices, but making GraphView with a
filter function is still slow (doing `gt.GraphView(g, lambda v: v <
n_nodes_orig)` takes 75% of runtime, and the number of vertices in the
original graph keeps growing...
iv - do this using multiprocessing with the hope that pickling the object
avoids having to copy the graph, but it it appears that a different process
modifying the graph ends up affecting the graph that another process
receives ( because multiprocessing doesn't pickle the underlying C++
object/memory?)

In all cases, it's surprising that the bulk of the runtime is in making
graph copies as opposed to running the algorithm. The cleanest solution is
to take a copy of the graph and work on it to avoid having to later delete
vertices and restore properties, so I'm hoping that there is a way to speed
up graph copies?
I'm hoping for some input on whether there is a better way to achieve this
using graph-tool, or some pythonic magic to help speed up the copy
operation.

Thanks!

P.S. sorry if this is a repost - my initial post was rejected by nabble,
trying again directly.

attachment.html (2.36 KB)

tiago · January 21, 2020, 2:01pm

Hi,

I'm using graph-tool in my project, and my use case is as follows:
1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6
edges)
2) attach some properties to the vertices
3) for about ~5000 sets of *new* candidate vertices (each set has about
2-30 vertices)
3.a - add new vertices to graph
3.b - add edges as necessary between new vertices and old ones
3.c - update the properties
3.d - run an algorithm which depends on properties and edges, save
result to a list
3.e - remove new vertices from graph and restore properties (i.e. undo
3.a-c)

Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d
runs fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing
is 3.a and 3.e (about 80% of runtime is spent there)

How are you doing steps 3.a and 3.e exactly?

Are you calling g.add_vertex() and g.remove_vertex() repeatedly?

iii- use GraphView but only filter on the original vertices in the
graph: this avoids having to delete the vertices, but making GraphView
with a filter function is still slow (doing `gt.GraphView(g, lambda v: v
< n_nodes_orig)` takes 75% of runtime, and the number of vertices in the
original graph keeps growing...

This can be improved by passing a boolean-valued vertex property map
instead of a lambda function. In this way the GraphView creation becomes
O(1) instead of O(N).

Best,
Tiago