joblib parallel for with graph-tool filtering?

Tasos · April 28, 2018, 3:26pm

With graph-tool and joblib working together, do we need to send graph.copy()
in the "Parallel" call like in the code below when using graph vertex
filtering with .set_vertex_filter? graph.copy() makes memory usage extreme
in large graphs (2M Vs, 4M Es) but in my head ensures any concurrency
problems. (or 'graph' without '.copy()' is ok?)

What is the best way to run parallel graph searches and filtering (different
vertex per thread) with graph-tool and joblib? (or without joblib)

tiago · May 13, 2018, 5:59pm

The best approach is to create a different GraphView object for each
filtering, instead of setting the filter for the main graph. Read about
GraphViews here:

https://graph-tool.skewed.de/static/doc/quickstart.html#graph-views

Best,
Tiago

cmos · July 16, 2018, 2:34am

Hi, I have the same question. Upon running code attempting to use GraphViews,
I get an error during pickling:

error: 'i' format requires -2147483648 <= number <= 2147483647

More specifically, it looks like a line inside joblib is unhappy:
CustomizablePickler(buffer, self._reducers).dump(obj)

And this takes us to a struct packing line: header = struct.pack("!i", n)

So, if I had to guess, I'd suspect joblib is trying to pickle the whole
graph rather than the GraphView reference, or something like this. Was
either of you able to get code to successfully parallelize using GraphViews
to avoid copying?

This is on python 3.6.5 with graph_tool version '2.26 (commit , )' , joblib
version '0.11'

Below is a minimal breaking example, if it helps. I am also happy to provide
other information such as tracebacks.

def toy_func(g):
return g.vertex_properties['skim'][0][1]

vmr = [0, 1]
g = load_graph(path) # 22,000 vertex directed graph (a road network)
skim_table = shortest_distance(g, weights=g.edge_properties["weight"])
g.properties['skim'] = skim_table
p(joblib.delayed(toy_func)(GraphView(g)) for i in range(10))

tiago · July 16, 2018, 2:10pm

Hi, I have the same question. Upon running code attempting to use GraphViews,
I get an error during pickling:

error: 'i' format requires -2147483648 <= number <= 2147483647

More specifically, it looks like a line inside joblib is unhappy:
CustomizablePickler(buffer, self._reducers).dump(obj)

And this takes us to a struct packing line: header = struct.pack("!i", n)

So, if I had to guess, I'd suspect joblib is trying to pickle the whole
graph rather than the GraphView reference, or something like this. Was
either of you able to get code to successfully parallelize using GraphViews
to avoid copying?

It is impossible to say anything, without a minimal and self-contained
example that shows the problem.

Below is a minimal breaking example, if it helps. I am also happy to provide
other information such as tracebacks.

def toy_func(g):
return g.vertex_properties['skim'][0][1]

vmr = [0, 1]
g = load_graph(path) # 22,000 vertex directed graph (a road network)
skim_table = shortest_distance(g, weights=g.edge_properties["weight"])
g.properties['skim'] = skim_table
p(joblib.delayed(toy_func)(GraphView(g)) for i in range(10))

That is not a complete minimal example; the function 'p' is undefined and
there are other errors. Please provide one that actually runs, and does not
depend on external data.

Best,
Tiago