Parallelization issues on all_shortest_paths

Greetings, Tiago!

I am trying to use the multiprocess module in Python to accelerate my
simulations. Specifically, I use the following code to parallelly calculate
k-shortest paths for a set of node pairs,

self.valid_paths[v] = pool.map(lambda x: gt.all_shortest_paths(self.g,source
= v_index, target = x),[self.g.vertex_index[c] for c in self.Cloudlet]),

where self.Cloudlet is a predefined node list.

However, python reports the pickling issues
self.valid_paths[v] = pool.map(lambda x:
gt.all_shortest_paths(self.g,source = v_index, target =
x),[self.g.vertex_index[c] for c in self.Cloudlet])

File
"/home/percy/anaconda2/lib/python2.7/site-packages/multiprocess/pool.py",
line 251, in map
    return self.map_async(func, iterable, chunksize).get()

File
"/home/percy/anaconda2/lib/python2.7/site-packages/multiprocess/pool.py",
line 567, in get
    raise self._value

RuntimeError: Pickling of "graph_tool.libgraph_tool_core.Vertex" instances
is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)

Is there anything I am missing? My OS is 16.04, python 2.7.14, graph-tool is
from Ostrokach's Anaconda.

Thanks for your reply

Best,
Percy

Greetings, Tiago!

I am trying to use the multiprocess module in Python to accelerate my
simulations. Specifically, I use the following code to parallelly calculate
k-shortest paths for a set of node pairs,

self.valid_paths[v] = pool.map(lambda x: gt.all_shortest_paths(self.g,source
= v_index, target = x),[self.g.vertex_index[c] for c in self.Cloudlet]),

where self.Cloudlet is a predefined node list.

Please, provide a minimal _self-contained_ (i.e. complete) example that
shows the problem, not a snippet. Otherwise it is difficult to understand
the problem.

However, python reports the pickling issues
self.valid_paths[v] = pool.map(lambda x:
gt.all_shortest_paths(self.g,source = v_index, target =
x),[self.g.vertex_index[c] for c in self.Cloudlet])

File
"/home/percy/anaconda2/lib/python2.7/site-packages/multiprocess/pool.py",
line 251, in map
    return self.map_async(func, iterable, chunksize).get()

File
"/home/percy/anaconda2/lib/python2.7/site-packages/multiprocess/pool.py",
line 567, in get
    raise self._value

RuntimeError: Pickling of "graph_tool.libgraph_tool_core.Vertex" instances
is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)

Is there anything I am missing?

Vertex objects cannot be pickled. I assume that 'self.Cloudlet' stores a
list of Vertex objects. It should be changed to store a list of ints instead.

My OS is 16.04, python 2.7.14, graph-tool is
from Ostrokach's Anaconda.

Please say what version of graph-tool you are using.

Hello, Tiago

Thanks for the quick reply.

Here is a small example to reproduce the problem

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import multiprocess as mp
import graph_tool.all as gt
from numpy import *

import multiprocess as mp
import graph_tool.all as gt
import numpy as np

g = gt.price_network(20, m = 2, directed = False)
valid_paths = g.new_vertex_property("object")
g.vertex_properties["valid_paths"] = valid_paths

Cloudlet = []
Gateway = []
for v in g.vertices():
    if np.random.rand() > 0.5:
        Cloudlet.append(v)
    else:
        Gateway.append(v)

pool = mp.Pool(processes=4)
for v in Gateway:
    valid_paths[v] = pool.map(lambda x: gt.all_shortest_paths(g = g, source
= g.vertex_index[v], target = x),[g.vertex_index[c] for c in Cloudlet])

Now, the python compiler says,

MaybeEncodingError: Error sending result:
'[<graph_tool.libgraph_tool_core.CoroGenerator object at 0x7f344b5f4f80>]'.
Reason: 'RuntimeError('Pickling of
"graph_tool.libgraph_tool_core.CoroGenerator" instances is not enabled
(http://www.boost.org/libs/python/doc/v2/pickle.html)‘,)’

The version of graph-tool is 2.25

Best regards,
Boxi

Hello, Tiago!

A small example to reproduce the problem please find as the following.

import multiprocess as mp
import graph_tool.all as gt
import numpy as np

g = gt.price_network(20, m = 2, directed = False)
valid_paths = g.new_vertex_property("object")
g.vertex_properties["valid_paths"] = valid_paths

Cloudlet = []
Gateway = []
for v in g.vertices():
    if np.random.rand() > 0.5:
        Cloudlet.append(v)
    else:
        Gateway.append(v)

pool = mp.Pool(processes=4)
for v in Gateway:
    valid_paths[v] = pool.map(lambda x: gt.all_shortest_paths(g = g, source
= g.vertex_index[v], target = x),[g.vertex_index[c] for c in Cloudlet])

The version of my graph-tool is 2.25. Now, the python compiler says

MaybeEncodingError: Error sending result:
'[<graph_tool.libgraph_tool_core.CoroGenerator object at 0x7f344b5f4f80>]'.
Reason: 'RuntimeError('Pickling of
"graph_tool.libgraph_tool_core.CoroGenerator" instances is not enabled
(http://www.boost.org/libs/python/doc/v2/pickle.html)‘,)’

I know vertex objects cannot be picked. However, I think I have converted
vertex objects into a int list before sending them to map function, i.e.,
"[g.vertex_index[c] for c in Cloudlet]". In particular, we can print
[g.vertex_index[c] for c in Cloudlet], and terminal shows something like
[1,2,3,4,5].

Is there any thing I misunderstand ?

As the error says, the iterator objects returned by all_shortest_paths()
cannot be pickled. The values returned by the function fed to pool.map()
must be pickable. Hence you need to convert the iterator to lists or
something else before returning.

Hello, Tiago

Thanks for the advice. I follow your instruction converting the iterative
object into a python list as shown in global function
'find_multi_path(g,source,target)'. Then, everything works fine as the
following.

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 9 09:47:40 2018
"""
import multiprocess as mp
import graph_tool.all as gt
import numpy as np

g = gt.price_network(20, m = 2, directed = False)
valid_paths = g.new_vertex_property("object")
g.vertex_properties["valid_paths"] = valid_paths

def find_multi_path(g,source,target):
    res = []
    #distance =
gt.shortest_distance(g=graph,source=g.vertex(source_index),target=g.vertex(dest_index),weights=delay)
    #paths = gt.all_paths(self.g,source=source_index,target=dest_index,
cutoff = 5)
    #growth with the O(V!)
    paths = gt.all_shortest_paths(g,source = source, target = target)
    for p in paths:
        res.append(p)
    return res

Cloudlet = []
Gateway = []
for v in g.vertices():
    if np.random.rand() > 0.5:
        Cloudlet.append(v)
    else:
        Gateway.append(v)

pool = mp.Pool(processes=4)
for v in Gateway:
    valid_paths[v] = pool.map(lambda x: find_multi_path(g = g, source =
g.vertex_index[v], target = x),[g.vertex_index[c] for c in Cloudlet])

*Nevertheless, the unpickled problem still exists as if I encapsulate the
graph object into a class.* The following is an example.

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 9 09:47:40 2018
"""
import multiprocess as mp
import graph_tool.all as gt
import numpy as np

def find_multi_path(g,source,target):
    res = []
    #distance =
gt.shortest_distance(g=graph,source=g.vertex(source_index),target=g.vertex(dest_index),weights=delay)
    #paths = gt.all_paths(self.g,source=source_index,target=dest_index,
cutoff = 5)
    #growth with the O(V!)
    paths = gt.all_shortest_paths(g,source = source, target = target)
    for p in paths:
        res.append(p)
    return res

class Network:
    def __init__(self):
        self.g = gt.price_network(20, m = 2, directed = False)
        self.Cloudlet = []
        self.Gateway = []
        self.valid_paths = self.g.new_vertex_property("object")
        self.g.vertex_properties["valid_paths"] = self.valid_paths
        
    def test(self):
        for v in self.g.vertices():
            if np.random.rand() > 0.5:
                self.Cloudlet.append(v)
            else:
                self.Gateway.append(v)
        
        pool = mp.Pool(processes=4)
        for v in self.Gateway:
            self.valid_paths[v] = pool.map(lambda x: find_multi_path(g =
self.g, source = self.g.vertex_index[v], target = x),[self.g.vertex_index[c]
for c in self.Cloudlet])
            
n = Network()
n.test()

The Python compiler again says

RuntimeError: Pickling of "graph_tool.libgraph_tool_core.Vertex" instances
is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)

Wish you all the best,
Percy

After carefully debugging, I come up with a solution to this issue. It seems
the class member may behave differently from normal functions. The following
example works in small-scale networks.

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 9 09:47:40 2018
"""
import multiprocess as mp
import graph_tool.all as gt
import numpy as np

def find_multi_path(g,source,target):
    res = []
    #distance
=gt.shortest_distance(g=graph,source=g.vertex(source_index),target=g.vertex(dest_index),weights=delay)
    #paths = gt.all_paths(self.g,source=source_index,target=dest_index,
cutoff = 5)
    #growth with the O(V!)
    paths = gt.all_shortest_paths(g,source = source, target = target)
    for p in paths:
        res.append(p)
    return res

class Network:
    def __init__(self):
        self.g = gt.price_network(20, m = 2, directed = False)
        self.Cloudlet = []
        self.Gateway = []
        self.valid_paths = self.g.new_vertex_property("object")
        self.g.vertex_properties["valid_paths"] = self.valid_paths
        
    def test(self):
        for v in self.g.vertices():
            if np.random.rand() > 0.5:
                self.Cloudlet.append(self.g.vertex_index[v])
            else:
                self.Gateway.append(self.g.vertex_index[v])
        
        pool = mp.Pool(processes=4)
        for v in self.Gateway:
            self.valid_paths[v] = pool.map(lambda x: find_multi_path(g =
self.g, source = v, target = x),self.Cloudlet)
            
n = Network()
n.test()
    
Are there any suggestions for this phenomenon?

Best,
Percy

It is always the same issue: The values returned must be picklable. If you
encapsulate things in a class, its members must also be picklable.