Performance with property map

Flavien_Lambert · July 16, 2014, 8:48am

Hi everyone, to compute some functions I needed to loop over a bunch of
edges. I realized that the call to property maps seems slow compare to a
dictionary. I am a bit surprised since I was told - I am not an expert in
python - that a query in a dictionary was already. So I was wondering if I
made a mistake in using graph_tool. Following is an example of comparison.
Best,
F.

In [3]:

_network = gt.load_graph(_dataFolder + 'networkLTA-2.0-scc.xml')

In [4]:

_network.list_properties()

destination (vertex) (type: long double)
_graphml_vertex_id (vertex) (type: string)
origin (vertex) (type: long double)
_graphml_edge_id (edge) (type: string)
speed (edge) (type: long double)
name (edge) (type: string)
time (edge) (type: long double)

In [5]:

_edgeIds = _network.edge_properties['_graphml_edge_id']

_times = _network.edge_properties["time"]

_speeds = _network.edge_properties["speed"]

_origin = _network.vertex_properties['origin']

_destination = _network.vertex_properties['destination']

In [8]:

_edges = [_e for _e in _network.edges()]

%time for _e in _network.edges() : a = _speeds[_e]

_speedDict = {}

for _e in _edges :

_speedDict[_network.edge_index[_e]] = _speeds[_e]

_indexes = [_network.edge_index[_e] for _e in _network.edges()]

%time for _n in _indexes : a = _speedDict[_n]

CPU times: user 102 ms, sys: 5 ms, total: 107 ms
Wall time: 103 ms
CPU times: user 2 ms, sys: 0 ns, total: 2 ms
Wall time: 1.94 ms

attachment.html (18.5 KB)

user5 · July 16, 2014, 2:02pm

You know you can get the edges, vertices, and property maps as numpy arrays
by using the .a method. You should be able to do whatever you need much
faster with arrays than dictionaries.

attachment.html (18.1 KB)

user5 · July 16, 2014, 2:24pm

Also, I can't check right now, but I'd guess the time difference is because
integers are much more efficient to hash and compare than long strings.
Just a barely educated guess.

-elliot

attachment.html (18.7 KB)

Flavien_Lambert · July 16, 2014, 2:49pm

Hi Elliott, I know that the get_array is very efficient but the thing is I
have to know exactly what are the edges I am dealing with.

To be more precise, I run loops over the shortest paths which I computed
before and stored in a file. Therefore, each iteration makes access to a
tiny fraction of the network and I must keep track of the edges involved.
That is why I was giving the example of single access instead of global one
though .a.

Best,
F.

attachment.html (19.3 KB)

Gareth_Simons · July 16, 2014, 3:05pm

Flavian,

I’m not exactly sure what you’re trying to do, but it seems you want to select only a handful of vertices based on shortest distance calculations?

You could try something like this, which is IMHO very fast:

# You already have your shortest paths, so you can probably skip this step, but it is possible to
# search all shortest paths from a source vertex to all vertices within a certain distance, or to
# search the shortest path from and to a specific vertex using the built-in shortest_distance function:
v_localDist = shortest_distance(g, source=v, weights=e_dist, max_dist=dist)

# You can then use the built-in masks functionality to filter out only relevant nodes.

# First create the mask property map:
v_mask = g.new_vertex_property('bool')

# Then assign the property map values from another property map array, using .a / get_array() to access arrays:
v_mask.a = v_localDist.a <= dist # set the array values to 1 (boolean property map) where distance parameter is met

# Then activate the filter
g.set_vertex_filter(v_mask) # activate mask based on currently selected vertices

# And deactivate when done
g.set_vertex_filter(None)

I use this method to iterate calculations on local vertex clusters (based on distance) within some fairly big graphs.
Gareth

attachment.html (20.6 KB)

user5 · July 16, 2014, 3:19pm

Are the indices in the array not the indices of the edges? I'm not sure,
but I think you could be accessing edges based on indices you precomputed
rather than edge objects.

attachment.html (20.1 KB)

Flavien_Lambert · July 16, 2014, 3:44pm

That is a good question. I was wondering if so.
On the same topic, I did not find the function for getting the edge from
the edge index. There is one for vertices like vertex(n) but for edges?

attachment.html (21 KB)

Flavien_Lambert · July 16, 2014, 3:46pm

Thanks Gareth, I will try to dig into this method!

attachment.html (20.6 KB)

user5 · July 16, 2014, 3:59pm

I'm new to graph tool and still not sure about how to use it
idiomatically. So, why would one go through the process of creating a
property map, activating and deactivating a filter rather than just
creating a numpy mask and masking an array of edges. (I guess edges is an
iterator so masking it isn't as super simple and efficient as with an array)

Flavian, I dont see one either. Look at Graph.edges.__doc__

attachment.html (22.1 KB)

tiago · July 18, 2014, 1:39pm

On the same topic, I did not find the function for getting the edge
from the edge index. There is one for vertices like vertex(n) but for
edges?

There is no such function, and this is related to the speed difference
you are seeing. The edges are not stored in one big vector, and thus
cannot be addressed simply by its index. Instead, they are stored in
different vectors across all nodes (i.e. and adjacency list), and thus
one needs to iterate first through the vectors.

Now lets look at your code:

_edges = [_e for _e in _network.edges()]
%time for _e in _network.edges() : a = _speeds[_e]

_speedDict = {}
for _e in _edges :
_speedDict[_network.edge_index[_e]] = _speeds[_e]
_indexes = [_network.edge_index[_e] for _e in _network.edges()]
%time for _n in _indexes : a = _speedDict[_n]

What makes the second loop slower has little to do with property maps vs
dicts, but instead with the loop over the Graph.edges() iterator and the
loop ofter the list you created in the second part. Whenever you loop
over the edges, not only is the loop slightly slower because it is not a
simple list, but a "list of lists", but also (more importantly) because
it has to *create* edge descriptor objects at each iteration! In the
second loop you created these objects and stored them in a list, and
then looped over this list, which does not involve object creation, and
is therefore much faster.

Best,
Tiago

Flavien_Lambert · July 19, 2014, 7:19am

Thanks a lot for the explanation! So is there a best way to have access to
the properties?

attachment.html (2.55 KB)

user5 · July 19, 2014, 5:01pm

Well, if you build a list of edge objects before hand, you can get and set
properties from maps quickly all day, or until you create or destroy edges
is the graph and have to rebuild the list. If building that list takes a
long time and you're changing edges often, you could use property map masks
to keep track of new and deleted edges and add to and subtract from the
list.

Elliot

attachment.html (3.53 KB)

user5 · July 19, 2014, 5:11pm

That's assuming you really need to loop through the edges, and that getting
edges from masked graphs won't do.

attachment.html (3.98 KB)