Speed up graph_tool.topology.shortest_distance

Hi!

I use graph_tool.topology.shortest_distance for an all pairs shortest path calculation, what is the main run-time footprint of my algorithm and way to large.

How would you speed it up / tackle this?

I tried to sub-sample with manual source, target pairs but that's terribly inefficient since it does not use the internal bookkeeping.

Nice would be graph_tool.topology.shortest_distance(G, U, V), where U and V are lists of same length with sources / targets but that's not implemented.

Thank a lot in advance!

Stephan

attachment.html (1.36 KB)

Hi!

I use graph_tool.topology.shortest_distance for an all pairs shortest
path calculation, what is the main run-time footprint of my algorithm
and way to large.

How would you speed it up / tackle this?

If I knew of a general faster way to solve the all-pairs shortest path
problem, I would have implemented it.

I tried to sub-sample with manual source, target pairs but that's
terribly inefficient since it does not use the internal bookkeeping.

Nice would be graph_tool.topology.shortest_distance(G, U, V), where U
and V are lists of same length with sources / targets but that's not
implemented.

It is possible to pass a single source but multiple targets.

Subsampling is usually a good technique to reduce the computation time,
but it is hard to know what is applicable to you.

Best,
Tiago

It is possible to pass also a single source and no target, in which case
the function computes the distances to all other vertices.

By sub-sampling the source vertices you can avoid the "overhead" you
were referring to.

On my test graph with 4201 nodes and 9683 edges I already tried a quick test with both:

I need ~ 70 random (source, target) node pairs to approximate the real mean of all pairs shortest path with an error of about 1 %.

This is about 40 % of the run-time of all pairs shortest path.

Using single source all targets shortest path I need to sample at least 6 random sources to approximate the real mean of all pairs shortest path with an error of about 1 %.

This is about 45-50 % of the run-time of all pairs shortest path.

So, although single source all targets shortest path is waay more efficient in it's computation than shortest_path manually it apparently is not a good candidate for subsampling and effectively doing worse.

attachment.html (2.43 KB)

I would not try to generalize much from this, since it is likely to
depend substantially on the underlying graph.

Best,
Tiago