Getting shortest distances for a subset of origins - are cython-like speedups worth looking into?

ignacioRh · March 4, 2021, 3:53pm

Hey,

Thanks a lot for the program. I'm using it on an academic project which
would have died long ago without it.

I'm now running some large simulations and need to speed up how I use the
shortest_distance, which is in a python for loop

My use case is the following:
I have a large graph, and I need to compute the shortest distance between an
origin subset of the nodes [A,B], to a target subset [X,Y]. So I want the
distances AX, AY, BX, BY. It is infeasible to compute all pairs distances on
the graph, its too large

I can pass a list to shortest_distance as a target, but I can't pass a list
as an origin. Hence I use list comprehension to achieve the result
distances = [shortest_distances(origin, target= [X,Y] for origin in [A,B]]

This is the main bottleneck in my code.

I'm wondering if I can somehow put the for loop for the origins in some type
of cpython extension, would that result in a speed up gain?
So if put the underlying libgraph_tool_topology.get_dists() call into some
cython-type code, do you think that would speed up this for loop?

I'm fairly new to this type of work, and just want to see if this is a good
idea to pursue, and if so, what sort of pitfalls I should keep in mind.

Thanks!

tiago · March 16, 2021, 7:24am

Hey,

Thanks a lot for the program. I'm using it on an academic project which
would have died long ago without it.

I'm now running some large simulations and need to speed up how I use the
shortest_distance, which is in a python for loop

My use case is the following:
I have a large graph, and I need to compute the shortest distance between an
origin subset of the nodes [A,B], to a target subset [X,Y]. So I want the
distances AX, AY, BX, BY. It is infeasible to compute all pairs distances on
the graph, its too large

I can pass a list to shortest_distance as a target, but I can't pass a list
as an origin. Hence I use list comprehension to achieve the result
distances = [shortest_distances(origin, target= [X,Y] for origin in [A,B]]

This is the main bottleneck in my code.

I don't understand why this would be. If your network is very large,
surely most of the time is spent in the inner loop, rather than the
outer one.

I'm wondering if I can somehow put the for loop for the origins in some type
of cpython extension, would that result in a speed up gain?
So if put the underlying libgraph_tool_topology.get_dists() call into some
cython-type code, do you think that would speed up this for loop?

If the bottleneck is really there, this could help, but not by much. But
again, I think it's unlikely, unless your graph is quite small.

Best,
Tiago