Hello,
I have recently been working on optimizing a graph-tool implementation of Yen’s k-shortest path algorithm. During that process, I ran into a few performance bottlenecks in the shortest_path()
and shortest_distance()
functions that I was able to address by modifying the graph-tool code locally. I’m hoping to get some or all of these changes into graph-tool so that (a) they will be useful to others and (b) I can use the graph-tool functions directly in place of my modified versions without reducing performance. I would be happy to add an issue and/or PR for these if my GitLab account can be approved.
Thanks,
Kai
Specs:
graph-tool version: 2.75
Operating system: MacOS Sonoma 14.5
MWE: Test cases are specified for each suggested change. Profiling results were generated using line_profiler.
Suggested changes:
shortest_path()
:
-
src/graph_tool/topology/__init__.py#L2247-2248
:vlist
andelist
are built frompred_map
usinginsert(0, ...)
, which has O(n) complexity (O(n^2) to build the entire list). Building the lists in reverse order withappend(...)
and then returningvlist[::-1], elist[::-1]
is an O(n) alternative.Test case:
g = gt.generation.circular_graph(5000000, 4, directed=True) gt.topology.shortest_path(g, 0, 1000000)
Before:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 2248 250000 15656690.0 62.6 43.8 elist.insert(0, pe) 2249 250000 15532121.0 62.1 43.4 vlist.insert(0, p) 2250 250000 33053.0 0.1 0.1 v = p 2251 1 0.0 0.0 0.0 return vlist, elist
After:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 2248 250000 42988.0 0.2 1.1 elist.append(pe) 2249 250000 38469.0 0.2 1.0 vlist.append(p) 2250 250000 23343.0 0.1 0.6 v = p 2251 1 2498.0 2498.0 0.1 return vlist[::-1], elist[::-1]
-
src/graph_tool/topology/__init__.py#L2237-2246
: The logic for determining the lowest-weight edge fromp
tov
is relatively expensive for long paths. For unweighted graphs, it’s faster to skip that logic and find the edge withg.edge(p, v)
.Test case:
g = gt.generation.circular_graph(5000000, 4, directed=True) gt.topology.shortest_path(g, 0, 1000000)
Before:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 2237 1750000 708892.0 0.4 18.5 for e in v.in_edges() if g.is_directed() else v.out_edges(): 2238 1750000 1016039.0 0.6 26.5 s = e.source() if g.is_directed() else e.target() 2239 1750000 1272609.0 0.7 33.3 if s == p: 2240 250000 25599.0 0.1 0.7 if weights is not None: 2241 if weights[e] < min_w: 2242 min_w = weights[e] 2243 pe = e 2244 else: 2245 250000 23036.0 0.1 0.6 pe = e 2246 250000 28138.0 0.1 0.7 break
After:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 2237 250000 25019.0 0.1 1.4 if weights is None: 2238 250000 1049546.0 4.2 57.0 pe = g.edge(p, v) 2239 else: 2240 for e in v.in_edges() if g.is_directed() else v.out_edges(): 2241 s = e.source() if g.is_directed() else e.target() 2242 if s == p: 2243 if weights[e] < min_w: 2244 min_w = weights[e] 2245 pe = e
shortest_distance()
:
-
src/graph_tool/topology/__init__.py#L2095-2098
: Values indist_map
are set to maximum usingdist_map.fa
. Setting the values withdist_map.a
is significantly faster on filtered graphs because it circumvents the filtering logic. My assumption here is that setting filtered values to maximum does not affect the behavior ofget_dists()
– please let me know if there are cases where that assumption is incorrect.Test case:
g = gt.generation.circular_graph(5000000, 4, directed=True) vfilt = g.new_vertex_property("bool") vfilt.a = True gv = gt.GraphView(g, vfilt=vfilt) gt.topology.shortest_path(gv, 0, 1000)
Before:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 2095 1 1279.0 1279.0 3.2 if numpy.issubdtype(dist_map.a.dtype, numpy.integer): 2096 1 27670.0 27670.0 68.7 dist_map.fa = numpy.iinfo(dist_map.a.dtype).max
After:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 2095 1 1196.0 1196.0 8.5 if numpy.issubdtype(dist_map.a.dtype, numpy.integer): 2096 1 429.0 429.0 3.0 dist_map.a = numpy.iinfo(dist_map.a.dtype).max
-
src/graph_tool/topology/__init__.py#L2126
: Theif pred_map:
check on this line determines whetherpmap
is returned in addition todist_map
. If aVertexPropertyMap
is passed as thepred_map
argument toshortest_distance()
, the conditional will result in a call tolen(pred_map)
to determine ifpred_map
is truthy or falsy.VertexPropertyMap.__len__()
callsg.num_vertices()
, which is O(V) for filtered graphs. Thelen()
call can be avoided by modifying the conditional to handle thebool
andVertexPropertyMap
cases separately.Test case:
g = gt.generation.circular_graph(5000000, 4, directed=True) gv = gt.GraphView(g, vfilt=g.new_vertex_property("bool")) pred_map = gv.copy_property(gv.vertex_index, value_type="int64_t") gt.topology.shortest_distance(g, 0, 1000, pred_map=pred_map)
Before:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 2126 1 2403.0 2403.0 30.3 if pred_map: 2127 ret = (dist_map, pmap)
After:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 2126 1 0.0 0.0 0.0 if pred_map is True or isinstance(pred_map, VertexPropertyMap): 2127 1 0.0 0.0 0.0 ret = (dist_map, pmap)