Shortest_path()/shortest_distance() performance improvements

Hello,

I have recently been working on optimizing a graph-tool implementation of Yen’s k-shortest path algorithm. During that process, I ran into a few performance bottlenecks in the shortest_path() and shortest_distance() functions that I was able to address by modifying the graph-tool code locally. I’m hoping to get some or all of these changes into graph-tool so that (a) they will be useful to others and (b) I can use the graph-tool functions directly in place of my modified versions without reducing performance. I would be happy to add an issue and/or PR for these if my GitLab account can be approved.

Thanks,
Kai

Specs:

graph-tool version: 2.75
Operating system: MacOS Sonoma 14.5
MWE: Test cases are specified for each suggested change. Profiling results were generated using line_profiler.

Suggested changes:

shortest_path():

  • src/graph_tool/topology/__init__.py#L2247-2248: vlist and elist are built from pred_map using insert(0, ...), which has O(n) complexity (O(n^2) to build the entire list). Building the lists in reverse order with append(...) and then returning vlist[::-1], elist[::-1] is an O(n) alternative.

    Test case:

    g = gt.generation.circular_graph(5000000, 4, directed=True)
    gt.topology.shortest_path(g, 0, 1000000)
    

    Before:

    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      2248    250000   15656690.0     62.6     43.8          elist.insert(0, pe)
      2249    250000   15532121.0     62.1     43.4          vlist.insert(0, p)
      2250    250000      33053.0      0.1      0.1          v = p
      2251         1          0.0      0.0      0.0      return vlist, elist
    

    After:

    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      2248    250000      42988.0      0.2      1.1          elist.append(pe)
      2249    250000      38469.0      0.2      1.0          vlist.append(p)
      2250    250000      23343.0      0.1      0.6          v = p
      2251         1       2498.0   2498.0      0.1      return vlist[::-1], elist[::-1]
    
  • src/graph_tool/topology/__init__.py#L2237-2246: The logic for determining the lowest-weight edge from p to v is relatively expensive for long paths. For unweighted graphs, it’s faster to skip that logic and find the edge with g.edge(p, v).

    Test case:

    g = gt.generation.circular_graph(5000000, 4, directed=True)
    gt.topology.shortest_path(g, 0, 1000000)
    

    Before:

    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      2237   1750000     708892.0      0.4     18.5          for e in v.in_edges() if g.is_directed() else v.out_edges():
      2238   1750000    1016039.0      0.6     26.5              s = e.source() if g.is_directed() else e.target()
      2239   1750000    1272609.0      0.7     33.3              if s == p:
      2240    250000      25599.0      0.1      0.7                  if weights is not None:
      2241                                                               if weights[e] < min_w:
      2242                                                                   min_w = weights[e]
      2243                                                                   pe = e
      2244                                                           else:
      2245    250000      23036.0      0.1      0.6                      pe = e
      2246    250000      28138.0      0.1      0.7                      break
    

    After:

    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      2237    250000      25019.0      0.1      1.4          if weights is None:
      2238    250000    1049546.0      4.2     57.0              pe = g.edge(p, v)
      2239                                                   else:
      2240                                                       for e in v.in_edges() if g.is_directed() else v.out_edges():
      2241                                                           s = e.source() if g.is_directed() else e.target()
      2242                                                           if s == p:
      2243                                                               if weights[e] < min_w:
      2244                                                                   min_w = weights[e]
      2245                                                                   pe = e
    

shortest_distance():

  • src/graph_tool/topology/__init__.py#L2095-2098: Values in dist_map are set to maximum using dist_map.fa. Setting the values with dist_map.a is significantly faster on filtered graphs because it circumvents the filtering logic. My assumption here is that setting filtered values to maximum does not affect the behavior of get_dists() – please let me know if there are cases where that assumption is incorrect.

    Test case:

    g = gt.generation.circular_graph(5000000, 4, directed=True)
    vfilt = g.new_vertex_property("bool")
    vfilt.a = True
    gv = gt.GraphView(g, vfilt=vfilt)
    gt.topology.shortest_path(gv, 0, 1000)
    

    Before:

    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      2095         1       1279.0   1279.0      3.2          if numpy.issubdtype(dist_map.a.dtype, numpy.integer):
      2096         1      27670.0  27670.0     68.7              dist_map.fa = numpy.iinfo(dist_map.a.dtype).max
    

    After:

    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      2095         1       1196.0   1196.0      8.5          if numpy.issubdtype(dist_map.a.dtype, numpy.integer):
      2096         1        429.0    429.0      3.0              dist_map.a = numpy.iinfo(dist_map.a.dtype).max
    
  • src/graph_tool/topology/__init__.py#L2126: The if pred_map: check on this line determines whether pmap is returned in addition to dist_map. If a VertexPropertyMap is passed as the pred_map argument to shortest_distance(), the conditional will result in a call to len(pred_map) to determine if pred_map is truthy or falsy. VertexPropertyMap.__len__() calls g.num_vertices(), which is O(V) for filtered graphs. The len() call can be avoided by modifying the conditional to handle the bool and VertexPropertyMap cases separately.

    Test case:

    g = gt.generation.circular_graph(5000000, 4, directed=True)
    gv = gt.GraphView(g, vfilt=g.new_vertex_property("bool"))
    pred_map = gv.copy_property(gv.vertex_index, value_type="int64_t")
    gt.topology.shortest_distance(g, 0, 1000, pred_map=pred_map)
    

    Before:

    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      2126         1       2403.0   2403.0     30.3          if pred_map:
      2127                                                       ret = (dist_map, pmap)
    

    After:

    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      2126         1          0.0      0.0      0.0          if pred_map is True or isinstance(pred_map, VertexPropertyMap):
      2127         1          0.0      0.0      0.0              ret = (dist_map, pmap)
    

Your account has been approved! (I don’t know why it got stuck)

Please submit the PR from there so I can review it.

Many thanks.

1 Like