Running all_shortest_path on large graph

Hi,

I have a large graph (800K vertices and 6e6 edges) and I am running the
all_shortest_paths function with edge weights. The problem is that due to
the size of the graph, there are probably millions of paths that qualify and
the function fails with a "MemoryError" code.

Is there a way to limit the number of shortest paths returned, or even
better a version that returns an iterator and compute the paths as needed?

Thanks

The function all_shortest_paths() *DOES* return an iterator to all the
paths! It does *not* store them all in memory. You must be doing some list
conversion...

I understand that it returns an iterator, but I assume it still does some
preparatory work internally that potentially allocates memory.
I don't do any list conversion. I simply call the function without even
assigning the output to any local variable.

The error message I get is as follow:

Traceback (most recent call last):
  File
"/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py",
line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-07c125e74096>", line 1, in <module>
    all_shortest_paths(g, v_start, v_dest, ep["sp_weight"], epsilon=10)
  File "/usr/lib/python3/dist-packages/graph_tool/topology/__init__.py",
line 1978, in all_shortest_paths
    _prop("v", g, all_preds_map))
MemoryError

attachment.html (2.56 KB)

As usual, without a minimal and self-contained (i.e. complete) example that
shows the problem, it is not possible to say anything. An error message
without any context is not very useful.

Tiago,

I start to believe that the error is not related to the size of the graph
but rather to the epsilon parameter that is passed to the function
all_shortest_paths().
Based on the documentation the epsilon parameter is defined as "Maximum
relative difference between distances to be considered equal".
I want to compute not only the best path but also paths that are "fairly"
close to the best one, so I set epsilon=5 to have the function return
multiple "best paths".
This messes up the function it either creates a segmentation error, or it
crashes the whole virtual machine.

The following code creates a 5x5 graph where each edge that does not lie to
the periphery of the square is connected to all neighboring edges with
a weight of 1.
The best path from vertex "3_1" to "1_2" is obvious the diagonal path which
has a weight of 2. I was expecting that by setting epsilon=5, other paths
would also be returned, for example (3_1, 3_2, 2_2, 1_2) that has weight of
4. Instead, the function crashes and returns the following error:
Traceback (most recent call last):
  File "<input>", line 3, in <module>
  File
"/home/labuser/anaconda3/envs/py35/lib/python3.5/site-packages/graph_tool/topology/__init__.py",
line 1978, in all_shortest_paths
    _prop("v", g, all_preds_map))
MemoryError

I probably misuse the epsilon parameter, but is there a way to achieve
what I described above?
The source code that recreates the problem can be found at the end of the
email.

I am using version 2.26 on an Ubuntu 17.10 virtual machine.

Thanks in advance for the help,
Vaggelis

!image.png|498x511

        import gi
        import numpy as np

        gi.require_version('Gtk', '3.0')
        from graph_tool.all import *
        import graph_tool as gt
        from graph_tool.topology import *
        import tqdm
        import math

        g = gt.Graph(directed=True)
        dict_v_name_to_v = {}
        g.vertex_properties["name"] = g.new_vertex_property("string")
        g.vertex_properties["pos"] = g.new_vertex_property("vector<float>")

        g.edge_properties["name"] = g.new_edge_property("string") # name
of edge, if any comes from csf
        g.edge_properties["text"] = g.new_edge_property("string") # name
of edge, if any comes from csf
        g.edge_properties["elev_gain"] = g.new_edge_property("double") #
name of edge, if any comes from csf

        vp = g.vertex_properties
        ep = g.edge_properties

        N_LON=5
        N_LAT=5

        # Vertices
        print("\r\nCreating Vertices")
        for i in range(0, N_LAT):
            for j in range(0, N_LON):
                v = g.add_vertex()
                v_name = str(i) + '_' + str(j)
                dict_v_name_to_v[v_name] = v
                vp["name"][v] = v_name
                vp['pos'][v] = [float(j), float(i)]

        # Edges
        print("\r\nCreating Edges")
        for j in range(1, N_LON - 1): # long
            jm1 = j - 1
            jp1 = j + 1
            for i in range(1, N_LAT - 1): # lat
                ip1 = i + 1
                im1 = i - 1
                im1_s, i_s, ip1_s = [str(el) for el in (im1, i, ip1)]
                jm1_s, j_s, jp1_s = [str(el) for el in (jm1, j, jp1)]
                v_i_jm1 = dict_v_name_to_v['_'.join([i_s, jm1_s])]
                v_i_j = dict_v_name_to_v['_'.join([i_s, j_s])]
                v_i_jp1 = dict_v_name_to_v['_'.join([i_s, jp1_s])]
                v_ip1_jm1 = dict_v_name_to_v['_'.join([ip1_s, jm1_s])]
                v_ip1_j = dict_v_name_to_v['_'.join([ip1_s, j_s])]
                v_ip1_jp1 = dict_v_name_to_v['_'.join([ip1_s, jp1_s])]
                v_im1_jm1 = dict_v_name_to_v['_'.join([im1_s, jm1_s])]
                v_im1_j = dict_v_name_to_v['_'.join([im1_s, j_s])]
                v_im1_jp1 = dict_v_name_to_v['_'.join([im1_s, jp1_s])]

                e_i_jm1 = g.add_edge(v_i_j, v_i_jm1)
                e_i_jp1 = g.add_edge(v_i_j, v_i_jp1)
                e_ip1_jm1 = g.add_edge(v_i_j, v_ip1_jm1)
                e_ip1_j = g.add_edge(v_i_j, v_ip1_j)
                e_ip1_jp1 = g.add_edge(v_i_j, v_ip1_jp1)
                e_im1_jm1 = g.add_edge(v_i_j, v_im1_jm1)
                e_im1_j = g.add_edge(v_i_j, v_im1_j)
                e_im1_jp1 = g.add_edge(v_i_j, v_im1_jp1)

        ep["elev_gain"].a = 1

        for e in g.edges():
            ep["text"][e] = str(ep["elev_gain"][e]) # name of edge, if any
comes from csf

        graph_draw(g,
                   pos=g.vertex_properties['pos'],
                   vertex_text=g.vertex_properties["name"],
                   edge_text=ep["text"],
                   update_layout=False)

        v_start = dict_v_name_to_v['3_1']
        v_dest = dict_v_name_to_v['1_3']

attachment.html (10.8 KB)

Sorry for taking so long to reply. Indeed, you are not supposed to use the
epsilon parameters in this way... If it is too large, it adds loops to the
shortest path tree, creating an infinite loop in the algorithm. The only way
to achieve what you want is to use all_paths() instead, and filter the
length range you want.

Best,
Tiago