Shortest_distance complexity when used with max_dist

Francois · March 5, 2015, 2:11pm

<disclaimer> I hope this question isn't too dumb, if it is please accept my
apologies. </disclaimer>

Hi list,

The graph that I work with has ≈ 30M vertices and ≈ 60M edges. This graph
has positive weights, it represents a Road network. I need to compute the
shortest path form a node to any other node within a range.

The documentation shortest_distance
<http://graph-tool.skewed.de/static/doc/topology.html#graph_tool.topology.shortest_distance>
, states that the complexity (source and weights specified) is O(VlogV).

My question is: "What happens when the *max_dist* parameter is specified ?"
I observed that the execution time of *shortest_distance* increases linearly
with the number of vertices. The execution time goes from 19ms with a
sub-sample of 1M vertices and 2M edges to 320ms with the full graph. How can
it be explained ?

Thanks for your help,
François.

tiago · March 6, 2015, 4:32pm

My question is: "What happens when the *max_dist* parameter is
specified ?"

It does not really change the (worst-time) complexity per se, but it
should run faster, since the search is stopped sooner. If max_dist is
much smaller than typical distances, it can indeed be much faster.

I observed that the execution time of *shortest_distance* increases
linearly with the number of vertices. The execution time goes from
19ms with a sub-sample of 1M vertices and 2M edges to 320ms with the
full graph. How can it be explained ?

With only two points it is not possible to distinguish between O(V),
O(V log V), O(V ^ 2) or anything else.

Best,
Tiago

Francois · March 6, 2015, 6:15pm

Tiago Peixoto wrote

It does not really change the (worst-time) complexity per se, but it
should run faster, since the search is stopped sooner. If max_dist is
much smaller than typical distances, it can indeed be much faster.

Yes, this situation I deal with.

Tiago Peixoto wrote

With only two points it is not possible to distinguish between O(V),
O(V log V), O(V ^ 2) or anything else.

Of course I wouldn't generalize from this example. But I can't explain
myself why the execution time differs so much between the sub-sample and the
full graph. I choose the same source-vertex. The graph around this source is
the same in the sub-sample and the full graph.

Best,
François.

tiago · March 7, 2015, 4:43pm

I'm not sure I understand exactly what the problem is. Could you explain
in detail exactly what you are doing, and why you the think the
performance is unexpected?

Best,
Tiago

Francois · March 7, 2015, 5:20pm

Consider two graphs: G=(V,E) and H=(V',E') such that:

   1. V' [image: \subset] V and E' [image: \subset] E
   2. |E| = 66M
   3. |E'|=2M.

Run:

   1. X = shortest_distance(G, G.vertex(0), max_dist=x,
   weights=G.ep['weights'])
   2. Y = shortest_distance(H, H.vertex(0), max_dist=x,
   weights=H.ep['weights'])

I verify that : X = Y

The execution took 320ms for (1) and 16ms for (2). However the number of
visited vertices should be identical ? I expected to obtain comparable
execution times.

Best,
François

2015-03-07 17:45 GMT+01:00 Tiago Peixoto [via Main discussion list for the
graph-tool project] <ml-node+s982480n4026025h32(a)n3.nabble.com>:

attachment.html (6.07 KB)

tiago · March 8, 2015, 7:10pm

This is because of the initialization of the algorithm. The algorithm
needs to create and initialize at least two vectors of size |V| to store
the distances and book-keeping, regardless of how many vertices are in
fact visited. Therefore the algorithm will never be faster than O(|V|),
independently of "max_dist".

Here is more or less what you are describing:

In [1]: g = lattice([10, 10])

In [2]: %time d = shortest_distance(g, g.vertex(0), max_dist=2)
CPU times: user 3.33 ms, sys: 0 ns, total: 3.33 ms
Wall time: 2.43 ms

In [3]: g = lattice([1000, 1000])

In [4]: %time d = shortest_distance(g, g.vertex(0), max_dist=2)
CPU times: user 50 ms, sys: 0 ns, total: 50 ms
Wall time: 28.6 ms

Note that the second graph is ten thousand times larger than the first
one, but the algorithm was only a little more than ten times slower.

Now if we do it for the larger graph without max dist, we have:

In [5]: %time d = shortest_distance(g, g.vertex(0))
CPU times: user 270 ms, sys: 0 ns, total: 270 ms
Wall time: 260 ms

Which is around 10 times slower... So thing are indeed faster if
max_dist is specified, but the algorithm will still depend on the total
size of the graph.

Best,
Tiago

Francois · March 9, 2015, 11:26am

Thanks for these clarifications.

I guess that most of the time is spent in the *copy_property* [link to git
<https://git.skewed.de/count0/graph-tool/blob/master/src/graph/graph_properties_copy.cc#L33>\].
Indeed, when I provide a pre-initialized distance vector through the dist_map
parameter, the execution of shortest_distance is only 10ms faster.

If I understood well, the *copy_property *is used to build a vector
initialized with one single value. If so, one would obtain the same result
with python and numpy with

x = np.empty(array_size, dtype=float)
x.fill(value)

I did try to time this "numpy way initialization" (altough I'm not sure it
corresponds to *copy_property*)

python -m timeit -s "import numpy as np" -n 100 "np.empty(33e6,
dtype=int).fill(1)"
100 loops, best of 3: 34.9 msec per loop

These 34.9ms have to be compared to the 300ms (bake of envelope calculus)
that takes the *copy_property *function.

Am I right about the way that the copy_property function works ? Could it
be improved ?

Best,
F.

2015-03-08 20:11 GMT+01:00 Tiago Peixoto [via Main discussion list for the
graph-tool project] <ml-node+s982480n4026027h94(a)n3.nabble.com>:

attachment.html (10.8 KB)

Francois · March 9, 2015, 11:28am

Typo + precision fix:

"These 34.9ms have to be compared to the 300ms (back of envelope calculus)
that takes the copy_property function in my 33M vertices graph."

2015-03-09 12:25 GMT+01:00 François Kawala <francois.kawala(a)gmail.com>:

attachment.html (11.5 KB)

tiago · March 9, 2015, 3:24pm

PropertyMap.copy() is slower than a simple numpy initialization because
it needs to deal with possible conversions between the values (such as
converting from string to double). However, it is possible to include a
specialization that avoids this conversion when the types are the
same. I have now included this modification in the git version, which
significantly improves the time it takes to copy a property without
conversion.

Best,
Tiago

Francois · March 10, 2015, 2:39pm

Hello,

Firstly thanks for pushing in this update as quickly, that's awesome. I
tested your fix, It seems that the vector initialization was not the only
to blame. With the fix the execution time is still around 300ms.

I checked that you're fix is used, and "timed" the execution of the
*shortest_distance* function up to the call to
*libgraph_tool_topology.get_dists*.

In the "sub-sample" graph:

- the initialization takes in average 3ms;
- the the call to libgraph_tool_topology.get_dists takes in average 11ms.

In the full graph:

- the initialization takes in average 100ms;
- the call to libgraph_tool_topology.get_dists takes in average 230ms.

Surprisingly (or not ?) when I change the pmap initialization [link to
source
<https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/__init__.py#L1248>\]
to:

vals = numpy.arange(g.num_vertices(), dtype='int')
pmap = g.new_vertex_property("int64_t", vals=vals)

The numpy call takes 36ms and creation of the new vertex property takes
103ms.

Best,
François.

attachment.html (8.74 KB)

tiago · March 10, 2015, 3:32pm

Hello,

Firstly thanks for pushing in this update as quickly, that's
awesome. I tested your fix, It seems that the vector initialization
was not the only to blame. With the fix the execution time is still
around 300ms.

I checked that you're fix is used, and "timed" the execution of the *shortest_distance* function up to the call to *libgraph_tool_topology.get_dists*.

In the "sub-sample" graph:

  * the initialization takes in average 3ms;
  * the the call to libgraph_tool_topology.get_dists* *takes in average 11ms.

In the full graph:

  * the initialization takes in average 100ms;
  * the call to libgraph_tool_topology.get_dists* *takes in average 230ms.

Do you have openmp enabled? If yes, it is best to disable it with
openmp_set_num_threads(1) before running the code.

Note, however, that you will always get a slower code for a larger
graph... A 10x slow-down in each function is expected if your graph is
10x larger.

Note also that the distance initialization is done inside get_dists().

Surprisingly (or not ?) when I change the pmap initialization [link to source <src/graph_tool/topology/__init__.py · master · Tiago Peixoto / graph-tool · GitLab] to:

vals = numpy.arange(g.num_vertices(), dtype='int')
pmap = g.new_vertex_property("int64_t", vals=vals)

The numpy call takes 36ms and creation of the new vertex property takes 103ms.

These are very tight loops, so things are very sensitive to cache hits,
memory re-allocation issues and so on. For instance, here I get:

In [8]: g = lattice([1000, 10000])

In [9]: g
Out[9]: <Graph object, undirected, with 10000000 vertices and 19989000 edges at 0x7fad583720b8>

    In [10]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 16.7 ms, sys: 0 ns, total: 16.7 ms
    Wall time: 19.6 ms

    In [11]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 13.3 ms, sys: 960 ms, total: 973 ms
    Wall time: 976 ms

    In [12]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 20 ms, sys: 13.3 ms, total: 33.3 ms
    Wall time: 32.6 ms

    In [13]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 3.33 ms, sys: 950 ms, total: 953 ms
    Wall time: 954 ms

    In [14]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 20 ms, sys: 10 ms, total: 30 ms
    Wall time: 30 ms

    In [15]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 20 ms, sys: 43.3 ms, total: 63.3 ms
    Wall time: 62.1 ms

    In [16]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 23.3 ms, sys: 6.67 ms, total: 30 ms
    Wall time: 28.9 ms

    In [17]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 13.3 ms, sys: 173 ms, total: 187 ms
    Wall time: 186 ms

    In [18]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 20 ms, sys: 13.3 ms, total: 33.3 ms
    Wall time: 30.7 ms

    In [19]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 6.67 ms, sys: 853 ms, total: 860 ms
    Wall time: 858 ms

Notice how the timing fluctuates wildly from ~30ms to ~900ms!

With the property map creation, things are similar:

    In [20]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 50 ms, sys: 0 ns, total: 50 ms
    Wall time: 48.2 ms

    In [21]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 23.3 ms, sys: 20 ms, total: 43.3 ms
    Wall time: 43.5 ms

    In [22]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 40 ms, sys: 3.33 ms, total: 43.3 ms
    Wall time: 42.3 ms

    In [23]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 30 ms, sys: 490 ms, total: 520 ms
    Wall time: 519 ms

    In [24]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 50 ms, sys: 6.67 ms, total: 56.7 ms
    Wall time: 56.6 ms

    In [25]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 26.7 ms, sys: 1.12 s, total: 1.15 s
    Wall time: 1.14 s

    In [26]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 43.3 ms, sys: 13.3 ms, total: 56.7 ms
    Wall time: 54.6 ms

    In [27]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 20 ms, sys: 1.11 s, total: 1.13 s
    Wall time: 1.12 s

These jumps in time are probably just memory re-allocations.

The actual initialization itself is basically the same when using a
numpy array or a property map:

    In [33]: %time pmap.a[:] = 10
    CPU times: user 30 ms, sys: 0 ns, total: 30 ms
    Wall time: 20.7 ms

    In [34]: %time pmap.a[:] = 10
    CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms
    Wall time: 18.9 ms

    In [35]: %time pmap.a[:] = 10
    CPU times: user 20 ms, sys: 0 ns, total: 20 ms
    Wall time: 20.1 ms

    n [36]: %time vals[:] = 10
    CPU times: user 16.7 ms, sys: 3.33 ms, total: 20 ms
    Wall time: 18.3 ms

    In [37]: %time vals[:] = 10
    CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms
    Wall time: 21.7 ms

    In [38]: %time vals[:] = 10
    CPU times: user 16.7 ms, sys: 0 ns, total: 16.7 ms
    Wall time: 15 ms

Therefore I'm not really sure there is any issue with the initialization.

Best,
Tiago

Francois · March 10, 2015, 3:55pm

Your point on the initialization is perfectly clear, thanks.

Does the libgraph_tool_topology.get_*dists *could suffer from the same
problems of memory re-allocations ? If not, what could be the reasons of
the execution time increase (approx. 23x) observed for
libgraph_tool_topology.get_*dists *?

I'll try to disable openmp.

Best,
François.

2015-03-10 16:33 GMT+01:00 Tiago Peixoto [via Main discussion list for the
graph-tool project] <ml-node+s982480n4026032h49(a)n3.nabble.com>:

attachment.html (11.9 KB)

tiago · March 10, 2015, 8:16pm

No, the vectors are allocated before the function is called. I would
rather wait for you to try without openmp to be sure it is not
interfering.

Best,
Tiago

Francois · March 10, 2015, 9:46pm

Hello,

I've time the shortest_distance on one hundred nodes in both graph, with
the graph_tool.openmp_set_num_threads(1). The results are comparable to my
previous measures, with 11ms in avg. for the small graph and 320ms in avg.
for the full graph.

Best,
f

> Does the libgraph_tool_topology.get_*dists *could suffer from the same
> problems of memory re-allocations ? If not, what could be the reasons
> of the execution time increase (approx. 23x) observed for
> libgraph_tool_topology.get_*dists *?

No, the vectors are allocated before the function is called. I would
rather wait for you to try without openmp to be sure it is not
interfering.

Best,
Tiago

--

Tiago de Paula Peixoto <[hidden email]

attachment.html (5.78 KB)

Francois · March 16, 2015, 11:07am

Hello,

Any further ideas on this ?

Best,
f.

tiago · March 18, 2015, 9:00am

There is not anything left in the code other than the initialization of
the distances, so I assume that is where the difference comes from. I am
not sure how one could improve it.

Best,
Tiago

Francois · September 1, 2016, 12:25pm

Hello,

I'm exhuming this rather old thread because I came across a related thread
in the boost mailing list
<https://groups.google.com/forum/#!topic/boost-list/7nCzvkkEXCk> .

If I get right, the problem is the same that I'm dealing with. The
initialization of the distance / predecessor maps takes more time than the
actual Dijkstra search because a very small region of the graph is visited.
I guess that it is particularly true when the distance / predecessor maps do
not fit in CPU cache.

Could you go through the proposed solution? If you confirm that the solution
is suitable, I'll implement it in the bfs / dijkstra visitors.

Bests,
f.

tiago · September 5, 2016, 9:41am

I think the simplest solution would be to let the user specify the
distance and predecessor maps, as we have now, together with an "init"
option, that defaults to True. If init=False, the distance and
predecessor maps are assumed to be already initialized.

Best,
Tiago