Hello Tiago, 

Thanks a lot for swift update. I timed the 2.22 against commit bf174831 (compiled with CXXFLAGS=-O3). That does not give an accurate drill 

On my typical ~7e6 vertices graph, for a typical search that reach about 1e5 vertices, the improvement is about 20%. We have 83ms (average over 100 runs) for the 2.22 release versus 69ms for commit bf174831. Interestingly, the bf174831 version is more stable with std = 4ms versus 6ms for the 2.22 release.

What I understand from the boost thread is that we could have only initialization for the whole lifespan of the program. If I get it right, their proposal would, prior to a search, to "re-set" the values from distances vector and predecessors vector only for the reached vertices of the last search. I think it worth a try because, from what I timed the python side of initialization could cost around 18ms (5ms for distance map, 13ms for predecessor map) see timeit snipet below.

import timeit
# pred map
timeit.timeit(setup='import numpy as np; x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.arange(7e6)', number=1000)/1000
# distances
timeit.timeit(setup='import numpy as np; x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.inf)', number=1000)/1000

I did the above timings on a modern work station, to switch to a cloud provider induces a non negligible overhead (+63% for the distances / +469% for the pred map).

What do you think ?

F.



Le mer. 28 juin 2017 à 18:58, Tiago de Paula Peixoto <tiago@skewed.de> a écrit :
On 28.06.2017 15:00, François Kawala wrote:
>
> I try to dig up an old idea that we've discussed previously (the thread is
> here
> <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-td4026018.html>).
>
> summary: when one runs shortest distance on a graph with several million
> vertices the distance initialization cost is high with respect to the actual
> Dijsktra cost. We want to circumvent the initialization [see L326 of
> src/graph/topology/graph_distance.cc
> <https://git.skewed.de/count0/graph-tool/blob/master/src/graph/topology/graph_distance.cc#L326>]
>
> The proposed approach (as per this boost thread
> <https://groups.google.com/forum/#%21topic/boost-list/7nCzvkkEXCk>) is to
> use the `dijkstra_shortest_paths_no_color_map_no_init`. The distance
> initialization is done "on the fly" by the `examine_edge` function. More
> precisely, the Dijkstra visitor maintains a set of discovered vertices. This
> set is read in the `examine_edge` function to set the distance to infinity
> when a the target vertex does not belong to the set of discovered vertices.
> The `examine_edge` function is also responsible to update the  set of
> discovered vertices.

I've pushed a commit:

   https://git.skewed.de/count0/graph-tool/commit/bf174831

that replaces the search functions by no_init ones. The initialization is
still done, but only once via a single numpy call in the Python side of things.

The speed of the numpy call should be comparable to what we get with
allocation, which is unavoidable.

Please see if the performance improvements you observe with this is acceptable.

Best,
Tiago


--
Tiago de Paula Peixoto <tiago@skewed.de>

_______________________________________________
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool