The proposed approach (as per this boost thread
<https://groups.google.com/forum/#!topic/boost-list/7nCzvkkEXCk>\) is to use
the `dijkstra_shortest_paths_no_color_map_no_init`. The distance
initialization is done "on the fly" by the `examine_edge` function. More
precisely, the Dijkstra visitor maintains a set of discovered vertices.
This set is read in the `examine_edge` function to set the distance to
infinity when a the target vertex does not belong to the set of discovered
vertices. The `examine_edge` function is also responsible to update the set
of discovered vertices.
- define a new Dijkstra visitor named `djk_max_visitor_no_init` (and
maybe a second one to handle multiple targets scenario)
- update the `do_djk_search` function to use
`dijkstra_shortest_paths_no_color_map_no_init`
- update the python shortest_distance (add the init-or-not switch)
How does it sounds ? Am I missing something ?
I have two side questions :
- what is the "standard" development environment ? I tried to use CLION
<https://www.jetbrains.com/clion/> but I cannot figure out how to
configure it properly so that all the boost dependencies are correctly
resolved.
- how do you debug during development ?
summary: when one runs shortest distance on a graph with several million
vertices the distance initialization cost is high with respect to the actual
Dijsktra cost. We want to circumvent the initialization [see L326 of
src/graph/topology/graph_distance.cc
<src/graph/topology/graph_distance.cc · master · Tiago Peixoto / graph-tool · GitLab]
The proposed approach (as per this boost thread
<Redirecting to Google Groups) is to use
the `dijkstra_shortest_paths_no_color_map_no_init`. The distance
initialization is done "on the fly" by the `examine_edge` function. More
precisely, the Dijkstra visitor maintains a set of discovered vertices. This
set is read in the `examine_edge` function to set the distance to infinity
when a the target vertex does not belong to the set of discovered vertices.
The `examine_edge` function is also responsible to update the set of
discovered vertices.
* define a new Dijkstra visitor named `djk_max_visitor_no_init` (and maybe
a second one to handle multiple targets scenario)
* update the `do_djk_search` function to use
`dijkstra_shortest_paths_no_color_map_no_init`
* update the python shortest_distance (add the init-or-not switch)
How does it sounds ? Am I missing something ?
I don't see the drawback of always using the no_init version, with a single
visitor that always does implicit initialization. The init=True version
would be redundant, but at almost no extra cost. If the extra cost is
important, the user should be using init=False anyways...
I have two side questions :
* what is the "standard" development environment ? I tried to use CLION
<https://www.jetbrains.com/clion/> but I cannot figure out how to
configure it properly so that all the boost dependencies are correctly
resolved.
There is no such thing... I just use a text editor (emacs).
that replaces the search functions by no_init ones. The initialization is
still done, but only once via a single numpy call in the Python side of things.
The speed of the numpy call should be comparable to what we get with
allocation, which is unavoidable.
Please see if the performance improvements you observe with this is acceptable.
On my typical ~7e6 vertices graph, for a typical search that reach about
1e5 vertices, the improvement is about 20%. We have 83ms (average over 100
runs) for the 2.22 release versus 69ms for commit bf174831. Interestingly,
the bf174831 version is more stable with std = 4ms versus 6ms for the 2.22
release.
What I understand from the boost thread
<https://groups.google.com/forum/#!topic/boost-list/7nCzvkkEXCk> is that
we could have only initialization for the whole lifespan of the program. If
I get it right, their proposal would, prior to a search, to "re-set" the
values from distances vector and predecessors vector only for the reached
vertices of the last search. I think it worth a try because, from what I
timed the python side of initialization could cost around 18ms (5ms for
distance map, 13ms for predecessor map) see timeit snipet below.
import timeit
# pred map
timeit.timeit(setup='import numpy as np;
x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.arange(7e6)',
number=1000)/1000
# distances
timeit.timeit(setup='import numpy as np;
x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.inf)',
number=1000)/1000
I did the above timings on a modern work station, to switch to a cloud
provider induces a non negligible overhead (+63% for the distances / +469%
for the pred map).
On my typical ~7e6 vertices graph, for a typical search that reach about 1e5
vertices, the improvement is about 20%. We have 83ms (average over 100 runs)
for the 2.22 release versus 69ms for commit bf174831. Interestingly, the
bf174831 version is more stable with std = 4ms versus 6ms for the 2.22 release.
I wonder how much time is spent in the actual search vs. initialization. If
you use something like cProfile you can see how much time is spent in the
actual C++ function vs. the rest. This would give us an idea of how much
actual improvement can be done.
What I understand from the boost thread
<Redirecting to Google Groups; is that
we could have only initialization for the whole lifespan of the program. If
I get it right, their proposal would, prior to a search, to "re-set" the
values from distances vector and predecessors vector only for the reached
vertices of the last search. I think it worth a try because, from what I
timed the python side of initialization could cost around 18ms (5ms for
distance map, 13ms for predecessor map) see timeit snipet below.
In order for this to work, the function would need to keep and return a list
of reached vertices. This is is not difficult to do, but I wonder how much
it would actually improve...
About the run time in cloud computing scenario my measure might be useless.
A cloud instance is a VM that runs on I don't know what hardware and might
be shared with I don't know who. So even if inside my VM the CPU was idle,
I don't now how the CPU was used by other VMs. That being said, for our
particular cloud service provider we observe (on a daily basis for several
months) that the performances are below what we have on our modern
workstations.
Just to be clear : I'm neither complaining about graph tool performances,
nor thinking about improvements that would be cloud computing driven. I
just wanted to make clear that modest improvement (as measured on a modern
work station) might be very meaningful in another environment.
In order to profile the initialization I moved the initialization code in
two functions init_pmap and init_dist, and did 100 runs of
shortest_distance with
identical parameters, cProfiles gives :
23484 function calls (23331 primitive calls) in 6.733 seconds
Ordered by: cumulative time
> we could have only ***one*** initialization for the whole lifespan of
the program. If
> I get it right, their proposal would, prior to a search, to "re-set" the
> values from distances vector and predecessors vector only for the reached
> vertices of the last search. I think it worth a try because, from what I
> timed the python side of initialization could cost around 18ms (5ms for
> distance map, 13ms for predecessor map) see timeit snipet below.
In order for this to work, the function would need to keep and return a
list
of reached vertices. This is is not difficult to do, but I wonder how much
it would actually improve...
To elaborate on this, a quick an dirty profiling (with the timeit module)
gives 0.27 ms to reset 10e5 random items to np.inf in a numpy array with
7e6 items.
I'm confident that this improvement worth to be done and I can try to do
it.
Once initialized, the distance map, the predecessor map, and the "last
reached index" would be internal property maps, correct ? Would it be
relevant to do the "re-set" in the CPP part ?
I cProfiled old versus your last commit. The graph has 7 953 158 vertices,
I sample 100 vertices and do shortest_distance from each one. I specify no
target, and set the max_dist parameter. After each call to
shortest_distance, I use the reached array to reset the predecessor map. In
average each search reaches 120177 vertices.
I cProfiled old versus your last commit. The graph has 7 953 158 vertices, I
sample 100 vertices and do shortest_distance from each one. I specify no
target, and set the max_dist parameter. After each call to
shortest_distance, I use the reached array to reset the predecessor map. In
average each search reaches 120177 vertices.
Did you do the same with the predecessor map as well?
We could have a do_initialization parameter, I think it would be more
explicit, see my proposal
<Sign in · GitLab.
I don't like this very much. The list of reached vertices might be useful
even if the user is not interested in the no_init optimization, so I think
it is better decoupled. Furthermore, you are ignoring the predecessor map,
which will give a cryptic error if the user chooses do_initialization=False
and forgets to pass pred_map, and will ignore the value passed by the user
otherwise.
> I cProfiled old versus your last commit. The graph has 7 953 158
vertices, I
> sample 100 vertices and do shortest_distance from each one. I specify no
> target, and set the max_dist parameter. After each call to
> shortest_distance, I use the reached array to reset the predecessor map.
In
> average each search reaches 120177 vertices.
>
> * 2.22 : 9.101 seconds
> * commit 26404ef4 : 4.536 seconds
> * commit 26404ef4 + no dist map init : 4.141 seconds
Well, the last improvement is only about 9%... We're clearly seeing
diminishing returns. As I had expected, numpy initialization is pretty
fast.
Yes, numpy does its job quite well, still 9% is non-negligible to me.
Did you do the same with the predecessor map as well?
Yep (see above).
> We could have a do_initialization parameter, I think it would be more
> explicit, see my proposal
> < Sign in · GitLab
>.
I don't like this very much. The list of reached vertices might be useful
even if the user is not interested in the no_init optimization, so I think
it is better decoupled. Furthermore, you are ignoring the predecessor map,
which will give a cryptic error if the user chooses do_initialization=False
and forgets to pass pred_map, and will ignore the value passed by the user
otherwise.
I agree. We could split the function into "shortest_distance" and
"shortest_distance_no_init". But, if I'm the sole user of this function
it'll be much more sensible that I implement it my projet.
Another related question, why it is necessary to copy the reached array ?
Because ownership issues. The function works with a std::vector<>
internally, because numpy's ndarrays cannot be dynamically increased. We
could return a wrapped std::vector (called Vector_size_t internally in
graph-tool), but these are not documented. So instead, the std::vector's
contents are copied as a ndarray.