do_djk_search without distance initialization

Francois_Kawala · June 28, 2017, 2:00pm

Hello,

I try to dig up an old idea that we've discussed previously (the thread is
here
<http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-td4026018.html>
).

summary: when one runs shortest distance on a graph with several million
vertices the distance initialization cost is high with respect to the
actual Dijsktra cost. We want to circumvent the initialization [see L326 of
src/graph/topology/graph_distance.cc
<https://git.skewed.de/count0/graph-tool/blob/master/src/graph/topology/graph_distance.cc#L326>
]

The proposed approach (as per this boost thread
<https://groups.google.com/forum/#!topic/boost-list/7nCzvkkEXCk>\) is to use
the `dijkstra_shortest_paths_no_color_map_no_init`. The distance
initialization is done "on the fly" by the `examine_edge` function. More
precisely, the Dijkstra visitor maintains a set of discovered vertices.
This set is read in the `examine_edge` function to set the distance to
infinity when a the target vertex does not belong to the set of discovered
vertices. The `examine_edge` function is also responsible to update the set
of discovered vertices.

As proposed in the initial thread, we would like to add a init-or-not
switch to the shortest_distance parameters (see L1568 of
/src/graph_tool/topology/__init__.py
<https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/__init__.py#L1568>
).

I plan to :

   - define a new Dijkstra visitor named `djk_max_visitor_no_init` (and
   maybe a second one to handle multiple targets scenario)
   - update the `do_djk_search` function to use
   `dijkstra_shortest_paths_no_color_map_no_init`
   - update the python shortest_distance (add the init-or-not switch)

How does it sounds ? Am I missing something ?

I have two side questions :

   - what is the "standard" development environment ? I tried to use CLION
   <https://www.jetbrains.com/clion/> but I cannot figure out how to
   configure it properly so that all the boost dependencies are correctly
   resolved.
   - how do you debug during development ?

Best regards,
François.

attachment.html (2.53 KB)

tiago · June 28, 2017, 2:26pm

Hello,

I try to dig up an old idea that we've discussed previously (the thread is
here
<http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-td4026018.html>\).

summary: when one runs shortest distance on a graph with several million
vertices the distance initialization cost is high with respect to the actual
Dijsktra cost. We want to circumvent the initialization [see L326 of
src/graph/topology/graph_distance.cc
<src/graph/topology/graph_distance.cc · master · Tiago Peixoto / graph-tool · GitLab]

The proposed approach (as per this boost thread
<Redirecting to Google Groups) is to use
the `dijkstra_shortest_paths_no_color_map_no_init`. The distance
initialization is done "on the fly" by the `examine_edge` function. More
precisely, the Dijkstra visitor maintains a set of discovered vertices. This
set is read in the `examine_edge` function to set the distance to infinity
when a the target vertex does not belong to the set of discovered vertices.
The `examine_edge` function is also responsible to update the set of
discovered vertices.

As proposed in the initial thread, we would like to add a init-or-not switch
to the shortest_distance parameters (see L1568 of
/src/graph_tool/topology/__init__.py
<src/graph_tool/topology/__init__.py · master · Tiago Peixoto / graph-tool · GitLab).

I plan to :

  * define a new Dijkstra visitor named `djk_max_visitor_no_init` (and maybe
    a second one to handle multiple targets scenario)
  * update the `do_djk_search` function to use
    `dijkstra_shortest_paths_no_color_map_no_init`
  * update the python shortest_distance (add the init-or-not switch)

How does it sounds ? Am I missing something ?

I don't see the drawback of always using the no_init version, with a single
visitor that always does implicit initialization. The init=True version
would be redundant, but at almost no extra cost. If the extra cost is
important, the user should be using init=False anyways...

I have two side questions :

  * what is the "standard" development environment ? I tried to use CLION
    <https://www.jetbrains.com/clion/> but I cannot figure out how to
    configure it properly so that all the boost dependencies are correctly
    resolved.

There is no such thing... I just use a text editor (emacs).

* how do you debug during development ?

GDB, valgrind, etc.

Best,
Tiago

tiago · June 28, 2017, 4:56pm

I've pushed a commit:

https://git.skewed.de/count0/graph-tool/commit/bf174831

that replaces the search functions by no_init ones. The initialization is
still done, but only once via a single numpy call in the Python side of things.

The speed of the numpy call should be comparable to what we get with
allocation, which is unavoidable.

Please see if the performance improvements you observe with this is acceptable.

Best,
Tiago

Francois_Kawala · June 29, 2017, 10:58am

Hello Tiago,

Thanks a lot for swift update. I timed the 2.22 against commit bf174831
<https://git.skewed.de/count0/graph-tool/commit/bf174831> (compiled with
CXXFLAGS=-O3). That does not give an accurate drill

On my typical ~7e6 vertices graph, for a typical search that reach about
1e5 vertices, the improvement is about 20%. We have 83ms (average over 100
runs) for the 2.22 release versus 69ms for commit bf174831. Interestingly,
the bf174831 version is more stable with std = 4ms versus 6ms for the 2.22
release.

What I understand from the boost thread
<https://groups.google.com/forum/#!topic/boost-list/7nCzvkkEXCk> is that
we could have only initialization for the whole lifespan of the program. If
I get it right, their proposal would, prior to a search, to "re-set" the
values from distances vector and predecessors vector only for the reached
vertices of the last search. I think it worth a try because, from what I
timed the python side of initialization could cost around 18ms (5ms for
distance map, 13ms for predecessor map) see timeit snipet below.

import timeit
# pred map
timeit.timeit(setup='import numpy as np;
x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.arange(7e6)',
number=1000)/1000
# distances
timeit.timeit(setup='import numpy as np;
x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.inf)',
number=1000)/1000

I did the above timings on a modern work station, to switch to a cloud
provider induces a non negligible overhead (+63% for the distances / +469%
for the pred map).

What do you think ?

F.

attachment.html (5.06 KB)

tiago · June 29, 2017, 11:14am

On my typical ~7e6 vertices graph, for a typical search that reach about 1e5
vertices, the improvement is about 20%. We have 83ms (average over 100 runs)
for the 2.22 release versus 69ms for commit bf174831. Interestingly, the
bf174831 version is more stable with std = 4ms versus 6ms for the 2.22 release.

I wonder how much time is spent in the actual search vs. initialization. If
you use something like cProfile you can see how much time is spent in the
actual C++ function vs. the rest. This would give us an idea of how much
actual improvement can be done.

What I understand from the boost thread
<Redirecting to Google Groups; is that
we could have only initialization for the whole lifespan of the program. If
I get it right, their proposal would, prior to a search, to "re-set" the
values from distances vector and predecessors vector only for the reached
vertices of the last search. I think it worth a try because, from what I
timed the python side of initialization could cost around 18ms (5ms for
distance map, 13ms for predecessor map) see timeit snipet below.

In order for this to work, the function would need to keep and return a list
of reached vertices. This is is not difficult to do, but I wonder how much
it would actually improve...

Best,
Tiago

tiago · June 29, 2017, 11:16am

This I don't understand at all. Why would the behavior in a "cloud"
environment be any different?

Francois_Kawala · June 29, 2017, 12:32pm

About the run time in cloud computing scenario my measure might be useless.
A cloud instance is a VM that runs on I don't know what hardware and might
be shared with I don't know who. So even if inside my VM the CPU was idle,
I don't now how the CPU was used by other VMs. That being said, for our
particular cloud service provider we observe (on a daily basis for several
months) that the performances are below what we have on our modern
workstations.

Just to be clear : I'm neither complaining about graph tool performances,
nor thinking about improvements that would be cloud computing driven. I
just wanted to make clear that modest improvement (as measured on a modern
work station) might be very meaningful in another environment.

In order to profile the initialization I moved the initialization code in
two functions init_pmap and init_dist, and did 100 runs of
shortest_distance with
identical parameters, cProfiles gives :

23484 function calls (23331 primitive calls) in 6.733 seconds
Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
100 3.723 0.037 6.731 0.067
graph_tool/topology/__init__.py:1512(shortest_distance)
100 0.001 0.000 1.833 0.018
graph_tool/topology/__init__.py:1507(init_pmap)
100 0.000 0.000 1.833 0.018
graph_tool/decorators.py:1(copy_property)
200/100 0.003 0.000 1.832 0.018
graph_tool/decorators.py:126(wrapper)
100 1.275 0.013 1.827 0.018
graph_tool/__init__.py:2348(copy_property)
100 0.001 0.000 1.163 0.012
graph_tool/topology/__init__.py:1500(init_dist)
100 0.000 0.000 0.645 0.006
graph_tool/__init__.py:661(<lambda>)
100 0.643 0.006 0.644 0.006
graph_tool/__init__.py:614(__get_set_f_array)
700 0.001 0.000 0.540 0.001 graph_tool/__init__.py:166(_prop)
700 0.003 0.000 0.538 0.001
graph_tool/__init__.py:392(_get_any)
700 0.534 0.001 0.534 0.001
graph_tool/__init__.py:814(reserve)
200 0.000 0.000 0.518 0.003
graph_tool/__init__.py:559(get_array)
200 0.515 0.003 0.517 0.003
graph_tool/__init__.py:581(_get_data)
200 0.004 0.000 0.011 0.000
graph_tool/__init__.py:2289(new_vertex_property)
200 0.002 0.000 0.010 0.000
graph_tool/__init__.py:3071(__init__)
600 0.002 0.000 0.008 0.000
graph_tool/__init__.py:377(__init__)
200 0.002 0.000 0.007 0.000
graph_tool/__init__.py:1531(__init__)
100 0.000 0.000 0.006 0.000
graph_tool/__init__.py:2274(new_property)
800 0.002 0.000 0.004 0.000
graph_tool/__init__.py:202(_type_alias)
599 0.001 0.000 0.003 0.000
graph_tool/__init__.py:437(__del__)
600 0.001 0.000 0.002 0.000
graph_tool/__init__.py:264(_converter)
599 0.002 0.000 0.002 0.000
graph_tool/__init__.py:432(__unregister_map)
200 0.001 0.000 0.002 0.000
graph_tool/__init__.py:909(__new__)

The complete cprofile output is attached to this mail.

attachment.html (5.75 KB)

profile.bin (7.01 KB)

Francois_Kawala · June 29, 2017, 4:22pm

> we could have only ***one*** initialization for the whole lifespan of
the program. If

> I get it right, their proposal would, prior to a search, to "re-set" the
> values from distances vector and predecessors vector only for the reached
> vertices of the last search. I think it worth a try because, from what I
> timed the python side of initialization could cost around 18ms (5ms for
> distance map, 13ms for predecessor map) see timeit snipet below.

In order for this to work, the function would need to keep and return a
list
of reached vertices. This is is not difficult to do, but I wonder how much
it would actually improve...

To elaborate on this, a quick an dirty profiling (with the timeit module)
gives 0.27 ms to reset 10e5 random items to np.inf in a numpy array with
7e6 items.

I'm confident that this improvement worth to be done and I can try to do
it.

Once initialized, the distance map, the predecessor map, and the "last
reached index" would be internal property maps, correct ? Would it be
relevant to do the "re-set" in the CPP part ?

François

attachment.html (3.11 KB)

Francois_Kawala · June 30, 2017, 4:44pm

I realize that I'm not familiar enough with boost to do this change.

From what I get, I'll add three private members : _distance and

_predecessor they would be initialized as follows :

_distance(get(vertex_distance, *_mg)),
_predecessor(get(predecessor, *_mg)),

I don't know if the _distance member should be filled with zeros ?

The last private member would be _reach a std::vector<size_t>

From that I'll declare two functions :

set_reached to update the reached vertices
reset_distance to reset the _distance, _predecessor and _reach to
their default values.

Does that sounds right ? If so, I'll guess that I'll have to make
the do_djk_search function to call the set_reached and reset_distance
functions.

Am I missing something ?

Sorry for this stuttering approach.

François.

attachment.html (5.08 KB)

tiago · June 30, 2017, 7:57pm

I've just pushed a commit that implements what you want; you can look inside
for the details.

In short, you can now do multiple searches as follows:

dist_map = g.new_vp("double", numpy.inf) # must be infinity
pred_map = g.vertex_index.copy() # must be identity

for source in sources:

# the following does not initialize dist_map and pred_map, and
# returns an array of reached vertices

        dist_map, pred_map, reached = \
            shortest_distance(g, source, weights=w, pred_map=pred_map,
                              dist_map=dist_map, return_reached=True)

# reset property maps for next search; this is O(len(reached))

dist_map.a[reached] = numpy.inf
pred_map.a[reached] = reached

Please tell me if this brings the improvements you were seeking.

Best,
Tiago

Francois_Kawala · July 3, 2017, 10:16am

I cProfiled old versus your last commit. The graph has 7 953 158 vertices,
I sample 100 vertices and do shortest_distance from each one. I specify no
target, and set the max_dist parameter. After each call to
shortest_distance, I use the reached array to reset the predecessor map. In
average each search reaches 120177 vertices.

   - 2.22 : 9.101 seconds
   - commit 26404ef4 : 4.536 seconds
   - commit 26404ef4 + no dist map init : 4.141 seconds

That more than twice faster, great news !

About the third configuration (no dist map init), I removed the distance
map initialization (graph_tool/topology/__init__.py L1779
<https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/__init__.py#L1779>\)
and used the reached array to reset dist_map to infinity.

We could have a do_initialization parameter, I think it would be more
explicit, see my proposal
<https://git.skewed.de/francois/graph-tool/commit/946826546a80f1f080681a4eaaa5fb2590b5abd4>
.

François.

attachment.html (4.05 KB)

tiago · July 3, 2017, 10:59am

I cProfiled old versus your last commit. The graph has 7 953 158 vertices, I
sample 100 vertices and do shortest_distance from each one. I specify no
target, and set the max_dist parameter. After each call to
shortest_distance, I use the reached array to reset the predecessor map. In
average each search reaches 120177 vertices.

  * 2.22 : 9.101 seconds
  * commit 26404ef4 : 4.536 seconds
  * commit 26404ef4 + no dist map init : 4.141 seconds

Well, the last improvement is only about 9%... We're clearly seeing
diminishing returns. As I had expected, numpy initialization is pretty fast.

About the third configuration (no dist map init), I removed the distance map
initialization (graph_tool/topology/__init__.py L1779
<src/graph_tool/topology/__init__.py · master · Tiago Peixoto / graph-tool · GitLab)
and used the reached array to reset dist_map to infinity.

Did you do the same with the predecessor map as well?

We could have a do_initialization parameter, I think it would be more
explicit, see my proposal
<Sign in · GitLab.

I don't like this very much. The list of reached vertices might be useful
even if the user is not interested in the no_init optimization, so I think
it is better decoupled. Furthermore, you are ignoring the predecessor map,
which will give a cryptic error if the user chooses do_initialization=False
and forgets to pass pred_map, and will ignore the value passed by the user
otherwise.

Francois_Kawala · July 3, 2017, 12:06pm

> I cProfiled old versus your last commit. The graph has 7 953 158
vertices, I
> sample 100 vertices and do shortest_distance from each one. I specify no
> target, and set the max_dist parameter. After each call to
> shortest_distance, I use the reached array to reset the predecessor map.
In
> average each search reaches 120177 vertices.
>
> * 2.22 : 9.101 seconds
> * commit 26404ef4 : 4.536 seconds
> * commit 26404ef4 + no dist map init : 4.141 seconds

Well, the last improvement is only about 9%... We're clearly seeing
diminishing returns. As I had expected, numpy initialization is pretty
fast.

Yes, numpy does its job quite well, still 9% is non-negligible to me.

> About the third configuration (no dist map init), I removed the distance
map
> initialization (graph_tool/topology/__init__.py L1779
> <
src/graph_tool/topology/__init__.py · master · Tiago Peixoto / graph-tool · GitLab
>)
> and used the reached array to reset dist_map to infinity.

Did you do the same with the predecessor map as well?

Yep (see above).

> We could have a do_initialization parameter, I think it would be more
> explicit, see my proposal
> <
Sign in · GitLab
>.

I don't like this very much. The list of reached vertices might be useful
even if the user is not interested in the no_init optimization, so I think
it is better decoupled. Furthermore, you are ignoring the predecessor map,
which will give a cryptic error if the user chooses do_initialization=False
and forgets to pass pred_map, and will ignore the value passed by the user
otherwise.

I agree. We could split the function into "shortest_distance" and
"shortest_distance_no_init". But, if I'm the sole user of this function
it'll be much more sensible that I implement it my projet.

Another related question, why it is necessary to copy the reached array ?

Regards,
François

attachment.html (4.22 KB)

tiago · July 3, 2017, 12:17pm

Because ownership issues. The function works with a std::vector<>
internally, because numpy's ndarrays cannot be dynamically increased. We
could return a wrapped std::vector (called Vector_size_t internally in
graph-tool), but these are not documented. So instead, the std::vector's
contents are copied as a ndarray.