Your point on the initialization is perfectly clear, thanks.

Does the libgraph_tool_topology.get_dists could suffer from the same problems of memory re-allocations ? If not, what could be the reasons of the execution time increase (approx. 23x) observed for libgraph_tool_topology.get_dists ?

I'll try to disable openmp. 

Best,
François.

2015-03-10 16:33 GMT+01:00 Tiago Peixoto [via Main discussion list for the graph-tool project] <[hidden email]>:
On 10.03.2015 15:39, François wrote:

> Hello,
>
> Firstly thanks for pushing in this update as quickly, that's
> awesome. I tested your fix, It seems that the vector initialization
> was not the only to blame. With the fix the execution time is still
> around 300ms.
>
> I checked that you're fix is used, and "timed" the execution of the *shortest_distance* function up to the call to *libgraph_tool_topology.get_dists*.
>
> In the "sub-sample" graph:
>
>   * the initialization takes in average 3ms;
>   * the the call to libgraph_tool_topology.get_dists* *takes in average 11ms.
>
> In the full graph:
>
>   * the initialization takes in average 100ms;
>   * the  call to libgraph_tool_topology.get_dists* *takes in average 230ms.
Do you have openmp enabled? If yes, it is best to disable it with
openmp_set_num_threads(1) before running the code.

Note, however, that you will always get a slower code for a larger
graph... A 10x slow-down in each function is expected if your graph is
10x larger.

Note also that the distance initialization is done inside get_dists().

> Surprisingly (or not ?) when I change the pmap initialization [link to source <https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/__init__.py#L1248>] to:
>
>     vals = numpy.arange(g.num_vertices(), dtype='int')
>     pmap = g.new_vertex_property("int64_t", vals=vals)
>
> The numpy call takes 36ms and creation of the new vertex property takes 103ms.

These are very tight loops, so things are very sensitive to cache hits,
memory re-allocation issues and so on.  For instance, here I get:

    In [8]: g = lattice([1000, 10000])

    In [9]: g
    Out[9]: <Graph object, undirected, with 10000000 vertices and 19989000 edges at 0x7fad583720b8>

    In [10]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 16.7 ms, sys: 0 ns, total: 16.7 ms
    Wall time: 19.6 ms

    In [11]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 13.3 ms, sys: 960 ms, total: 973 ms
    Wall time: 976 ms

    In [12]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 20 ms, sys: 13.3 ms, total: 33.3 ms
    Wall time: 32.6 ms

    In [13]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 3.33 ms, sys: 950 ms, total: 953 ms
    Wall time: 954 ms

    In [14]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 20 ms, sys: 10 ms, total: 30 ms
    Wall time: 30 ms

    In [15]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 20 ms, sys: 43.3 ms, total: 63.3 ms
    Wall time: 62.1 ms

    In [16]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 23.3 ms, sys: 6.67 ms, total: 30 ms
    Wall time: 28.9 ms

    In [17]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 13.3 ms, sys: 173 ms, total: 187 ms
    Wall time: 186 ms

    In [18]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 20 ms, sys: 13.3 ms, total: 33.3 ms
    Wall time: 30.7 ms

    In [19]: %time vals = numpy.arange(g.num_vertices(), dtype='int')
    CPU times: user 6.67 ms, sys: 853 ms, total: 860 ms
    Wall time: 858 ms

Notice how the timing fluctuates wildly from ~30ms to ~900ms!

With the property map creation, things are similar:

    In [20]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 50 ms, sys: 0 ns, total: 50 ms
    Wall time: 48.2 ms

    In [21]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 23.3 ms, sys: 20 ms, total: 43.3 ms
    Wall time: 43.5 ms

    In [22]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 40 ms, sys: 3.33 ms, total: 43.3 ms
    Wall time: 42.3 ms

    In [23]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 30 ms, sys: 490 ms, total: 520 ms
    Wall time: 519 ms

    In [24]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 50 ms, sys: 6.67 ms, total: 56.7 ms
    Wall time: 56.6 ms

    In [25]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 26.7 ms, sys: 1.12 s, total: 1.15 s
    Wall time: 1.14 s

    In [26]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 43.3 ms, sys: 13.3 ms, total: 56.7 ms
    Wall time: 54.6 ms

    In [27]: %time pmap = g.new_vertex_property("int64_t", vals=vals)
    CPU times: user 20 ms, sys: 1.11 s, total: 1.13 s
    Wall time: 1.12 s

These jumps in time are probably just memory re-allocations.

The actual initialization itself is basically the same when using a
numpy array or a property map:

    In [33]: %time pmap.a[:] = 10
    CPU times: user 30 ms, sys: 0 ns, total: 30 ms
    Wall time: 20.7 ms

    In [34]: %time pmap.a[:] = 10
    CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms
    Wall time: 18.9 ms

    In [35]: %time pmap.a[:] = 10
    CPU times: user 20 ms, sys: 0 ns, total: 20 ms
    Wall time: 20.1 ms

    n [36]: %time vals[:] = 10
    CPU times: user 16.7 ms, sys: 3.33 ms, total: 20 ms
    Wall time: 18.3 ms

    In [37]: %time vals[:] = 10
    CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms
    Wall time: 21.7 ms

    In [38]: %time vals[:] = 10
    CPU times: user 16.7 ms, sys: 0 ns, total: 16.7 ms
    Wall time: 15 ms

Therefore I'm not really sure there is any issue with the initialization.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (836 bytes) Download Attachment
--
Tiago de Paula Peixoto <[hidden email]>



If you reply to this email, your message will be added to the discussion below:
http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-tp4026018p4026032.html
To start a new topic under Main discussion list for the graph-tool project, email [hidden email]
To unsubscribe from Shortest_distance complexity when used with max_dist, click here.
NAML



--
François Kawala


View this message in context: Re: Shortest_distance complexity when used with max_dist
Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.