Average path length - shortest_distance=2147483647

P-M · August 24, 2016, 3:40pm

I am trying to find the average shortest path length for my network. I am
currently trying to find it by running dist = gt.shortest_distance(g) and
then finding the average of the results. I however find that for a lot of
distances I get the value "2147483647" which is seemingly imposisbly large
for my network and crops up in the example results in the documentation too
(I only have 718 vertices and 979 edges in my graph). Does this value have a
special significance? Does it mean there is no path between the given
vertices?

Also, seeing as there is no reference to it in the documentation I presume
that this algorithm has not been parallelised?

Giuseppe_Profiti · August 24, 2016, 4:29pm

The value 2147483647 is the maximum possible 32 bits integer. Then I
think it is used as Infinity (since there is no greater value),
meaning that there is no path between two vertices.
you can see the value also in the example in the documentation:

https://graph-tool.skewed.de/static/doc/topology.html#graph_tool.topology.shortest_distance

Best,
Giuseppe

2016-08-24 17:40 GMT+02:00 P-M <pmj27(a)cam.ac.uk>:

tiago · September 5, 2016, 9:42am

https://graph-tool.skewed.de/static/doc/topology.html#graph_tool.topology.shortest_distance

This is correct. This value is used to mark that there are no paths
between the nodes.

Best,
Tiago

haiko.lietz · March 3, 2023, 10:17pm

Dear all,

a quick follow-up in this thread: When I store the shortest path length for all node pairs in a connected undirected graph g like this

dist = gt.shortest_distance(g)

how can I obtain the average shortest path length without (a) looping through dist and (b) counting the zeros in the diagonal?

There must be something faster than a for loop…

Best wishes

Haiko

tiago · March 4, 2023, 12:34pm

You can get a 2D array from a vector-valued property map with get_2d_array(). The following two computations are equivalent:

g = extract_largest_component(GraphView(collection.data["polblogs"], directed=False), prune=True)
dist = gt.shortest_distance(g)
N = g.num_vertices()
print("average distance:", mean([dist[v].a.sum()/(N-1) for v in g.vertices()]))
d = dist.get_2d_array(arange(N))
print("average distance:", d.sum() / (N*(N-1)))

But note that the first computation is actually faster than obtaining the 2D array and then summing all the elements, despite the python loop!