Performance question

I have a graph with 1034 vertices and 53498 edges. I'm manually computing
the preferential attachement index for the vertices, and other indices. I'm
aware that graph-tool has that implemented but I'm doing it for personal
stuff. However I noticed that my computations are very slow. It took 2.7
minutes to compute that for the mentioned graph. I'm not sure if it's my
algorithm that is slow or the something is wrong with graph-tool. I would
be very thankful if someone could have a little look into my code.

def pa(graph):

    """

        Calculates Preferential Attachment index.

        Returns S the similarity matrix.

    """

    A = gts.adjacency(graph)

    S = np.zeros(A.shape)

    for i in xrange(S.shape[0]):

        for j in xrange(S.shape[0]):

            i_degree = graph.vertex(i).out_degree()

            j_degree = graph.vertex(j).out_degree()

            factor = i_degree * j_degree

            S[i,j] = factor

    return S

attachment.html (1.69 KB)

Maybe graph.vertex(i).out_degree() is slow itself? If so, should I store
all the degrees in a matrix then?

attachment.html (1.75 KB)

Hi,

There is a propertymap that contains the out degrees of each vertex, it
might be much faster to access it, i.e.:

`i_degree = graph.degree_property_map('out')[i]`

I guess I would also iterate over the edges rather than the indices of
the adjacency matrix...

G.

attachment.html (3.23 KB)

Indeed out_degree() is very very slow. I just replaced it with a constant
and it finished the computation in 0.5 seconds compared to the 2.7
minutes!!! But I guess what you mentioned is also slow since 'graph.
degree_property_map('out')[i]' since assumes 'i' is a vertex object and not
the index of the vertex. In that case I need to call graph.vertex(index)
and get the vertex and then call degree_property_map('out')[vertex]. I
guess this is still very slow!

attachment.html (4.4 KB)

This is not the best way to do this. You have only 53498 nonzero entries
in your matrix, which has 1034^2 = 1069156 entries in total. In the loop
above you spend the vast majority of the time setting the zero entries
of the matrix... The best way is to iterate through the edges. Note that
you do not need the adjacency matrix at all.

    N = g.num_vertices()
    S = np.zeros((N, N))
    d = g.degree_property_map()

    for e in g.edges():
        s, t = tuple(e)
        S[int(s), int(t)] = d[s] * d[t]

    return S

Using graph-tool will not magically make things faster; you still need
to use the best algorithms. And furthermore, if your computationally
intensive loops are implemented in pure Python, graph-tool will provide
no advantage at all when compared to networkx, or other pure python
modules.

Best,
Tiago

That should have been:

    d = g.degree_property_map("out")

This is not a good idea, since "g.degree_property_map('out')" will
create an entire property map anew (which is O(N)) each time you want a
single degree (which should be O(1)). It is best to compute the property
map before the loop, and just look it up when you need it.

    d = graph.degree_property_map('out') # this is O(N)
    ...

    k = d[v] # this is O(1)

Best,
Tiago

You can also access scalar property maps with indexes instead of
descriptors. The following two are equivalent:

    d = g.degree_property_map("out")
    k = d[g.vertex(0)] # degree of vertex 0
    k = d.a[0] # degree of vertex 0

Best,
Tiago

Sure, that escaped me!

Thanks for pointing this out (and now I might need to look back at my
code to see if I don't do that kind of thing ;-))

Best,

Guillaume

attachment.html (1.69 KB)