Eigenvector centrality runtime/complexity

Philipp_Singer · July 27, 2016, 1:14pm

Hi all,

I have run into a runtime/complexity question where I am wondering if
someone can give me insights into why this might happen.

I have conducted the following steps:

1.) Create some random network
2.) Calculate eigenvector centrality
3.) Sample 10% random nodes
4.) Filter graph by nodes
5.) Calculate eigenvector centrality on sample

I have observed that for the sample, the eigenvector centrality
calculation takes much longer, in some cases (dependent e.g., on block
structure) it takes way longer (like 30 times longer).

I am now trying to figure out why this is the case. I assume it has
something to do with the convergence which might be probably because
links are missing in the sample. If I do the same for e.g., PageRank the
difference is not that drastic (it still takes longer in the sample).

Does anyone have an idea what is going on here?

Thanks,
Philipp

solstag · July 27, 2016, 3:01pm

HI Philipp,

Can you provide minimal sample code that demonstrates your observations?

That usually improves tenfold the chance that anyone will be able to help.

Cheers,
ale

attachment.html (1.87 KB)

Philipp_Singer · July 27, 2016, 3:06pm

Sure, sorry for that:

attachment.html (4.11 KB)

tiago · July 27, 2016, 3:16pm

If you want to check if the difference is due to graph filtering, you
can compare with:

u = gt.Graph(g_sample, prune=True)
gt.eigenvector(u)

Best,
Tiago

Philipp_Singer · July 27, 2016, 3:35pm

Thanks for the hint, I tried that but it is the same runtime as on the
filtered graph.

Best, Philipp

attachment.html (1.63 KB)

solstag · July 27, 2016, 3:43pm

Hi Philipp,

Not with the sample you provided.

Trying here, time decreases below that for the original graph.

(I had to change the sample to increase n and sample size, otherwise it was
just too fast to capture)

In [1]: %paste

import random
import graph_tool.all as gt
n=100000
g = gt.price_network(n,10,directed=False)
%time gt.eigenvector(g)
def node_sample(g, g_vertices, n_samples):
return random.sample(g_vertices, n_samples)
sample = node_sample(g, [x for x in g.vertices()], int(n/10))
vfilt = g.new_vertex_property('bool')
for s in sample:
vfilt[s] = True
g_sample = gt.GraphView(g, vfilt=vfilt)
%time gt.eigenvector(g_sample)
u = gt.Graph(g_sample, prune=True)
%time gt.eigenvector(u)

## -- End pasted text --
CPU times: user 747 ms, sys: 0 ns, total: 747 ms
Wall time: 748 ms
CPU times: user 12.2 s, sys: 0 ns, total: 12.2 s
Wall time: 12.2 s
CPU times: user 335 ms, sys: 0 ns, total: 335 ms
Wall time: 335 ms

Cheers
.~´

attachment.html (3.18 KB)

tiago · July 27, 2016, 3:43pm

The time difference is because of the convergence of the algorithm.

As mentioned in the documentation, the convergence speed of
eigenvector() is a function of the spectral gap of the graph. By
sub-sampling the network, and thus removing most of the edges, the
spectral gap changes significantly.

Best,
Tiago

Philipp_Singer · July 27, 2016, 4:33pm

Okay, that is really odd. Your results indicate that it is really slow
on the filtered graph, but fast on the pruned graph.

For me, the results are always the same for the filtered and pruned.

What I found out though is that my anaconda graph tool is much faster
for the sample, different to the global graph tool installation. I am
also running it with multiple threads. But anyways, the sample should
seldomly be that much slower than the full dataset.

Strange, I'll try to find out more.

Best,

Philipp

attachment.html (5.59 KB)