Computing Page Rank over the Largest Connected Component

Hello to everyone,

I am trying to compute the Page Rank of the nodes of the Largest CC of the ‘cond-mat-2003’ graph. It seems like I am unable to compute them over only the Largest CC.

First, I extract the largest connected component:

   g = collection.data[cond-mat-2003]
   l = topology.label_largest_component(g)
   g = GraphView(g, vfilt=l) OR g.set_vertex_filter(l)

And then calculate the PR
   PR = centrality.pagerank(g, damping=0.99, epsilon=1e-06)

But,
The length of the PR.a is equal to the number of nodes of the whole graph and even more alarming, none of its entries is equal to zero.
This makes me suspect that PR is computed over the whole initial graph. Is this true?

Thanks a lot,
Panos

attachment.html (2.18 KB)

Hello to everyone,

I am trying to compute the Page Rank of the nodes of the *Largest CC* of the ‘cond-mat-2003’ graph. It seems like I am unable to compute them over *only* the Largest CC.

First, I extract the largest connected component:

   g = collection.data[cond-mat-2003]
   l = topology.label_largest_component(g)
   g = GraphView(g, vfilt=l) OR g.set_vertex_filter(l)

And then calculate the PR
   PR = centrality.pagerank(g, damping=0.99, epsilon=1e-06)

But,
The length of the PR.a is equal to the number of nodes of the whole graph and even more alarming, none of its entries is equal to zero.

This is always true, the length of the array will always be the same as
the unfiltered graph. If you want only the filtered values, you need to
use the .fa attribute:

    PR.fa

This makes me suspect that PR is computed over the whole initial
graph. Is this true?

No, the entire property is _initialized_ at the beginning, but the
values corresponding to the filtered nodes are correctly computed (with
the other vertices removed).

I'll commit a modification to git soon where the initialization is done
properly on filtered graphs. (Note that this does not affect the actual
results for the filtered vertices)

Best,
Tiago