Second largest component

Hi all,

I continue with the probably silly questions..

I have a graph that is relatively large (1.6 million vertices, 2.4 million
arcs). I extract the largest strong component, but it is still quite large
(~400,000 vertices) and while the "analysis" algos work smoothly, when it
comes to representing it, it's very sparsely connected, so essentially a
huge cloud of points.

So I am trying to extract the second largest strong component, which is way
smaller. I do the following:

comp, hist = label_components(g)
idx = argsort(hist)
idx_second_largest = idx[-2]
idx_vertices_second_largest = find(comp.a == idx_second_largest)
u = GraphView(g, vfilt=idx_vertices_second_largest)

To make things easier, I'll paraphrase the example from the doc.

g2=random_graph(100, lambda: (1,1))
comp,hist=label_components(g2)
print hist

[11 28 41 17 3]

So my second largest strong component should be the one with elements
labelled 1.

print comp.a

[0 1 1 2 3 2 2 0 0 2 1 3 3 0 3 1 4 0 2 2 1 3 2 1 2 2 2 2 2 1 2 1 3 1 2 2 1
2 3 2 2 0 2 2 4 2 3 2 2 3 1 1 2 1 1 2 1 1 2 4 2 1 0 1 2 1 1 3 2 2 2 1 3 3
1 2 1 1 0 0 2 2 2 3 3 3 2 2 2 2 1 2 3 0 1 1 2 1 0 3]

Using the code above gives, finally

print idx_vertices_second_largest

[ 1 2 10 15 20 23 29 31 33 36 50 51 53 54 56 57 61 63 65 66 71 74 76 77 90
94 95 97]

But when I use this as u = GraphView(g2,
vfilt=idx_vertices_second_largest), I get an error that ends with

File "/usr/local/lib/python2.7/dist-packages/graph_tool/__init__.py", line
527, in __get_set_f_array
    a[:] = v
ValueError: operands could not be broadcast together with shapes (100) (4)

So as far as picking up the indices, I seem to be right, but after that,
problem.. Where am I being silly? (Coming from classic C, my first reaction
is "what should I typecast to?".)

Thanks!

attachment.html (2.64 KB)

You're not passing the correct information as the vertex filter. The
vfilt parameter from GraphView expects either a function or a vertex
property map or an array. The vertex property map must be of type
"boolean", where the vertices with value 'True' are kept in the
graph. Similarly, if you pass an array, it should also be boolean (or
integer) valued, and it should have the same size as the number of
vertices. You cannot pass a list of vertices you want to be kept...
However, if you pass a function, it will be called on each vertex, and
the result determines if the vertex is filtered. Thus, you have
essentially two options:

    u = GraphView(g2, vfilt=lambda v: v in idx_vertices_second_largest)

or

    comp, hist = label_components(g)
    idx = argsort(hist)
    idx = idx[-2]
    filter = comp.a == idx
    u = GraphView(g2, vfilt=filter)

The second option should be the fastest.

Cheers,
Tiago