Hi,
Could you suggest what is the most efficient way for:
1) extracting vertex clusters based on values in vertex property map?
- e.g., for iterating through all connected components in a
graph, based on result of label_components
You can improve the performance by avoiding repeated python function
calls for each vertex. One way to do this is to use array
expressions. For instance, instead of doing:
gv1 = gt.GraphView(g, vfilt = lambda v: v_prop[v] == cl_num)
you could do:
gv1 = gt.GraphView(g, vfilt = v_prop.a == cl_num)
The second option should be significantly faster for larger graphs.
2) saving extracted subgraphs to a file?
Code that I am using is in [1] but for a large graph (~1M edges, 12k
clusters) it is taking days. Is there a more optimal way?
The problem with this approach is that since you have a number C of
clusters, and the creation / iteration of each filtered graph is O(N +
E), where N and E correspond to the size of the unfiltered graph, the
whole thing becomes O(C(N + E)), which can be very slow if C is in the
order of tens of thousands, as is your case. In this case the best
approach would be not to create filtered graphs, but rather, to create
your cluster graphs individually by hand, i.e.
clusters = defaultdict(list)
for v in g.vertices():
clusters[v_prop[v]].append(v)
for i, (c, vs) in enumerate(clusters.items()):
cg = Graph()
vmap = defaultdict(lambda: cg.add_vertex())
for v in vs:
u = vmap[v]
for w in v.out_neighbours():
cg.add_edge(u, vmap[w])
cg.save("cluster-%d.xml.gz" % i)
This approach is O(N + E), which would be much faster than the previous
one if C is sufficiently large, even though it is pure python.
Can the function topology.mark_subgraph be of help here?
- the name seems relevant but I do not know how it is used.
This returns a property map which can be used to mask a subgraph, if it
is represented as a graph object. I does in essence the opposite of what
you want to achieve.
Cheers,
Tiago