How are you doing steps 3.a and 3.e exactly?

Are you calling g.add_vertex() and g.remove_vertex() repeatedly?

Yes I am. I assumed that's not the best thing but compared to copying the graph, adding then removing the vertices was performing better.

I investigated a different approach whereas I allow additional vertices to exist as long as I keep track of the original number of vertices (n_orig):
For a set of candidate vertices of size m
- if graph size is not large enough, add new vertices
- create a graph view that only keeps (n_orig + m)
- add edges as needed (this relies on vertex properties)
- do graph computation
- when done, remove edges (but not vertices)

When finished with all candidate sets, remove the extra vertices.

Now I am only spending ~25% of the time in resizing the graph (around 20% in remove_edges, despite having set_fast_edge_removal(True)). That's still more than I'd like, but it's an improvement from the 80% I had with adding / removing vertices. If you have other suggestions on how to tackle this that'd be great (for example, I tried without remove_edges, but the edges that get added to the graphview end up showing up in the original graph. Is there a way to avoid that or to filter such that only the original edges are kept in the graphview?) 


as a minimal working example:
```
import graph_tool.all as gt
import numpy as np

g = gt.Graph(directed=False)
g.add_vertex(10)
original_edges = [[0, 1], [0, 4]]
g.add_edge_list(original_edges)

g.set_fast_edge_removal(True)
n_orig = g.num_vertices()

def foo(g, m, skipEdgeRemoval = False):
    n_act = g.num_vertices()
    n_extra = n_act - (n_orig+m)

    if n_extra < 0:
        print(f'adding {-n_extra} vertices ...')
        g.add_vertex(-n_extra)
        n_extra = 0
        n_act = g.num_vertices()
       
    filt = np.ones(n_act)
    if n_extra > 0:
        filt[(-n_extra):] = 0

   
    gv = gt.GraphView(g, filt)
    #print(f'filt = {filt}')
   
    new_vtx_ids = np.arange(n, n+m)
    print(f'indices of new vertices = {new_vtx_ids}')
         
    #Check that all edges are expected
    for e in list(gv.edges()):
        if [e.source(), e.target()] not in original_edges:
            raise Exception(f'{e.source()} -> {e.target()} was not in original graph')
   
   
    #Add a bunch of random edges connecting to new candidates
    edge_list = []
    for s, t in zip(np.random.choice(range(n_orig), m),
                    np.random.choice(new_vtx_ids, m)):
        edge_list.append([s,t])
        #also updates some vertex properties, but that's easy/fast to undo, not done in this MWE


    gv.add_edge_list(edge_list)
   
    print('Current edges:')
    for e in list(gv.edges()):
        print(f'{e.source()} -> {e.target()}')
       
    #computation on new graph goes here ...
    pass
   
   
    #print('Cleanup ...')
    #If cleanup is not done, this fails as the edges that were added to the graphview
    #end up being added to the original graph as well :s
    if not skipEdgeRemoval:
        for e in edge_list:
            gv.remove_edge(gv.edge(e[0], e[1]))
       
       
foo(g, 1)
foo(g, 4)
foo(g, 3)
foo(g, 5)
#above is all good
foo(g, 3, True) #skip cleanup
foo(g, 4) #fails
```


 
Message: 3
Date: Tue, 21 Jan 2020 15:01:53 +0100
From: Tiago de Paula Peixoto <tiago@skewed.de>
To: Main discussion list for the graph-tool project
        <graph-tool@skewed.de>
Subject: Re: [graph-tool] Speeding up graph copy?
Message-ID: <1eb5aeab-0a3d-d3fa-84e3-60734afdd360@skewed.de>
Content-Type: text/plain; charset="utf-8"

Am 20.01.20 um 19:43 schrieb Zouhair:
> Hi,
>
> I'm using graph-tool in my project, and my use case is as follows:
> 1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6
> edges)
> 2) attach some properties to the vertices
> 3) for about ~5000 sets of *new* candidate vertices (each set has about
> 2-30 vertices)
> ? 3.a - add new vertices to graph
> ? 3.b - add edges as necessary between new vertices and old ones
> ? 3.c - update the properties
> ? 3.d - run an algorithm which depends on properties and edges, save
> result to a list
> ? 3.e - remove new vertices from graph and restore properties (i.e. undo
> 3.a-c)
>
> Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d
> runs fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing
> is 3.a and 3.e (about 80% of runtime is spent there)
 

> iii- use GraphView but only filter on the original vertices in the
> graph: this avoids having to delete the vertices, but making GraphView
> with a filter function is still slow (doing `gt.GraphView(g, lambda v: v
> < n_nodes_orig)` takes 75% of runtime, and the number of vertices in the
> original graph keeps growing...

This can be improved by passing a boolean-valued vertex property map
instead of a lambda function. In this way the GraphView creation becomes
O(1) instead of O(N).

Best,
Tiago

--
Tiago de Paula Peixoto <tiago@skewed.de>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.skewed.de/pipermail/graph-tool/attachments/20200121/4ead384b/attachment.asc>

------------------------------

Subject: Digest Footer

_______________________________________________
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool


------------------------------

End of graph-tool Digest, Vol 144, Issue 15
*******************************************