Hi all,

I haven't been on the list very long but I see this question keeps coming up. I just thought I put some numbers out so people know. Graph tool is super fast at doing the c++ stuff it does, and super convenient for picking through the data on a python level also, but it doesn't make python fast.

So, here are some examples of working with a mask in Python. Notice that numpy is not fast just because it is numpy. I chose an Nx3 array because numpy masking is extra slow with shapes other than Nx1.

================

import numpy as np
a_generator = ((x,x+2,x*x) for x in range(1000))
a_list =[(x,x+2,x*x) for x in range(1000)]
a_array = np.array(a_list)
mask = np.ones((len(a_array)),dtype=np.bool)
mask[::3] = False

def tupled():
    for x,b in zip(a_generator,mask):
        if b:
            c,d,e =x

def looped():
    for x,b in zip(a_list,mask):
        if b:
            c,d,e =x

def masked():
    for x in a_array[mask]:
        c,d,e = x

#IPython magic function: timeit

%timeit tupled()
1000000 loops, best of 3: 510 ns per loop

%timeit looped()
10000 loops, best of 3: 76.6 µs per loop

%timeit masked()
1000 loops, best of 3: 445 µs per loop

================

Notice the nano seconds vs micro seconds. The generator is the clear winner.

Now, here are some graph tool specific examples. I left out the mask, but clearly you can create and manipulate a mask as you see fit.

=================

>>>graph
    <Graph object, directed, reversed, with 32183 vertices and 199381 edges at 0x7f62842ee9d0>

>>>def graph_loop0():
    for i in range(10): #Only 10 times because this is soooo slooow
        for e in graph.edges():
            v1,v2 = e.source(),e.target()

>>>def graph_loop1():
    edges = [[e.source(),e.target()] for e in graph.edges()]
    for i in range(1000):
        for e in edges:
            v1,v2 = e[0],e[1]

>>>def graph_loop2():
    edges = [[e.source(),e.target()] for e in graph.edges()]
    gen = (e for e in edges)
    for x in range(1000):
        for e in gen:
            v1,v2 = e[0],e[1]

>>>from time import time

>>>a=time(); graph_loop0(); print time()-a
23.1095559597

>>>a=time(); graph_loop1(); print time()-a
15.721350193

>>>a=time(); graph_loop2(); print time()-a
2.80044198036

=======================

The loop1 and loop2 are doing 100x more, so the generator is 1000x faster.

If you're just looping through once, it makes sense to use the convenience graph_tool provides. But if you are implementing a graph algorithm, just grab what you need from the graph_tool graph into a list or whatever python object makes sense for what you're doing.

-Elliot