Building graph from adjacency matrix

Pietro_Battiston · March 20, 2014, 10:30am

Hello,

I tried graph-tool after some experience with networkx, in search for
better efficiency (attracted by [1]) ... but unfortunately, it seems
like it all depends on how data is stored!

The (several) networks I play with are dense weighted networks of 1435
nodes. They are defined in terms of their adjacency matrices, which are
pandas dataframes stored in a HDF5 store.

In order to get the relative network with networkx, I would do
dg = nx.from_numpy_matrix(hstore['my_df'].as_matrix(),
create_using=nx.DiGraph())

which took around 18 seconds. In graph-tool, the best alternative I
found, based on [2] and [3], is:

# Since the bottleneck is given by the calls to g.add_edge(),
# numba actually slows down things a bit.
#from numba import autojit

tiago · March 20, 2014, 11:52am

Hello,

I tried graph-tool after some experience with networkx, in search for
better efficiency (attracted by [1]) ... but unfortunately, it seems
like it all depends on how data is stored!

This is true. If your code essentially depends on python loops,
graph-tool will not provide any advantage.

The (several) networks I play with are dense weighted networks of 1435
nodes. They are defined in terms of their adjacency matrices, which are
pandas dataframes stored in a HDF5 store.

In order to get the relative network with networkx, I would do
dg = nx.from_numpy_matrix(hstore['my_df'].as_matrix(),
                          create_using=nx.DiGraph())

which took around 18 seconds. In graph-tool, the best alternative I
found, based on [2] and [3], is:

# Since the bottleneck is given by the calls to g.add_edge(),
# numba actually slows down things a bit.
#from numba import autojit
#
#@autojit
def graph_from_adj_matrix(m):
    g = Graph()

    n = m.shape[0]

    edges = list(g.add_vertex(n=n))

    c = Counter(every=10, until=n)
    for i in xrange(n):
        for j in xrange(n):
            if m[i,j]:
                g.add_edge(edges[i], edges[j])
        c.count()

    return g

g = graph_from_adj_matrix(hstore['my_df'].as_matrix())

... which takes around 200 seconds (and is still not even weighted!).

Is there anything trivial (or simply new) I'm missing (apart from the
possibility, mentioned in [3], of writing my code in C++)?

The ability of inserting many edges without calling add_edge() several
times is indeed lacking...

Because of this, I have now just added a Graph.add_edge_list() to the
git version, which takes a list of edges to be added, which can be a
numpy array. If you have a full adjacency matrix instead of an edge
list, you can do simply:

g.add_edge_list(transpose(nonzero(a)))

This should be much faster than the Python loop above.

Best,
Tiago

Pietro_Battiston · March 20, 2014, 12:10pm

This is really great, I will try the git version as soon as possible.

Thank you very much.

Pietro

Pietro_Battiston · March 20, 2014, 5:25pm

Do you think it would make sense then to have something analogous for
PropertyMaps?

Like:

pm.set_from_list(a)

in place of:

for v in g.vertices():
pm[v] = a[i]

if pm is a property for vertices, and

for e in g.edges():
pm[e] = a[i]

if it is for edges?

Pietro

Guillaume · March 20, 2014, 5:29pm

That's all ready simpler than that, as you can assign values of the
propertymap like that:

pm.a = a

Pietro_Battiston · March 20, 2014, 9:07pm

Great! Thanks

Pietro

Pietro_Battiston · April 3, 2014, 8:44am

I think there is some problem with add_edge_list.

The following code:

tiago · April 3, 2014, 6:35pm

I think there is some problem with add_edge_list.

The following code:

---------------------------------------------------------

from scipy import array
from graph_tool import Graph
from graph_tool.draw import graph_draw

def blocks_star(transpose=False):
    g = Graph()
    members = array([0,1,2])
    edges_list = array([[0,1,2], [2,0,1]]).transpose()
    edges_list_2 = array([[0,2],[1,0],[2,1]])
    print edges_list
    print edges_list_2
    print all(edges_list == edges_list_2)
    if transpose:
        g.add_edge_list(edges_list)
    else:
        g.add_edge_list(edges_list_2)
    graph_draw(g)

---------------------------------------------------------

gives me a different result if called with True or False as argument
(with True, the resulting graph is disconnected), while the edges lists
are clearly identical. I guess this has to do with the fact that
transposing, for numpy, is just a change of view, not a relocation of
elements... but I see your code is based on boost, which I have no
experience with. I am using package python-graph-tool 2.2.31-1 from your
Debian sid repositories.

This is a bug. The strides of the numpy array are not propagated
properly to the C++ side of things. I have fixed this now in the git
version.

By the way: why "add_edge_list" rather than "add_edges_list"? True,
"add_vertex" can add multiple vertices, but then there is "clear_edges".
Not really crucial issue, but... later it would be too late to raise it!

The only grammatically correct variations I see are "add edge list" or
"add list of edges". The phrase "add edges list" does not seem right.

Best,
Tiago

Pietro_Battiston · April 5, 2014, 6:49am

> I think there is some problem with add_edge_list.
>
> The following code:
>
> ---------------------------------------------------------
>
> from scipy import array
> from graph_tool import Graph
> from graph_tool.draw import graph_draw
>
> def blocks_star(transpose=False):
> g = Graph()
> members = array([0,1,2])
> edges_list = array([[0,1,2], [2,0,1]]).transpose()
> edges_list_2 = array([[0,2],[1,0],[2,1]])
> print edges_list
> print edges_list_2
> print all(edges_list == edges_list_2)
> if transpose:
> g.add_edge_list(edges_list)
> else:
> g.add_edge_list(edges_list_2)
> graph_draw(g)
>
> ---------------------------------------------------------
>
> gives me a different result if called with True or False as argument
> (with True, the resulting graph is disconnected), while the edges lists
> are clearly identical. I guess this has to do with the fact that
> transposing, for numpy, is just a change of view, not a relocation of
> elements... but I see your code is based on boost, which I have no
> experience with. I am using package python-graph-tool 2.2.31-1 from your
> Debian sid repositories.

This is a bug. The strides of the numpy array are not propagated
properly to the C++ side of things. I have fixed this now in the git
version.

Great!

> By the way: why "add_edge_list" rather than "add_edges_list"? True,
> "add_vertex" can add multiple vertices, but then there is "clear_edges".
> Not really crucial issue, but... later it would be too late to raise it!

The only grammatically correct variations I see are "add edge list" or
"add list of edges". The phrase "add edges list" does not seem right.

It seems that you are perfectly right.

Pietro