About generating strength-preserving random directed graphs


I have been trying to use graph-tool (in Mac OSX) to generate random graphs by preserving in- and out-strength of nodes. I think I figured out a way to do that in graph-tool by allowing parallel_edges and setting a degree sampler function. It worked well on a very small graph but stuck for ~10 hours on a graph of 66 nodes of ~1700 edges (probably because the total edge weights are too large?). Could you let me know if this is the correct way to do so in graph-tool? Any way that can make it more efficient? Thanks!

The system did not let me upload documents, so I am listing the first few rows of the edgelist:


You are now allowed to upload files.

Please do not ignore the instructions that appear when you make a new post: without a minimal working example that shows the problem you’re having, there’s very little we can do to help.

sample-graph.csv (17.0 KB)

Thanks! A sample graph file is attached. The code I wrote is as follows.

import pandas as pd
from graph_tool.all import *

dfod = pd.read_csv('sample-graph.csv')

g = Graph(directed=True)
ep_weight = g.new_ep("int")
g.add_edge_list(dfod.to_numpy(), hashed=False, eprops=[ep_weight])
g.edge_properties["weight"] = ep_weight

vp_deg_in_w = g.degree_property_map('in', g.edge_properties["weight"])
vp_deg_out_w = g.degree_property_map('out', g.edge_properties["weight"])

g = random_graph(g.num_vertices(), lambda k: (vp_deg_in_w[k], vp_deg_out_w[k]), model="configuration",  directed=True, parallel_edges=True, self_loops=True,
 edge_sweep=True, n_iter=1, random=True)

The way the current code works is inefficient for such super dense multigraphs (you want 5M edges between 66 nodes), because it first tries to build a seed graph in the same way you would do it for a simple graph. For sparse graphs it makes no difference, but for multigraphs it is quadratic on the edge multiplicities.

You’re better off just starting with your initial multigraph as a seed, and calling random_rewire() on it:

import pandas as pd
from graph_tool.all import *

dfod = pd.read_csv('sample-graph.csv')

g = Graph(directed=True)
for u, v, w in dfod.to_numpy():
    g.add_edge_list([(u, v)] * w, hashed=False)

random_rewire(g, self_loops=True, parallel_edges=True)

Very helpful – thanks for your help.