About generating strength-preserving random directed graphs

Hi,

I have been trying to use graph-tool (in Mac OSX) to generate random graphs by preserving in- and out-strength of nodes. I think I figured out a way to do that in graph-tool by allowing parallel_edges and setting a degree sampler function. It worked well on a very small graph but stuck for ~10 hours on a graph of 66 nodes of ~1700 edges (probably because the total edge weights are too large?). Could you let me know if this is the correct way to do so in graph-tool? Any way that can make it more efficient? Thanks!

The system did not let me upload documents, so I am listing the first few rows of the edgelist:

source,target,weight
0,1,12306
0,2,3603
0,3,88263
0,4,92890
0,5,62422
0,6,10054
0,7,20353
0,8,9841
0,9,21
0,10,20429
0,11,5405
0,12,309

You are now allowed to upload files.

Please do not ignore the instructions that appear when you make a new post: without a minimal working example that shows the problem you’re having, there’s very little we can do to help.

sample-graph.csv (17.0 KB)

Thanks! A sample graph file is attached. The code I wrote is as follows.

import pandas as pd
from graph_tool.all import *

dfod = pd.read_csv('sample-graph.csv')

g = Graph(directed=True)
ep_weight = g.new_ep("int")
g.add_edge_list(dfod.to_numpy(), hashed=False, eprops=[ep_weight])
g.edge_properties["weight"] = ep_weight

vp_deg_in_w = g.degree_property_map('in', g.edge_properties["weight"])
vp_deg_out_w = g.degree_property_map('out', g.edge_properties["weight"])

g = random_graph(g.num_vertices(), lambda k: (vp_deg_in_w[k], vp_deg_out_w[k]), model="configuration",  directed=True, parallel_edges=True, self_loops=True,
 edge_sweep=True, n_iter=1, random=True)

The way the current code works is inefficient for such super dense multigraphs (you want 5M edges between 66 nodes), because it first tries to build a seed graph in the same way you would do it for a simple graph. For sparse graphs it makes no difference, but for multigraphs it is quadratic on the edge multiplicities.

You’re better off just starting with your initial multigraph as a seed, and calling random_rewire() on it:

import pandas as pd
from graph_tool.all import *

dfod = pd.read_csv('sample-graph.csv')

g = Graph(directed=True)
for u, v, w in dfod.to_numpy():
    g.add_edge_list([(u, v)] * w, hashed=False)

random_rewire(g, self_loops=True, parallel_edges=True)

Very helpful – thanks for your help.