About generating strength-preserving random directed graphs

drwho · February 10, 2023, 2:16pm

Hi,

I have been trying to use graph-tool (in Mac OSX) to generate random graphs by preserving in- and out-strength of nodes. I think I figured out a way to do that in graph-tool by allowing parallel_edges and setting a degree sampler function. It worked well on a very small graph but stuck for ~10 hours on a graph of 66 nodes of ~1700 edges (probably because the total edge weights are too large?). Could you let me know if this is the correct way to do so in graph-tool? Any way that can make it more efficient? Thanks!

The system did not let me upload documents, so I am listing the first few rows of the edgelist:

source,target,weight
0,1,12306
0,2,3603
0,3,88263
0,4,92890
0,5,62422
0,6,10054
0,7,20353
0,8,9841
0,9,21
0,10,20429
0,11,5405
0,12,309
…

tiago · February 11, 2023, 12:56pm

You are now allowed to upload files.

Please do not ignore the instructions that appear when you make a new post: without a minimal working example that shows the problem you’re having, there’s very little we can do to help.

drwho · February 12, 2023, 2:40pm

sample-graph.csv (17.0 KB)

Thanks! A sample graph file is attached. The code I wrote is as follows.

import pandas as pd
from graph_tool.all import *

dfod = pd.read_csv('sample-graph.csv')

g = Graph(directed=True)
ep_weight = g.new_ep("int")
g.add_edge_list(dfod.to_numpy(), hashed=False, eprops=[ep_weight])
g.edge_properties["weight"] = ep_weight

vp_deg_in_w = g.degree_property_map('in', g.edge_properties["weight"])
vp_deg_out_w = g.degree_property_map('out', g.edge_properties["weight"])

g = random_graph(g.num_vertices(), lambda k: (vp_deg_in_w[k], vp_deg_out_w[k]), model="configuration",  directed=True, parallel_edges=True, self_loops=True,
 edge_sweep=True, n_iter=1, random=True)

tiago · February 19, 2023, 10:15am

The way the current code works is inefficient for such super dense multigraphs (you want 5M edges between 66 nodes), because it first tries to build a seed graph in the same way you would do it for a simple graph. For sparse graphs it makes no difference, but for multigraphs it is quadratic on the edge multiplicities.

You’re better off just starting with your initial multigraph as a seed, and calling random_rewire() on it:

import pandas as pd
from graph_tool.all import *

dfod = pd.read_csv('sample-graph.csv')

g = Graph(directed=True)
for u, v, w in dfod.to_numpy():
    g.add_edge_list([(u, v)] * w, hashed=False)

random_rewire(g, self_loops=True, parallel_edges=True)

drwho · February 19, 2023, 11:19pm

Very helpful – thanks for your help.