SBM on dense graphs v2

This is a follow-up to this 4 year old question: SBM on dense graphs. Some useful hints were given on how to run SBM inference on dense large graphs (N \sim 10^5).

I have a similar problem: a complete undirected graph of 75k vertices and 2.8b edges with correlation weights \in [-1,1] much like the “Vote correlations in congress” example in section IIIB in the Nonparametric weighted stochastic block models paper. My goal is to fit a nested weighted SBM to the data (which is essentially a Gram matrix).

After some preliminary experiments I am looking into to sparsify the graph such that O(E) \sim O(N) but I am wondering if there are new recommendations/caveats 4 years later.

Other questions:

  • I am content with finding a good local maximum of the posterior, not so much sampling. Can this be parallelized other than running independent runs in parallel? HPCs with large RAM always come with many cores.
  • Is the suggestion to keep calling multiflip_mcmc_sweep() until convergence in the mentioned post still a good approach today?
  • Given the SBM ansatz, likely better to sample nodes (leading to complete subnetworks but with much information) or edges (leading to very sparse subnetworks but with little information)?

Many thanks.

Maybe take a look at this recent paper: [2405.01015] Network reconstruction via the minimum description length principle

1 Like

Hi Diogo, thanks for the reply! This is perfectly what I need to sparsify the network… And it works on the scale that I have in mind, and it can use the nested SBM prior. Many many thanks.