This is a follow-up to this 4 year old question: SBM on dense graphs. Some useful hints were given on how to run SBM inference on dense large graphs (N \sim 10^5).
I have a similar problem: a complete undirected graph of 75k vertices and 2.8b edges with correlation weights \in [-1,1] much like the “Vote correlations in congress” example in section IIIB in the Nonparametric weighted stochastic block models paper. My goal is to fit a nested weighted SBM to the data (which is essentially a Gram matrix).
After some preliminary experiments I am looking into to sparsify the graph such that O(E) \sim O(N) but I am wondering if there are new recommendations/caveats 4 years later.
Other questions:
- I am content with finding a good local maximum of the posterior, not so much sampling. Can this be parallelized other than running independent runs in parallel? HPCs with large RAM always come with many cores.
- Is the suggestion to keep calling multiflip_mcmc_sweep() until convergence in the mentioned post still a good approach today?
- Given the SBM ansatz, likely better to sample nodes (leading to complete subnetworks but with much information) or edges (leading to very sparse subnetworks but with little information)?
Many thanks.