I’ve developed a python library that uses graph-tool to analyze some biological data and it works perfect, not because of my library, but because of the SBM model under the hood (thanks!).
The methods available to generate the data are continuously improved and, most important, their cost decreases with time. The direct consequence is that the size of the datasets increases as well. When I first approached the problem, I typically worked with graphs of, at most, 20k nodes. Nowadays we are already in the realm of 10^5 or even 10^6 nodes. I may be ok with the fact it may take days (or weeks) to analyze such networks, but it would be better if I could speed up (some collaborators also asked hints on that).
A while ago, I was reasoning I could infer the structure on (multiple) random subsamples of the original network, project the structure on the original one and possibly taking the consensus over multiple solutions.
To perform the projection I would go like this: given G (the original) and S (the subsample), I can find the solution B for S. Then I would create a BlockState P for G in which all nodes belong to the same partition, except for the ones in S, which will belong to the partition B. I would then use the P.virtual_vertex_move for every “unassigned” cell to each group in B, looking for the “best” move.
Of course I won’t be able to find small partitions in the original network and the final solution will be necessarily “coarse”, but in my case that may be ok for the majority of settings (which raises the question: why on earth you would generate 10x more data? but that’s another story). Having said that, do you think there’s something inherently wrong in what I just wrote? I never had time to test such approach