overlap/mixed-membership sbm help

Eli_Draizen · November 11, 2021, 2:03am

Hi everyone,

I was wondering if it would be possible to provide some more examples of
how to run a nested mixed membership SBM with edge weights. The new version
seems to have removed the "overlap=True" option for state_args in the
minimize_* functions.

Is this the correct way to do it now?

import graph_tool as gta

import numpy as np
g = .... # build graph
e_score = .... #Set edge weights
state_args = dict(
    deg_corr=deg_corr,
    base_type=gta.inference.overlap_blockmodel.OverlapBlockState,
    B=2*g.num_edges(), #B_max
    deg_corr=True,
    recs=[e_score],
    rec_types=["real-normal"])
state = gta.inference.minimize_nested_blockmodel_dl(
    g,
    state_args=state_args,
    multilevel_mcmc_args=dict(verbose=True))
# improve solution with merge-split
state = state.copy(bs=state.get_bs() + [np.zeros(1)] * 4, sampling=True)

for i in range(100):

if i%10==0: print(".", end="")
ret = state.multiflip_mcmc_sweep(niter=10, beta=np.inf, verbose=True)

I am currently running this for a fully connected bipartite graph with 3454
nodes and 55008 edges. I understand it would take longer than the
non-overlapping version, but do you have any suggestions on how to speed it
up? The non-overlapping version takes about 15 minutes, while the
overlapping version is still running after 1 day.

Thanks for your help,
Eli

attachment.html (2.02 KB)

tiago · November 11, 2021, 8:05am

Hi everyone,

I was wondering if it would be possible to provide some more examples of
how to run a nested mixed membership SBM with edge weights. The new
version seems to have removed the "overlap=True" option for state_args
in the minimize_* functions.

Indeed, I will add more examples about this. Could you please open an
issue in the website so I don't forget?

Is this the correct way to do it now?

    import graph_tool as gta
    import numpy as np
    g = .... # build graph
    e_score = .... #Set edge weights
    state_args = dict(
      deg_corr=deg_corr,
      base_type=gta.inference.overlap_blockmodel.OverlapBlockState,
      B=2*g.num_edges(), #B_max
      deg_corr=True,
      recs=[e_score],
      rec_types=["real-normal"])
    state = gta.inference.minimize_nested_blockmodel_dl(
      g,
      state_args=state_args,
      multilevel_mcmc_args=dict(verbose=True))
    # improve solution with merge-split
    state = state.copy(bs=state.get_bs() + [np.zeros(1)] * 4, sampling=True)

    for i in range(100):
      if i%10==0: print(".", end="")
      ret = state.multiflip_mcmc_sweep(niter=10, beta=np.inf,
    verbose=True)

This is correct. But note that the "sampling=True" option is no longer
needed.

I am currently running this for a fully connected bipartite graph with
3454 nodes and 55008 edges. I understand it would take longer than the
non-overlapping version, but do you have any suggestions on how to speed
it up? The non-overlapping version takes about 15 minutes, while the
overlapping version is still running after 1 day.

The new version will contain a much faster code for the overlapping case!

But in the mean-time, what you can do is to fit the non-overlapping
model first, and use that as a starting point to the MCMC with overlap.
You do that simply by doing:

state = state.copy(state_args=dict(overlap=True))

Best,
Tiago

Eli_Draizen · March 2, 2022, 9:48pm

Hi Tiago,

Just checking to see the status on this. I've tried the newer version
(2.44) but am still having the same issues. Your proposed solution of
copying the state and adding the overlap work for the non-nested block
models, but I was having trouble with the nested version. I am also now
running into an issue where I can only run non degree corrected sbms,
otherwise it hangs (logs attached). If you have any suggestions, I would
really appreciate it.

Thanks,
Eli

-> = copied state adding the following options, then reran mcmc
+ = defined in minimization function followed by mcmc

NDC = yes
NDC+Overlap = slow
NDC->Overlap = yes
NDC+Nested = slow
NDC+Nested+Overlap = slow
NDC+Nested->Overlap (not implemented)

NDC+Nested+Overlap -> DC = slow

DC+Overlap+Nested = hangs (This is the goal)
DC+Overlap = hangs
DC+Nested->Overlap (not implemented)

attachment.html (4.52 KB)

graph_tool_bipartite_wmmsbm.txt (28.5 KB)

fc_bipartite_graph_edge_weights.gt (668 KB)

Eli_Draizen · March 4, 2022, 5:37am

Hello,

I think I was able to figure it out using the code from the "Characterizing
the posterior distribution" section. I start by equilibrating a
degree-corrected NestedBlockState with edge weights and then inferring
partition modes with ModeClusterState. Is this the same as running
minimize_nested_blockmodel_dl? Are the minimize* functions not used anymore?

Also, is it normal to have inferred only one mode?

Thanks for your help,
Eli

attachment.html (5.67 KB)

tiago · March 7, 2022, 9:14am

I think I was able to figure it out using the code from the
"Characterizing the posterior distribution" section. I start by
equilibrating a degree-corrected NestedBlockState with edge weights and
then inferring partition modes with ModeClusterState. Is this the same
as running minimize_nested_blockmodel_dl?

No, this is not the same. ModeClusterStates characterizes the whole
distribution of answers, while minimize_blockmodel_dl() finds the single
most likely/compressive partition.

Are the minimize* functions not used anymore?

Yes they are. They have different goals.

Also, is it normal to have inferred only one mode?

Yes, it can happen. This means the solutions are clustered around a
single typical solution, rather than several different ones.

This is discussed in this paper:

Phys. Rev. X 11, 021003 (2021) - Revealing Consensus and Dissensus between Network Partitions

Best,
Tiago