Model specification and extraction of layer-specific partitions in LayeredBlockState when overlap=True

Hello,

I'm studying political coalitions over time (three time slices) and across discrete types of relationships (collaboration, ideological similarity, and discursive similarity). Since graph-tool offers (the only) principled approach that does not require arbitrary user-specified parameters (such as 'resolution' or 'coupling'), I would very much like to use it for my analysis.

I'm trying to fit a SBM to a graph with edges divided into distinct layers (three time slices X three types of relationships). Each edge has a discrete weight in the range of [0, 3], with weight referring to the 'strength' of the relationship. Not all nodes participate in all layers.

I'm interested in how/if the block membership varies across layers that I expect to be somewhat dependent on one another (hence I would like to avoid fitting a separate SBM to each layer). In my ideal scenario, I would be using the NestedBlockState with LayeredBlockState as base and allowing for overlaps across layers:

state=NestedBlockState(g, base_type=LayeredBlockState, state_args=dict(deg_corr=True, overlap=True, layers=True, ec=g.ep.layer, recs=[g.ep.weight], rec_types=["discrete-binomial"]))

dS, nmoves=0, 0
for i in range(100):
    ret=state.multiflip_mcmc_sweep(niter=10)
    dS+=ret[0]
    nmoves+=ret[1]

print("Change in description length:", dS)
print("Number of accepted vertex moves:", nmoves)

mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10), verbose=True)

bs=[]
def collect_partitions(s):
   global bs
   bs.append(s.get_bs())

mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), verbose=True, callback=collect_partitions)

pm=PartitionModeState(bs, nested=True, converge=True)
pv=pm.get_marginal(g)

bs=pm.get_max_nested()
state=state.copy(bs=bs)

Two questions:

Q1)
This is taking very long (days) on a high performance cluster and actually I haven't seen it finishing up for once. Is my model specification correct or simply too complex? If the latter, is there a better strategy to perform my analysis? All I can think of is to revert to a non-nested model and/or reduce the number of layers and/or edges:

state=LayeredBlockState(g, deg_corr=True, overlap=True, layers=True, ec=g.ep.layer, recs=[g.ep.weight], rec_types=["discrete-binomial"])

dS, nattempts, nmoves=state.multiflip_mcmc_sweep(niter=1000)
print("Change in description length:", dS)
print("Number of accepted vertex moves:", nmoves)

mcmc_equilibrate(state, wait=1000, nbreaks=2, mcmc_args=dict(niter=10), verbose=True)

bs=[]
def collect_partitions(s):
   global bs
   bs.append(s.b.a.copy())

mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), verbose=True, callback=collect_partitions)

pm=PartitionModeState(bs, converge=True)
pv=pm.get_marginal(g)

Q2)
Say that the model runs and yields a neat NestedBlockState (or LayeredBlockState) object with X overlapping blocks, 9 layers, degree-corrected, with 2 edge covariates, for graph <Graph object, undirected, with 220 vertices and 11322 edges, 2 internal vertex properties, 4 internal edge properties>.

Now, in previous posts there are mentions about the number of nodes multiplying by the number of layers when 'overlap=True', but in my experiments with only two layers, this does not seem to be the case and I struggle to find a way to extract the layer-specific block and respective memberships of nodes.

I've tried iteration over vertices, but what I eventually need is a 2d_array [y=all nodes in g, x=all layers in g, values=most likely block membership] in order to map how/if the block membership of each node varies across those layers in which they participate (ideally somehow denoting non-participation in the layer with NaN or some unique value). Is there some way to get there?

l=state.get_levels()[0] #NestedBlockState
B=l.get_nonempty_B()
bv=l.get_overlap_blocks()[0]

z=np.zeros((g.num_vertices(), B)) #to be filled with most likely block membership with potential overlaps across ALL layers, but I need the layer-specific information
for i in range(g.num_vertices()):
    z[int(i), :len(z[v])]=bv[i].a+1

for v in range(g.num_vertices()*max(g.ep.layer)+1): #to see if the number of vertices multiplied by layers, which it didn't
    print(bv[v])

Many thanks,
Arttu

Q1)
This is taking very long (days) on a high performance cluster and actually I haven't seen it finishing up for once. Is my model specification correct or simply too complex? If the latter, is there a better strategy to perform my analysis? All I can think of is to revert to a non-nested model and/or reduce the number of layers and/or edges:

Very hard to say something, without more concrete information. How large
is your network?

You are using the most complex combination possible, so indeed it should
take longer than the simpler variants.

My overall suggestion is to always try the
minimize_nested_blockmodel_dl() function first, and (time permitting)
use that as a starting point to the MCMC.

Furthermore, I would suggest starting simple, using non-overlapping,
unweighted and non-layered models first, and seeing how far you get, and
building understanding from your data that way. You can then introduce
more elaborate models step by step.

Q2)
Say that the model runs and yields a neat NestedBlockState (or LayeredBlockState) object with X overlapping blocks, 9 layers, degree-corrected, with 2 edge covariates, for graph <Graph object, undirected, with 220 vertices and 11322 edges, 2 internal vertex properties, 4 internal edge properties>.

Now, in previous posts there are mentions about the number of nodes multiplying by the number of layers when 'overlap=True', but in my experiments with only two layers, this does not seem to be the case and I struggle to find a way to extract the layer-specific block and respective memberships of nodes.

I've tried iteration over vertices, but what I eventually need is a 2d_array [y=all nodes in g, x=all layers in g, values=most likely block membership] in order to map how/if the block membership of each node varies across those layers in which they participate (ideally somehow denoting non-participation in the layer with NaN or some unique value). Is there some way to get there?

In the overlapping model, group membership is a property of _edges_
(actually, half-edges), not nodes directly. Take a look at
OverlapBlockState.get_edge_blocks(), which will give you this
information. From this, you should be able to decompose membership
across layers, etc.

Best,
Tiago

Oi!

The graph has 220 nodes and 11322 edges divided into 9 layers, so it is not huge. But this seems like a good way forward - I'll keep on experimenting.

Excellent - get_edge_blocks is the function I was looking for.

Very helpful, thank you so much!

Arttu

It seems really strange that it would take _days_ for such a small network!

There must be something wrong going on.

Try the step-wise approach, maybe it will help you to track it down.

Best,
Tiago

Starting with minimize_nested_blockmodel_dl indeed speeds up the process and now I'm more confident that the model will eventually run:

state=minimize_nested_blockmodel_dl(g, state_args=dict(base_type=LayeredBlockState, state_args=dict(deg_corr=True, overlap=True, layers=True, ec=g.ep.layer, recs=[g.ep.weight], rec_types=["discrete-binomial"])))

dS, nmoves=0, 0
for i in range(100):
    ret=state.multiflip_mcmc_sweep(niter=10)
    dS+=ret[0]
    nmoves+=ret[1]

print("Change in description length:", dS)
print("Number of accepted vertex moves:", nmoves)

mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10), verbose=True)

bs=[]
def collect_partitions(s):
   global bs
   bs.append(s.get_bs())

mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), verbose=True, callback=collect_partitions)

pm=PartitionModeState(bs, nested=True, converge=True)
pv=pm.get_marginal(g)

bs=pm.get_max_nested()
state=state.copy(bs=bs)

Coming back to my initial question concerning the layer-specific partitions, it seems that I was too optimistic about being able to decompose this information from get_edge_blocks. I have already spent time reading the docs and the use of _be_ in source codes to convert the block membership of each half-edge to layer-specific block membership of each node, so I need to ask for more instructions.

There are a few previous posts about extracting this information but none of the answers addresses this decomposition issue beyond get_edge_blocks. I presume there is a relatively straightforward way to do this?

I also thought that there could have been a more convenient way of doing this by applying set_edge_filter to state.g and get_majority_blocks to extract this information, but yeah, didn't work.

Do what? Please ask a specific question.

Best,
Tiago

I was referring to the decomposition of layer-specific node membership from get_edge_blocks.

So, how do I process the _be_ returned by layers[0].get_edge_blocks() to yield this information as an np.array or vertex_property_map?

Thanks,
Arttu

I'm sorry, but this is not helping. You are not making any substantial
effort to be specific, despite my asking, and it's still not clear what
you want that is not already given.

A half-edge points to a node. Each half-edge belongs to a group, as
given by get_edge_blocks(). Each half-edge also belongs to a specific layer.

What else do you need?

Hi,

I'm sorry, I'll try to be more specific.

I eventually need a matrix that I can plot as done in this post by ulgenklc:
https://nabble.skewed.de/Temporal-community-detection-with-DSBM-tt4028581.html#a4028592

Here, I use a subgraph with only two layers to demonstrate this seemingly trivial issue:

np.random.seed(42)
gt.seed_rng(42)

state = minimize_nested_blockmodel_dl(g, state_args=dict(base_type=LayeredBlockState, state_args=dict(deg_corr=True, overlap=True, layers=True, ec=g.ep.layer, recs=[g.ep.weight], rec_types=["discrete-binomial"])))

This yields:

<NestedBlockState object, with base <LayeredBlockState object with 4610 overlapping blocks, 2 layers, degree-corrected, with 2 edge covariates, for graph <Graph object, undirected, with 82 vertices and 2305 edges, 2 internal vertex properties, 4 internal edge properties, and 8 levels of sizes [(82, 3), (3, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1)] at 0x2b1aa8982dc0>

While there are 4610 overlapping blocks in this stage, state.draw() divides the nodes neatly into three sensible communities. Hence, I know that the snippet of code that I need already exists somewhere in draw_hierarchy.

Then, I filter the edges in state.g by layer and use get_edge_blocks() to extract the layer-specific half-edges:

lay = []
for e in g.edges():
    lay.append(g.ep.layer[e]==0) #edges in the first layer
lay = np.array(lay, dtype="int")
ft = g.new_edge_property("bool")
ft.a = lay
state.g.set_edge_filter(ft)
be = state.levels[0].get_edge_blocks()

This is where I struggle to find a way to extract the desired information. My best guess on how to extract the layer-specific node memberships from _be_ is the following:

s = OverlapBlockState(state.g, b=be) #state.g with filter on
mb = s.get_majority_blocks()
for v in g.vertices():
    print(mb[v])

If I compare the _mb_ for the discrete layers, the memberships vary slightly (this is what I need), but I'm unsure whether this is the correct way to extract this information?

Many thanks,
Arttu

If "be" is the property map returned by
state.levels[0].get_edge_blocks(), then

bm = g.new_vp("vector<int>", val=[0] * state.levels[0].get_B())
for e in g.edges():
     if layer[e] != 0:
         continue
     u, v = e
     u, v = sorted((u, v))
     r, s = be[e]
     bm[u][r] += 1
     bm[v][s] += 1

will give you the memberships in each group at layer 0 in the property
map "bm". Is this what you want?

Best,
Tiago

Yep, this is what I needed - "bm" works fine by handing me the (possibly) overlapping node memberships across layers.

This'll save my week, many thanks!

Best,
Arttu

Hi again,

With regard to the previous posts in this thread, I still have one more question.

My motivation to use the overlapping model with discrete layers (instead of fitting a separate model to each layer) was my understanding that this model will somehow account for dependencies between the layers, and return partitions that (potentially) vary across layers. After reading through https://journals.aps.org/prx/abstract/10.1103/PhysRevX.5.011033, I'm still not sure whether/how my model accounts for such dependencies.

What is the difference between A) fitting a separate model to each layer and B) fitting an overlapping model with layers=True? When interpreting my results, what can I say about the dependencies between layers with discrete types of relationships?

Many thanks,
Arttu

With B) the overlapping partition is encoded in a manner that takes
correlations between the layers into account, so that the group
memberships will vary between layers only if there is enough evidence
requiring this. With A) the independent models do not take advantage of
any similarity between layers.

Excellent! Once again, many thanks for a prompt and clarifying response!

a.

Hi Tiago,

Still working on this same issue here... A quick question - is there any principled way to reconcile between the overlapping partitions (i.e., return majority blocks) stored in the new vertex property map "bm"?

1. I thought about passing bm to BlockState(b = bm) using a layer-specific GraphView and get_majority_blocks(), but the constructor doesn't accept overlapping partitions as b.
2. I also noticed that I can use state.layer_states[0].get_majority_blocks(), but for some reason the number of vertices in each of the 10 overlapping states differs from the ones they should have, thus not sure about how to interpret them.

gt.seed_rng(42); np.random.seed(42)
state = gt.minimize_blockmodel_dl(g, state = gt.LayeredBlockState, state_args = dict(deg_corr = True, overlap = True, layers = True, ec = g.ep.layer, recs = [g.ep.weight], rec_types = ["real-exponential"]))

print(state)
...
<LayeredBlockState object with 9012 overlapping blocks, 10 layers, degree-corrected, with 2 edge covariates, for graph <Graph object, undirected, with 113 vertices and 4506 edges, 1 internal vertex property, 2 internal edge properties, at 0x2abef36cf6d0>, at 0x2abef48360a0>
...

be = state.get_edge_blocks()
bm = g.new_vertex_property("vector<int>", val = [0] * state.get_B())

for e in g.edges():
    if g.ep.layer[e] != 0: #the layer I want the memberships for
        continue
    u, v = e
    u, v = sorted((u, v))
    r, s = be[e]
    bm[u][r] += 1
    bm[v][s] += 1

for v in g.vertices():
    bm[v] = [i for i in bm[v] if i != 0]

for v in g.vertices()
    print(bm[v])
...
array([9], dtype=int32)
array([5], dtype=int32)
array([22, 1], dtype=int32)
array([15], dtype=int32)
array([ 3, 12], dtype=int32)
...

All help is much appreciated.

Best,
Arttu

Oh well, I noticed that passing the state object to mcmc_equilibrate() improves the partitions such that I'm now getting certain group memberships repeatedly, thus allowing me to extract the "majority" membership for each node.

This solves my issue.

a.