Temporal community detection with DSBM

ulgenklc · February 26, 2021, 7:00pm

This is more of like an implementation question.

I'm PhD student in math and I've been trying out different dynamic community
detection(DCD) algorithms to compare their performance. I was stuck on
graph-tool's implementation for a few months because the documentation for
the layered networks was really short and I noticed this mailing list just
now.

I have multiple questions: first, am I initiating the network correctly
below? My network is a list of weighted adjacency matrices corresponding to
each snapshots.

g = Graph()
e_weight = g.new_ep("double")
e_layer = g.new_ep("int")
g.add_edge_list(edge_list, hashed = True, eprops=[e_weight, e_layer]) ##
edge_list has (v1,v2,w,t) format
g.edge_properties["edge_weight"] = e_weight ## weights of the intralayer
edges
g.edge_properties["edge_layer"] = e_layer ## since this is a multilayer
network, each edge has to come with a layer information that it belongs to(a
layer is a snapshot of the temporal network and has nothing to do with
nested model which I will use the word 'level' for)

Then, when I run the optimizer with the below code, I feel like I'm doing
something wrong. I want to use 'independent layers' model since this is a
temporal network and edge weights within each layer are varying between
[0,1].

for deg_corr in [True, False]:
    state = minimize_nested_blockmodel_dl(g, layers = True, deg_corr =
deg_corr,
                                          state_args=dict(ec =
g.ep.edge_layer,
                                                          layers = True,

recs=[g.ep.edge_weight],

rec_types=["real-exponential"]))

S1 = state.entropy()

# we will pad the hierarchy with another four empty levels, to
# give it room to potentially increase

state = state.copy(bs=state.get_bs() + [np.zeros(1)] * 4, sampling =
True)

    for i in range(100): # this should be sufficiently large
        state.multiflip_mcmc_sweep(beta=np.inf, niter=10)
    S2 = state.entropy()

    print(S1,S2, S2-S1)

I have a serious problem with getting the node labels. The code below only
returns the node membership information for N nodes(size of the aggregated
network). But since this is an evolving network, nodes are expected to
change communities over time, so below code should return NxT(number of
nodes times number of layers) many community labels?

levels = state.get_levels()
for s in levels:
print(s)

This returns the network partition at different levels of the nested SBM but
only for N many nodes.

What am I missing?

tiago · March 1, 2021, 9:04am

You are missing the fact that the layered model only has a single
partition shared between all the layers.

If you want a model that gives a different partition for every layer you
can either:

1. Split each layer into a separate graph, and fit a different SBM
for each layer.

2. Fit an overlapping SBM (i.e. by passing overlap=True to
LayeredBLockState), which will cluster the "half-edges", and hence give
partitions that (potentially) vary between the layers.

Best,
Tiago

ulgenklc · March 1, 2021, 4:19pm

Ahh, yes I noticed that when I scanned through the previous posts, sorry for
replication. I have three more quick questions:

1) If I fitted a different SBM to splitted the graph, wouldn't the
communities in individual layers be temporally discrete? That's not quite
what I want because I want to track an evolution, but I'm open to process
that result in a temporally overlapping fashion(some sort of set matching
algorithms between layers.)

2) When I pass overlap =True to LayeredBlockState, I have NxT many nodes as
a result like you said, that's all fine. However, now I can't get the node
membership using .get_blocks().

levels = states[0].get_levels()

levels[0].get_blocks() is a list of length M(total number of edges in the
network) which should be a list of length NxT, isn't it? Again, it feels
like I'm missing something very trivial here...

3) Also, is there a method for levels[0] that I can call to get the total
number of communities up front? I can see the number of blocks when I do
states[0].print_summary() but I need the integer value of this number for
preprocessing..

Thank you!!

tiago · March 1, 2021, 6:57pm

Ahh, yes I noticed that when I scanned through the previous posts, sorry for
replication. I have three more quick questions:

1) If I fitted a different SBM to splitted the graph, wouldn't the
communities in individual layers be temporally discrete? That's not quite
what I want because I want to track an evolution, but I'm open to process
that result in a temporally overlapping fashion(some sort of set matching
algorithms between layers.)

I'm not sure exactly what you mean, but if you want to "align" the
different partitions you can take a look at

There is an example of this alignment being done here:

2) When I pass overlap =True to LayeredBlockState, I have NxT many nodes as
a result like you said, that's all fine. However, now I can't get the node
membership using .get_blocks().

levels = states[0].get_levels()

levels[0].get_blocks() is a list of length M(total number of edges in the
network) which should be a list of length NxT, isn't it? Again, it feels
like I'm missing something very trivial here...

The overlapping block model consists of a labeling of the half-edges of
the graph. Since each half-edge can belong to only one layer, you can
then tell how the membership of the respective node has changed in the
time slice. Please take a look at the documentation on how to extract
this information, e.g. via

3) Also, is there a method for levels[0] that I can call to get the total
number of communities up front? I can see the number of blocks when I do
states[0].print_summary() but I need the integer value of this number for
preprocessing..

It's worthwhile to peruse the documentation, which contains all the
answers to questions such as this:

ulgenklc · March 17, 2021, 3:14pm

Thank you very much!

I experimented a little bit with overlap = True in the LayeredBlockState and
I don't know if this is a common issue or it is just me but 1) solver takes
way longer to equilibrate compared to BlockState(and networks are not very
large, around N=100 with 4 layers on average, which I also threshold the
weighted network edges) and 2) more interestingly, partitions in the layers
that are towards the end of the temporal network are found perfectly whereas
the communities I want to find in the first a few layers are partitioned
into almost singleton communities. This is what I mean:

<https://nabble.skewed.de/file/t496292/Screen_Shot_2021-03-17_at_11.png>

I wasn't sure if this is because of the low number of iterations which I
kept 100<niter<1000 (I guess reasonable given the network size and number of
edges) but I kept getting similar results over different trials. One other
thing I noticed is that DSBM fits non-communities(the upper left stairs in
the ground truth) with %100 accuracy.

I think I'm going to try fitting a different sbm to each layer as next step
and share how it goes.

Sorry for keep bugging you!!

Best