Issue with SBM Method for Overlapping Community Detection in Graphs

Hi here,

I hope this email finds you well?

I am currently working on a Python script to detect overlapping communities using the graph-tool library. I copied the first lines of my script from the graph-tool documentation (“minimize_blockmodel_dl — graph-tool 2.68 documentation”).

My purpose is to obtain my overlapping communities in the form of a list of sets, where each community is represented as a set. However, I am encountering an issue where the detected communities do not match the plotted ones. Specifically, the plot shows two communities, but the variable communities in my final output only shows one community, indicating no community.

I believe the problem lies in how I am extracting the communities. I would greatly appreciate your assistance on how to resolve this issue.

Thank you!

The code is below:

import numpy as np
import graph_tool.all as gt
import networkx as nx
from cdlib import NodeClustering, evaluation

# Set seeds for reproducibility
np.random.seed(42)
gt.seed_rng(42)

# Load the "football" graph
g = gt.collection.data["polbooks"]
print(g)

# Detect communities using the degree-corrected stochastic blockmodel
state = gt.minimize_blockmodel_dl(g, state=gt.OverlapBlockState)

# Print the state blocks
print(state.get_blocks())

# Draw the graph with the detected communities
state.draw()

# Generate layout for visualization
pos = gt.sfdp_layout(g)

# Ensure we use the correct vertex property map
vertex_blocks = state.get_blocks()

# Extract communities in the form of a list of sets
num_communities = vertex_blocks.a.max() + 1
communities = [set() for _ in range(num_communities)]

for v in g.vertices():
block_id = vertex_blocks[v]
communities[block_id].add(int(v))

# Filter out empty communities
non_empty_communities = [community for community in communities if community]
1 Like

Hello

Specifically, the OverlapBlockState returns a membership vector where each node can belong to multiple communities, unlike the non-overlapping version. This means you need to handle multiple memberships correctly when extracting the communities.

Below is modified version of your code that should properly extract overlapping communities:

import numpy as np
import graph_tool.all as gt

# Set seeds for reproducibility
np.random.seed(42)
gt.seed_rng(42)

# Load the "polbooks" graph
g = gt.collection.data["polbooks"]

# Detect communities using the degree-corrected stochastic blockmodel with overlap
state = gt.minimize_blockmodel_dl(g, state_args=dict(overlap=True))

# Draw the graph with the detected communities
state.draw()

# Extract communities in the form of a list of sets
vertex_blocks = state.get_blocks()
num_blocks = vertex_blocks.max() + 1
communities = [set() for _ in range(num_blocks)]

for v in g.vertices():
    memberships = vertex_blocks[v]
    for block_id in memberships:
        communities[block_id].add(int(v))

# Filter out empty communities
non_empty_communities = [community for community in communities if community]

# Print the non-empty communities
print("Detected overlapping communities:")
for i, community in enumerate(non_empty_communities):
    print(f"Community {i + 1}: {community}")

This version properly handles overlapping community detection and outputs the communities in the correct format.

Thank you
nolanmaris

Good Morning @nolanmaris,

Thank you very much for the solution provided.

However, it is still not working. When I run the code, I get the error “AttributeError: ‘VertexPropertyMap’ object has no attribute ‘max’”. Moreover, even when I set a random value for num_blocks, I always get some errors. Could you please run the code yourself to see what I mean?

Thank you.

Good evening @nolanmaris ,

Thank you once more for the provided code.

Using what you have given before, I tried to solve some issues to make the code run.

In the end, I get the following code that runs. However, to be honest, I am not yet totally convinced. On all the graphs to which I applied the algorithm, including the overlapping LFR benchmarks (LFR graph with overlapping ground truth), I obtain non-overlapping communities. Maybe Professor @tiago can provide some insight into this. I think that among all the graphs I have used to test my source code, there should be at least one that detects overlapping communities.

Because I am in the process of writing a paper urgently, I would like to have the results of the SBM in the overlapping case. I am concerned that I might not have handled the overlapping properly in my code, which means I might be comparing a non-overlapping method (that I think is overlapping) to other overlapping algorithms.

You can find the code below.

import numpy as np
import graph_tool.all as gt

# Set seeds for reproducibility
np.random.seed(42)
gt.seed_rng(42)

# Load the “lesmis” graph
g = gt.collection.data["polbooks"]

# Detect communities using the degree-corrected stochastic blockmodel with overlap
state = gt.minimize_blockmodel_dl(g, state=gt.OverlapBlockState)

# Draw the graph with the detected communities
state.draw()

# Extract overlapping communities
bv, bc_in, bc_out, bc_total = state.get_overlap_blocks()

# Initialize a dictionary to hold communities
communities = {}

# Initialize a set to hold overlapping nodes
overlapping_nodes = set()

# Assign vertices to their respective overlapping communities
for vertex in g.vertices():
    memberships = bv[vertex]  # This gives the block memberships of the vertex
    if len(memberships) > 1:
        overlapping_nodes.add(int(vertex))
    for membership in memberships:
        if membership not in communities:
            communities[membership] = set()
        communities[membership].add(int(vertex))

# Print the overlapping nodes
print("Overlapping nodes:", overlapping_nodes)

# Print the communities with overlaps
for community, members in communities.items():
    print(f"Community {community}: {members}")

Sincerely,
Martin WAFFO KEMGNE