Potential bug in the clustering.motif_significance function


I am analyzing the motifs in a network with self-loops. I ran the clustering.motif function which identifies the motifs and the respective counts for each motif. Then, I looked at the clustering.motif_significance function and the full output also includes motifs and counts (along with zscores, sample counts and sample sd).

However, the lengths of the motif arrays produced by the functions are different, even though in the documentation it is written that the two functions produce the same motif output. Additionally, the counts array generated by clustering.motif_significance contains 0 values at the end, but I think those values correspond to motifs in a different part in the motif array (and should probably not be there if the motif occurs 0 times).

There, are also nan values in the zscores array, potentially caused by the self loops.

In the example below, I added a short function that checks the isomorphism of the motifs generated by the two functions and at some points the indices of the isomorphic pairs do not coincide.

  1. Your exact graph-tool version: 2.58
  2. Your operating system: MacOS
  3. A minimal working example that shows the problem:
from graph_tool import all as gt

g = gt.random_graph(100, lambda: (5,5), self_loops=True)
motifs_1, counts_1 = gt.motifs(g, 3)
motifs_2, zscores, counts_2, s_counts, s_dev  = gt.motif_significance(g, 3, self_loops = True, full_output = True)

#Print motif lengths and counts
print(f"Motif_1 array length: {len(motifs_1)}")
print(f"Motif_2 array length: {len(motifs_2)}")

#Print Z-Scores

#Graph with index 18 is different in the two motif arrays but the count is the same

gt.graph_draw(motifs_1[18],  vertex_font_size=12,  edge_pen_width=1.5,
               output_size=(1000, 1000),  vertex_color="black",
               edge_font_size=10, edge_text_color="red")

gt.graph_draw(motifs_2[18],  vertex_font_size=12,  edge_pen_width=1.5,
               output_size=(1000, 1000),  vertex_color="black",
               edge_font_size=10, edge_text_color="red")

# Initialize a list to store isomorphic pairs
isomorphic_pairs = []

# Iterate through the graphs in motifs_array1
for index, graph in enumerate(motifs_1):
    # Iterate through the graphs in motifs_1
    for s_index, s_graph in enumerate(motifs_2):
        # Check if the current graph in motifs_array2 is isomorphic to the current graph in motifs_1
        if gt.isomorphism(graph, s_graph):
            isomorphic_pairs.append((index, s_index))

# Print the isomorphic pairs
if isomorphic_pairs:
    for motif_index, s_index in isomorphic_pairs:
        print(f"motif_1 index: {motif_index}, motif_2 index: {s_index}")
    print("No isomorphic pairs found.")


Thank you for the bug report. Please open an issue at https://graph-tool.skewed.de/issues so this can be tracked properly.

Thank you! I opened an issue on GitLab.

Writing here the issue for reference: Potential bug in the clustering.motif_significance function (#772) · Issues · Tiago Peixoto / graph-tool · GitLab


I also found something else connected to self loops and isomorphism. I was working some of the motifs trying to figure out why they appear multiple times, when I realized the isomorphism identifies different graphs with self loops as isomorphic. Here is an example of two such graphs:

g1 = Graph(directed=True)
g1.add_edge(0, 1)  
g1.add_edge(1, 2)  
g1.add_edge(0, 0)  
g1.add_edge(2, 2) 
g1.add_edge(1, 1)  

g2 = Graph(directed=True)
g2.add_edge(0, 1)  
g2.add_edge(1, 2)  
g2.add_edge(2, 0)  
g2.add_edge(0, 2)  
g2.add_edge(1, 1)  


In the first graph nodes 0 and 2 are not connected, but it still appears isomorphic to the graph where 0 and 2 are connected. Maybe that is part of the reason why some motifs appear multiple times?


Yes, this could be related. It looks like self-loops are being ignored. It should be an easy fix. I’ll look into it.