`local_clustering` versus `extended_clustering` on directed graphs

On graph-tool version 2.98extended_clustering of degree 1 does not match local_clustering for a directed graph even when specifying undirected=True. Is this expected or a bug?

import graph_tool.all as gt
import numpy as np

graph = gt.collection.data["football"]  # Undirected graph
lc = gt.local_clustering(graph, undirected=True)
ec = gt.extended_clustering(graph, undirected=True)
np.allclose(lc.a, ec[0].a)  # This is True

graph = gt.collection.data["celegansneural"]  # Directed graph
lc = gt.local_clustering(graph, undirected=True)
ec = gt.extended_clustering(graph, undirected=True)
np.allclose(lc.a, ec[0].a)  # This is False

Hi Leo. I can take a look at this in the next few days if @tiago doesn’t beat me to it. I wonder if it could have to do with how multiple edges get handled in either case.

I actually did one other quick check and can add that for the ‘celegans’ network the local clustering doesn’t change whether we pick undirected to be True or False.

My “quick check” was in fact too quick, I just didn’t notice that local_clustering defaults undirected=True, so setting undirected=False does yield different results.

I believe the difference you’re seeing is mostly related to how multigraphs are handled by the local_clustering algorithm.

The implementation corresponds to what is described for weighted graphs in the documentation, and if no weights are passed it uses unitary weights. As this formulation considers the graph to be a real valued adjacency matrix, the case of multiple edges is not well defined.

That said, it is true that the current behavior is not very intuitive, such that a triangle with multiple edges yields a coefficient lower than unity:

In [21]: import graph_tool.all as gt
    ...: g = gt.Graph(directed=False)
    ...: g.add_edge_list(
    ...:     [
    ...:         (0, 1),
    ...:         (0, 1),
    ...:         (1, 2),
    ...:         (1, 2),
    ...:         (2, 0),
    ...:         (2, 0),
    ...:     ]
    ...: )
    ...: lcc = gt.local_clustering(g)
    ...: print(lcc.a)
[0.66666667 0.66666667 0.66666667]

I guess the function could either emit a warning or raise an error if passed a multigraph, or project it onto a simple graph before calculating the coefficient. One would perhaps prefer to coalesce the multiple edges as a weight, but the formulation excludes weights outside [0, 1] so there would be a choice of normalization involved (of which a simple projection would be a special case).

But maybe @tiago has a better idea. I could find some more time to help out, so let me know if it’s worth creating an issue or later a PR. Cheers!