Clustering coefficient with parallel edges

Hi Tiago,

I was recently comparing the behavior of the clustering coefficient in
different graph libraries and I was surprised by the results given by
graph-tool with parallel edges:

    import graph_tool as gt
    from graph_tool.clustering import local_clustering
    from graph_tool.stats import remove_parallel_edges

    num_nodes = 5
    edge_list = [(0, 3), (3, 0), (1, 0), (0, 1), (1, 2), (2, 1), (2, 4),
    (4, 2), (4, 1), (1, 4), (4, 3), (3, 4)]

    g = gt.Graph()

    g.add_vertex(num_nodes)

    g.add_edge_list(edge_list)

    print(local_clustering(g, undirected=False).a)
    print(local_clustering(g, undirected=True).a)

    g.set_directed(False)
    remove_parallel_edges(g)
    print(local_clustering(g, undirected=True).a)

Gives:

    [0. 0.33333333 1. 0. 0.33333333]
    [0. 0.26666667 0.66666667 0. 0.26666667]
    [0. 0.33333333 1. 0. 0.33333333]

And I must say I was expecting the same result for all three ^^

The graph:

From what I understand, then, it seems that, when using
undirected=True, graph-tool considers reciprocal edges as two different
ones and considers the possible triangles between parallel edges as
distinct.

Is this desired behavior, and if so, why?
I know it stems from the mathematical formula, but I think this one was
derived from simple graphs and I'm not sure I see the physical meaning
of the "extension" to parallel edges, besides the fact that it would
perhaps give the same result as weighting the edges by their number.

If this is indeed desired, is there a quick way to do the equivalent of
setting the graph to undirected and removing the parallel edges in
"view-mode" (i.e. making the removal temporary, like a filter)?

Thanks in advance for your help and for your work on graph-tool.

Best,
Tanguy

attachment.html (2.88 KB)

jhjjilnoicknoeba.png

From what I understand, then, it seems that, when using undirected=True,
graph-tool considers reciprocal edges as two different ones and
considers the possible triangles between parallel edges as distinct.

Is this desired behavior, and if so, why?> I know it stems from the mathematical formula, but I think this one was
derived from simple graphs and I'm not sure I see the physical meaning
of the "extension" to parallel edges, besides the fact that it would
perhaps give the same result as weighting the edges by their number.

Yes, it's desired behavior.

Ignoring parallel edges would have complicated and slowed down the code,
and in any case can be easily implemented by the user (see below), who
should have something specific in mind when attempting to compute the
value for multigraphs.

If this is indeed desired, is there a quick way to do the equivalent of
setting the graph to undirected and removing the parallel edges in
"view-mode" (i.e. making the removal temporary, like a filter)?

Yes, just use:

  u = GraphView(g, directed=False)
  GraphView(u, efilt=label_parallel_edges(u).fa == 0)

Best,
Tiago

Hi Tiago,

thanks for the quick and precise answer!

Yes, it's desired behavior.

Ignoring parallel edges would have complicated and slowed down the code,
and in any case can be easily implemented by the user (see below), who
should have something specific in mind when attempting to compute the
value for multigraphs.

OK, do you think it would be relevant to include it in the documentation?
It was not very obvious to me...

If this is indeed desired, is there a quick way to do the equivalent of
setting the graph to undirected and removing the parallel edges in
"view-mode" (i.e. making the removal temporary, like a filter)?

Yes, just use:

   u = GraphView(g, directed=False)
   GraphView(u, efilt=label_parallel_edges(u).fa == 0)

Works like a charm, thank you!

Best,
Tanguy

But doesn't the function compute exactly what it says in the documentation?

But doesn't the function compute exactly what it says in the documentation?

It does, but one needs to know what the undirected=True implies for the
graph, otherwise the result is surprising (or just not what one's think
one is computing, if not working on a super simple test graph where it
is easy to check).

I think some people might end up not noticing and therefore would not
compute what they think they are computing...

Best,
Tanguy

But doesn't the function compute exactly what it says in the
documentation?

It does, but one needs to know what the undirected=True implies for the
graph, otherwise the result is surprising (or just not what one's think
one is computing, if not working on a super simple test graph where it
is easy to check).

It is consistent with how directionality is implement in graph-tool,
which preserves the total number of edges. Furthermore, the concept of
multigraph vs simple graph is orthogonal to directed vs undirected graphs.

So this is not really about the clustering coefficient computation.

I think some people might end up not noticing and therefore would not
compute what they think they are computing...

I don't like second-guessing misinterpretations like this. What is
important is to be consistent.

The surprise came from not understanding how the undirected/directed
filter works. This kind of surprise will appear in almost every function
of this kind, and I don't think that putting a warning everywhere is the
right approach.

Besides, I think the way the undirected filter works is perfectly
intuitive. What you seemed to be expecting was for a simple directed
graph to become a simple undirected graph. But in this case how would
property maps be handled? Suppose the incoming edge had a weight value
and the outgoing edge had another, which one should be kept in the
simple graph? How would a directed multigraph be handled when converted
to undirected? Should it magically become a simple graph? Wouldn't that
be a lot more surprising?

In graph-tool all graphs are multigraphs per default, which makes
decisions such as these easier.

Best,
Tiago

Hi Tiago,

It is consistent with how directionality is implement in graph-tool,
which preserves the total number of edges. Furthermore, the concept of
multigraph vs simple graph is orthogonal to directed vs undirected graphs.

So this is not really about the clustering coefficient computation.

I understand.

Besides, I think the way the undirected filter works is perfectly
intuitive. What you seemed to be expecting was for a simple directed
graph to become a simple undirected graph. But in this case how would
property maps be handled? Suppose the incoming edge had a weight value
and the outgoing edge had another, which one should be kept in the
simple graph? How would a directed multigraph be handled when converted
to undirected? Should it magically become a simple graph? Wouldn't that
be a lot more surprising?

Yes, I was thinking that way because there were no attributes in the
current graph, but you are right, of course.
For instance igraph provides a keyword for edge coalescence to sum/take
the max/other of the edge attributes; is there a similar way of doing
this that is already available in graph-tool?

Best,
Tanguy

Yes, take a look at the condensation_graph() function.

Ah, I did not think I could use it that way!

I'll try it out, thank you.

Best,
Tanguy

PS: (sorry for the spam, I hit reply again)

Hello Tiago and Tanguy,

I'm following your discussion with great interest as I have a similar
task. I want to convert a multigraph into a simple graph. This works
very nicely with

GraphView(u, efilt=label_parallel_edges(u).fa == 0)

However, removing the parallel edges this way means that the edge
properties of the parallel edges are lost. I have an edge property
called "eqntag" which is a string and I want to concatenate all eqntags
from the parallel edges into the simple graph. I think this is what
Tanguy meant by "edge coalescence". I did not understand how to use
condensation_graph() for this purpose, so I've written my own function:

def remove_parallel_edges_merge_eqntags(g):
    # create simple graph g2:
    g2 = gt.GraphView(g, efilt=gt.label_parallel_edges(g).fa == 0)
    # loop over the remaining edges in the simple graph:
    for e in g2.edges():
        # create a list with all edges from g that connect the same
        # vertices as the current edge in g2:
        edgelist = g.edge(e.source(), e.target(), all_edges=True)
        # replace eqntag by semicolon-separated merged eqntag:
        g2.ep.eqntag[e] = ';'.join([g.ep.eqntag[edge] for edge in edgelist])
    return g2

Best regards
Rolf