C++ extension: retrieve the filtered graph

Hi all,

I'm currently offloading construction of the uni-modal projection of a bipartite graph from Python to C++ (NB: I am not very familiar with C++ or BGL, _yet_).
Thanks to the docs on writing such extensions, I know roughly where I'm heading.

This is the signature of the projection-function:

    template <class Graph, class ProjectedGraph>
    void projection(const Graph &g, ProjectedGraph &pg)

Here, g is the bipartite graph, and pg is constructed in Python as:
    pg = Graph(GraphView(g, vfilt=lambda v: g.vp.type == 0))

I am thus passing in the graph containing only the vertices I wish to project onto, and add the edges between them in C++.

In Python, pg.num_edges() correctly returns zero.
In C++, it returns the same number of edges as the base-graph it was constructed from, g.

How can I obtain the filtered view of pg in C++, which I expect to have no edges?

Kind regards,
Oliver

Hi all,

I'm currently offloading construction of the uni-modal projection of a bipartite graph from Python to C++ (NB: I am not very familiar with C++ or BGL, _yet_).
Thanks to the docs on writing such extensions, I know roughly where I'm heading.

This is the signature of the projection-function:

     template <class Graph, class ProjectedGraph>
     void projection(const Graph &g, ProjectedGraph &pg)

Here, g is the bipartite graph, and pg is constructed in Python as:
     pg = Graph(GraphView(g, vfilt=lambda v: g.vp.type == 0))

This would remove every vertex from the graph, since g.vp.type == 0 will
always return False.

I also do not understand why do you construct a new Graph object,
instead of working with the GraphView directly.

I am thus passing in the graph containing only the vertices I wish to project onto, and add the edges between them in C++.

In Python, pg.num_edges() correctly returns zero.
In C++, it returns the same number of edges as the base-graph it was constructed from, g.

How can I obtain the filtered view of pg in C++, which I expect to have no edges?

Since you have not provided any code, I can only guess what is going on.

A typical pitfall is that the C++ function num_edges(g) will always
return the number of edges in the _unfiltered_ graph, even if the graph
is being filtered. This is due to the BGL semantics that requires the
function to be computed in time O(1). But that does not mean that the
graph is not being filtered... If you iterate over it, you will get what
you expect.

Best,
Tiago

Tiago de Paula Peixoto wrote:

> Hi all,
>
> I'm currently offloading construction of the uni-modal projection of a bipartite
> graph from Python to C++ (NB: I am not very familiar with C++ or BGL, _yet_).
> Thanks to the docs on writing such extensions, I know roughly where I'm heading.
>
> This is the signature of the projection-function:
>
> template <class Graph, class ProjectedGraph>
> void projection(const Graph &g, ProjectedGraph &pg)
>
> Here, g is the bipartite graph, and pg is constructed in Python as:
> pg = Graph(GraphView(g, vfilt=lambda v: g.vp.type == 0))
This would remove every vertex from the graph, since g.vp.type == 0 will
always return False.

This was a typo whilst drafting my message, it is of course `lambda v: g.vp.type[v] == 0`.

I also do not understand why do you construct a new Graph object,
instead of working with the GraphView directly.

To my understanding, modifying the GraphView by adding edges to it would also reflect in the original, bipartite graph, which I don't want. Would an EdgePropertyMap only on the view with a binary "exists/missing" be the cleaner approach, you reckon?

> I am thus passing in the graph containing only the
> vertices I wish to project onto, and add the edges between them in C++.
>
> In Python, pg.num_edges() correctly returns zero.
> In C++, it returns the same number of edges as the base-graph it was constructed from,
> g.
>
> How can I obtain the filtered view of pg in C++, which I expect to have no edges?
Since you have not provided any code, I can only guess what is going on.

A typical pitfall is that the C++ function num_edges(g) will always
return the number of edges in the _unfiltered_ graph, even if the graph
is being filtered. This is due to the BGL semantics that requires the
function to be computed in time O(1). But that does not mean that the
graph is not being filtered... If you iterate over it, you will get what
you expect.

This is exactly it, thanks for clarifying that iteration will be performed on the filtered set.
The result is as expected, however with the full bipartite graph (|V| 32k, |E| 280k), I run out of memory on a 16GB machine.
Slightly hijacking the topic, is there any room for improvement on the projection code:

#include <graph_tool.hh>

namespace gt = graph_tool;

template <class Graph, class ProjectedGraph>
void projection(const Graph &g, ProjectedGraph &pg) {
  // iterate the filtered nodes, type==0
  for (const auto v : gt::vertices_range(pg)) {
    // iterate the neighbourhood of type == 1
    for (const auto direct_neigh : gt::all_neighbors_range(v, g)) {
      // iterate this neighbourhood, again of type == 0 to determine edges between type-0-nodes
      for (const auto next_neigh : gt::all_neighbors_range(direct_neigh, g)) {
        if (v != nn) {
          boost::add_edge(v, next_neigh, pg);
        }
      }
    }
  }
}

This is the direct translation of what networkx.bipartite_projection [1] does.

Best,
Tiago

Thanks a lot,
Oliver

[1] networkx.algorithms.bipartite.projection — NetworkX 3.2.1 documentation

Tiago de Paula Peixoto wrote:

   Hi all,
  
  I'm currently offloading construction of the uni-modal projection of a bipartite
graph from Python to C++ (NB: I am not very familiar with C++ or BGL, _yet_).
  Thanks to the docs on writing such extensions, I know roughly where I'm heading.
  
  This is the signature of the projection-function:
  
       template <class Graph, class ProjectedGraph>
       void projection(const Graph &g, ProjectedGraph &pg)
  
  Here, g is the bipartite graph, and pg is constructed in Python as:
       pg = Graph(GraphView(g, vfilt=lambda v: g.vp.type == 0))

This would remove every vertex from the graph, since g.vp.type == 0 will
always return False.

This was a typo whilst drafting my message, it is of course `lambda v: g.vp.type[v] == 0`.

I also do not understand why do you construct a new Graph object,
instead of working with the GraphView directly.

To my understanding, modifying the GraphView by adding edges to it would also reflect in the original, bipartite graph, which I don't want. Would an EdgePropertyMap only on the view with a binary "exists/missing" be the cleaner approach, you reckon?

If you don't want to modify in place, then you should remove the
filtering altogether by passing "prune=True" to the Graph() constructor.
This will make the code faster.

Slightly hijacking the topic, is there any room for improvement on the projection code:

#include <graph_tool.hh>

namespace gt = graph_tool;

template <class Graph, class ProjectedGraph>
void projection(const Graph &g, ProjectedGraph &pg) {
   // iterate the filtered nodes, type==0
   for (const auto v : gt::vertices_range(pg)) {
     // iterate the neighbourhood of type == 1
     for (const auto direct_neigh : gt::all_neighbors_range(v, g)) {
       // iterate this neighbourhood, again of type == 0 to determine edges between type-0-nodes
       for (const auto next_neigh : gt::all_neighbors_range(direct_neigh, g)) {
         if (v != nn) {
           boost::add_edge(v, next_neigh, pg);
         }
       }
     }
   }
}

This looks like it adds every edge twice...

Thanks for the valuable advice, Tiago.

Adding a lookup before inserting edges indeed brought down the memory consumption to a manageable load.
I will look into pruning the new graph, right now the code relies too much on the original, pre-filtered vertex indices, which get reindexed upon pruning. Shouldn't be too hard to de-couple that logic, though.

Obrigado!

Oliver