Block models with 'unasigned' verticies

Nik_Drummond · April 9, 2024, 12:37pm

Hi,

I am just wondering if it is possible to define a block model where some of the nodes in your graph are explicitly not assigned to a block?

In more detail, I have a network where some, but not all, nodes can already be assigned to a block. Within the whole network, I have the optimisation problem of assigning as many nodes as possible to blocks, whilst I know that some will not fit any particular block (without knowing which). Is there a way to have say a ‘noise’ block of unassigned nodes, or nodes below some threshold of p which are excluded from the partitioning?

Further, is it possible to then start with the initial partitioned graph and define the block state based on “known” blocks and refine it to find an optimal partition - by both allowing for movement of assigned nodes between blocks, and assigning unallocated nodes to a block or moving them to the ‘unallocated’ block.

Is this currently approachable?

I’m happy to provide any more information as needed as well! This is a very general description.

Thank you,

Nik

tiago · April 10, 2024, 2:56pm

The notion of nodes that are not assigned to any group does not exist in an SBM. If you want some nodes (and their incident edges) not to affect the clustering of the graph, you should remove (or filter) them from the graph.

However, if you want to force some nodes to belong to a given group, this is a different matter.

You can do that by specifying the bfield parameter to BlockState:

 bfield :  VertexPropertyMap (optional, default: ``None``)
      Local field acting as a prior for the node partition. This should be a
      vector property map of type ``vector<double>``, and contain the
      log-probability for each node to be placed in each group.

So, for example, if you set bfield[v] = [-inf, -inf, 0, -inf], then vertex v will be fixed to group label 2.

Note that the values of group labels that exceed the size of this vector will take the values of the last item. So the above means that group label 100 will have a log-probability of -inf.

Nik_Drummond · April 12, 2024, 9:39am

Ah great thanks for this.

So for the first problem, as I don’t know before hands which nodes may need to be remove, would it be reasonable to use the marginals for each node to reason that nodes which often change between clusters are more likely to be ‘noise’?

tiago · April 13, 2024, 7:09pm

I don’t really understand at all what you mean by “noise” in this context.

Nik_Drummond · April 15, 2024, 9:09am

Hi Tiago,

Sorry, I used ‘noise’ somewhat liberally. As an example I would mean a graph built of two communities joint by a single vertex where all edges between the two communities pass through this one vertex. Lets assume the two communities are identical and have equal edges passing through the joining vertex. To my understanding, an SBM would arbitrarily assign the joining vertex to one of the communities ~50% of the time. It is these kinds of vertices I am looking to identify, at least in the case where you fix the number of blocks to 2 before hand.

Another analogous concept is that of noise points found from say, the DBSCAN clustering algorithm or it’s equivalents.

tiago · April 16, 2024, 8:17am

In case of nodes with “ambiguous” group membership (I would not call this “noise”), two things can happen:

As you say, the node will have a 50% membership to each group in the posterior distribution.
The node will be put in a group of its own.

If the degree of the node is large enough, option 2 will be the most parsimonious, unless you constrain the number of groups, in which case option 1 will be the only possible outcome.