I am just wondering if it is possible to define a block model where some of the nodes in your graph are explicitly not assigned to a block?
In more detail, I have a network where some, but not all, nodes can already be assigned to a block. Within the whole network, I have the optimisation problem of assigning as many nodes as possible to blocks, whilst I know that some will not fit any particular block (without knowing which). Is there a way to have say a ‘noise’ block of unassigned nodes, or nodes below some threshold of p which are excluded from the partitioning?
Further, is it possible to then start with the initial partitioned graph and define the block state based on “known” blocks and refine it to find an optimal partition - by both allowing for movement of assigned nodes between blocks, and assigning unallocated nodes to a block or moving them to the ‘unallocated’ block.
Is this currently approachable?
I’m happy to provide any more information as needed as well! This is a very general description.
The notion of nodes that are not assigned to any group does not exist in an SBM. If you want some nodes (and their incident edges) not to affect the clustering of the graph, you should remove (or filter) them from the graph.
However, if you want to force some nodes to belong to a given group, this is a different matter.
You can do that by specifying the bfield parameter to BlockState:
bfield : VertexPropertyMap (optional, default: ``None``)
Local field acting as a prior for the node partition. This should be a
vector property map of type ``vector<double>``, and contain the
log-probability for each node to be placed in each group.
So, for example, if you set bfield[v] = [-inf, -inf, 0, -inf], then vertex v will be fixed to group label 2.
Note that the values of group labels that exceed the size of this vector will take the values of the last item. So the above means that group label 100 will have a log-probability of -inf.
So for the first problem, as I don’t know before hands which nodes may need to be remove, would it be reasonable to use the marginals for each node to reason that nodes which often change between clusters are more likely to be ‘noise’?
Sorry, I used ‘noise’ somewhat liberally. As an example I would mean a graph built of two communities joint by a single vertex where all edges between the two communities pass through this one vertex. Lets assume the two communities are identical and have equal edges passing through the joining vertex. To my understanding, an SBM would arbitrarily assign the joining vertex to one of the communities ~50% of the time. It is these kinds of vertices I am looking to identify, at least in the case where you fix the number of blocks to 2 before hand.
Another analogous concept is that of noise points found from say, the DBSCAN clustering algorithm or it’s equivalents.
In case of nodes with “ambiguous” group membership (I would not call this “noise”), two things can happen:
As you say, the node will have a 50% membership to each group in the posterior distribution.
The node will be put in a group of its own.
If the degree of the node is large enough, option 2 will be the most parsimonious, unless you constrain the number of groups, in which case option 1 will be the only possible outcome.