We are using the stochastic block model in a research project and trying to formulate how we would best utilise the community structure results downstream, and would welcome any suggestions.
We fit a nested stochastic block model with an edge weight, which gives us a hierarchical partition. From this, we want to report not just the underlying community structure, but also some form of corresponding weights of given blocks, how ‘important' a given block is with respect to the edge weight, say.
Considering an example on the weighted foodweb network:
import graph_tool.all as gt; import numpy as np; import matplotlib;
g = gt.collection.ns["foodweb_baywet"]
state = gt.minimize_nested_blockmodel_dl(g, state_args=dict(recs=[g.ep.weight],
rec_types=["real-exponential"]))
This yields a hierarchical community structure, but how would you most suitably determine what communities were ‘most’ or ‘least’ important/influential/correlated with respect to the edge weight?
I have considered whether this might be done with centrality metrics on the blocks (or perhaps vcount and ecount data from a condensation graph on the hierarchical blocks), but was keen to see if you had a more innovative idea...
It is impossible to answer this kind of question absent of a very
specific context and objective in mind. One of the biggest sins in
network science is the proliferation of centrality metrics that attempt
to define which node is "best" or "most important" as if there was a
general answer to this question.
So, I can't tell you which community is most "important"; you have to
tell me what you mean by this.
Apologies for not being clearer. Let me try and make my example more specific:
I have a network defined from brain imaging and am passing an edge weight of a given clinical variable. In this example the nodes are voxels of brain tissue, the edges are the presence of the voxels being structurally connected in imaging space, and the edge weight is the relationship of this to a clinical variable, a weight which incorporates ageing. I want to firstly derive the community structure, passing the edge weight, which ultimately gives me clusters of voxels. But, in addition I want to derive some formulation of a weight for the community blocks for their relation to the passed edge weight. For instance, in this example I would want a block which contains voxels within the hippocampus to be negatively associated to an age weight given atrophy associated with age, but a block containing voxels of the ventricular system to be positively associated as they will enlarge with age.
I’m wondering why you use the Stochastic Blockmodel if you want to identify community structure? The SBM groups nodes with similar positions but, roughly speaking, does not maximize the number of edges that constitute communities.
Apologies for not being clearer. Let me try and make my example more
specific:
I have a network defined from brain imaging and am passing an edge
weight of a given clinical variable. In this example the nodes are
voxels of brain tissue, the edges are the presence of the voxels being
structurally connected in imaging space, and the edge weight is the
relationship of this to a clinical variable, a weight which incorporates
ageing. I want to firstly derive the community structure, passing the
edge weight, which ultimately gives me clusters of voxels. But, in
addition I want to derive some formulation of a weight for the community
blocks for their relation to the passed edge weight. For instance, in
this example I would want a block which contains voxels within the
hippocampus to be negatively associated to an age weight given atrophy
associated with age, but a block containing voxels of the ventricular
system to be positively associated as they will enlarge with age.
How would you go about doing this?
The seemingly obvious answer is to look at the distribution of edge
covariates on edges incident on the groups. But it is still not very
clear exactly what you want to find.
In any case, this is a question about a particular research problem, so
I don't believe it is appropriate for this list, which is about using
graph-tool.