Multiple edge weights and model comparison in the SBM

Dear Tiago / Graph Tool community,

I am running the SBM on a directed network where edges are all conditional probability of one event, given the other, approximately 60 nodes.

This is generated originally from a binary matrix of repeated samples and if these 60 nodes as variables occurred or did not. However, in this binary matrix I have some missing data, which cannot be easily imputed, and I suspect is not missing at random.

I am keen to find a way to either control for this as a covariate, or alternatively show that the missingness of data does not cause a big issue in the SBM fit.
One possibility I have been exploring is by computing a conditional probability network of the missingness, and passing that as an additional edge covariate to the model.

So, in this instance I end up with two SBMs:
- 1 with a single edge weight, the conditional probability of the events occurring. The community structure of this is meaningful in its interpretation.
- 1 with two edge weights, the conditional probability of the events occurring, and also the conditional probability of the events being missing from the dataset (missingness matrix). The community structure of this is not as meaningful - largely events that are missing together will just cluster together. (For instance, I have several nodes representing given decades of life which will all just cluster together, as if being 20 is unknown then if they are 30 is also unknown.)

What is the best way to robustly compare the two SBM models and (hopefully) illustrate the model fit is better with the conditional probability of the event alone?
Alternatively, do you have any suggestions on how I would account for the missingness in the data when constructing the model?

Thank you for your time,
James

Dear James,

This is a very involved question, and it's a bit difficult to get to the
bottom of what you want.

I'll start with the last question, which I think can be addressed more
directly:

Alternatively, do you have any suggestions on how I would account for the missingness in the data when constructing the model?

I would suggest for you to take a look at the following paper which
addresses directly the missing data problem:

  Phys. Rev. X 8, 041011 (2018) - Reconstructing Networks with Unknown and Heterogeneous Errors

The idea of coercing the "missingness" as edge covariates as you
describe does not seem correct to me. Missing data is not data, it's
lack of data. The paper above puts it like this, and of course the code
is in graph-tool as well.

What is the best way to robustly compare the two SBM models and (hopefully) illustrate the model fit is better with the conditional probability of the event alone?

You can only compare models that generate the same data. If they do,
this is described in the documentation, i.e. you can compare the
description length or Bayesian evidence.

If the two SBMs generate different data, then the question is ill posed.
It would be like comparing a discrete geometric distribution with a
Gaussian; they will always be different, as they do not even have the
same support.

Best,
Tiago

Dear Tiago,

Thank you for your reply, this is really helpful and I think that the MixedMeasuredBlockState is just what I need. I would like to run the MixedMeasuredBlockState, collect the marginals and then store these as an edge property on the original graph for later analysis with my other vertex and edge properties. However, I seem to be unable to do this because the graph object generated does not match that of the original. This occurs both if the number of edges remain the same, and similarly if some are removed from the MixedMeasuredBlockState analysis.

My self-contained example, with only small addition to your pre-existing functional example online:

from graph_tool.all import *; import graph_tool.all as gt; import matplotlib; import math; import numpy as np; import seaborn as sns

#Graph Tool online example
g = gt.collection.data["lesmis"].copy()
n = g.new_ep("int", 2) # number of measurements
x = g.new_ep("int", 2) # number of observations
e = g.edge(11, 36)
x[e] = 1 # pretend we have observed edge (11, 36) only once
e = g.add_edge(15, 73)
n[e] = 2 # pretend we have measured non-edge (15, 73) twice,
x[e] = 1 # but observed it as an edge once.
state = gt.MixedMeasuredBlockState(g, n=n, n_default=1, x=x, x_default=0)

gt.mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10))

u = None # marginal posterior edge probabilities
bs = [] # partitions
cs = [] # average local clustering coefficient

def collect_marginals(s):
    global u, bs, cs
    u = s.collect_marginal(u)
    bstate = s.get_block_state()
    bs.append(bstate.levels[0].b.a.copy())
    cs.append(gt.local_clustering(s.get_graph()).fa.mean())

gt.mcmc_equilibrate(state, force_niter=5000, mcmc_args=dict(niter=10),
                    callback=collect_marginals)

#I then want to pull the edge probability from the collected marginals and store that as an edge property on the original graph.
print(u.ep.count.get_array().shape)
print(u.ep.eprob.get_array().shape)

#histogram of edge probabilities
sns.distplot(u.ep.eprob.get_array())

#This will always give an error as it is unable to collect the edge probabilities from the marginal graph and store to the original. ?how to fix
g.ep['mixedmeasures_eprob'] = u.ep.eprob
#ValueError: Received property map for graph <Graph object, undirected, with 77 vertices and 439 edges, 2 internal edge properties, 1 internal graph property, at 0x14e7fce10> (base: <Graph object, undirected, with 77 vertices and 439 edges, 2 internal edge properties, 1 internal graph property, at 0x14e7fce10>), expected: <Graph object, undirected, with 77 vertices and 255 edges, 2 internal vertex properties, 1 internal edge property, 2 internal graph properties, at 0x151a3de90> (base: <Graph object, undirected, with 77 vertices and 255 edges, 2 internal vertex properties, 1 internal edge property, 2 internal graph properties, at 0x151a3de90>)

How do you manage to get around this issue? Presumably there is a simpler way with the graph-tool syntax than forlooping everything and matching edges up between the two graphs…?

Thank you again!
James

attachment.html (8.51 KB)

Obviously, the reconstructed graph is different from what has been
measured. Before we talk about how to do anything efficiently, can you
first explain what "get around" this issue means? How do you expect two
graphs with a different set of edges to share the same edge property map?

Hi Tiago,

It seemed reasonable that one might want to review the results of a MixedMeasuredBlockState with respect to the original graph. For instance, perhaps I have certain vertex or edge properties within my original graph object I want to review for association to this model's collected eprobs. Another example, perhaps I want to run both the Measured and MixedMeasured and compare/contrast the eprobs from the marginals stored as separate edge properties on the graph. Is there a way to marry up these results of the MixedMeasuredBlockState eprob back to the original graph?

Unless intuitive syntax within GraphTool I am not aware of, I suppose it would be a case of pulling out the edge list from the original and the model, then taking the union of that.

Thanks,
James

It seemed reasonable that one might want to review the results of a MixedMeasuredBlockState with respect to the original graph. For instance, perhaps I have certain vertex or edge properties within my original graph object I want to review for association to this model's collected eprobs. Another example, perhaps I want to run both the Measured and MixedMeasured and compare/contrast the eprobs from the marginals stored as separate edge properties on the graph. Is there a way to marry up these results of the MixedMeasuredBlockState eprob back to the original graph?

There are many ways to do this, I was just asking you to be more
specific about what you wanted to accomplish.

Unless intuitive syntax within GraphTool I am not aware of, I suppose it would be a case of pulling out the edge list from the original and the model, then taking the union of that.

To take the union of all edges between two graphs, you can use the
graph_union(g1, g2) function, while passing the 'intersection' parameter
with the Graph.vertex_index property map of g2 (to avoid the nodes being
duplicated). This will create parallel edges for those that appear in
both graphs, which you can remove with remove_parallel_edges(). You can
then copy any edge property map from either g1 or g2 by calling
Graph.copy_property() on the union graph.

Thanks for the suggestion Tiago. Unfortunately, it is resulting in the same issue for me.

In extension to the previous example:
ug = gt.graph_union(g,u,intersection=u.vertex_index,internal_props=True)
gt.remove_parallel_edges(ug)
print(ug)

print(g.vp.label[0])
#Myriel

print(ug.vp.label[0])
#’ ‘
#Presumably graph_union wipes the original properties to blank?

#reclaim the lost labels...
label=g.copy_property(src=g.vp.label)
label[0]
#’Myriel’

ug.vp["copied_label"]=label #or similarly, ug.copy_property(src=g.vp.label,tgt=ug.vp.label)
#ValueError: Received property map for graph <Graph object, undirected, with 77 vertices and 255 edges, 2 internal vertex properties, 1 internal edge property, 2 internal graph properties, at 0x152bc7410> (base: <Graph object, undirected, with 77 vertices and 255 edges, 2 internal vertex properties, 1 internal edge property, 2 internal graph properties, at 0x152bc7410>), expected: <Graph object, undirected, with 77 vertices and 447 edges, 2 internal vertex properties, 3 internal edge properties, 3 internal graph properties, at 0x153ebb890> (base: <Graph object, undirected, with 77 vertices and 447 edges, 2 internal vertex properties, 3 internal edge properties, 3 internal graph properties, at 0x153ebb890>)

#naturally, collecting the eprob will cause the same error:
ug.copy_property(src=u.ep.eprob,tgt=ug.ep.eprob)
#ValueError: ….

Could you kindly provide a working example to the advised solution?

James

attachment.html (5.89 KB)

Why won't you even attempt to do what I described?

You keep insisting on passing properties belonging to other graphs as
internal properties, despite the explicit error messages that tell you
that you cannot do that. You need to use `Graph.copy_property()`.

Please show me an example of where you do something that is _supposed_
to work, so I can say what the problem is.

Please send complete and minimal examples.

Best,
Tiago

Tiago,

I’m sorry that you feel that my code to use the graph_union with intersection, following which to remove parallel edges, then call Graph.copy_property on the union graph, did not follow your instructions which, as far as I can read, specifically asked I use all of those and in that order.

# To take the union of all edges between two graphs, you can use the graph_union(g1, g2) function, while passing the 'intersection’ parameter with the Graph.vertex_index property map of g2 (to avoid the nodes being duplicated). This will create parallel edges for those that appear in both graphs, which you can remove with remove_parallel_edges(). You can then copy any edge property map from either g1 or g2 by calling Graph.copy_property() on the union graph.

A complete and minimal example with just vertex labels is the following:

from graph_tool.all import *; import graph_tool.all as gt; import matplotlib; import math; import numpy as np; import seaborn as sns
g = gt.collection.data["lesmis"].copy()
state = gt.MixedMeasuredBlockState(g, n=n, n_default=1, x=x, x_default=0)
gt.mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10)
u = None; bs = ; cs =

def collect_marginals(s):
    global u, bs, cs
    u = s.collect_marginal(u)
    bstate = s.get_block_state()
    bs.append(bstate.levels[0].b.a.copy())
    cs.append(gt.local_clustering(s.get_graph()).fa.mean())

gt.mcmc_equilibrate(state, force_niter=5, mcmc_args=dict(niter=10),
                    callback=collect_marginals)

#take the graph union with intersection
ug = gt.graph_union(g,u,intersection=u.vertex_index,internal_props=True)

#remove the parallel edges
gt.remove_parallel_edges(ug)

#this process will erase the labels property in the union graph object, proved here for vertex 0.
print(ug.vp.label[0])
#' '

#run copy property on the union graph, where src is the source property, namely the original label file
ug.copy_property(src=g.vp.label)
####This is the error I want to fix in this example, and transfer the label property from the original lesmis graph object to the union graph

Please do let me know if this example is not clear enough.

James

Tiago,

I’m sorry that you feel that my code to use the graph_union with intersection, following which to remove parallel edges, then call Graph.copy_property on the union graph, did not follow your instructions which, as far as I can read, specifically asked I use all of those and in that order.

I didn't "feel" it, you had not called copy_property in your example,
you had just alluded that it "didn't work".

# To take the union of all edges between two graphs, you can use the graph_union(g1, g2) function, while passing the 'intersection’ parameter with the Graph.vertex_index property map of g2 (to avoid the nodes being duplicated). This will create parallel edges for those that appear in both graphs, which you can remove with remove_parallel_edges(). You can then copy any edge property map from either g1 or g2 by calling Graph.copy_property() on the union graph.

A complete and minimal example with just vertex labels is the following:

It had a syntax error, and some missing parts, but I managed to get it
working.

#run copy property on the union graph, where src is the source property, namely the original label file
ug.copy_property(src=g.vp.label)
####This is the error I want to fix in this example, and transfer the label property from the original lesmis graph object to the union graph

The documentation of "copy_property" says: "The optional parameter ``g``
specifies the source graph to copy properties from (defaults to self)."
Since this not owned by the same graph you need to do:

  ug.copy_property(src=g.vp.label, g=g)

(I will actually change this, so that the default becomes the graph that
owns the property, which is less error prone).

Thanks

attachment.html (5.21 KB)