Dear Tiago,

Thank you for your reply, this is really helpful and I think that the MixedMeasuredBlockState is just what I need. I would like to run the MixedMeasuredBlockState, collect the marginals and then store these as an edge property on the original graph for later analysis with my other vertex and edge properties. However, I seem to be unable to do this because the graph object generated does not match that of the original. This occurs both if the number of edges remain the same, and similarly if some are removed from the MixedMeasuredBlockState analysis.

My self-contained example, with only small addition to your pre-existing functional example online:

from graph_tool.all import *; import graph_tool.all as gt; import  matplotlib; import math; import numpy as np; import seaborn as sns

#Graph Tool online example
g = gt.collection.data["lesmis"].copy()
n = g.new_ep("int", 2)   # number of measurements
x = g.new_ep("int", 2)   # number of observations
e = g.edge(11, 36)
x[e] = 1                 # pretend we have observed edge (11, 36) only once
e = g.add_edge(15, 73)
n[e] = 2                 # pretend we have measured non-edge (15, 73) twice,
x[e] = 1                 # but observed it as an edge once.
state = gt.MixedMeasuredBlockState(g, n=n, n_default=1, x=x, x_default=0)

gt.mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10))

u = None              # marginal posterior edge probabilities
bs = []               # partitions
cs = []               # average local clustering coefficient

def collect_marginals(s):
    global u, bs, cs
    u = s.collect_marginal(u)
    bstate = s.get_block_state()
    bs.append(bstate.levels[0].b.a.copy())
    cs.append(gt.local_clustering(s.get_graph()).fa.mean())

gt.mcmc_equilibrate(state, force_niter=5000, mcmc_args=dict(niter=10),
                    callback=collect_marginals)

#I then want to pull the edge probability from the collected marginals and store that as an edge property on the original graph.
print(u.ep.count.get_array().shape)
print(u.ep.eprob.get_array().shape)

#histogram of edge probabilities
sns.distplot(u.ep.eprob.get_array())

#This will always give an error as it is unable to collect the edge probabilities from the marginal graph and store to the original. ?how to fix
g.ep['mixedmeasures_eprob'] = u.ep.eprob
#ValueError: Received property map for graph <Graph object, undirected, with 77 vertices and 439 edges, 2 internal edge properties, 1 internal graph property, at 0x14e7fce10> (base: <Graph object, undirected, with 77 vertices and 439 edges, 2 internal edge properties, 1 internal graph property, at 0x14e7fce10>), expected: <Graph object, undirected, with 77 vertices and 255 edges, 2 internal vertex properties, 1 internal edge property, 2 internal graph properties, at 0x151a3de90> (base: <Graph object, undirected, with 77 vertices and 255 edges, 2 internal vertex properties, 1 internal edge property, 2 internal graph properties, at 0x151a3de90>)

How do you manage to get around this issue? Presumably there is a simpler way with the graph-tool syntax than forlooping everything and matching edges up between the two graphs…?

Thank you again!
James

On 4 Jun 2020, at 10:29, Tiago de Paula Peixoto <tiago@skewed.de> wrote:

Dear James,

This is a very involved question, and it's a bit difficult to get to the
bottom of what you want.

I'll start with the last question, which I think can be addressed more
directly:

Am 03.06.20 um 20:35 schrieb James Ruffle:
Alternatively, do you have any suggestions on how I would account for the missingness in the data when constructing the model?

I would suggest for you to take a look at the following paper which
addresses directly the missing data problem:

 https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdx.doi.org%2F10.1103%2FPhysRevX.8.041011&amp;data=02%7C01%7C%7C14fa9220da8f4391e3fd08d80869d7f8%7C569df091b01340e386eebd9cb9e25814%7C0%7C0%7C637268597974969421&amp;sdata=KNsG6cDS9Zu6pxKeaklouG5tyH%2FMudsHoChPwpzP8Qg%3D&amp;reserved=0

The idea of coercing the "missingness" as edge covariates as you
describe does not seem correct to me. Missing data is not data, it's
lack of data. The paper above puts it like this, and of course the code
is in graph-tool as well.

What is the best way to robustly compare the two SBM models and (hopefully) illustrate the model fit is better with the conditional probability of the event alone?

You can only compare models that generate the same data. If they do,
this is described in the documentation, i.e. you can compare the
description length or Bayesian evidence.

If the two SBMs generate different data, then the question is ill posed.
It would be like comparing a discrete geometric distribution with a
Gaussian; they will always be different, as they do not even have the
same support.

Best,
Tiago


--
Tiago de Paula Peixoto <tiago@skewed.de>

<signature.asc>_______________________________________________
graph-tool mailing list
graph-tool@skewed.de
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.skewed.de%2Fmailman%2Flistinfo%2Fgraph-tool&amp;data=02%7C01%7C%7C14fa9220da8f4391e3fd08d80869d7f8%7C569df091b01340e386eebd9cb9e25814%7C0%7C0%7C637268597974989331&amp;sdata=azQy7HPx2SFir%2BlwsHPW49ZL3YbB7Ban4FshIMPCtTg%3D&amp;reserved=0