Thank you for your reply, this is really helpful and I think that the MixedMeasuredBlockState is just what I need. I would like to run the MixedMeasuredBlockState, collect the marginals and then store these as an edge property on the original
graph for later analysis with my other vertex and edge properties. However, I seem to be unable to do this because the graph object generated does not match that of the original. This occurs both if the number of edges remain the same, and similarly if some
are removed from the MixedMeasuredBlockState analysis.

My self-contained example, with only small addition to your pre-existing functional example online:

from graph_tool.all import *; import graph_tool.all as gt; import matplotlib; import math; import numpy as np; import seaborn as sns

#Graph Tool online example

g = gt.collection.data["lesmis"].copy()

n = g.new_ep("int", 2) # number of measurements

x = g.new_ep("int", 2) # number of observations

e = g.edge(11, 36)

x[e] = 1 # pretend we have observed edge (11, 36) only once

e = g.add_edge(15, 73)

n[e] = 2 # pretend we have measured non-edge (15, 73) twice,

x[e] = 1 # but observed it as an edge once.

state = gt.MixedMeasuredBlockState(g, n=n, n_default=1, x=x, x_default=0)

gt.mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10))

u = None # marginal posterior edge probabilities

bs = [] # partitions

cs = [] # average local clustering coefficient

def collect_marginals(s):

global u, bs, cs

u = s.collect_marginal(u)

bstate = s.get_block_state()

bs.append(bstate.levels[0].b.a.copy())

cs.append(gt.local_clustering(s.get_graph()).fa.mean())

gt.mcmc_equilibrate(state, force_niter=5000, mcmc_args=dict(niter=10),

callback=collect_marginals)

print(u.ep.count.get_array().shape)

print(u.ep.eprob.get_array().shape)

sns.distplot(u.ep.eprob.get_array())

#ValueError: Received property map for graph <Graph object, undirected, with 77 vertices and 439 edges, 2 internal edge properties, 1 internal graph property, at 0x14e7fce10> (base: <Graph object, undirected, with 77 vertices and 439 edges, 2
internal edge properties, 1 internal graph property, at 0x14e7fce10>), expected: <Graph object, undirected, with 77 vertices and 255 edges, 2 internal vertex properties, 1 internal edge property, 2 internal graph properties, at 0x151a3de90> (base: <Graph object,
undirected, with 77 vertices and 255 edges, 2 internal vertex properties, 1 internal edge property, 2 internal graph properties, at 0x151a3de90>)

How do you manage to get around this issue? Presumably there is a simpler way with the graph-tool syntax than forlooping everything and matching edges up between the two graphsâ€¦?

Thank you again!

James

On 4 Jun 2020, at 10:29, Tiago de Paula Peixoto <tiago@skewed.de> wrote:

Dear James,

This is a very involved question, and it's a bit difficult to get to the

bottom of what you want.

I'll start with the last question, which I think can be addressed more

directly:

Am 03.06.20 um 20:35 schrieb James Ruffle:

Alternatively, do you have any suggestions on how I would account for the missingness in the data when constructing the model?

I would suggest for you to take a look at the following paper which

addresses directly the missing data problem:

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdx.doi.org%2F10.1103%2FPhysRevX.8.041011&data=02%7C01%7C%7C14fa9220da8f4391e3fd08d80869d7f8%7C569df091b01340e386eebd9cb9e25814%7C0%7C0%7C637268597974969421&sdata=KNsG6cDS9Zu6pxKeaklouG5tyH%2FMudsHoChPwpzP8Qg%3D&reserved=0

The idea of coercing the "missingness" as edge covariates as you

describe does not seem correct to me. Missing data is not data, it's

lack of data. The paper above puts it like this, and of course the code

is in graph-tool as well.

What is the best way to robustly compare the two SBM models and (hopefully) illustrate the model fit is better with the conditional probability of the event alone?

You can only compare models that generate the same data. If they do,

this is described in the documentation, i.e. you can compare the

description length or Bayesian evidence.

If the two SBMs generate different data, then the question is ill posed.

It would be like comparing a discrete geometric distribution with a

Gaussian; they will always be different, as they do not even have the

same support.

Best,

Tiago

--

Tiago de Paula Peixoto <tiago@skewed.de>

<signature.asc>_______________________________________________

graph-tool mailing list

graph-tool@skewed.de

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.skewed.de%2Fmailman%2Flistinfo%2Fgraph-tool&data=02%7C01%7C%7C14fa9220da8f4391e3fd08d80869d7f8%7C569df091b01340e386eebd9cb9e25814%7C0%7C0%7C637268597974989331&sdata=azQy7HPx2SFir%2BlwsHPW49ZL3YbB7Ban4FshIMPCtTg%3D&reserved=0