# Getting edge probabilities for SBM.

Hi Tiago, I am a new user for Graph Tool. I was trying to do a fitting
approach for a graph. And to measure the goodness of fit, I intend to use
log of likelihood score for the generated graph and the real graph.

For this, I needed to calculate the probabilities of each edge existing in
the generated graph. However, when I use the 'get_edge_prob' function for
BlockState, I get values which are greater than 0 for an edge existing.
Which should not be possible since log(prob) <= 0. It is supposed to return
'unnormalised log probability' of the edge.

Thanks,
Sukrit

The exact quantity computed is explained here:

https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#predicting-spurious-and-missing-edges

It is the posterior likelihood of a missing edge, conditioned on the
observed graph and model. It is unnormalized, because computing the
normalization constant would require obtaining this value for every possible
missing edge, which is typically of the order O(N^2).

Since it is not normalized, it can also return log-probability values that
are above zero.

Best,
Tiago

Hi Tiago, how do we obtain the normalisation constant in this case? Please
help with some sample code for it.

Regards,
Sukrit

I just explained it. You need to sum the probabilities for every possible
edge (which will be very slow, and I suspect is not really the quantity you
are interested in).

If you need help understanding the general framework, just scroll down in
the page I sent you, and you will find the relevant literature.

Please verify if I have understood this correctly:

I calculate a normalisation constant for each iteration of the network
generated by 'mcmc_equilibrate'. This constant is the sum of probabilities
of each edge existing between all pairs of nodes in the graph.
To get the actual value of an edge existing in the graph, I divide the
probability of that edge existing in the graph by the above mentioned
constant.

Thank you, for being patient with me!

Sukrit

I calculate a normalisation constant for each iteration of the network
generated by 'mcmc_equilibrate'.

mcmc_equilibrate() will generate fits of the SBM, i.e. partitions of the
network according to the posterior probability. It will not generate networks.

This constant is the sum of probabilities
of each edge existing between all pairs of nodes in the graph.

No, you should consider only all pairs of nodes that are _not_ connected in
the graph. This is about predicting _missing_ edges.

To get the actual value of an edge existing in the graph, I divide the
probability of that edge existing in the graph by the above mentioned
constant.

Yes.

(Note that this probability will correspond to the question: Assuming there
is only one missing edge in the network, what is the likelihood of it being
a particular edge? You have to be sure this is what you want to compute.)

Hi Tiago, so this is what I want:
I want to calculate the log likelihood that the SBM modeled from a network
fits the network correctly. I want to compare this with the likelihood of
other model fits to the network. I have the log likelihood scores for other
model fits, but SBM is pending.
Till now, I was trying to get the edge probabilities for all the edges and
trying to calculate the log likelihood based on that, but it doesn't seem
like this is a good way to go.

Can you give me a workaround for this with Graph Tool?

I want to calculate the likelihood that the model generated by
minimize_blockmodel_dl fits the graph which is input into it without
dropping any constants in the process. The constants part is important
because I want the real likelihood so that I can compare it with fits of
other models.

Sukrit

Ah, OK. That is something else entirely. You just want to do model
selection. This is fully supported in graph-tool.

There are two things you can compute:

1. The joint likelihood of the graph and partition. You get this
simply via the state.entropy() function, which returns the
negative logarithm of this value (what is also known as the
description length).

2. The full marginal distribution, corresponding to the sum of the
likelihood above over all partitions. This cannot be done
exactly, but can be done approximately. This is explained here:

https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#model-class-selection

Best,
Tiago