Thanks for the quick reply
>
Note that link prediction itself is a valid criterion for model
selection, so you might just want to focus on that.
Not sure how to interpret this. Are you saying that even if the non-Poisson model has lower entropy, the Poisson model may still be 'better' for community detection if it's better at link prediction (which you suggest is most likely the case)?
> I'm not sure what you mean. The site seems fine.
>
You should use logsumexp():
Wouldn't that be non-incremental? Wouldn't the incremental version of this be summing the exp(a) of the results at each step, and then taking log at the end?
>
The problem with this is that it will not resolve very small
probabilities, since the edge would never be seen in such a case. But
the whole approach would be much faster.
Yes, I ran into this problem. I'll paste the code to make sure I'm not doing anything wrong,
```n = G.new_ep('int', 1)
x = G.new_ep('int', 1)
state = gt.MeasuredBlockState(G, n=n, x=x, n_default=1,
x_default=0, nested=True, self_loops=True, state_args={'deg_corr':
True})
gt.mcmc_equilibrate(state,
wait=1000,
epsilon=1e-5,
mcmc_args=dict(niter=10),
multiflip=True,
verbose=True
)
u = None
def collect_marginals(s):
global u
u = s.collect_marginal(u)
gt.mcmc_equilibrate(state,
force_niter=1000,
mcmc_args=dict(niter=10),
multiflip=True,
verbose=True, callback=collect_marginals
)
eprob = u.ep.eprob
non_edge_found_c = 0
edge_not_found_c = 0
for x in G.vertices():
for y in G.vertices():
edge = G.edge(x, y)
u_edge = u.edge(x, y)
if not edge:
if u_edge:
non_edge_found_c += 1
print("Non-edge in original, found in marginal graph.")
print(u_edge, eprob[u_edge])
print()
else:
if not u_edge:
edge_not_found_c += 1
print("Edge in original graph, but not an edge in marginal graph.")
print(edge)
print()
print(non_edge_found_c, edge_not_found_c)
```
I'm assuming that `u = s.collect_marginal(u)` is taking care of counting the non-edge appearances over every sample (and not just the last one). I assume if u.edge(x, y) is None that means the non-edge was never observed, correct?
Well, I ran it just now on a smaller graph to test. The graph is directed with 475 nodes and 20,911 edges. Intuitively I would expect a reasonable number of non-edges to be observed given that there are 21k edges... maybe 500~ at least. Running that code above I see that `non_edge_found_c` is 13. Only 13 non-edges are observed with non-zero probability? The largest of those 13 probabilities is 0.07. And, edge_not_found_c is 2. How should I interpret this situation where the edge is in the original but not in (any of?) the marginals?
Am I doing something wrong here? Do I need to adjust n, x, n_default, x_default?
Thanks for your help, as always