>
You seem to be talking about the "probability of predicting a
missing edge with a binary classifier", whereas I was referring to the
scores themselves not being proper probabilities.
Indeed, that's what I was pointing out, but when you say this:
> Most typically, edge prediction is
formulated as a binary classification task [7], in which each
missing (or spurious) edge is attributed a “score” (which
may or may not be a probability), so that those that reach a
prespecified discrimination threshold are classified as true
edges (or true nonedges)
I think this is misleading. This is not typical.
As far as I can tell, very few people use unsupervised binary classification for link prediction.
Most typically, edge prediction is formulated as a *supervised* binary classification task. From that setup, you can calibrate probabilities based on ability to predict.
>
Indeed selecting the optimal threshold can be done by cross validation
in a *supervised* setting, but even then this will depend in general on
the fraction of removed edges, size of test set, etc.
I agree, this is a valid concern. If this is your objection to the *supervised* binary classification formulation then I think this should be the statement in the paper.
This could be settled to some degree experimentally. Adjust the different sizes of removed edges, test sizes, etc and see how drastically it changes the number of edges with probability > 0.5. My hypothesis: not very much assuming reasonable fractions.
>
but in many scenarios
that is applicable. There is a big difference between measuring a
non-edge and not doing a measurement.
Yeah, I saw that part of the paper. Makes sense to me. But, that situation seems most common in the scientific domain. In digital networks (my case: Twitter), everything is usually easily measured. Again, if I just hand you a graph, how will this apply?
>
Just an observation: these numbers do not mean much by themselves. MCMC
equilibration times will vary a lot between datasets. To be sure your
sampling is good you have to run diagnostics, e.g. running the chain
multiple times and checking they give the same answer, and checking that
your results are not changing by increasing the number of iterations.
>
It's difficult to say. I would investigate the MCMC equilibration in
more detail before declaring defeat.
Well, I have run it multiple times with different numbers. To be sure, I just now ran it with the epsilon removed, 2000 wait, multiflip on, and then 100k(!) sampling iterations. Results were pretty much the same.
I noticed that in the docs you now recommend setting beta=np.inf when using multiflip. What exactly is this doing? (I plan to soon read that other paper you mention which probably includes this..) I tried using it and noticed the chain equilibrated much faster, with many of the iterations involving no moves. The resulting entropy was lower at 48k~ versus the 62k~ with beta at default. Not sure what's going on there. When I run the sampling step, am I not supposed to also set beta=np.inf? I tried this and got 0 results. I also tried just equilibrating with beta=np.inf and then sampling with the default. Results still pretty poor.
Are there any more tricks I should know about evaluating how well the chain is equlibrated?
I noticed that in your paper you didn't compare your reconstruction method performance against any baseline. How do you know how well it's performing if you don't have a baseline? I'm currently pessimistic given its performance on the graph I'm testing. Some of those graphs you were testing on in the paper are loaded into graph-tool, right? It would be fairly easily to train a RandomForest (no fancy boosted trees necessary) with the stacked similarity measures from graph-tool (and maybe a few other simple features I have in mind...) and test the performance against your reconstruction approach (at least for just link prediction). Interested in this? Conjectures? I would be willing to do it for some of the moderately-sized graphs.
Thanks for your help, as always