On Wed, Apr 15, 2020 at 8:54 PM Deklan Webster <deklanw@gmail.com> wrote:

I, totally by accident, think I discovered the issue! In the process of setting up the testing between the reconstructive method, get_edges_prob method, and the simple stacked topological similarity classifier I accidentally set the graph to undirected without setting it back to directed. The results appeared to be reasonable finally. I've gone back and tested on the graph I've been testing on, and indeed:

Undirected: ~13/1000 correctly identified
Directed: 488/1000 correctly identified

A giant improvement! Of course, I'm relieved since this solves my problem (I can just, as least, set to undirected to compute this). But, now I'm wondering what's going on. Is there some predictable reason why the directed case is performing so much worse? Something to do with directed graphs and equlibration? But if it were to do with equlibration why did this issue never affect `get_edges_prob`'s performance? Could there be a bug in how graph-tool is handling the directed case? In the paper it seems you've worked with the directed case before, so I assume you would have noticed if there were.

To be sure, I tested this on the "polblogs" graph as well. Undirected: reconstructive performance is great, directed: terrible.

> You didn't even have to read any paper, just the docstring would have
been enough.

> Beta is the inverse temperature parameter, and setting it to infinity
means turning the MCMC into a greedy optimization heuristic.

> And I don't "recommend it". It is not applicable to your context.

I've read the documentation. The docstring says "Inverse temperature". The relevant section of the docs say,

> An alternative is to use a greedy algorithm based on a merge-split MCMC with zero temperature [peixoto-merge-split-2020], which requires a single global state, and thus can reduce memory usage.

Just from this I don't understand. Not setting beta to infinity has already reduced memory usage for me. I don't understand the significance of this being greedy or why it doesn't apply to my situation. If the explanation of all this is in one of the papers please point to it and I'll read it.

> To be honest, I think the pattern of saying "I plan to read your
documentation/paper at some point, but you could you please just explain
this to me before I do so" a bit disrespectful. Why is my time worth
less than yours?

I'm sorry it seems that way to you. I have read all of the documentation at least once, and some of parts of the papers. I don't think my time is more valuable. I would happily volunteer to contribute to the docs or to the project in general in any way I can. I've found some more typos in the docs in addition to the one I already told you about. I can at least contribute these. Also, I'll email you those paper typos after this.

> This kind of comparison has been done already in
https://arxiv.org/abs/1802.10582 and https://arxiv.org/abs/1909.07578.
The SBM approach is the single best classifier among the over a hundred
they consider, which is marginally beat only by a stacking of around 40
other predictors.

I've read both these papers and referred to the latter twice in this thread. We've already discussed that these don't use either `get_edges_prob` nor the full reconstructive marginal edge probability

On page 8 of the former, they're using this as their similarity for SBM: "s_ij = θ_i θ_j * l_gi,gj"

As I've already told you, the "score" I'm getting with `get_edges_prob` is also the *most impactful feature of all the features I've tested* (including many newer network embedding methods not considered in aforementioned papers). This is impressive. Those papers didn't test this as far as I'm aware. The reconstructive approach, however, was giving me worse results. This discrepancy is what I was trying to account for (and, did, now I think. see above).

> In any case that was not the point of the PRX paper, which was to
develop an actual Bayesian reconstruction algorithm, not a binary
classifier. AFAIK there is no other algorithm that does this, so there
was nothing to compare to. If you're worried about comparing with binary
classifiers, you can just convert this approach into one by using the
marginal posterior probabilities as "scores" and go from there, as the
papers above do. Then you are comparing apples to apples.

Yes, this is my intention as I've stated.

Thanks for your help, as always

On Wed, Apr 15, 2020 at 2:04 AM Tiago de Paula Peixoto <tiago@skewed.de> wrote:
Am 15.04.20 um 05:40 schrieb Deklan Webster:
>> Most typically, edge prediction is
> formulated as a binary classification task [7], in which each
> missing (or spurious) edge is attributed a “score” (which
> may or may not be a probability), so that those that reach a
> prespecified discrimination threshold are classified as true
> edges (or true nonedges)
>
> I think this is misleading. This is not typical. As far as I can tell,
> very few people use unsupervised binary classification for link
> prediction. Most typically, edge prediction is formulated as a
> *supervised* binary classification task. From that setup, you can
> calibrate probabilities based on ability to predict.

It's not misleading at all, this is exactly how a binary classifier
works, supervised or otherwise. How you find the threshold is besides
the point.

>> Indeed selecting the optimal threshold can be done by cross validation
> in a *supervised* setting, but even then this will depend in general on
> the fraction of removed edges, size of test set, etc.
>
> I agree, this is a valid concern. If this is your objection to the
> *supervised* binary classification formulation then I think this should
> be the statement in the paper.

This is just another difference. And I wasn't "objecting", just
explaining what you had misunderstood.

> Well, I have run it multiple times with different numbers. To be sure, I
> just now ran it with the epsilon removed, 2000 wait, multiflip on, and
> then 100k(!) sampling iterations. Results were pretty much the same.

It's a shame.

> I noticed that in the docs you now recommend setting beta=np.inf when
> using multiflip. What exactly is this doing? (I plan to soon read that
> other paper you mention which probably includes this..)

You didn't even have to read any paper, just the docstring would have
been enough.

Beta is the inverse temperature parameter, and setting it to infinity
means turning the MCMC into a greedy optimization heuristic.

And I don't "recommend it". It is not applicable to your context.

To be honest, I think the pattern of saying "I plan to read your
documentation/paper at some point, but you could you please just explain
this to me before I do so" a bit disrespectful. Why is my time worth
less than yours?

> I noticed that in your paper you didn't compare your reconstruction
> method performance against any baseline. How do you know how well it's
> performing if you don't have a baseline? I'm currently pessimistic given
> its performance on the graph I'm testing. Some of those graphs you were
> testing on in the paper are loaded into graph-tool, right? It would be
> fairly easily to train a RandomForest (no fancy boosted trees necessary)
> with the stacked similarity measures from graph-tool (and maybe a few
> other simple features I have in mind...) and test the performance
> against your reconstruction approach (at least for just link
> prediction). Interested in this? Conjectures? I would be willing to do
> it for some of the moderately-sized graphs.
This kind of comparison has been done already in
https://arxiv.org/abs/1802.10582 and https://arxiv.org/abs/1909.07578.
The SBM approach is the single best classifier among the over a hundred
they consider, which is marginally beat only by a stacking of around 40
other predictors.

In any case that was not the point of the PRX paper, which was to
develop an actual Bayesian reconstruction algorithm, not a binary
classifier. AFAIK there is no other algorithm that does this, so there
was nothing to compare to. If you're worried about comparing with binary
classifiers, you can just convert this approach into one by using the
marginal posterior probabilities as "scores" and go from there, as the
papers above do. Then you are comparing apples to apples.

If you have further questions about how to use the library, please go
ahead and ask. But if you want to discuss how to compare supervised vs
unsupervised edge prediction, etc, please take this off this list since
it's off-topic.

Best,
Tiago

--
Tiago de Paula Peixoto <tiago@skewed.de>

_______________________________________________
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool