>
Let's pause and appreciate the learning opportunity.
> After considerable amount of unsubstantiated speculation about bugs in
the code, and even the underlying soundness of the algorithm, the
problem turned out to be a basic misuse. One which was impossible to
guess from the information given.
> It's also a learning on my part, in fact about something that I thought
I already knew: It's a pointless exercise to try to troubleshoot
something without a _minimal_ and _complete_ example at hand.
I think you may have misunderstood me. Sorry if I wasn't communicating clearly.
I made a typo in my second to last email swapping the numbers for "directed" and "undirected". I sent another email immediately after correcting the typo. Hope you saw that. I wasn't saying I'd been making a silly mistake this whole time (as far as I can tell...). I was saying that I accidentally set my (properly) directed graph to undirected and the performance skyrocketed.
Unfortunately, that performance improvement was just the result of a mistake on my part. I didn't realize graph-tool's behavior for setting a graph to undirected was to turn reciprocal edges into 2 multiedges. So, when I removed the sampled edges I was only removing one of the two multiedges (in the fraction of the randomly selected which had a reciprocal). I.e., it was basically pure data leakage. Once I corrected that error by only sampling edges which don't have a reciprocal the performance between undirected/directed mostly disappeared. So, performance still bad. To ensure it wasn't just my network or something to do with directedness I tried several simple undirected graphs from graph-tool's collection (where I don't need to worry about multiedges). All performed poorly.
Since I haven't seen this network reconstruction perform well on any of the graphs I've tested I've given up on this approach. `get_edges_prob` is working very well for me, so I'll just stick to that, despite its slowness.
Although, this possibility occurred to me: I wonder how the reconstruction would do if I fed back the probabilities that I get from my binary classifier using the method described in the "Incorporating Extrinsic Uncertainty Estimates" section of your paper. I could regard it as a final step in a link prediction flow. Seems plausible to me that with this extra information the reconstruction might work better. I'm assuming no one has tried this yet?
> The directed SBM does not generate reciprocal edges very well, hence it is not a good
predictor in this case.
Where can I read more about this? I assumed the SBM handles the directed case just fine, even if there are many reciprocal edges. Does this have implications for community detection as well?
>
That is in fact very similar to the conditional probability computed by
get_edges_prob(). The latter is more accurate, yields an actual
normalized probability, and includes information from the priors, but
for sufficiently sparse and large graphs, both computations should coincide.
Hmm, I didn't realize this. Thanks.
Thanks for your help, as always