efficient random sampling of paths between two nodes

Franco_Peschiera · March 21, 2020, 4:44pm

Good day to all,

I wrote an initial message (through the issues site) asking about the
possibility of using the all_paths to random sample paths between two nodes
in a graph.

It was pointed out that the: 1. The algorithm is deterministic and that 2.
It’s not possible to adapt it to iterate in a unbiased random way.

Now I have a couple more questions, namely searching for advice on how to
achieve what I want.

attachment.html (8.97 KB)

tiago · March 23, 2020, 2:58pm

Since I could potentially do many iterations (and samplings), I would
like to have an unbiased sampling method for the Y paths without having
to enumerate them all.

One option is to do the sampling during the construction of the paths. I
have in mind to just use the |get_out_neighbors| method on the start
node to do my own depth-first search, randomly choosing a neighbor at
each moment until I get to the |cutoff| or the end node. This, I could
do for each path until I get to my limit of paths to sample. I’m not
sure, though, if this is 1) similar in logic to the |all_paths| method
and 2) if this is close in efficiency (due to using python to iterate
and sample instead of C++).

This would be heavily biased. It could be that there are many more paths
that go through one of the neighbors, so if you want sample uniformly
from all paths, you have to do a weighted sample of the neighbors at
each step. And these weights would change at each step.

Another option, although a lot more far-fetched, is to create my own
>all_paths| method in C++ and re-compile my own version of graph-tool.
Is this realistic?

The problem here is not which language should be used.

Are there better options to doing this?

Counting all paths between two nodes is conjectured to be NP-Hard. This
is also known as the self-avoiding-walk (SAW) problem. Sampling SAWs is
not straightforward either, I believe most efficient approaches are
based on MCMC. I recommend you to study the literature a little bit
before attempting a solution.

(It would be a lot simpler if you would be seeking instead to sample
*shortest* paths between two nodes. This can be done in linear time, is
even implemented in graph-tool already in random_shortest_path().)

Best,
Tiago