Hello Tiago,

Thanks again for the quick answers and feedback.

In fact, the more I think about how to take the most advantage of the

graph, the more I think that in my case what I would really like is to

control the way the paths are being sampled. So, the process would become a

biased sampling where I’m the one giving the bias in the form of the

probability of choosing each edge when traversing from the origin to the

destination. This probability would depend on information of the node and

additional information that will depend on each iteration of my algorithm.

It could also be an edge property, I guess.

One caveat is that DFS will not be possible since I want to make each

sampling independent from the previous one and so getting several paths

would be less efficient. Although I’d assume that this is not too much of a

problem given that I’d need a smaller sample if I know it’s probably going

to be a better one.

I was thinking on doing something on the lines of the following (I’ve only

tested it with a small graph):

def sample_path_from_nodes(node1, node2, all_weights, cutoff):

# all_weights is a dictionary on the graph edges

current = node1

path = []

visited = set()

while True:

if current == node2:

# we finished!

return path + [current]

visited.add(current) # maybe we should add (current, len(path))

neighbors = set(current.out_neighbors()) - visited

if len(path) >= cutoff or not len(neighbors):

# this path does not reach node2

# we backtrack

# and we will never visit this node again

current = path.pop()

continue

if not len(path) and not len(neighbors):

# there is no path to node2

# this should not be possible

return None

# we haven't reached node2

# but there is still hope

path.append(current)

weights = [all_weights.get((current, n), 1) for n in neighbors]

_sum = sum(weights)

weights = [w/_sum for w in weights]

current = np.random.choice(a=list(neighbors), p=weights)

I fear this will not be as efficient as just getting a lot of paths via

all_paths. Do you think this makes sense for sampling with preferences on

edges? And if so, do you think it will perform a lot better if I did it as

a graph-tool C++ extension? I guess I can always test the python version

and see how it goes…

thanks!

Franco

attachment.html (17.1 KB)