Weird error when interfaced with joblib

dawe · April 1, 2021, 12:31pm

Hi all,
I'm using graph-tool a lot and I usually perform multiple random initializations only to choose, in the end, the solution with lowest entropy. As each init is a separate and independent process, I was thinking to use joblib to parallelize the process. However I noticed something weird. Here we go:

from joblib import delayed, Parallel
import graph_tool.all as gt

# choose a graph

g = gt.collection.data['football']

# set some variables, as the number of inits

n_init = 3
fast_tol = 1e-3
beta = 1000
n_sweep = 10

# define a function for sweep, this is essentially what is found in official docs

def fast_min(state, beta, n_sweep, fast_tol):
    dS = 1
    while np.abs(dS) > fast_tol:
        dS, _, _ = state.multiflip_mcmc_sweep(beta=beta, niter=n_sweep)
    return state

# test with standard python list comprehension, this works

pstates = [gt.PPBlockState(g) for x in range(n_init)]
pstates = [fast_min(state, beta, n_sweep, fast_tol) for state in pstates]
selected = pstates[np.argmin([x.entropy() for x in pstates])]
print(gt.modularity(g, selected.get_blocks()))

0.5986403881107808

# test with 'threading' backend in joblib

pstates = [gt.PPBlockState(g) for x in range(n_init)]
pstates = Parallel(n_jobs=3, prefer='threads')(delayed(fast_min)(state, beta, n_sweep, fast_tol) for state in pstates)
selected = pstates[np.argmin([x.entropy() for x in pstates])]
print(gt.modularity(g, selected.get_blocks()))

0.5926606505592532

# test with default backend in joblib

tiago · April 13, 2021, 5:52pm

My guess here is that joblib is using pickle in the background, which
ended up copying the whole states and their graphs, making the property
maps incompatible. Try changing the last line to:

print(gt.modularity(selected.g, selected.get_blocks()))

dawe · April 14, 2021, 7:12am

Hi Tiago,

My guess here is that joblib is using pickle in the background, which ended up copying the whole states and their graphs, making the property maps incompatible. Try changing the last line to:

Reading the joblib documentation, I understand it uses cloudpickle in place of pickle as default (Serialization of un-picklable objects — joblib 1.4.dev0 documentation).

print(gt.modularity(selected.g, selected.get_blocks()))

That works, thank. As a matter of fact this works as well

pmode = gt.PartitionModeState([x.get_blocks().a for x in pstates], converge=True)

I guess because every entry in the pstates list owns its own graph.

Thanks!

d