Accessing a compressed matrix is super slow!

Hang_Mang · March 21, 2014, 11:15am

Hi, I found that accessing a compressed matrix is really slow. I'm
computing a similarity index called LHN1, it took 34 seconds to compute
when I access the 'paths' variable/matrix. But when I converted paths to
paths.asarray() it only took 11 seconds. So now I'm really ending up
calling toarray() all over my code. I'm not sure of why using compressed
matrices in the first place or how I could overcome this! Forget about the get_degrees_dic()
for now.

Here's my code:

def lhn1(graph):

A = gts.adjacency(graph)

paths = A**2

paths = paths.toarray()

S = np.zeros(A.shape)

degrees = get_degrees_dic(graph)

for i in xrange(S.shape[0]):

for j in xrange(S.shape[0]):

i_degree = degrees[i] #graph.vertex(i).out_degree()

j_degree = degrees[j] #graph.vertex(j).out_degree()

factor = i_degree * j_degree

S[i,j] = (1.0/factor) * paths[i,j]

return S

attachment.html (1.76 KB)

tiago · March 21, 2014, 11:32am

This doesn't really have anything to do with graph-tool.

Compressed matrices are always slower then dense ones, but they take
less space. Please take a look at the scipy documentation for sparse
matrices:

http://docs.scipy.org/doc/scipy/reference/sparse.html

Read about the different types of sparse matrices (crs, lil, coo, etc.)
and their advantages, and disadvantages.

Since graphs are most often sparse, it makes more sense to return sparse
arrays. If the user wants dense ones, it is trivial to convert.

Best,
Tiago