gt.scalar_assortativity returns wrong values

Hello Tiago,

I am using gt.scalar_assortativity and I observed that it returns non-zero
values and big variance values even when the values on the nodes are
exactly same.

g = gt.collection.data['karate']
s = g.new_vertex_property('float')
for v in g.vertices():
     s[v] = 0.9999
gt.scalar_assortativity(g, deg = s)

This returns : (1.0, 8.889098493616578)

I expect to see (0, 0) here. What am I missing?

Thank you
Snehal Shekatkar

attachment.html (803 Bytes)

The scalar assortativity coefficient is undefined if the variance is zero,
since it appears in the denominator.

The expectation that it will be zero in this case is incorrect, since the
limit where the variance goes to zero is also undefined in general.

The proper answer in this case would be to return "NaN". I'll modify the
code in this way.

Best,
Tiago

Thanks for the quick reply. It is indeed true that variance should be NaN
but assortativity would be zero if I understand it correctly. Now, when
instead of 'float', I use 'int' as the type for the property map, I do get
0 value for the assortativity. Thus I guess that the values are wrong and
it is a bug. Am I right? From your reply, it isn't clear to me if this is a
bug.

Snehal

attachment.html (2.43 KB)

Thanks for the quick reply. It is indeed true that variance should be NaN
but assortativity would be zero if I understand it correctly.

No, that is not correct. The variance is not NaN, it is zero. Just look at
the formula for scalar assortativity: If the variance is zero, the value of
the coefficient is undefined, as I explained.

Now, when
instead of 'float', I use 'int' as the type for the property map, I do get 0
value for the assortativity. Thus I guess that the values are wrong and it
is a bug. Am I right? From your reply, it isn't clear to me if this is a bug.

It is not a bug. The reason why you get different answers is due to
numerical instability. Instead of a variance of zero, what ends up computed
instead is a very small number due to limited numerical precision.

I've modified the version in git to always return NaN in such cases, instead
of this unstable behavior.

Thanks for the clarification and the quick modification.

Snehal

attachment.html (2.16 KB)

The gt.assortativity still seems to be wrong to me. I tried calculating the
degree-assortativity using gt.assortativity and gt.scalar_assortativity as
well as manually. My code:

import graph_tool.all as gt

# Load a graph
g = gt.collection.data['karate']

# Get the adjacency matrix
adj_mat = gt.adjacency(g).todense()

# Calculate S values
S1 = sum([v.out_degree() for v in g.vertices()])

S2 = sum([(v.out_degree())**2 for v in g.vertices()])

S3 = sum([(v.out_degree())**3 for v in g.vertices()])

Se = 0

for i in range(g.num_vertices()-1):
    for j in range(i+1, g.num_vertices()):
        Se += adj_mat[i, j] * g.vertex(i).out_degree() *
g.vertex(j).out_degree()

Se *= 2

# Calculate the assortativity coefficient

print((S1*Se-S2**2)/(S1*S3-S2**2))
print(gt.assortativity(g, deg = 'out'))
print(gt.scalar_assortativity(g, deg = 'out'))

Results are:

-0.475613097685
(-0.07774502579218864, 0.024258508125118667)
(-0.4756130976846143, 0.1669286053081143)

Am I missing something?

Thank you
Snehal Shekatkar

attachment.html (4.65 KB)

What is your point?

The first and third values are identical, up to floating point accuracy. The
second value is different, as it should be, since it is a completely
different coefficient.

I am sorry that I didn't pay attention to the formulae. But now I have two
points.

First, the formula for gt.assortativity implies that we are talking about
discrete categories for the vertices. If this is true, how can we use it at
all for "degree" since we treat that as a continuous variable? Thus, I
don't understand what does "in", "out" and "total" do in this formula.

Second, I tried implementing the formula itself assuming that the actual
degree values to be discrete types and my code gives different results than
the result given by gt.assortativity. I agree that I might be interpreting
the whole thing in a different fashion and I would be very happy to
understand it. My code:

import numpy as np
import graph_tool.all as gt

# Load a graph
g = gt.collection.data['karate']

# Unique degree values or types
deg_vals = list(set([v.out_degree() for v in g.vertices()]))
n = len(deg_vals)

e = np.zeros(n) # fraction of edges that connect similar vertices
a = np.zeros(n) # fraction of edges connected to a vertx of a given type

for v in g.vertices():
    a[deg_vals.index(v.out_degree())] += 1
    for nbr in v.out_neighbours():
        if v.out_degree() == nbr.out_degree():
            e[deg_vals.index(v.out_degree())] += 1

a /= g.num_edges()
e /= g.num_edges()

r = (sum(e)-sum(a**2))/(1-sum(a**2))

print(r)
print(gt.assortativity(g, deg = 'out'))

The output of these two is:

0.0435967302452
(-0.07774502579218864, 0.024258508125118667)

Why are these two values different?

Thank you
Snehal Shekatkar

attachment.html (4.73 KB)

First, the formula for gt.assortativity implies that we are talking about
discrete categories for the vertices. If this is true, how can we use it at
all for "degree" since we treat that as a continuous variable? Thus, I
don't understand what does "in", "out" and "total" do in this formula.

Degrees are discrete, not continuous.

Second, I tried implementing the formula itself assuming that the actual
degree values to be discrete types and my code gives different results than
the result given by gt.assortativity. I agree that I might be interpreting
the whole thing in a different fashion and I would be very happy to
understand it. My code:

import numpy as np
import graph_tool.all as gt

# Load a graph
g = gt.collection.data['karate']

# Unique degree values or types
deg_vals = list(set([v.out_degree() for v in g.vertices()]))
n = len(deg_vals)

Why are you doing this? The moment you discard repetitions, all the
fractions you compute will be wrong.

Why are these two values different?

Because they come from different algorithms.

Best,
Tiago

I am very sorry to bug you so long, but I am bewildered now.

Of course, the degree is a discrete variable. I said we treat it as a
continuous variable because we don't categorize the degree values like we
do for a gender. For example, we don't treat degree values 25 and 26 as two
different categories. (Formula 7.82 in Newman's book).

I am discarding repetitions because I wanted to treat unique degree values
as discrete types. For example, to study mixing by genders, I will have
first to find out the unique gender values. What is wrong with this?

Thank you

attachment.html (3.02 KB)

Of course, the degree is a discrete variable. I said we treat it as a
continuous variable because we don't categorize the degree values like we do
for a gender. For example, we don't treat degree values 25 and 26 as two
different categories. (Formula 7.82 in Newman's book).

You are confusing "continuous" with "scalar". Degrees are discrete _scalar_
values, that are better characterized via the _scalar_ assortativity
coefficient, rather than the plain assortativity coefficient (which is not
invalid, only less useful).

I am discarding repetitions because I wanted to treat unique degree values
as discrete types. For example, to study mixing by genders, I will have
first to find out the unique gender values. What is wrong with this?

Sorry, I misunderstood your code. It is actually wrong in other places. What
you wanted to compute was:

g = gt.collection.data['karate']

# Unique degree values or types
deg_vals = list(set([v.out_degree() for v in g.vertices()]))
n = len(deg_vals)

e = np.zeros(n) # fraction of edges that connect similar vertices
a = np.zeros(n) # fraction of edges connected to a vertx of a given type

for v in g.vertices():
   for nbr in v.out_neighbours():
       a[deg_vals.index(v.out_degree())] += 1
       if v.out_degree() == nbr.out_degree():
           e[deg_vals.index(v.out_degree())] += 1

a /= 2 * g.num_edges()
e /= 2 * g.num_edges()
r = (sum(e)-sum(a**2))/(1-sum(a**2))

print(r)
print(gt.assortativity(g, deg = 'out'))

Which yields:

-0.0777450257922
(-0.07774502579218864, 0.024258508125118667)

(Note that it is not up to me to show how your calculation is wrong; it is
up to you to show that it is right.)

I am very sorry for this silly mistake. The last question though: when I
have discrete categories, and I am using gt.assortativity, what role does
the parameter "deg" play? The comment in the source code says: "this will
calculate the assortativity coefficient, based on the property pointed by
'deg' ". What does this mean?

Thank you
Snehal

attachment.html (3.48 KB)

"deg" is the vertex property map that determines the discrete categories.

Yes, that's true. The documentation says: "degree type (“in”, “out” or
“total”) or vertex property map, which specifies the vertex types". Which
means that the "deg" parameter can also be "in", "out" or "total" degree.
So if I understand correctly, one can treat say out degrees as discrete
categories, but that just won't be very useful (from your previous email).
Is this right?

Thank you
Snehal

attachment.html (2.03 KB)

Yes, that is right, otherwise I would not have written it in the documentation.

Whether or not it is useful depends on the circumstances. In general, for
scalar properties like degrees, scalar assortativity is more meaningful.

Thank you very much all the clarification and patience. I really appreciate
it.

Thank you
Snehal

attachment.html (1.93 KB)

Dear Tiago,

I upgraded to the version 2.24 (commit 9642dd05, Sat Oct 7 23:40:03 2017
+0100) and tried calculating gt.assortativity and gt.scalar_assortativity
by making a property map's values same for all nodes.

In [1]: import graph_tool.all as gt

In [2]: gt.__version__
Out[2]: '2.24 (commit 9642dd05, Sat Oct 7 23:40:03 2017 +0100)'

In [3]: g = gt.collection.data['karate']

In [4]: s = g.new_vertex_property('float')

In [5]: for v in g.vertices():
   ...: s[v] = 0.9999

In [6]: gt.scalar_assortativity(g, deg = s)
Out[6]: (1.0, 8.889098493616578)

In [7]: gt.assortativity(g, deg = s)
Out[7]: (nan, nan)

Shouldn't gt.scalar_assortativity also return (nan, nan) here?

Thank you
Snehal

attachment.html (3.92 KB)

It should, and this is what I get.

What Ubuntu release are you using?

Best,
Tiago

Hi Tiago,

I am using 16.04.

Thank you
Snehal

attachment.html (2.17 KB)

In order to determine if NaN should be returned, I use the function
boost::math::relative_difference(), which is only available in boost 1.60
and newer. In Ubuntu 16.04 the boost version is too old, and does not
contain this function, and therefore a simpler --- but less stable ---
option is used. If you want the more robust behavior, you should upgrade to
a newer Ubuntu version.