Strange outputs for Newman RMI? (graph_tool.inference.partition_centroid.reduced_mutual_information(x, y, norm=False))

JohannHM · April 30, 2021, 4:53pm

Hi team.
I'm wondering whether you could help me to see what is happening with your reduced_mutual_information() function because of several mismatching outputs I found on this implementation.

1. RMI is a value between [0, 1], but why in your example the output is negative if I compare two partition?
x = np.random.randint(0, 10, 1000)
y = np.random.randint(0, 10, 1000)
gt.reduced_mutual_information(x, y)
-0.065562...

2. In your example, you create sort of two partitions from a random distribution, Is it not the specific case when RMI is zero, or very close to zero?

3. When I use the exact partitions Newman offer in your own code (wine.txt), your function gives
0.7890319931250596
But the Newman function gives
Reduced mutual information M = 1.21946279985 bits per object
Why do these results are so different or how can we associate them?

4. Finally, what is (or where is) the description of the format one must pass the partitions to the function?
I mean, I'm confused about how x (or y) variables should arranged. Each row index is the node label? If so, how to write nodes sharing several partitions?

Thanks in advance for your answers and congratulation for creating this tool!
JM

tiago · May 2, 2021, 9:10am

Hi team.
I'm wondering whether you could help me to see what is happening with your reduced_mutual_information() function because of several mismatching outputs I found on this implementation.

1. RMI is a value between [0, 1], but why in your example the output is negative if I compare two partition?
x = np.random.randint(0, 10, 1000)
y = np.random.randint(0, 10, 1000)
gt.reduced_mutual_information(x, y)
-0.065562...

RMI is _not_ between [0,1]. It can take negative values!

The *normalized* value of RMI can take a value of at most one, but it
can still be negative.

2. In your example, you create sort of two partitions from a random distribution, Is it not the specific case when RMI is zero, or very close to zero?

-0.065562 is close to zero.

3. When I use the exact partitions Newman offer in your own code (wine.txt), your function gives
0.7890319931250596
But the Newman function gives
Reduced mutual information M = 1.21946279985 bits per object
Why do these results are so different or how can we associate them?

Newman's code returns the value in bits (base 2), where in graph-tool
the convention is to return the value in nats (base e). Just divide the
value obtained via graph-tool by log(2) and the results should match.

By the way, for the "wine.txt" example I get:

reduced_mutual_information(x, y) -> 0.8452672015130195
reduced_mutual_information(x, y) / log(2) -> 1.2194627998489254

I can only recover your value for norm=True

reduced_mutual_information(x, y, norm=True) -> 0.7890319931250596

which is not what is returned by Newman's code. So please pay attention.

4. Finally, what is (or where is) the description of the format one must pass the partitions to the function?
I mean, I'm confused about how x (or y) variables should arranged. Each row index is the node label? If so, how to write nodes sharing several partitions?

I honestly do not understand the source of confusion. A label partition
is a 1D array containing the group labels for each node, indexed by the
node index.

Best,
Tiago

JohannHM · May 3, 2021, 10:28am

Dear Tiago, thanks for your convenient answers.
My final question is about how to deal when one have a node in different partitions. If I have these communities, for example:
cover = [ [0,1,2,3], [3,1,5,4] ]
How it should be the label partition?
[0, ?, 0, ?, 1, 1]
The question marks is where I do not know what label should I write for?
Which label should I write there for the node 1, which is in two communities (community of label 0, and 1)?
The same for the node 3..

Any comment of how to deal with the entries of your function for covers will be very welcome.
Thanks in advance for your answer.
Johann

tiago · May 3, 2021, 1:51pm

This isn't a partition. A partition, by definition, is non-overlapping.
RMI is not applicable to this case.

JohannHM · May 3, 2021, 3:20pm

Thank for your comment, Tiago.
Would you please recommended me one of the included methods of your gt that is useful for comparing:
a. cover, vs partition
b. covers, vs covers

Thanks in advance.

tiago · May 3, 2021, 3:23pm

Nothing in particular comes to mind.

Best,
Tiago