numbers format

Hi guys, when saving a graph with .save(), I understood that float are
represented in hexadecimal exponentiation format. I was wondering the
reason for that.

The thing behind my question is that I built a graph with 5M vertices and
50M edges so storing it as an XML file leads to a 5GB one. So, if I can
find a way to reduce the size, I will be very happy.

Best,
F.

attachment.html (448 Bytes)

That doesn't seem like an unnecessarily large file for the number of edges
(I'm assuming with float or double properties also) you have. Thats only
100 bytes per edge. It should compress nicely for storage.

The values you are storing are literally binary, and a hexidecimal
representation is the most compact AFAIK. I am not certain, but maybe you
can round your floating point values to nice round (negative) powers of two
to make them smaller. You know, like .3 is not a simple number in hex:
http://www.binaryconvert.com/result_double.html?decimal=046051 but .75 is
because it comes from negative powers of two (1/2 + 1/4).

But, I'd bet the XML saves all the zeros for round numbers anyway. You'd
have to check.

Elliot

attachment.html (1.96 KB)

Hi guys, when saving a graph with .save(), I understood that float are
represented in hexadecimal exponentiation format. I was wondering the
reason for that.

The hexadecimal format is to guarantee exact representation, so that you
load exactly the same number you saved, without any loss. This is not
possible with a decimal format.

The thing behind my question is that I built a graph with 5M vertices
and 50M edges so storing it as an XML file leads to a 5GB one. So, if
I can find a way to reduce the size, I will be very happy.

Just use gzip or bzip2.

Note that in graph-tool you can save directly in a compressed format by
doing:

    g.save("graph.xml.gz") # for gzip

or

    g.save("graph.xml.bz2") # for bzip2

The same thing works for load.

This should lead to a strong reduction of your file size.

Best,
Tiago

Thanks for the explanation! I was already playing with the compression.
Best,
F.

attachment.html (1.8 KB)