Hi,
I am trying to read the graphml output of graph-tool's graphml using
networkx.
graphml - compatibility with graph-tool · Issue #843 · networkx/networkx · GitHub
Unfortunately this does not work with any of the vector_* type property maps
which graph-tool uses. Have you encountered this issue before?
Yes, this is expected, because the graphml specification only defines
the following types: boolean, int, long, float, double, or string
If you want another type, you are out of luck.
It seems the right thing to do might be to extend your graphml to hold the
vector_* attributes as detailed:
GraphML Primer
Is there some reason why it was done the way it is? How do you manage
read/writing graphml data to other tools?
Extending it this way would be the strictly "correct" approach. However,
it has two downsides: Firstly, it is much more cumbersome to
implement. Essentially, the reader must be aware of this whole xml
schema extension stuff, which currently it simply ignores. Secondly, it
does not really fix the problem of interoperability, it only punts
it. Two pieces of software would still need to agree and know about the
extension for it to work. In other words, you still would not be able to
make networkx read the vector types, unless the they modify their
reader. It seems to me that simply adding a nonstandard type is much
more straightforward, albeit "unclean" from the point of view of XML
validity.
Regarding reading data from other tools, there is no issue, since the
standard types are fully supported. If the user wants to feed graphml
data produced with graph-tool to other programs, then only the standard
types should be used.
In the meantime, it might be possible to hack some read support for
graph-tool's xml into networkx. To this end, could you please advise how to
parse the 'key1' data (should be two floats)
<node id="n1">
<data key="key0">6</data>
<data key="key1">0x1.5c71d0cb8d943p+3, 0x1.70db7f4083655p+3</data>
</node>
The delimiter is a comma, and spaces should be ignored. The individual
values are encoded according to the %a format from C99. This is to
ensure exact binary representation. From the printf manpage:
a, A (C99; not in SUSv2) For a conversion, the double argument is converted to
hexadecimal notation (using the letters abcdef) in the style [-]0xh.hhhhp±;
for A conversion the prefix 0X, the letters ABCDEF, and the exponent separa‐
tor P is used. There is one hexadecimal digit before the decimal point, and
the number of digits after it is equal to the precision. The default preci‐
sion suffices for an exact representation of the value if an exact represen‐
tation in base 2 exists and otherwise is sufficiently large to distinguish
values of type double. The digit before the decimal point is unspecified for
nonnormalized numbers, and nonzero but otherwise unspecified for normalized
numbers.
I'm not sure there is any python function which can read this
automatically. You can do it with ctypes:
>>> from ctypes import *
>>> libc = cdll.LoadLibrary("libc.so.6")
>>> d = c_double()
>>> libc.sscanf(b"0x1.5c71d0cb8d943p+3", b"%a", byref(d))
1
>>> print(d)
c_double(5.402846293e-315)
But this would not be the most portable approach... Otherwise you can
write a simple parser based on the format description above.
Please keep me informed on any progress on this. Interoperability with
other programs is important, so if there is anything I can do to help,
I'd be glad to do it. If the networkx people would like to consider a
common approach, I'm open for discussion.
Cheers,
Tiago