using internal properties with string type to filter graph

mvdnheuv · May 12, 2020, 12:38pm

Hi,

I am working on a very large graph of companies and wanted to make some
functions to easily filter out certain subgraphs I would need for some
calculations.

So I made a graph G and populated it with nodes and edges and some internal
property maps because I don't want to always remake this graph. The point is
to just write the complete graph out to a file once I get through all the
data cleaning and have a final graph in a file.

So this gives me:
G, a <Graph object, directed, with 11944189 vertices and 7828750 edges at
0x7f49254e90f0>
with
G.list_properties()
    ID (vertex) (type: string)
    company_country (edge) (type: string)
    shareholder_country (edge) (type: string)
    shareholderdirect (edge) (type: double)

Now I just want to do a filtering based on these properties as suggested
earlier in this forum:
g_AT = GraphView(G, efilt=G.ep.company_country.a == 'AT')

But as mentioned in the docs for internal properties: "Internal graph
property maps behave slightly differently. Instead of returning the property
map object, the value itself is returned from the dictionaries"

Which I guess is why running G.ep.company_country.a gives me None, and
running G.ep['company_country'][G.edges().next()] gives me 'AT'.

So for filtering I now do:

ep_filter = G.new_ep('bool')
for e in G.edges():
ep_filter[e] = G.ep['company_country'][e] == 'AT'

But I was wondering if there is some way to not have to go through this edge
by edge but rather just get the whole PropertyArray returned which would be
more elegant and avoid having to constantly make new boolean properties.

PS: for the double type, G.ep.shareholderdirect.a does give me a nice
PropertyArray which I can directly use in the form of
G.ep.shareholderdirect.a > .5, which gives me an easy to use filtering array
to input into GraphViews.

Thanks in advance,
Milan

tiago · May 12, 2020, 1:10pm

This has nothing to do with property maps being internal or not. It is
not possible to obtain an array of a string-type property map (internal
or not) because its storage is not contiguous in memory. This is true
also for vector and python object types.

I think be best alternative for you is to convert the strings to numeric
codes that you store in a dictionary, so that you can do:

g.ep.company_country_code.a == code["AT"]

Best,
Tiago