Problems in plotting subgraphs using original graph colors

Hello everybody,

I am trying the following steps:

  1. Import an initial graph using the load_graph_from_csv
  2. Plot it assigning a color to every node given a vertex property
  3. Reduce the original graph to its biggest component
  4. Plot the biggest component sticking to the same colors for nodes as point 2
  5. Experiment with seed propagation algorithm to see how this colors change

I am working in parallel with neo4j to run all the algos and reduction to connected components. Starting from a edges.csv that contains 2 columns, “sourceNodeId” and “targetNodeId”, i initially read it, then I use a nodes.csv with “nodeId” and “property” to create a vprop, that I will pass it later to vertex_fill_color. Here the code:

g = graph_tool.load_graph_from_csv('edges.csv'), directed=True, hashed=True, skip_first=True)

nodes_prop = nodes.sort_values(by = "nodeId")
vprop = g.new_vertex_property("int", vals=nodes_prop["property"])

graph_draw(g, vertex_fill_color = vprop)

I then proced to do the same thing as above with a connected component, therefore a edgesCC.cvs with less edges but same id referring to the nodes, and also nodesCC.csv with the same property, but just less nodes.

gCC = graph_tool.load_graph_from_csv('edgesCC.csv'), directed=True, hashed=True, skip_first=True)

nodes_propCC = nodesCC.sort_values(by = "nodeId")
vpropCC = gCC.new_vertex_property("int", vals=nodes_propCC["property"])

graph_draw(gCC, vertex_fill_color = vpropCC)

And at this point I realised that the colors are probably wrongly assigned. Even though we cannot see correspondences from image 1 to 2 in node positions, the colors (I think at least) should be the same. But in the second image, that should be a subgraph of the first image, I cannot find color correspondences. I noticed it particularly in the yellowish cluster that in the first image is basically just a big one on the left with some green, and in the second is on the right, but now there is no green but blue and red instead.

Any hints?

Christian

When you pass integer valued property maps to the vertex_color argument of graph_draw it uses an internal logic to distribute the colors evenly among the values, and using the default color map. So, the colors are not expected to be the same if the number of nodes or integer distribution changes.

Do obtain consistent colors you can either:

  1. Pass your own color map via the vcmap parameter.
  2. Pass explicit colors e.g. in RGB format using string valued (HTML format) or vector-valued property maps
1 Like

But how do I connect them to a node property? Meaning how do I connect the property value to colors without creating a vprop first?

As I said, either you do create a property map first, or you use a color map: Colormap reference — Matplotlib 3.9.1 documentation

Thanks Tiago, I managed to assign the colors that I want creating a vector property map, nevertheless I encounter problems in the moment of plotting the subgraph. At this point I think there is something that I do wrong in terms of creating the property and assigning the value. I identify 2 sources of the problem:

  1. In the moment of creation of the big graph using the load_graph_from_csv (NOW, NOT IN THE CODE ABOVE) I am using hashed = False because the vertex in the edges.csv correspond to indices. But, creating the connected component edges list edgesCC.csv where there is just a subset edges.csv, if I use hashed = False in this last creation of the graph, the drawing does not represent a connected component, and additionally says that the number of vertices is the same as the starting graph.
  2. When I create the property like g.new_vertex_property(“vector”, vals = nodes_prop[“property”]), in the previous line I order first nodes_prop by nodeId since I imagine that vals expets this, since he just wants a list of values and not a dictionary nodeId:property. At this point I am not sure if this is correct. I can see from the two plots that the two above points create problems in the representation

Could you help me identify the problem?

Christian

Unfortunately, it’s not possible to provide a solution to your problem or an answer to your question with the information provided.

To enable us to understand the situation, you need to provide all the items below:

  1. Your exact graph-tool version.
  2. Your operating system.
  3. A minimal working example (MWE) that shows the problem. The MWE needs to be
    a. Self-contained. I.e. it cannot depend on any other context or resource. If it depends on files, they need to be provided.
    b. Minimal. I.e. it needs to include only what is necessary to reproduce the problem.

Item 3 above is very important! If you provide us only with a vague description, or only with the part of the code that you believe causes the problem, then it is not possible to understand the context that may have contributed to it.

You need to provide us with a complete example that runs and reproduces the problem you are observing.

Working on wsl using graph-tool version 2.74

edges.csv

sourceNodeId,targetNodeId
3,2
4,18
4,19
4,20
4,21
g = graph_tool.load_graph_from_csv('edges.csv', directed=True, hashed=False, skip_first=True)
nodes_data = {
    "nodeId": [2, 3, 4, 18, 19, 20, 21],
    "property": [1, 2, 1, 2, 2, 1, 2]
}
nodes = pd.DataFrame(nodes_data)
prop_to_color = {
    1: "[0,0,0,1]",
    2: "[0.5,0.5,0.5,0.5]"
}
nodes = nodes.replace({"property": prop_to_color})
nodes_properties = nodes.sort_values(by = "nodeId")
vprop = g.new_vertex_property("vector<double>", vals=nodes_properties["property"].apply(eval))
graph_draw(g, vertex_fill_color = vprop)

expected behaviour 7 nodes, 2 connected components, 4 grey and 3 black.
result:

and I need to keep the nodesId as then for the connected component I have
edgesCC.csv

sourceNodeId,targetNodeId
4,18
4,19
4,20
4,21
gCC = graph_tool.load_graph_from_csv('edgesCC.csv', directed=True, hashed=False, skip_first=True)
nodes_dataCC = {
    "nodeId": [4, 18, 19, 20, 21],
    "property": [1, 2, 2, 1, 2]
}
nodesCC = pd.DataFrame(nodes_dataCC)
nodesCC = nodes.replace({"property": prop_to_color})
nodes_propertiesCC = nodesCC.sort_values(by = "nodeId")
vpropCC = gCC.new_vertex_property("vector<double>", vals=nodes_propertiesCC["property"].apply(eval))
graph_draw(gCC, vertex_fill_color = vpropCC)

expected behaviour 5 nodes, 1 connected components, 3 grey and 2 black. Subgraph of the first one
result:

Could you please explain more clearly what you wanted to obtain in each case, and what is surprising you?

Basically, I have a scenario in neo4j where nodes have already preassigned nodeIds. This means that in the first example I would like to plot only the nodes [2, 3, 4, 18, 19, 20, 21], as the others nodes that are created by graph-tool simply do not exists. And also in creating the connected component in a second instance, I would like just to plot [4, 18, 19, 20, 21]. Additionally, in both examples I want to plot each node with the given property. Instead, since graph-tool creates as many nodes as the largest nodeId, when I create the vertex property these values are not assigned correctly.

Hope I explained.

You can achieve what you want by passing the option hashed=True to load_graph_from_csv(). This is in fact the default, so I don’t know why you set this parameter to False, since this just does the opposite of what you want.

The node ids will be accessible via the “name” property map:

g = graph_tool.load_graph_from_csv('edges.csv', directed=True, hashed=True, hash_type="int", skip_first=True)
print(g.vp.name.a)
[ 3  2  4 18 19 20 21]

If you want to extract the largest component of this graph, you can just use a graph view:

u = extract_largest_component(g, directed=False)
print(u.vp.name.fa)
PropertyArray([ 4, 18, 19, 20, 21], dtype=int32)

Does this solve your problem?

Thanks Tiago, through the hint of the vertex property name I managed to fix the situation.

1 Like