# how to improve similarity for Ising model and SIS model

Hi, everyone!
I met a problem when I'm learning how to use graph-tool. I read the paper,
network reconstruction and community detection from dynamics, and I am
trying to achieve the same result. When I followed the same settings for
real networks with synthetic dynamics, their similarities were just about
0.2. I have a question about how to control the number of infection events
per node,a, for the first model and the number of micro-state, M, for the
second model. The whole process is shown as following.

import graph_tool.all as gt
from matplotlib import cm

g = gt.collection.konect_data["openflights"] ## airport network with SIS
dynamics
gt.remove_parallel_edges(g)
g = gt.extract_largest_component(g, prune=False)

#simulation of an empirical dynamic model

# The algorithm accepts multiple independent time-series for the
# reconstruction. We will generate 100 SIS cascades starting from a
# random node each time, and uniform infection probability beta=0.2.

ss = []
for i in range(100):
si_state = gt.SISState(g, beta=.2)
s = [si_state.get_state().copy()]
for j in range(10):
si_state.iterate_sync()
s.append(si_state.get_state().copy())
# Each time series should be represented as a single vector-valued
# vertex property map with the states for each note at each time.
s = gt.group_vector_property(s)
ss.append(s)

# Prepare the initial state of the reconstruction as an empty graph
u = g.copy()
u.clear_edges()
ss = [u.own_property(s) for s in ss] # time series properties need to be
'owned' by graph u

# Create reconstruction state
rstate = gt.EpidemicsBlockState(u, s=ss, beta = None, r=1e-6,
global_beta=.2,
state_args=dict(B=20), nested=False,
aE=g.num_edges())

# Now we collect the marginals for exactly 10,000 sweeps, at
# intervals of 10 sweeps:

gm = None
bm = None
betas = []

def collect_marginals(s):
global gm, bm
gm = s.collect_marginal(gm)
b = gt.perfect_prop_hash([s.bstate.b])
bm = s.bstate.collect_vertex_marginals(bm, b=b)
betas.append(s.params["global_beta"])

gt.mcmc_equilibrate(rstate, force_niter=1000, mcmc_args=dict(niter=10,
xstep=0),
callback=collect_marginals)

print("Posterior similarity: ", gt.similarity(g, gm, g.new_ep("double", 1),
gm.ep.eprob))
print("Inferred infection probability: %g ± %g" % (mean(betas), std(betas)))

Hi, everyone!
I met a problem when I'm learning how to use graph-tool. I read the paper,
network reconstruction and community detection from dynamics, and I am
trying to achieve the same result. When I followed the same settings for
real networks with synthetic dynamics, their similarities were just about
0.2. I have a question about how to control the number of infection events
per node,a, for the first model and the number of micro-state, M, for the
second model. The whole process is shown as following.

You just copied the example in the documentation and changed the
network. That's a good start, but I recommend trying to understand what
each part does.

In the SIS example, as the comments clearly state, the generated data
correspond to 100 cascades of length 10.

In the Ising model example you sent, you sample M=1000 microstates.

Moreover, I also wonder how to do a nested version for the same network.

Just don't pass nested=False when you created the reconstruction state.

Best,
Tiago

Dear professor Peixoto,

The dynamic example in the document sets 100 initial infected points and
iterates for 10 times simultaneously. So the epidemic process is ongoing on
a network and time T belongs to [0,9]. Then the time series is copied to a
same but masked network. Am I correct? But I still wonder how to control the
number of infected events per node. I noted that infected nodes are randomly
selected.

Moreover, Should I set like this for the Ising model?
"
for i in range(1000):
si_state = gt.IsingGlauberState(g, beta=.02)
s = [si_state.get_state().copy()]
si_state.iterate_async()
s.append(si_state.get_state().copy())
# Each time series should be represented as a single vector-valued
# vertex property map with the states for each note at each time.
s = gt.group_vector_property(s)
ss.append(s)
"

sincerely,
Gege Hou

Dear professor Peixoto,

The dynamic example in the document sets 100 initial infected points and
iterates for 10 times simultaneously. So the epidemic process is ongoing on
a network and time T belongs to [0,9]. Then the time series is copied to a
same but masked network. Am I correct?

In the example in the documentation the time series is copied to an
empty graph, which will be the starting point of the reconstruction.

But I still wonder how to control the
number of infected events per node. I noted that infected nodes are randomly
selected.

This is not controlled explicitly; after you generate the time series
you count the number of times each node flipped, and you average.

Moreover, Should I set like this for the Ising model?
"
for i in range(1000):
si_state = gt.IsingGlauberState(g, beta=.02)
s = [si_state.get_state().copy()]
si_state.iterate_async()
s.append(si_state.get_state().copy())
# Each time series should be represented as a single vector-valued
# vertex property map with the states for each note at each time.
s = gt.group_vector_property(s)
ss.append(s)
"

Since the Ising reconstruction expects uncorrelated samples, I think
it's best to use only one "time series", i.e.

si_state = gt.IsingGlauberState(g, beta=.02)
ss = [si_state.get_state().copy()]

for i in range(1000):
si_state.iterate_async()
ss.append(si_state.get_state().copy())

ss = gt.group_vector_property(ss)

Best,
Tiago

Hi, professor Peixoto.
Please forgive me. Comparing carefully, I am still confused about what is
the difference between the example in the documentation and the SIS model in
the paper. I only found that the dolphins network is undirected but the
open-flight network is directed. So should I deal with directed network in a
different way?

In the example in the documentation the time series is copied to an
empty graph, which will be the starting point of the reconstruction.

Should I copy the time series to the open-flight graph directly as
following?

"
ss = [g.own_property(s) for s in ss]
rstate = gt.EpidemicsBlockState(g, s=ss, beta = None, r=1e-6,
global_beta=.2,
state_args=dict(B=1), nested=False)
"
Why couldn't I use an empty graph as a starting point?

Sincerely,
Gege Hou

Hi, professor Peixoto.
Please forgive me. Comparing carefully, I am still confused about what is
the difference between the example in the documentation and the SIS model in
the paper. I only found that the dolphins network is undirected but the
open-flight network is directed. So should I deal with directed network in a
different way?

No, the procedure is exactly the same for directed and undirected networks.

In the example in the documentation the time series is copied to an
empty graph, which will be the starting point of the reconstruction.

Should I copy the time series to the open-flight graph directly as
following?

"
ss = [g.own_property(s) for s in ss]
rstate = gt.EpidemicsBlockState(g, s=ss, beta = None, r=1e-6,
global_beta=.2,
state_args=dict(B=1), nested=False)
"
Why couldn't I use an empty graph as a starting point?

Of course you could start with an empty graph, this is precisely what is
done in the example in the documentation. You seemed to have interpreted
my statement as the precise opposite of what it said.

Best,
Tiago

Hi, prof.Peixoto. I reread the paper and found that the existence of edges
depends on their posterior. So should I delete edges whose posteriors are
less than 0.5 before comparing with the original network?
I'm also wondering how to calculate the inverse temperature for the food web
network. Is it from the simulation or based on a unique property of the
network?

sincerely,
Gege Hou