Questions about output of "history" of graph_tool.inference.mcmc_equilibrate

P-M · February 13, 2017, 3:30pm

I have just run the following snippet of code:

    mcmc_args=dict(parallel = True,niter=10)
    history = gt.mcmc_equilibrate(state, wait=1000,
history=True,mcmc_args=mcmc_args)
    with open('history1.pkl','wb') as his1_pkl:
        pickle.dump(history,his1_pkl,-1)

According to the manual history is a "list of tuples of the form (iteration,
entropy)". When unpickling it however I get a list of length 2000. Each
element in the list is another list of length two containing `nan` as first
entry and then a single-digit integer as second entry.

A couple of questions:
1) I would expect a tuple, not a list for each entry in the list. Is the
manual wrong or is the code wrong? Or did I do something wrong?
2) Why am I receiving `nan` rather than a value for "iteration" as first
entry of my list?
3) Is there a particular reason why the length of the list is precisely 2000
in this case? (Obviously there is, I just haven't quite figured it out yet.)

Best,

Philipp

tiago · February 13, 2017, 4:48pm

I have just run the following snippet of code:

    mcmc_args=dict(parallel = True,niter=10)
    history = gt.mcmc_equilibrate(state, wait=1000,
history=True,mcmc_args=mcmc_args)
    with open('history1.pkl','wb') as his1_pkl:
        pickle.dump(history,his1_pkl,-1)

According to the manual history is a "list of tuples of the form (iteration,
entropy)". When unpickling it however I get a list of length 2000. Each
element in the list is another list of length two containing `nan` as first
entry and then a single-digit integer as second entry.

A couple of questions:
1) I would expect a tuple, not a list for each entry in the list. Is the
manual wrong or is the code wrong? Or did I do something wrong?

The point of the documentation was that two values for each step are
returned, not that the actual type was a tuple. Most code should not care
about this.

2) Why am I receiving `nan` rather than a value for "iteration" as first
entry of my list?

I have no idea. I can't reproduce this. You have to send a complete example
that shows the problem

3) Is there a particular reason why the length of the list is precisely 2000
in this case? (Obviously there is, I just haven't quite figured it out yet.)

As stated in the documentation, this is a stochastic algorithm which will
stop after equilibration has been detected (using a record-breaking
heuristic). Hence, the length of the history will be different each time.

Best,
Tiago

P-M · February 13, 2017, 5:50pm

Tiago Peixoto wrote

1) I would expect a tuple, not a list for each entry in the list. Is the
manual wrong or is the code wrong? Or did I do something wrong?

The point of the documentation was that two values for each step are
returned, not that the actual type was a tuple. Most code should not care
about this.

It is irrelevant to my code indeed. Seeing something different just threw me
off.

Tiago Peixoto wrote

2) Why am I receiving `nan` rather than a value for "iteration" as first
entry of my list?

I have no idea. I can't reproduce this. You have to send a complete
example
that shows the problem

The graph file is 675 MB in size so probably not terribly amenable to
sharing online. If I come across the issue with a smaller file I shall
upload it.

Tiago Peixoto wrote

3) Is there a particular reason why the length of the list is precisely
2000
in this case? (Obviously there is, I just haven't quite figured it out
yet.)

As stated in the documentation, this is a stochastic algorithm which will
stop after equilibration has been detected (using a record-breaking
heuristic). Hence, the length of the history will be different each time.

OK, thank you for the explanation.

Best wishes,

Philipp

P-M · February 21, 2017, 11:07am

Hi Tiago,

I have not reproduced the same problem yet, but a different problem for the
history with a smaller graph which I can upload. I ran the following piece
of code (this is a deliberately small network so ignore the actual results
of the code):

import graph_tool.all as gt
import timeit
import random
import cPickle as pickle

def collect_edge_probs(s):
        for i in range(len(missing_edges)):
            p = s.get_edges_prob([missing_edges[i]],
entropy_args=dict(partition_dl=False))
            probs[i].append(p)

g = gt.load_graph('graph_no_multi_clean.gt')

pub_years = [1800]
vertex_filter = g.new_vertex_property("bool")
edge_filter = g.new_edge_property("bool")
for pub_year in pub_years:
    #Initiliase parallel edges filter
    parallel_edges_filter= g.new_edge_property("int",val=0)

    #filter vertices by date
    for v in g.vertices():
        if g.vp.v_pub_year[v] <= pub_year:
            vertex_filter[v] = True
        else:
            vertex_filter[v] = False

    g.set_vertex_filter(vertex_filter)
    #now filter edges by date
    for e in g.edges():
        if g.ep.pub_year[e] <= pub_year:
            edge_filter[e] = True
        else:
            edge_filter[e] = False
    g.set_edge_filter(edge_filter)
    #cannot simply delete all parallel edges as that might prevent accurate
    #filtering of edges by date in the next step
    gt.label_parallel_edges(g,eprop=parallel_edges_filter)
    for e in g.edges():
        if parallel_edges_filter[e] != 0:
            edge_filter[e] = False
    g.set_edge_filter(edge_filter)
    remaining_v_indices = []
    for v in g.vertices():
        remaining_v_indices.append(int(g.vertex_index[v]))
    num_vertices = g.num_vertices()
    random_origins = random.sample(remaining_v_indices,
int(0.01*num_vertices))
    random_targets = random.sample(remaining_v_indices,
int(0.01*num_vertices))
    missing_edges = []
    for v1 in random_origins:
        for v2 in random_targets:
            if v1==v2:
                continue
            elif g.edge(v1,v2) == None:
                missing_edges.append((v1,v2))

    state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)
    state = state.copy(sampling=True)

    probs = [[] for _ in range(len(missing_edges))]

    mcmc_args=dict(niter=10)
    # Now we collect the probabilities for exactly 10,000 sweeps
    history = gt.mcmc_equilibrate(state, force_niter=1000,
mcmc_args=mcmc_args,
                        callback=collect_edge_probs,history=True)
    name = 'history'+str(g.num_vertices())+'.pkl'
    with open(name,'wb') as missing_edges_pkl:
        pickle.dump(history,missing_edges_pkl,-1)

    #undo filtering
    g.set_edge_filter(None)
    g.set_vertex_filter(None)

Now when looking at the output of `history` I find that the output for every
entry is [7842.8484318875344, a] where `a` is some single-digit integer.
Given that the expected format is [iteration,entropy] I can't quite make
sense of it as the first entry is always the same and a decimal number
wasn't quite what I expected for an iteration counter. The last number
however also doesn't work as an interation counter as it doesn't seem to
straightforwardly increment. Do you know what is going wrong here? Is this
maybe a similar issue to what I had observed previously? I have attached the
history output here ( history1023.pkl
<http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027051/history1023.pkl>
) and the graph as a zipped file here as it was too large otherwise (
graph_no_multi_clean.zip
<http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/file/n4027051/graph_no_multi_clean.zip>
).

tiago · February 21, 2017, 11:37am

It seems that the documentation is wrong; the history returns (entropy,
nmoves), where nmoves is the number of vertices moved. Returning the number
of iterations would be redundant anyways, since the length of the history
gives you that already.

I'll fix the documentation.