minimize nested blockmodel dl performance

Jose_Magana · April 3, 2014, 11:30pm

Hello,
    I am a new user of the library so forgive if my questions are very
basic.
    I am using the minimize_nested_blockmodel_dl in order to create a
hierarchical cluster for a graph.
    The graph is created from a matrix distance of the nodes in the graph.
There are 180 nodes.
    I have tried with different transformations of the distances to
weights, basically w=1-d and w=1/d, previously normalizing to fit in range
0-1.
    Only with the second one I have been able to obtain a non-trivial
partition and the partition obtained is quite bad (low NMI and high NVI ...
).
    Using other methods, much simpler, I have found much better partition
(agglomerative clustering for example) so I know the data allows for a
better partition to be obtained.

    The method is non-parametric, once the weights are provided, except for
the number of sweeps, epsilons,...

How can I know if I need to increase the number of sweeps or if I have
reached the limit of the algorithm?
So far, I have run upto 10K sweeps but there is no much difference in
the execution time nor the result.

What other parameters can be adjusted and under what criteria?

How must the data be normalized to obtain optimal results(increase
algorithm chances of finding a better partition series)?

       Thanks in advance

          Jose

attachment.html (1.74 KB)

tiago · April 4, 2014, 8:57am

Hi Jose,

Hello,
    I am a new user of the library so forgive if my questions are very basic.
    I am using the minimize_nested_blockmodel_dl in order to create a hierarchical cluster for a graph.
    The graph is created from a matrix distance of the nodes in the graph. There are 180 nodes.
    I have tried with different transformations of the distances to weights, basically w=1-d and w=1/d, previously normalizing to fit in range 0-1.
    Only with the second one I have been able to obtain a non-trivial partition and the partition obtained is quite bad (low NMI and high NVI ... ).
    Using other methods, much simpler, I have found much better partition (agglomerative clustering for example) so I know the data allows for a better partition to be obtained.

    The method is non-parametric, once the weights are provided, except for the number of sweeps, epsilons,...

     How can I know if I need to increase the number of sweeps or if I have reached the limit of the algorithm?
     So far, I have run upto 10K sweeps but there is no much difference in the execution time nor the result.

      What other parameters can be adjusted and under what criteria?

      How must the data be normalized to obtain optimal results(increase algorithm chances of finding a better partition series)?

The stochastic blockmodel implemented only covers unweighted graphs, or
multigraphs where the weights correspond to the edge multiplicities. The
edge weight parameter in the minimize_nested_blockmodel_dl() function
corresponds to edge multiplicities in a multigraph, not arbitrary real
weights. If you pass real weights, they will be converted to discrete
integers, e.g. zero if the weight is in the range [0, 1[, which is
probably why you are getting bad results.

You could discretize the weights by multiplying by some large constant,
but you have to do this carefully, since the results will depend on your
quantization, which will arbitrarily change the density of the graph...

Best,
Tiago