I´d have a question regarding model selection with different distributions.
When we want to decide the partition that best describes the data for a
given distribution we go with that that gives the smallest entropy. However
say we want to compare 2 different distributions d1 and d2 and the best fit
for d1 gives an entropy value of e1 and for d2 e2 respectively. If e1 < e2,
can we say that d1 describes better our data than d2?

yes, I mean edge-covariates. In the example you referenced you compare
state.entropy() for two distributions, i.e. exponential and
log-normal, where for the log-normal model the covariates were scaled,
which is handled by subtracting log(g.ep.weight.a).sum().

In case I want to simply compare two models with unscaled discrete
covariates: one using a geometric distribution and one using a
binomial distribution. Can I perform model selection by simply
comparing their state.entropy() values?