Hello, I read that since version 2.40 some optimizations have been implemented to speedup the inference.
I used to run a number of concurrent minizations to find partition overlaps, this is handled by joblib which spawns n concurrent tasks (with the loki backend). I'm possibly experiencing some performance degradation (still investigating, though), possibly due to OMP "colliding" with joblib. Which optimizations were introduced in version 2.40? At what level?
Nevermind, after further investigations I found that the degradation was only apparent. Still, I'd like to know which are the optimizations that have been included and what do they affect
You can take a look at the git commit history for all the gritty
details, but in a nutshell:
- The agglomeration algorithm has been moved entirely to C++ (some
higher level functions were in Python before)
- Many data structures have been improved (e.g. the bookkeeping
necessary for move proposals)
- The initialization of the agglomeration has been changed: when
starting with B=N groups, instead of performing merge/sweeps, we just do
single-node sweeps, which have the same effect as merges, but are much
faster. Only after the number of groups stops decreasing fast enough, we
switch to merges.
The last modification turned out to have a big relative impact in practice.