Replacing boost::any arguments with std::variant

Jeff_Trull · February 6, 2020, 6:12am

Hi all,

I've been wrestling with long compile times in graph-tool while doing some
development work, and I noticed that template instantiations seem to be an
important factor. Even functions of modest compile time can be very slow if
they are multiplied by the product of several type alternatives, each of
which much be instantiated. For example, a single function in Python may
dispatch to N graph types x M degree map types x P weight map types, which
can mean hundreds of instantiations.

At the same time I noticed a pattern in the code where there is a set of
types (represented by a Boost.MPL typelist) that accompany a boost::any
argument. At dispatch time the "any" object is interrogated to find out
which of the types it stores, and the correct instantiation is called.
Dispatching this way requires a linear search through the typelist for each
argument.

It seems to me that replacing (MPL typelist + boost::any) with std::variant
would improve both of those factors with:

1. Constant time dispatch to the appropriate instantiation via std::visit
2. The ability to reduce compile time by reducing the N x M... type
product, depending on how well the argument use can be refactored.

I think the approach described in
http://jefftrull.github.io/c++/boost/python/2020/01/30/variants-in-boost-python.html
could
be applied here.

Would there be any interest in exploring this kind of refactoring? I think
there could be substantial benefits in compile time, as well as some
runtime improvement (depending on how often the Python/C++ boundary is
crossed).

Thanks,
Jeff

attachment.html (1.88 KB)

tiago · February 6, 2020, 1:09pm

I don't think the important issue is whether we use boost::any or
std::variant. The high compile times stem simply from the fact we have
to cycle through the Cartesian product of the set of types of each
parameter. I don't see how using std::variant changes this in any way.

Jeff_Trull · February 6, 2020, 5:59pm

std::variant. The high compile times stem simply from the fact we have
to cycle through the Cartesian product of the set of types of each

I absolutely agree on that diagnosis, but I think std::variant can play a
role. The key
is to postpone dispatch, where possible, to refactor out one or more of the
product
terms.

At the moment the dispatch mechanism identifies all the concrete types
first, *then*
runs the correct instantiated function. If there were more flexibility in
this process,
dispatching to selected type-dependent code could happen later and cover
less code.
Consider:

template<class A, class B, class C>
void foo(A a, B b, C c)
{
    // lots of A and B code
    // a single use of C
    // lots more A and B
}

For the sake of a small amount of code involving C the entire function gets
rebuilt as
many times as there are C types. Now consider an approach that postponed
determining C's concrete type:

template<class A, class B>
void foo(A a, B b, std::variant<C1, C2, ...> c_var)
{
    // lots of A and B
    std::visit((auto const & c){ // use of C }, c_var);
    // lots more A and B
}

If C was something based on scalar_types this would mean a factor of 6
reduction in
instantiations!

It's true that you could do the same trick - though IMO less cleanly - with
boost::any and
a typelist. But this would leave off the other advantage of variant:
constant time dispatch.
I don't know what the runtime impact of the current iterative approach is,
but it seems like
it could particularly important in filtered graphs, or in algorithms with a
lot of callbacks
(like my personal favorite, astar_search).

attachment.html (4.34 KB)

tiago · February 6, 2020, 9:18pm

But that is not the relevant pattern. What we require is the *joint*
instantiation of A B and C types, and a dispatch that is specialized for
this joint combination, and hence is as fast as possible.

What you are describing (independent dispatching for every type) is not
very different from dynamic typing, unless the code can be divided into
clear disjoint blocks as in your example, which is not the case for most
algorithms in graph-tool.

Best,
Tiago

Jeff_Trull · February 7, 2020, 7:28pm

But that is not the relevant pattern. What we require is the *joint*
instantiation of A B and C types, and a dispatch that is specialized for
this joint combination, and hence is as fast as possible.

The joint combination is straightforward:

template<class A, class B, class C>
void fooImpl(A a, B b, C c);

void foo(std::variant<A1..> avar,
         std::variant<B1..> bvar,
         std::variant<C1..> cvar)
{
    std::visit(
        (auto a, auto b, auto c)
        {
            fooImpl(a, b, c);
        },
        avar, bvar, cvar);
}

but in this case, as you pointed out, we don't get much compile time
advantage - but we do still enjoy the fast dispatch.

What you are describing (independent dispatching for every type) is not
very different from dynamic typing, unless the code can be divided into
clear disjoint blocks as in your example, which is not the case for most
algorithms in graph-tool

I will defer to your judgment on that one, though perhaps surprisingly I
found this worked in the first (only) algorithm I tried applying it to:
assortativity.
I selected it based on how long it took to build. The results were:

boost::any + typelist : 177s, 4.5GB memory
std::variant for edge weights only: 37s + 1.74GB

The memory reduction is very useful in that it enables parallel builds.

The prototype can be found here:

Best,
Jeff

attachment.html (5.59 KB)

tiago · February 7, 2020, 8:51pm

But that is not the relevant pattern. What we require is the *joint*
instantiation of A B and C types, and a dispatch that is specialized for
this joint combination, and hence is as fast as possible.

The joint combination is straightforward:

template<class A, class B, class C>
void fooImpl(A a, B b, C c);

void foo(std::variant<A1..> avar,
std::variant<B1..> bvar,
std::variant<C1..> cvar)
{
std::visit(
(auto a, auto b, auto c)
{
fooImpl(a, b, c);
},
avar, bvar, cvar);
}

but in this case, as you pointed out, we don't get much compile time
advantage - but we do still enjoy the fast dispatch.

It's possible that the compiler is able to compile this in less time and
memory than what is currently done with MPL + boost::any (which predates
the existence of std::variant). It's something to be considered.

What you are describing (independent dispatching for every type) is not
very different from dynamic typing, unless the code can be divided into
clear disjoint blocks as in your example, which is not the case for most
algorithms in graph-tool

I will defer to your judgment on that one, though perhaps surprisingly I
found this worked in the first (only) algorithm I tried applying it to:
assortativity.
I selected it based on how long it took to build. The results were:

boost::any + typelist : 177s, 4.5GB memory
std::variant for edge weights only: 37s + 1.74GB

The memory reduction is very useful in that it enables parallel builds.

The prototype can be found
here: master to feature/variant-conversion-demo · Jeff Trull / graph-tool · GitLab

I don't find this kind of modification interesting, because the
trade-off goes precisely in the opposite direction of what is the
intention of the library: you're sacrificing run time for compile time.
In particular you replace _every_ lookup of property maps with a variant
visit, including type conversion. Sure, that compiles faster because it
reduces the type combinations, but it just shifts the burden to run
time. This could also be achieved in the traditional way of doing run
time polymorphism, virtual functions, etc.

Best,
Tiago

Jeff_Trull · February 7, 2020, 9:13pm

It's possible that the compiler is able to compile this in less time and
memory than what is currently done with MPL + boost::any (which predates
the existence of std::variant). It's something to be considered.

I'm sorry to say that in my experiments it made only a small difference.
I think the main benefit would be felt at runtime. It may be worth it for
that, though.

I don't find this kind of modification interesting, because the
trade-off goes precisely in the opposite direction of what is the
intention of the library: you're sacrificing run time for compile time.
In particular you replace _every_ lookup of property maps with a variant
visit, including type conversion. Sure, that compiles faster because it

Quite true! Thanks for considering it, though.

Best,
Jeff

attachment.html (4.59 KB)