max_cardinality_matching heuristic

Hi all,

in which sense is the algorithm in
graph_tool.topology.max_cardinality_matching with the heuristic=True
switch a heuristic and not exact? As I understand, it might not return a
maximum-cardinality matching even if there is one. But what about the
edge weights? Does it always return a minimum-weight matching
(respectively maximum-weight with minimize=False)? Or is this not
guaranteed?

Note that there are pitfalls to this question: eg. if all edge weights
are equal to one, then the optimal min-weight matching is empty if I do
not insist on max-cardinality. Can someone enlighten me what the
heuristic actually optimizes? Cardinality or weights? Both (in which
sense) or none (what's it good for, then)? And does it always find the
optimum for one of those two criteria?

Best,

Daniel

Hi,

Hi all,

in which sense is the algorithm in
graph_tool.topology.max_cardinality_matching with the heuristic=True
switch a heuristic and not exact? As I understand, it might not return
a maximum-cardinality matching even if there is one. But what about
the edge weights? Does it always return a minimum-weight matching
(respectively maximum-weight with minimize=False)? Or is this not
guaranteed?

Indeed this is a bit confusing. If heuristic=False the maximum should be
always guaranteed, and the is_maximal return value should also be
one. Why is this value then returned? This is explained in the boost
documentation:

    http://www.boost.org/doc/libs/1_52_0/libs/graph/doc/maximum_matching.html

    Why is a verification algorithm needed? Edmonds' algorithm is fairly
    complex, and it's nearly impossible for a human without a few days of
    spare time to figure out if the matching produced by edmonds_matching on
    a graph with, say, 100 vertices and 500 edges is indeed a maximum
    cardinality matching. A verification algorithm can do this mechanically,
    and it's much easier to verify by inspection that the verification
    algorithm has been implemented correctly than it is to verify by
    inspection that Edmonds' algorithm has been implemented correctly. The
    Boost Graph library makes it incredibly simple to perform the
    subroutines needed by the verifier (such as finding all the connected
    components of odd cardinality in a graph, or creating the induced graph
    on all vertices with a certain label) in just a few lines of code.

So, in reality, as far as the algorithm is correct, this test is not
necessary and could have been omitted. I decided to keep it. Maybe a
better option would be to optionally activate it, since in most cases it
is not desired.

If heuristic=True, a much simpler algorithm is run, which does not
guarantee that optimum is found, and also performs no checking.

Note that there are pitfalls to this question: eg. if all edge weights
are equal to one, then the optimal min-weight matching is empty if I
do not insist on max-cardinality. Can someone enlighten me what the
heuristic actually optimizes? Cardinality or weights? Both (in which
sense) or none (what's it good for, then)? And does it always find the
optimum for one of those two criteria? Best, Daniel

The weighted version optimizes the sum of the weights. Thus, if the
weights are one (or simply equal), it reduces to the cardinality. The
heuristic _does not_ guarantee that the optimum is found, it only
guarantees that it runs really fast in linear time. In fact, in many
cases the maximum will _not_ be found. This algorithm is used for the
sfdp_layout() algorithm, which needs more to be fast, rather than exact.

There are exact algorithms for the weighted version, but I have yet to
implement them. I intend to implement them at some point, and thus make
the 'weight' parameter effective also when heuristic=False.

I hope this clarifies.

Cheers,
Tiago

Hi Tiago and everyone else,

So, in reality, as far as the algorithm is correct, this test is not
necessary and could have been omitted. I decided to keep it. Maybe a
better option would be to optionally activate it, since in most cases it
is not desired.

In case the validation does not take much time (say, less than 20% of
the time to find the matching), I'd suggest to keep it but to not output
the result. I'd instead raise an exception instead if validation failed
(which should never happen unless there is a bug!).

Alternatively: Isn't the validation much more valuable when the
heuristic is used, ie. when it indeed can happen that the matching does
not have maximal cardinality? Is it possible to include the validation
for the heuristic=True setting? In this case, it would make more sense
to keep the test result as an output value, but then for both
"heuristic" settings!

Anyway that's just a suggestion.

The weighted version optimizes the sum of the weights. Thus, if the
weights are one (or simply equal), it reduces to the cardinality. The
heuristic _does not_ guarantee that the optimum is found, it only
guarantees that it runs really fast in linear time.

It's still not clear to me what the heuristic optimizes since the two
criteria "find a matching with maximal cardinality" and "minimize the
sum of edge weights" are contrary goals. Let's assume unitary edge
weights for simplicity. If someone told me to write a matching algorithm
that does not necessarily yield a maximum cardinality matching but
optimizes for minimal sum of weights, I'd play dumb and output an empty
matching. That has weight zero, is thus "better" than any other partial
matching and is generated in constant time. I'm sure the algorithm does
a cleverer thing - but what? I also suspect that I can extract the
answer if I study the paper that Tiago gave as a reference in the
documentation, but in case you remember what the algorithm really aims
for, can you spell it out for everyone, Tiago?

I'd indeed be very interested in a refined max_cardinality_matching that
has an exact algorithm for the weighted problem. I find the way networkx
approaches the problem very good; the only issue there is speed. In
networkx, the user chooses whether an overall maximum-weight matching is
searched for (which might not be maximum-cardinality) or whether a
maximum-weight matching is chosen among all maximum-cardinality matchings.

Best,

Daniel

Hi Tiago and everyone else,

So, in reality, as far as the algorithm is correct, this test is not
necessary and could have been omitted. I decided to keep it. Maybe a
better option would be to optionally activate it, since in most cases it
is not desired.

In case the validation does not take much time (say, less than 20% of
the time to find the matching), I'd suggest to keep it but to not
output the result. I'd instead raise an exception instead if
validation failed (which should never happen unless there is a bug!).

This makes a lot of sense. It is much better than the current state. The
verification step is much faster O(N + E), while the matching algorithm
itself is O(NE), so it should not be a problem. I'll make this
modification.

Alternatively: Isn't the validation much more valuable when the
heuristic is used, ie. when it indeed can happen that the matching
does not have maximal cardinality? Is it possible to include the
validation for the heuristic=True setting? In this case, it would make
more sense to keep the test result as an output value, but then for
both "heuristic" settings!

Yes, it is possible to include the verification if heuristic=True, but
only for the unweighted version. This sounds like a meaningful
modification as well.

The weighted version optimizes the sum of the weights. Thus, if the
weights are one (or simply equal), it reduces to the cardinality. The
heuristic _does not_ guarantee that the optimum is found, it only
guarantees that it runs really fast in linear time.

It's still not clear to me what the heuristic optimizes since the two
criteria "find a matching with maximal cardinality" and "minimize the
sum of edge weights" are contrary goals. Let's assume unitary edge
weights for simplicity. If someone told me to write a matching
algorithm that does not necessarily yield a maximum cardinality
matching but optimizes for minimal sum of weights, I'd play dumb and
output an empty matching. That has weight zero, is thus "better" than
any other partial matching and is generated in constant time. I'm sure
the algorithm does a cleverer thing - but what? I also suspect that I
can extract the answer if I study the paper that Tiago gave as a
reference in the documentation, but in case you remember what the
algorithm really aims for, can you spell it out for everyone, Tiago?

Maximum cardinality is always implied in all cases, even when the
weights are minimized. So returning an empty match is not meaningful.

I'd indeed be very interested in a refined max_cardinality_matching
that has an exact algorithm for the weighted problem. I find the way
networkx approaches the problem very good; the only issue there is
speed. In networkx, the user chooses whether an overall maximum-weight
matching is searched for (which might not be maximum-cardinality) or
whether a maximum-weight matching is chosen among all
maximum-cardinality matchings.

The heuristic in graph-tool performs the second version.

I think it is clear that the documentation right now is not as good as
it could be, and algorithm could also be improved to cover more general
cases. I'll keep this in the TODO list. If you wish, you can open a
ticket for it, so that I can be reminded of it, and you can keep track
of the progress.

Cheers,
Tiago