2015/06/09

Turning a Hard Problem Into an Easy One

Recently I continued my stewardship (for want of a better word) of List::Util by finally adding the last function in a useful set of the pairwise key/value list management functions, called unpairs.

I had been intending to add to the set of pairfirst, pairgrep and pairmap, a sorting function called pairsort. To manage that I would first have to get around a slightly awkward interface requirement. Perl's regular grep and map pass the single items in via $_, whereas pairgrep and pairmap operate not on single values of their input list, but odd/even sets of two elements each, consisting of a notional "key" and "value". They have to pass the key and value items in via $a and $b. Since regular sort already uses $a and $b, there'd be no easily convenient place to pass the second key and value in to pairsort - perl doesn't provide a $c and $d for this purpose.

It would be easier, in hindsight, if the pair* versions of these functions had continued using the $_ variable to pass both key and value by using the Pair objects in the list provided by the pairs function. If this were the case, then pairsort could be implemented using just $a and $b, as the comparison function would have access to all four items using $a->key, $a->value, $b->key and $b->value.

I started by thinking about how a pairsort function would do this, taking the values in the list and splitting them into key/value pairs, packing them into Pair objects, and... Wait a moment - this is exactly the task that the pairs function already does. Rather than writing a pairsort function as a native XS function from scratch, it would be far easier to reuse pairs and regular sort. All I'd need is a function to take the sorted output, that list of Pair objects, and turn it back into the even-sized list of keys and values. An inverse of the pairs function - call it unpairs.

Rather than a complex XS function to implement sorting a list of keys and values, all I'd need is the simpler pairs and unpairs functions, wrapped around a regular sort. This could easily be done in Perl code:

sub pairsort(&@)
{
   my $code = shift;
   unpairs sort &$code pairs @_;
}

In fact, by writing them out in a little table we can see that the two functions pairs and unpairs are in fact quite sufficient to let us implement any of the higher-order pairwise functions, using only the regular single-element functions perl already provides:

pairfirst { COND } LIST  ===   unpairs first { COND } pairs LIST
pairgrep { COND } LIST   ===   unpairs grep { COND }  pairs LIST
pairmap { FUNC } LIST    ===   unpairs map { FUNC }   pairs LIST
pairsort { CMP } LIST    ===   unpairs sort { CMP }   pairs LIST

You might at this point start to notice some similarities with the Schwartzian Transform; a method of sorting a list by the values returned by some transformation function on each item. To remind ourselves what this looks like, here's an example that sorts a list of strings by the number of vowels it has:

my @sorted_strings = map { $_->[0] }
                     sort { $a->[1] <=> $b->[1] }
                     map { [ $_, length( $_ =~ s/[^aeiou]+//gr ) ]
                     @strings;

This starts to look rather similar to sorting an even-sized kvlist by, say, a simple string order on its keys:

my @sorted_kvlist = unpairs
                    sort { $a->key cmp $b->key }
                    pairs
                    @kvlist;

What both of these have in common is their method of attacking a problem that is in some way inconvenient to process by changing the form of the data into a shape that is easier to solve. This is done in a reversible way so that at the end we can undo the initial transformation and return back to a list in the form we started with, but now in the right order.

In the Schwartzian's case, we take the original list of values and turn them into two-element array references containing the transformed value we'd like to sort by and the original value. In the key/value pair case, we take the original list in sets of two elements and bundle these keys and values into Pair objects. Each transformation can be easily undone, but before we undo it, we first pass the list through our required sort function, to obtain the order that we'd like.

What we have achieved then, is a way to create new sorting behaviours by just providing a pair of functions that let us use a regular sort. In the Schwartzian's case it simply gives us a more efficient way to achieve this sort by reducing the number of times we need to transform items into their sorting values, but in the pairsort case it lets us perform what previously we could not; as perl's native sort does not understand how to take items in key/value pairs from its input list. In both cases, we can write in a more generalised form, as:

my @output = UNTRANSFORM
             sort { COMPARE($a, $b) }
             TRANSFORM
             @input;

This works because the two functions of TRANSFORM and UNTRANSFORM are inverses of each other. We transform the input list from one type into a list of a different type, perform the sorting operation on that type instead, then undo that transformation and return back to our original type now in a new order.

There is of course nothing unique to strings or pairs, or indeed even sorting, in this pattern. It can apply to many other kinds of list-processing function, or even problems on other kinds of data. Whenever you have a problem that at first glance seems hard, it pays to consider whether the problem can be turned into one that is inherently easier by changing the values from one type into another, and then back again.