Recently I continued my stewardship (for want of a better word) of `List::Util` by finally adding the last function in a useful set of the pairwise key/value list management functions, called `unpairs`.

I had been intending to add to the set of `pairfirst`, `pairgrep` and `pairmap`, a sorting function called `pairsort`. To manage that I would first have to get around a slightly awkward interface requirement. Perl's regular `grep` and `map` pass the single items in via `$_`, whereas `pairgrep` and `pairmap` operate not on single values of their input list, but odd/even sets of two elements each, consisting of a notional "key" and "value". They have to pass the key and value items in via `$a` and `$b`. Since regular sort already uses `$a` and `$b`, there'd be no easily convenient place to pass the second key and value in to `pairsort` - perl doesn't provide a `$c` and `$d` for this purpose.

It would be easier, in hindsight, if the `pair*` versions of these functions had continued using the `$_` variable to pass both key and value by using the `Pair` objects in the list provided by the `pairs` function. If this were the case, then `pairsort` could be implemented using just `$a` and `$b`, as the comparison function would have access to all four items using `$a->key`, `$a->value`, `$b->key` and `$b->value`.

I started by thinking about how a `pairsort` function would do this, taking the values in the list and splitting them into key/value pairs, packing them into `Pair` objects, and... Wait a moment - this is exactly the task that the `pairs` function already does. Rather than writing a `pairsort` function as a native XS function from scratch, it would be far easier to reuse `pairs` and regular sort. All I'd need is a function to take the sorted output, that list of `Pair` objects, and turn it back into the even-sized list of keys and values. An inverse of the `pairs` function - call it `unpairs`.

Rather than a complex XS function to implement sorting a list of keys and values, all I'd need is the simpler `pairs` and `unpairs` functions, wrapped around a regular `sort`. This could easily be done in Perl code:

sub pairsort(&@) { my $code = shift; unpairs sort &$code pairs @_; }

In fact, by writing them out in a little table we can see that the two functions `pairs` and `unpairs` are in fact quite sufficient to let us implement any of the higher-order pairwise functions, using only the regular single-element functions perl already provides:

pairfirst { COND } LIST === unpairs first { COND } pairs LIST pairgrep { COND } LIST === unpairs grep { COND } pairs LIST pairmap { FUNC } LIST === unpairs map { FUNC } pairs LIST pairsort { CMP } LIST === unpairs sort { CMP } pairs LIST

You might at this point start to notice some similarities with the Schwartzian Transform; a method of sorting a list by the values returned by some transformation function on each item. To remind ourselves what this looks like, here's an example that sorts a list of strings by the number of vowels it has:

my @sorted_strings = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, length( $_ =~ s/[^aeiou]+//gr ) ] @strings;

This starts to look rather similar to sorting an even-sized kvlist by, say, a simple string order on its keys:

my @sorted_kvlist = unpairs sort { $a->key cmp $b->key } pairs @kvlist;

What both of these have in common is their method of attacking a problem that is in some way inconvenient to process by changing the form of the data into a shape that is easier to solve. This is done in a reversible way so that at the end we can undo the initial transformation and return back to a list in the form we started with, but now in the right order.

In the Schwartzian's case, we take the original list of values and turn them into two-element array references containing the transformed value we'd like to sort by and the original value. In the key/value pair case, we take the original list in sets of two elements and bundle these keys and values into `Pair` objects. Each transformation can be easily undone, but before we undo it, we first pass the list through our required sort function, to obtain the order that we'd like.

What we have achieved then, is a way to create new sorting behaviours by just providing a pair of functions that let us use a regular sort. In the Schwartzian's case it simply gives us a more efficient way to achieve this sort by reducing the number of times we need to transform items into their sorting values, but in the `pairsort` case it lets us perform what previously we could not; as perl's native sort does not understand how to take items in key/value pairs from its input list. In both cases, we can write in a more generalised form, as:

my @output = UNTRANSFORM sort { COMPARE($a, $b) } TRANSFORM @input;

This works because the two functions of `TRANSFORM` and `UNTRANSFORM` are inverses of each other. We transform the input list from one type into a list of a different type, perform the sorting operation on that type instead, then undo that transformation and return back to our original type now in a new order.

There is of course nothing unique to strings or pairs, or indeed even sorting, in this pattern. It can apply to many other kinds of list-processing function, or even problems on other kinds of data. Whenever you have a problem that at first glance seems hard, it pays to consider whether the problem can be turned into one that is inherently easier by changing the values from one type into another, and then back again.

The example that sorts strings by number of vowels may need to be rewritten to something like:

ReplyDeletemy @sorted_strings = map { $_->[0] }

sort { $a->[1] <=> $b->[1] }

map { [ $_, length( $_ =~ s/[^aeiou]+//gr ) ]}

@strings;

Oops yes indeed. Now corrected. Thanks.

DeleteHow would you feel about adding a Perl6 pairup() function to List::Util ? If you inspired by Damian's modules I'm pretty sure something like the pairup() functionality exists in `Var::Pairs`

ReplyDeleteI'm not sure I understand. How is my `List::Util::pairs()` function not already exactly that?

Delete