2014/06/06

List::Util additions in Perl 5.20

For a while now, I have taken over maintaining List::Util and Scalar::Util, the utility modules that ship with core Perl. After a while of getting used to what's where, I've actually now started adding things to it again; mostly by surveying what's commonly used from some other utility modules, and bringing them in so they can be nicely implemented in XS for efficiency. Now that Perl 5.20 is out, all of these latest updates now ship with core perl.

From List::MoreUtils, I have taken the four shortcutting reduce-like boolean test functions of all, any, none and notall. These are all similar to grep, in that they take a block that evaluates some predicate test on each element of the list. Where they differ from grep, is that grep will count the total number of items in the list that returned true, whereas these four functions will simply indicate what the overall result was; allowing them to short-circuit as soon as the result is determined.

use List::Util 1.33 qw( any );

if( any { $_ == 0 } @numbers ) {
  say "The list of numbers includes zero";
}

As this module ships as part of the Perl core, it can reliably make use of the C compiler to build it, so most of the functions it contains are implemented in efficient XS code. Specifically these four also use an optimisation technique called MULTICALL, which improves the efficiency of functions of this form, where a given small block of code is repeatedly executed many times, with $_ set differently every time.

Another set of functions copied from elsewhere are the pair* functions taken and extended from List::Pairwise. These are all functions that interpret their list as an even-sized list of pairs, executing the code block with $a and $b set to the first and second value of each pair. This could be used to operate on regular perl hashes (by assigning keys to $a and their associated values to $b), though there is no requirement that it really be a hash. The functions will preserve the order of the pairs, and won't get upset if the "keys" are not plain strings, or not unique. As the result is also returned in a list of pairs, it could be assigned into a hash, or used elsewhere.

use List::Util 1.33 qw( pairgrep pairmap );

# Take a subset of a hash whose keys are ALLCAPS
my %capitals = pairgrep { $a =~ m/^[A-Z]+$/ } %hash

# Rename keys in a hash
my %renamed = pairmap { ($a =~ s/^foo_/bar_/r), $b } %hash

(This latter example also makes use of the perl 5.14 s///r flag, to return the result of the substitution instead of editing in place.)

use List::Util 1.33 qw( pairs );

foreach ( pairs %hash ) {
   # $_ will be a 2-element ARRAY ref
   say "$_->[0] has value $_->[1]";
}

As of version 1.39 (so a little too late to make it into perl 5.20.0, but still available on CPAN), pairs returns blessed array references that respond to methods called key and value (inspired by DCONWAY's Var::Pairs), as well as being accessible by array indexing.

use List::Util 1.39 qw( pairs );

foreach ( pairs %hash ) {
   say $_->key, " has value ", $_->value;
}

I have many more ideas for functions that could be added, though some care will need to be taken not to invent experimental ideas; but instead to take inspiration of tried-and-tested from CPAN, as all these have done, to bring into core and standardise existing ideas.

One other thing I have my sights set on is to implement further-optimised versions of at least some of the functions in Scalar::Util and List::Util as custom ops on perl 5.16 onwards. This will give them an even further performance boost, as they won't even be regular XS functions any more, so will completely remove the expensive call-time overhead of the ENTERSUB/LEAVESUB pair.


I should also add, that for a while now I've been a self-employed IT contractor, which has given me a lot more free time to be able to write such things as named above. If anyone is interested in supporting or sponsoring similar work on Perl, contact me by email. I'd be happy to give most reasonable Perl jobs a consideration. For that matter, I also work in C, or other languages, and I've even been known to build small-scale electronics projects.