Meta-Meta Documentation

When any project of mine gets over a certain size and people start actually using it for things, inevitably people ask me for more documentation. Often there is a problem here in that I'm never really quite sure what to write about. I find it very hard to provide tutorial-style documentation "up-front" explaining the hows and the whys (as compared to reference-style docs, which just lists all the pieces that exist).

Let me give a random example: Consider Parser::MGC, a base for writing grammar parsers. Someone might want to use this for parsing some sort of expression language, and they'd find a lack of explaining how to do that. It's not that I don't know how to explain that, it's simply that it hasn't occurred to me to write that documentation as a discrete subject. If they'd ask me on IRC I could explain at great length, because they'd given me that initial impulse to give them an answer. But at some point someone will say "Well why isn't that written in documentation?" or "You should write a tutorial", and the same cycle of conversation will usually end with something like

But I don't know what documentation to write for you.

I have sometimes asked people to explain to me what documentation they want, and usually people don't like doing that because they feel since they don't already know the answers, they can't provide the questions. This article is my attempt to break out of that cycle.

Here now then is my rather direct plea to anyone who would ask me for more documentation. Here is your documentation on how to document what documentation I should write - a meta-meta-document:

If you explain to me what problems you want to solve, or what situations you want documentation for, that's a good start. It would be better if you can provide me some introduction text around your problem, your background, and then I can write something better for you specifically, that still stands nicely on its own.

But, here's the killer revelation:

If you write some documentation wherein you write the English waffly-wordy bits that explain the background of the problem and introduce the reader to the background context of the code, and then write a block of code that just says
banana banana banana
then I will find that about a hundred times easier to write. All I have to do is fill in that blob of perl code, and hey presto; there's your documentation.

I know this is a lot to ask of people, and I don't imagine it will happen very often, but that would be by far the fastest quickest way to get more documentation out of me.


Turning a Hard Problem Into an Easy One

Recently I continued my stewardship (for want of a better word) of List::Util by finally adding the last function in a useful set of the pairwise key/value list management functions, called unpairs.

I had been intending to add to the set of pairfirst, pairgrep and pairmap, a sorting function called pairsort. To manage that I would first have to get around a slightly awkward interface requirement. Perl's regular grep and map pass the single items in via $_, whereas pairgrep and pairmap operate not on single values of their input list, but odd/even sets of two elements each, consisting of a notional "key" and "value". They have to pass the key and value items in via $a and $b. Since regular sort already uses $a and $b, there'd be no easily convenient place to pass the second key and value in to pairsort - perl doesn't provide a $c and $d for this purpose.

It would be easier, in hindsight, if the pair* versions of these functions had continued using the $_ variable to pass both key and value by using the Pair objects in the list provided by the pairs function. If this were the case, then pairsort could be implemented using just $a and $b, as the comparison function would have access to all four items using $a->key, $a->value, $b->key and $b->value.

I started by thinking about how a pairsort function would do this, taking the values in the list and splitting them into key/value pairs, packing them into Pair objects, and... Wait a moment - this is exactly the task that the pairs function already does. Rather than writing a pairsort function as a native XS function from scratch, it would be far easier to reuse pairs and regular sort. All I'd need is a function to take the sorted output, that list of Pair objects, and turn it back into the even-sized list of keys and values. An inverse of the pairs function - call it unpairs.

Rather than a complex XS function to implement sorting a list of keys and values, all I'd need is the simpler pairs and unpairs functions, wrapped around a regular sort. This could easily be done in Perl code:

sub pairsort(&@)
   my $code = shift;
   unpairs sort &$code pairs @_;

In fact, by writing them out in a little table we can see that the two functions pairs and unpairs are in fact quite sufficient to let us implement any of the higher-order pairwise functions, using only the regular single-element functions perl already provides:

pairfirst { COND } LIST  ===   unpairs first { COND } pairs LIST
pairgrep { COND } LIST   ===   unpairs grep { COND }  pairs LIST
pairmap { FUNC } LIST    ===   unpairs map { FUNC }   pairs LIST
pairsort { CMP } LIST    ===   unpairs sort { CMP }   pairs LIST

You might at this point start to notice some similarities with the Schwartzian Transform; a method of sorting a list by the values returned by some transformation function on each item. To remind ourselves what this looks like, here's an example that sorts a list of strings by the number of vowels it has:

my @sorted_strings = map { $_->[0] }
                     sort { $a->[1] <=> $b->[1] }
                     map { [ $_, length( $_ =~ s/[^aeiou]+//gr ) ]

This starts to look rather similar to sorting an even-sized kvlist by, say, a simple string order on its keys:

my @sorted_kvlist = unpairs
                    sort { $a->key cmp $b->key }

What both of these have in common is their method of attacking a problem that is in some way inconvenient to process by changing the form of the data into a shape that is easier to solve. This is done in a reversible way so that at the end we can undo the initial transformation and return back to a list in the form we started with, but now in the right order.

In the Schwartzian's case, we take the original list of values and turn them into two-element array references containing the transformed value we'd like to sort by and the original value. In the key/value pair case, we take the original list in sets of two elements and bundle these keys and values into Pair objects. Each transformation can be easily undone, but before we undo it, we first pass the list through our required sort function, to obtain the order that we'd like.

What we have achieved then, is a way to create new sorting behaviours by just providing a pair of functions that let us use a regular sort. In the Schwartzian's case it simply gives us a more efficient way to achieve this sort by reducing the number of times we need to transform items into their sorting values, but in the pairsort case it lets us perform what previously we could not; as perl's native sort does not understand how to take items in key/value pairs from its input list. In both cases, we can write in a more generalised form, as:

my @output = UNTRANSFORM
             sort { COMPARE($a, $b) }

This works because the two functions of TRANSFORM and UNTRANSFORM are inverses of each other. We transform the input list from one type into a list of a different type, perform the sorting operation on that type instead, then undo that transformation and return back to our original type now in a new order.

There is of course nothing unique to strings or pairs, or indeed even sorting, in this pattern. It can apply to many other kinds of list-processing function, or even problems on other kinds of data. Whenever you have a problem that at first glance seems hard, it pays to consider whether the problem can be turned into one that is inherently easier by changing the values from one type into another, and then back again.


Printing function calls in Perl

Lately my favourite way to interpolate function calls into printed strings is to use the ${\ ... } notation for embedding arbitrary expressions into strings:

print "f($number) = ${\ fibonacci($number) }\n";

Other techniques include separate arguments:

print "f($number) = ", fibonacci($number), "\n";

or a helper variable:

my $result = fibonacci($number);
print "f($number) = $result\n";

or even printf:

printf "f(%d) = %d\n", $number, fibonacci($number);

Of all these techniques I tend to prefer either of the first two, because they lead to putting the expressions "in-line" with the rest of the text string, whereas in the latter two they sit elsewhere, making it harder to see at a glance what gets printed where. Especially with printf's positional arguments, it can be easy to be "off-by-one" with a large number of arguments, and put everything in the wrong place.

(This post was originally written to answer a question on StackOverflow.)


Perl module idea for making bitfields


I've been doing a lot of hardware IO work lately, which involves lots of talking to hardware devices, where there are byte-wide registers that store little bitfields, sometimes of individual and unrelated bits packed together. To make this easier I tend to write myself lots of pairs of functions to pack/unpack the fields in one of these bytes; for instance:

use constant {
      MASK_RX_RD      => 1<<6,
      MASK_TX_DS      => 1<<5,
      MASK_MAX_RT     => 1<<4,
      EN_CRC          => 1<<3,
      CRCO            => 1<<2,
      PWR_UP          => 1<<1,
      PRIM_RX         => 1<<0,

sub unpack_CONFIG
   my ( $config ) = @_;
      MASK_RX_RD  => !!( $config & MASK_RX_RD ),
      MASK_TX_DS  => !!( $config & MASK_TX_DS ),
      MASK_MAX_RT => !!( $config & MASK_MAX_RT ),
      EN_CRC      => !!( $config & EN_CRC ),
      CRCO        => $CRCOs[!!( $config & CRCO )],
      PWR_UP      => !!( $config & PWR_UP ),
      PRIM_RX     => !!( $config & PRIM_RX );

sub pack_CONFIG
   my %config = @_;
      ( $config{MASK_RX_RD}  ? MASK_RX_RD  : 0 ) |
      ( $config{MASK_TX_DS}  ? MASK_TX_DS  : 0 ) |
      ( $config{MASK_MAX_RT} ? MASK_MAX_RT : 0 ) |
      ( $config{EN_CRC}      ? EN_CRC      : 0 ) |
      ( ( _idx_of $config{CRCO}, @CRCOs )
         // croak "Unsupported 'CRCO'" ) * CRCO |
      ( $config{PWR_UP}      ? PWR_UP      : 0 ) |
      ( $config{PRIM_RX}     ? PRIM_RX     : 0 ) );

This convenient pair of functions bidirectionally converts bytes into key/value lists. As well as containing 6 simple boolean values, there's also an enumerated field, represented by the values in the array @CRCOs. These functions convert those too.

I'm getting tired of writing these pairs of functions. Hey, this is Perl right? :) I should get Perl to write them for me.

TL;DR - I propose the following

I'd like instead to write something like:

use bitfield;

bitfield CONFIG =>
   MASK_RX_DS  => boolfield(6),
   MASK_TX_DS  => boolfield(5),
   MASK_MAX_RT => boolfield(4),
   EN_CRC      => boolfield(3),
   CRCO        => enumfield(2, qw( values here )),
   PWR_UP      => boolfield(1),
   PRIM_RX     => boolfield(0);

The operation here is that 'bitfield' is automatically exporting a function called bitfield, along with the various *field() functions that create field definitions. boolfield() declares a single true/false boolean bit position, enumfield() declares a range of bits that give an enumeration. I guess also would be required intfield() to use a range of bits as an integer, and finally most likely customfield() taking a pair of conversion CODE refs or somesuch.

What does anyone think to that?


Released: Tickit 0.47

New up in Tickit 0.47, some fairly small and incremental updates:

  • Support the 'blink' terminal attribute

    Both in libtickit C library and Tickit perl module now support the blink attribute, much to my hesitation. ;)

    I'm not sure I want to encourage this sort of thing, but the Neovim project said they wanted this, so I've reluctantly added support for it all the same.

  • Bugfix for renderbuffer 'get*' methods

    When offset and clipping are applied, previously the get* methods didn't pay attention to this, fetching content relative to the toplevel, or segfaulting if requested out of bounds. This has now been fixed.

  • Tickit::Widget::HBox and ::VBox have now been moved to the Tickit-Widgets distribution.

    This dist is now linked to explicitly from the documentation. This supports the longterm goal of turning 'Tickit' into purely the window-layer downwards, and having all the widget support live in its own distribution, backed eventually by its own C library.

  • Nicer handling of fallback terminfo attributes for definitions missing them.

    Certain distributions of terminfo databases seem to be lacking certain essential attributes for some termtypes. Such examples as the upstream 'screen' terminfo is lacking erase_chars. libtickit now includes some fallback "likely to work" strings to handle these cases. This turns it from an instant failure on startup, to an at-worst wrong output, but hopefully most terminals should understand these standard strings. At least, if they don't I suspect libtickit is far from the only place that's broken, if they don't supply a more correct string in their terminfo.


Event Reflexivity - A Design Pattern Pattern?

The previous series of posts on the topic of Event Reflexivity each posed a question about what the general shape of the design pattern actually is. Over the following months after the posts I thought about the pattern a lot more and eventually came to the conclusion that a lot of the questions aren't a simple matter of choosing what's correct - the reason these questions are hard to answer is that both options could be correct. What I have here is in fact not a single unclear design pattern, but instead a whole family of possible choices on a design pattern, with different specifics being more useful to specific cases.

The specific design choices that a particular implementation takes should answer such questions as:

  • Ordering control: Are subscribers of some named action invoked in a controllable fashion?
  • Invocation functions: What kinds of actions or ways of invoking them are actually provided?
  • Heirarchial actions: Do the types of actions occupy a simple flat namespace, or is there some heirarchial structure to them? Can a subscriber catch an entire subspace of actions at once?
  • Explicit or implicit subscription: Do subscribers have to explicitly list every action (or action subspace) they wish to receive?
  • Filterable arguments: Does the implementation offer a built-in way to filter specific values of arguments by some pre-declared pattern?

Perhaps the only real design question for the Event-Reflexive Design Pattern Pattern is to decide on some neat concise language that implementations can use to explain their specific choices on these issues.


List::Util additions in Perl 5.20

For a while now, I have taken over maintaining List::Util and Scalar::Util, the utility modules that ship with core Perl. After a while of getting used to what's where, I've actually now started adding things to it again; mostly by surveying what's commonly used from some other utility modules, and bringing them in so they can be nicely implemented in XS for efficiency. Now that Perl 5.20 is out, all of these latest updates now ship with core perl.

From List::MoreUtils, I have taken the four shortcutting reduce-like boolean test functions of all, any, none and notall. These are all similar to grep, in that they take a block that evaluates some predicate test on each element of the list. Where they differ from grep, is that grep will count the total number of items in the list that returned true, whereas these four functions will simply indicate what the overall result was; allowing them to short-circuit as soon as the result is determined.

use List::Util 1.33 qw( any );

if( any { $_ == 0 } @numbers ) {
  say "The list of numbers includes zero";

As this module ships as part of the Perl core, it can reliably make use of the C compiler to build it, so most of the functions it contains are implemented in efficient XS code. Specifically these four also use an optimisation technique called MULTICALL, which improves the efficiency of functions of this form, where a given small block of code is repeatedly executed many times, with $_ set differently every time.

Another set of functions copied from elsewhere are the pair* functions taken and extended from List::Pairwise. These are all functions that interpret their list as an even-sized list of pairs, executing the code block with $a and $b set to the first and second value of each pair. This could be used to operate on regular perl hashes (by assigning keys to $a and their associated values to $b), though there is no requirement that it really be a hash. The functions will preserve the order of the pairs, and won't get upset if the "keys" are not plain strings, or not unique. As the result is also returned in a list of pairs, it could be assigned into a hash, or used elsewhere.

use List::Util 1.33 qw( pairgrep pairmap );

# Take a subset of a hash whose keys are ALLCAPS
my %capitals = pairgrep { $a =~ m/^[A-Z]+$/ } %hash

# Rename keys in a hash
my %renamed = pairmap { ($a =~ s/^foo_/bar_/r), $b } %hash

(This latter example also makes use of the perl 5.14 s///r flag, to return the result of the substitution instead of editing in place.)

use List::Util 1.33 qw( pairs );

foreach ( pairs %hash ) {
   # $_ will be a 2-element ARRAY ref
   say "$_->[0] has value $_->[1]";

As of version 1.39 (so a little too late to make it into perl 5.20.0, but still available on CPAN), pairs returns blessed array references that respond to methods called key and value (inspired by DCONWAY's Var::Pairs), as well as being accessible by array indexing.

use List::Util 1.39 qw( pairs );

foreach ( pairs %hash ) {
   say $_->key, " has value ", $_->value;

I have many more ideas for functions that could be added, though some care will need to be taken not to invent experimental ideas; but instead to take inspiration of tried-and-tested from CPAN, as all these have done, to bring into core and standardise existing ideas.

One other thing I have my sights set on is to implement further-optimised versions of at least some of the functions in Scalar::Util and List::Util as custom ops on perl 5.16 onwards. This will give them an even further performance boost, as they won't even be regular XS functions any more, so will completely remove the expensive call-time overhead of the ENTERSUB/LEAVESUB pair.

I should also add, that for a while now I've been a self-employed IT contractor, which has given me a lot more free time to be able to write such things as named above. If anyone is interested in supporting or sponsoring similar work on Perl, contact me by email. I'd be happy to give most reasonable Perl jobs a consideration. For that matter, I also work in C, or other languages, and I've even been known to build small-scale electronics projects.