2020/12/25

2020 Perl Advent Calendar - Day 25

<< First | < Prev

Bonus Day!

Over this blog post series we have built up to the post on day 24, which explains that all of what we've seen this series is available and working in Perl, right now. It is the Perl we can write in 2020. All of this has been possible because of the custom keyword feature which was first introduced to Perl in version 5.14.

When Perl gained the ability to support custom keywords provided by modules it started down the path that CPAN modules would experiment with new language ideas. Already a number of such modules exist, and it is likely this idea will continue to develop. What new ideas might turn up in the next few years, and will any of them evolve to become parts of the actual core language?

Here's a collection of some thoughts of mine. Some of these can be implemented in CPAN modules, in the same way as the four modules we've already seen this series. Other ideas however go beyond what would be possible via keywords alone, and stray into the realm of ideas that really do need core Perl support.

Match/case Syntax

Perl 5.10 added the smartmatch operator, ~~. I think we can mostly agree it has not been the success many had been hoping for. Its rules are complex and subtle, and there's far too many of them to remember. Furthermore, it still doesn't express the most basic question of whether basic scalar comparisons for equality are performed as string or number tests. For example, is the expression 5 ~~ "5.0" true or false? I honestly don't know and the fact I'd have to look it up in a big table of behavior suggests that the thing has failed to achieve its goal.

Yet still we are left without a useful syntax to express control-flow dispatch based on comparing a given value to several example choices - a task for which many languages use keywords switch and case. I have already written to Perl5 Porters with my thoughts on a design I have nicknamed "dumb match", in response to this. The basic idea of dumb match is to make the programmer write down their choice of operator to be used to compare the given value with the various alternatives.

match($var : eq) {
    case("abc") { ... }
    case("def") { ... }
    
    case("ij", "kl", "mno") { ... }  # any of these will match
}

Here the programmer has specifically requested the eq operator, so we know these are stringy comparisons. Alternatively they could have requested any of

match($var : ==) {
    # numerical comparisons
    case(123) { ... }
    case(456) { ... }
}

match($string : =~) {
    # regexp matches
    case(m/^pattern/)    { ... }
    case(m/morestring$/) { ... }  # only the first match wins
}

match($obj : isa) {
    # object class matches
    case(IO::Handle) { ... }
}

Type Assertions

Various people have in various times written about or designed all sorts of variations on a theme of a "type system" for Perl. I have written reactions to some of those ideas before.

The idea I have in mind here is less a feature in itself, and more a piece of common ground for several of the other ideas, though it may have applications to existing pieces of Perl syntax. Common to several ideas is the need to be able to ask, at runtime, whether a given value satisfies some classification criteria. People often bring up thoughts of assertions like "this is a string" or "this is an integer" at the start of these discussions, but that isn't really within the nature or spirit of what Perl's value system can answer. Instead, I think any workable solution would be written in terms of the existing kinds of comparisons.

Perl 5.32 added the isa operator - a real infix operator that asks if its first operand is an object derived from the class given by its second.

if($arg isa IO::Handle) {
    ...
}

This is certainly one kind of type assertion. I could imagine a new keyword, for the sake of argument for now lets call it is*, which can answer similar yes/no questions on a broader category of criteria. It is likely that the righthand side argument would have to be some sort of expression giving a "type constraint", though exactly what that is I admit I don't have a neat design for currently.

*: Yes, I'm aware this operator choice would interfere probably with Test::More::is. Likely a solution can be found somehow, either by a better naming choice, better parser disambiguation, or a lexical feature guard.

It may be the case that generic type constraints can be constructed with an arbitrary Perl expression to explain how to test if a value meets the constraint:

type PositiveNumber is Numeric where { $_ > 0 };

While in general that would be the most powerful system, it may not lead to a very good performance for several of the other ideas here, so I am still somewhat on the fence about this sort of detail. Because I don't have a firm design on this yet, for the rest of this post I'm just going to give examples using the isa operator instead. But any of the examples or ideas would definitely apply to a more generalised type constraint operator or system, whenever one came to exist.

In any case, once a generic is operator exists for testing type constraints, it feels natural to allow that in match/case syntax too:

match($value : is) {
    case(PositiveNumber) { ... }
    case(NegativeNumber) { ... }
}

In addition it would be wanted in function and method signatures:

method exec($code is Callable)
{
    ...
}

And also object slot variables:

class Caption
{
    has $text is Textual;
    ...
}

Multiple Dispatch

Another idea that comes once you have assertions is the idea of hooking that into function dispatch itself. Some languages give you the ability to define the same-named function multiple times, with different kinds of assertion on its arguments, and at runtime the one that best matches the given arguments will be chosen. There are usually many rules and subtleties to this idea, so it may not ultimately be very suitable for Perl, but if a constraint system did exist then it would be relatively simple to write a CPAN module providing a multi keyword to allow these.

multi sub speak($animal isa Cow) { say "Moo" }

multi sub speak($animal isa Sheep) { say "Baaah" }

Naturally this syntax ought to be implemented in a way that means it still works with method and async as well, allowing us to just as easily

async multi method speak_to($animal isa Goose)
{
    await $self->say("Boo", to => $animal);
}

Signature-like List Assignment

Perl 5.20 introduced signatures, which can be imagined as a neatening up of the familiar syntax of unpacking the @_ list into variables. In some ways the following two functions could be considered identical:

sub add_a
{
    my ($x, $y) = @_;
    return $x + $y;
}

sub add_b($x, $y)
{
    return $x + $y;
}

This does however brush over a few more subtle details of signatures. Firstly, signatures are more strict on the number of values they receive vs. how many they were expecting. While this is a useful feature, it seems odd that Perl now lacks any syntax for performing a list unpack and checking that it has exactly the right number of elements in any situation other than the arguments from function entry.

For that task, I could imagine an operator maybe spelled := which acts exactly the same as a signature on a function:

my ($x, $y) := (1, 2);

my ($x, $y) := (1, 2, 3);  # complains about too many values
my ($x, $y) := (1);        # complains about not enough values

Of course, there's more to signatures than simply counting the elements. Signatures permit a default value to be used if the caller did not specify it; we could allow that too:

my ($one, $two, $three = 3) := (1, 2);

If signatures gain features like type assertions then it seems natural to apply them to the signature-like list assignment operator as well, allowing that to check also:

my ($item isa Item, $group isa Group) := @itemgroup;

If key/value unpacking of named arguments arrives then that too would be useful for unpacking a hash:

my (:$height, :$width) := %params;

Twigils

The slot variables introduced by Object::Pad are written the same as regular lexical variables. I have for a while wished them to be distinct from regular lexicals, so they stand out better visually. The $: syntax can easily be made available, allowing them to be written with that instead:

class Point
{
    has $:x = 0;
    has $:y = 0;
    
    method describe($name) {
        say "Hello $name, this point is at ($:x, $:y)";
    }
}

I accept this is a much more subjective idea than most of the other features. Personally I find it helps to visually distinguish object slots, now that they don't have such notation as $self->{...} to remind you.

True Core Implementations

As earlier mentioned, some of these ideas can be implemented as CPAN modules (those introduced by new keywords), but others (such as the := operator) would require core Perl support. It would also be nice to see some of the more established and stable CPAN keyword modules implemented in core Perl as true syntax as well.

It would be great if, in 2025, we could simply

use v5.40;    # or maybe it will be use v7.x by then

try { ... }
catch ($e) { ... }

class Calculator {
    method add($x, $y) { ... }
}

Having these available to the core language would hopefully mean that a lot more code would more quickly adopt them as features. While these things are all available as CPAN modules, and work even on historic Perl versions as far back as 5.16 from 2012, it seems that some people don't want to make use of such syntax features unless they are provided by the core language itself. Moving the implementation into core may help for other reasons too, such as efficiency of operation, or allowing them to do yet more abilities not available to them while they are third-party modules.

All in all, it's something we can hope for over the next five years...

<< First | < Prev

2020/12/24

2020 Perl Advent Calendar - Day 24

<< First | < Prev

Over the course of this blog post series we have seen a number of syntax-providing modules from CPAN. Each of them sets out to neaten up some specific kind of structure often found in Perl code.

  • Future::AsyncAwait aims to neaten up asynchronous code flow, replacing older techniques like ->then method chaining and helper functions from Future::Utils by replacing them with regular Perl syntax.
  • Syntax::Keyword::Try brings the familiar try/catch pattern for handling exceptions, replacing more manual techniques involving eval {} blocks and inspecting the $@ variable.
  • Object::Pad provides an entire set of syntax keywords for managing classes of objects, allowing stateful object-oriented code to be neatly written without the risk of things like hash key collsions on $self->{...}.

Each one of these allows writing shorter, neater code that has less "machinery noise". With fewer distractions in the code it becomes clearer to see the detail of the specific situation the code is for. With less code to write there's less opportunity to introduce bugs.

Moreover we have seen that these syntax modules can be combined together, used in conjunction to allow even greater benefits. We saw on day 4 that try/catch control flow works within async sub, on day 22 that object methods can be marked as asynchronous with async method, and on day 23 we explored how the dynamically assignment syntax can be combined with objects, asynchronous functions, and even both at the same time.

The various code examples we've seen over the past 22 days or so have been written using these syntax modules, and also make use of Perl's signatures feature, and other things where possible, all to help in this regard. The shorter neatness that comes from not needing to write the line (or two) of code to unpack the function's arguments from @_ (and maybe the $self method invocant as well) removes yet another distraction and potential source of errors.

In summary: This series has been about what it feels like to write Perl code in the year 2020 - it has been about 2020 Perl. This is a language just as flexible and adaptable as Perl has ever been, yet still capable of any of the modern techniques common to other languages, which perhaps even the Perl of five or ten years ago was lacking in - neat function arguments, asynchronous control, exception handling, and syntax for object orientation. With all these new abilities, 2020 has been a great year for writing Perl code.

<< First | < Prev

2020/12/23

2020 Perl Advent Calendar - Day 23

<< First | < Prev | Next >

For today's article, I'd like to take a look at yet another of my syntax-providing CPAN modules, Syntax::Keyword::Dynamically. This provides a single new keyword, dynamically. To quote its documentation:

Syntactically and semantically it is similar to the built-in perl keyword local, but is implemented somewhat differently to give two key advantages over regular local:
  • You can dynamically assign to lvalue functions and accessors.
  • You can dynamically assign to regular lexical variables.

This is important to us when working with Object::Pad because of the way slot variables work. Within a method body a slot looks like a regular lexical variable. This means that Perl's regular local keyword refuses to interact with one. If we want to assign a new value temporarily, only for the duration of one block of code and have it restored automatically afterwards, we must use dynamically instead.

For example, both Syntax::Keyword::Dynamically and Object::Pad contain a copy of a unit test which asserts that their interaction works as expected.:

has $value = 1;
method value { $value }

method test
{
    is $self->value, 1, 'value is 1 initially';

    {
        dynamically $value = 2;
        is $self->value, 2, 'value is 2';
    }

    is $self->value, 1, 'value is 1 finally';
}

If instead we were to try this using core Perl's local it fails to compile:

...
    {
        local $value = 2;
        ...
$ perl -c example.pl
Can't localize lexical variable $value at ...

When a variable is dynamically assigned a new value inside an asynchronous function it has to be swapped back to its original value while that function is suspended, and its new value put back when the function resumes. This may have to happen several times before the function eventually returns. The way that dynamically is implemented means it is supported by Future::AsyncAwait and can detect the times it needs to swap values back and forth.

There is also a unit test which checks this interaction in both Syntax::Keyword::Dynamically and Future::AsyncAwait:

my $var = 1;

async sub with_dynamically
{
    my $f = shift;

    dynamically $var = 2;

    is $var, 2, '$var is 2 before await';
    await $f;
    is $var, 2, '$var is 2 after await';
}

my $f1 = Future->new;
my $fret = with_dynamically( $f1 );

is $var, 1, '$var is 1 while suspended';

$f1->done;
is $var, 1, '$var is 1 after finish';

Given these three modules are now known to be working nicely in each of the three pairwise combinations, you might wonder if all three can be combined at once - can you dynamically change the value of an object slot during an async method? The answer is still yes.

All three of these module distributions contain a copy of a unit test which checks this behaviour:

class Logger {
    has $_level = 1;

    method level { $_level }

    async method verbosely {
        my ( $code ) = @_;
        dynamically $_level = $_level + 1;
        is $self->level, 2, 'level is 2 before code';
        await $code->();
        is $self->level, 2, 'level is 2 after code';
    }
}

my $logger = Logger->new;

my $f1 = Future->new;
my $fret = $logger->verbosely(async sub {
    is $logger->level, 2, 'level is 2 before await';
    await $f1;
    is $logger->level, 2, 'level is 2 after await';
});

is $logger->level, 1, 'level is 1 outside';

$f1->done;

is $logger->level, 1, 'level is 1 finally';

Each of these syntax modules has provided something useful on its own, but as we have seen both yesterday and today they can be combined with each other to provide even more useful behaviours. It is easily possible to create CPAN modules that operate together to extend the Perl language with new syntax and semantics, and have those extensions work and feel every bit as convenient and powerful as all of the native syntax built into the language.

<< First | < Prev | Next >

2020/12/22

2020 Perl Advent Calendar - Day 22

<< First | < Prev | Next >

We started off this advent calendar series looking at the async/await syntax provided by Future::AsyncAwait, and the way that functions can be marked as async. More recently we have been looking at the class and object syntax provided by Object::Pad, such as syntax to provide named methods. Some of you may be wondering whether these two things can be combined; whether methods can be marked as being asynchronous. The answer is yes.

The way that these two modules are implemented means that they can coƶperate on how functions are parsed. The end result is that a method can be declared using the combined keywords async method and it behaves exactly as expected. Namely, that $self and the class's slot variables are available within the code, it returns a future-wrapped value, and permits the await keyword.

For example, back on day 6 we saw an example of await with a //= shortcircuit expression to optionally wait for a read operation to fill a cache on an object, implemented with a $self->{...} key inside async sub. At the time I said that the example was slightly reworded from the original code. That is because in reality, the code is implemented using the combination of async and method:

use Object::Pad;
use Future::AsyncAwait;

class Device::Chip::TSL256x extends Device::Chip;

...

has $_TIMINGbytes;

async method _cached_read_TIMING ()
{
    return $_TIMINGbytes //= await $self->_read(REG_TIMING, 1);
}

In fact, almost every post after that also had some code taken from modules that are implemented using async method. In each case, the real code was in fact shorter and more concise than the posted example because it did not have to start with the my $self = shift; line initially, and could use the shorter slot variables instead of hash key accesses on $self->{...}.

These two syntax modules - either individually or in combination - are able to greatly neaten a lot of common code patterns. To see just how much they provide here is what the method above might have been written if neither syntax module was used:

sub _cached_read_TIMING
{
    my $self = shift;

    return Future->done($self->{TIMINGbytes})
        if defined $self->{TIMINGbytes};
    
    return $self->_read(REG_TIMING, 1)->then(sub {
        ($self->{TIMINGbytes}) = @_;
        return Future->done($self->{TININGbytes});
    });
}

In this version of the code it is far less obvious to see the flow of the logic. The caching behaviour of the TIMINGbytes field is harder to see, hidden by the various machinery of the future return value and ->then chaining. Additionally, the $self->{TIMINGbytes} field is referred to four times here - each one being just a hash key, and thus prone to typoes. Sure there are techniques to help detect such problems with classical Perl hash-based objects (such as locked hashes), but those all detect runtime attempts to actually touch the fields; none of them are able to point out problems at compiletime.

Such an error would be detected at compiletime using an Object::Pad-based slot variable:

has $_TIMINGbytes;

async method _cached_read_TIMING {
    return $_TININGbytes //= await $self->_read(REG_TIMING, 1);
}
$ perl -c example.pl
Global symbol "$_TININGbytes" requires explicit package name
  (did you forget to declare "my $_TININGbytes"?) at ...

By the way, did anyone spot the typo on the long example code above? I didn't, the first time I wrote it... ;)

<< First | < Prev | Next >

2020/12/21

2020 Perl Advent Calendar - Day 21

<< First | < Prev | Next >

So far we've been looking at features of some syntax modules that are relatively well-established - Future::AsyncAwait has a couple of years of production battle-testing against it, and even Object::Pad's basic class features have been found to be quite stable over the past six months or so. For today's article I'd like to take a slightly different direction and take a look at something much newer and still under experimental design.

Some object systems which use inheritance to create derived classes out of base ones (including the base system in Perl itself) support the idea that a given class may have multiple bases. This is called Multiple Inheritance. Iniitally it may sound like a useful feature to have, but in practice trying to support it makes implementations of object systems more complicated, and can lead to situations where the choice of correct behaviour is non-obvious, or in some cases conflicting with what may seem sensible. Situations get especially complicated if the same partial class appears multiple times in the inheritance hierarchy leading up to a given class.

For this reason most modern object systems, including Object::Pad, do not support multiple interitance, to keep behaviours simpler. In order to try to provide the same useful properties (that of being able to share code from multiple component classes), they provide a somewhat different idea, called roles. A role can be considered similar to a partial class which can be merged into a real class. A role can provide methods, BUILD blocks, and slot variables. In many ways a role appears the same as a class, except that instances of it cannot be directly created. To be used as an instance a role must be applied to a class. This has the effect of copying all of the pieces of that role into the target class.

For example, in the Tickit-Widget-Menu distribution there are two different classes of object that can appear in a menu - an individual menu item, or a submenu. In order to avoid code duplication by copying parts of the implementation around both classes, the common behaviours are implemented in a role, by using the role keyword:

use Object::Pad 0.33;

role Tickit::Widget::Menu::itembase;

has $_name;

BUILD (%args)
{
    $_name = $args{name}
}

...

To apply this role to both of the required classes each uses the implements keyword on its class statement to copy the components of that role into the class:

use Object::Pad 0.33;

class Tickit::Widget::Menu:::Item
    implements Tickit::Widget::Menu::itembase;
...

class Tickit::Widget::Menu::base
    implements Tickit::Widget::Menu::itembase;
...

Superficially this might feel like it suffers the same problems as multiple inheritance, but keep in mind that applying a role is basically just a fancy form of copy-pasting the code into the class. There is no runtime lookup of methods or other class items whenever they are accessed. The parts of a role are simply copied individually into the class that applies it. This means that any naming conflicts are detected as errors at compile-time, alerting the programmer to the potential problem:

use Object::Pad 0.33;

role R
{
    method collides() {}
}

class C implements R
{
    method collides() {}
}
$ perl example.pl
Method 'collides' clashes with the one provided by role R at ...

A program will only successfully compile if there are no naming collisions. As a result of this, and because the pieces of the role are simply copied into a class, it means that it does not matter in what order individual roles are applied to a class, nor does it matter if the same role is applied multiple times within the hierarchy (e.g. if both a class and its base class tried to apply the same role). The end result is always the same, presuming no conflicts. This compiletime check, and flexibility on ordering and duplicate application, helps to ensure more robust code.

<< First | < Prev | Next >

2020/12/20

2020 Perl Advent Calendar - Day 20

<< First | < Prev | Next >

We have now seen the way that the has keyword creates a new kind of variable, called a slot variable, where object instances can store their state values. All of the code in yesterday's examples creates variables that begin, like a new my variable, as the undefined value. Often though with an object instance we want to store some other value initially. For this there are two options available.

In simple cases where slot variables of any new object should start off with the same default value we can use an expression on the has statement itself to assign a default value. In these two examples, the slot is initialised from a simple constant.

class Device::Chip::AD9833 extends Device::Chip;

has $_config = 0;
class Tickit::Widget::LinearSplit
    extends Tickit::ContainerWidget;
    
has $_split_fraction = 0.5;

These are compiletime constants, though any form of expression is allowed here. However, note: much like would apply to a my or our variable in the scope of an entire package or class, any expression is evaluated just once at the time the class itself is first created. The resulting value is stored as the default for every new instance. This expression is not evaluated for each new instance individually. Thus it is rare in practice to see anything other than a constant here. For example, using an expression that created some new helper object would mean that all new instances of the containing class will share the same reference to the same helper object - unlikely what was intended.

For more complex situations which require code to be evaluated for every new instance of a class we can use a BUILD block. This provides a block of code which is run as part of the construction process for every individual instance of the class. For example, this BUILD block allows us to create a new mutex helper instance for every instance of the containing class:

class Device::Chip::LEO1306
    extends Device::Chip::Base::RegisteredI2C;

use Future::Mutex;

has $_mutex;

BUILD
{
    $_mutex = Future::Mutex->new;
}

The BUILD block is basic syntax, similar to Perl's own BEGIN block for instance. People familiar with object systems like Moo and Moose especially should take note - a BUILD block is not a method. It does not take the sub or method keyword, and it cannot be called like one.

Whenever a new instance is invoked BUILD block is passed a copy of the argument list given to the constructor. A common task is to set slot variables from those, or perhaps applying defaults if values weren't specified. It is also a common style in Perl for constructor arguments to passed in an even-sized key/value list, so they can be easily unpacked as a hash variable. This makes it simple for BUILD blocks to inspect the named keys they're interested in. Despite not being a true method, a BUILD block still permits a signature to unpack its arguments as if it were one.

class Device::Chip::CC1101 extends Device::Chip;

has $_fosc;
has $_poll_interval;

BUILD (%opts)
{
    $_fosc          = $opts{fosc} // 26E6;
    $_poll_interval = $opts{poll_interval} // 0.05;
}

There is still much ongoing design work here. It turns out in practice that a large majority of the code in BUILD blocks is something like this form - a series of lines, each setting a slot variable from one constructor argument.

There may be value in having Object::Pad provide a convenient way to let each slot variable declaration specify how it should be initialised from name constructor arguments. This would help keep the code less cluttered by the low-level machinery, and allow additional features such as error checking by rejecting unrecognised key names. This would, however, involve Object::Pad specifying that constructor arguments must be in named argument pairs, which it currently does not.

<< First | < Prev | Next >

2020/12/19

2020 Perl Advent Calendar - Day 19

<< First | < Prev | Next >

We have already discussed that the most fundamental property of an object-oriented programming is the idea that a collection of state can be encapsulated into a single piece, and given behaviours that operate on the state. In yesterday's article we saw how to create new classes of object (with the class keyword), and how to add behaviours (with the method keyword). Today we'll take a closer look at the other half of this - how to add state.

While the word "method" seems to be fairly well entrenched, various object systems across various languages have a variety of different words to describe the state values stored for each given instance. The word "field" has been used in Perl before, and refers specifically to the now-obsolete fields pragma. Sometimes programmers refer to "attributes" of an object, but in Perl this is also an overloaded term referring to the :named annotations that can be applied to functions or variables. In Object::Pad the per-instance state is stored in variables called "slots".

Within a class, slots are created by the has keyword. This looks and feels similar to the my and our keywords. It introduces a new variable, optionally initialised with the value of an expression. Whereas a my or our variable is visible to all subsequent code (including nested functions) within its scope, a has variable is only visible within functions declared as method, because it will be associated with individual instances of the object class.

In this example the slot variables storing the label and click behaviour are available within any method:

class Tickit::Widget::Button extends Tickit::Widget;

has $_label;
has $_on_click;

method label { return $_label; }

method set_label
{
    ( $_label) = @_;
    $self->redraw;
}

method on_click { return $_on_click; }

method click
{
    $_on_click->($self);
}

In terms of visibility these slot variables behave much like other kinds of lexical variable - namely, they are not visible from outside the source of this particular class. This means that by default any such state variables are private to the class's implementation, inaccessible by other code that uses the class. We can choose to expose certain parts of it via the class's interface by providing these accessor methods, but we are not required to do so.

It is a common style in Object::Pad-based code to name the slot variables with a leading underscore, as in this example, as it helps them to stand out visually in larger code. It helps remind people that these are slot variables, because they now lack other visual signalling (such as $self->{...}) to otherwise distinguish them.

Another common behaviour is creating simple accessor methods to simply return the value of a slot, thus deciding to expose that particular variable as part of the object's interface, visible to callers. So common in fact that Object::Pad provides a shortcut to create these accessor methods automatically:

class Device::Chip::SSD1306 extends Device::Chip;

has $_rows :reader;
has $_columns :reader;

# now the class has ->rows and ->columns methods visible

The :reader attribute requests that a simple accessor method is created to return the current value of the slot. It is named the same as the slot, with a leading underscore first removed to account for the common naming convention.

One key advantage that these variable-like slots have over classical Perl objects built on hash keys or data provided by accessor methods is that the names are scoped within just the class body that defines them. Names cannot collide with those defined by subclasses. This is even checked by one of Object::Pad's own unit tests, which defines a base class and a subclass from it that both have a slot called $data:

class Base::Class {
    has $data;
    method data { $data }
}
 
class Derived::Class extends Base::Class {
    has $data;
    method data { $data }
}

It then has some tests to check that each of these methods behaves differently. In particular, this provides the guarantee that classes can freely add, delete, or rename their own slot variables without risking breaking other related classes. This leads to more robust class definitions.

<< First | < Prev | Next >

2020/12/18

2020 Perl Advent Calendar - Day 18

<< First | < Prev | Next >

Yesterday we took our first glance at some example code using Object::Pad. Today I'd like to continue with some more in-depth examples showing a few details of the new syntax provided. These will be real examples from actual code on CPAN.

The class keyword introduces a new package that will form a class, much like Perl's existing package keyword. It creates the new package, much as the package statement does, and additionally sets up the various Object::Pad-related machinery to have the new package be a proper class. It also makes the other new keywords available - method and has. As with package it supports setting the $VERSION of the new package by specifying a version number after the name. It also supports several new sub-keywords to further specify details about the class, such as a base class that it is extending (via the extends keyword).

Even though the class keyword acts the same as the package keyword, it isn't currently recognised by parts of CPAN infrastructure, such as the indexer which creates package-to-file indexes. As such, any module uploaded to CPAN still needs to have a package statement as well, to keep these tools happy. It's usual to find them both in combination:

use Object::Pad;

package Tickit::Widget::HBox 0.49;
class Tickit::Widget::HBox extends Tickit::Widget::LinearBox;

...

Like with package the class syntax can be used in either of two forms. It can set the prevailing package name for following declarations if used as a simple statement, or it can take a block of code surrounded by braces, and applies just to the contents of that block. The first form is usually preferred for the toplevel class in a file, with the latter form being seen for internal "helper" classes within a file. For example, the Device::Chip::NoritakeGU_D module contains three small internal helper classes defined using a block

class Device::Chip::NoritakeGU_D::_Iface::UART {
    use constant DEFAULT_BAUDRATE => 38400;

    has $_baudrate;

    ...
}

The class keyword was at least partly designed during the 2019 Perl 5 Hackathon event in Amsterdam, at which there was a similar idea for a module keyword. That has yet to be implemented anywhere, but a common theme to both ideas was that they would imply a more modern set of default pragma settings than default Perl begins with. After a class statement (or inside its block), the strict and warnings pragmas are applied, and on versions of Perl new enough to support it, the signatures feature is turned on and the indirect feature is turned off.

The method keyword adds a new function into the class namespace, much like sub does. The $self invocant parameter is handled automatically within the body of a method, meaning that a parameter signature or @_ unpacking code does not have to handle it specially. The code can totally ignore this and it will work correctly.

Because the signatures feature is automatically enabled on supported Perl versions, it makes method declarations inside classes particularly short and neat. For example, this from Tickit::Widget::Scroller:

method scroll ($delta, %opts)
{
    return unless $delta;
    
    my $window = $self->window;
    @_items or return;
    
    ...
}

Straight away we haven't needed to write the usual two lines of method setup code, of handling the $self variable and then unpacking the other arguments out of @_. As we have already seen with the use of async/await syntax, this method keyword helps reduce a lot of the "noise" of machinery out of the code, and lets us more clearly and easily see the domain-specific details inside it.

<< First | < Prev | Next >

2020/12/17

2020 Perl Advent Calendar - Day 17

<< First | < Prev | Next >

For the past 16 days we've been looking at the subject of asynchronous programming, and how using async/await syntax as provided by the Future::AsyncAwait module leads to code that is much simpler and easier to read, as compared to other ways to achieve similar results. I now want to shift focus entirely, and take a look at an entirely different area - object-oriented programming.

Perl has supported object-oriented programming ever since version 5.000, though people tend to find the built-in mechanisms a little short on features. Over the years various CPAN modules have been created to fill in the missing pieces. Entire articles could be written just listing and comparing them, but Moo and Moose seem to be among the more commonly-used ones. Many of these systems are written in Perl, and thus to use them code has to be written entirely in existing Perl syntax. Even when some object systems end up being implemented in C for efficiency, they still require Perl syntax to operate them. This often leads to non-ideal behaviour.

Consider the most fundamental property of object systems: the idea that a collection of state can be bundled up into a convenient and encapsulated place, and given behaviours (which we call "methods") that can operate on that state. In classical Perl classes, we usually use a hash reference to store the state. Individual named keys can store fields of this state.

package Point;
use feature 'signatures';

sub new($class)
{
    return bless {
        x => 0,
        y => 0,
    }, $class;
}

sub move($self, $dx, $dy)
{
    $self->{x} += $dx;
    $self->{y} += $dy;
}

sub describe($self)
{
    say "A point at ($self->{x}, $self->{y})";
}

Here we have used the keys "x" and "y" inside this blessed hash reference to store state about the object instance. It's accepted convention that code outside of the object class's implementation should not interfere with these. Still, there is no enforcement of this separation, and no automation of the various parts of code that need to be written for basically any class - namely, things like the bless expression, or the $self argument of method functions.

Object systems such as Moo or Moose have popularised the idea of a has statement, at the class level, which attempts to provide some automation around these kinds of object fields. These provide a certain amount of automation of tasks like instance constructors. But they don't add much overall convenience because they are limited to only working within existing Perl syntax, and that restricts the options available for accessing instance data. The usual style is to make internal state accessible via accessor methods.

package Point;
use feature 'signatures';
use Moo;

has "x", is => "rw", default => 0;
has "y", is => "rw", default => 0;

sub move($self, $dx, $dy)
{
    $self->x($self->x + $dx);
    $self->y($self->y + $dy);
}

sub describe($self)
{
    say "A point at (", self->x, ", ", $self->y, ")";
}

This has helped in some ways (e.g. we didn't have to think about providing a constructor this time), but in other ways it feels less of an improvement. Notably, because object fields don't behave any more like regular Perl variables (as hash elements do), they can't be mutated by the convenient += operator in the move method, nor interpolated into a string in the describe method. Moreover, there is nothing about this which separates, or even suggests a difference between, the external interface of method calls that users of this class should call to access it, from the internal interface that these methods use to access the state fields directly. Users of this class are not prevented from, or even discouraged against, calling $point->x on some instance, to either read or even modify a field. This does not encourage data encapsulation.

In an attempt to fix some of these shortcomings, Ovid has been working on a design called Cor. Along with this design I have been working on an implementation of it, as the CPAN module Object::Pad.

The aim of this design is to provide new syntax as real keywords, which is therefore able to do things that none of the previous generation of object systems could do. An important feature is the way that instance data is provided.

use Object::Pad;
class Point;

has $x = 0;
has $y = 0;

method move($dx, $dy)
{
    $x += $dx;
    $y += $dy;
}

method describe()
{
    say "A point at ($x, $y)";
}

This is close to ideal in terms of code size. We have expressed all the behaviours of the previous two examples, but with a minimum of extra "noise" of exposed machinery. We didn't need to provide a constructor method, or think about a bless expression. None of our methods have had to consider a $self - either in the list of arguments provided, nor in using it to access the instance fields. The fields have been directly accessible as if they were lexical variables.

Over the next several posts, we will continue to explore this syntax module in more detail, and see its various features and advantages in more detail.

<< First | < Prev | Next >

2020/12/16

2020 Perl Advent Calendar - Day 16

<< First | < Prev | Next >

Over the past couple of weeks we've seen lots of syntax for managing asynchronous functions using the async and await keywords. While some things have been new (such as awaiting on a future returned by the needs_all or wait_any constructors), much of it has been the same as regular synchronous Perl syntax just with functions declared using async sub and called in await expressions. All of the usual forms of control flow - conditional if blocks, while and foreach loops, and so on - have all been exactly the same.

This contrasts with the original blog post series from seven years ago which explained Futures as they looked before the creation of async/await syntax. That spent many days building up to the day 22 post which had a long list of examples of synchronous vs. future-based control flow, looking different on each side even though they were trying to do fundamentally the same logic. In yesterday's post we also saw some more complex code structures that were necessary to create an HTTP client, when async/await is not available.

Day 23 of the original series then summarized a lot of advantages in terms of new things that can be done, such as the concurrency constructors. These advantages still hold when using async/await, though now we can use the additional neatness of such syntax to make the code even more readable.

The entire series last time culminated in day 24's retrospective overview; parts of which I shall quote here:

Futures allow the control- and data-flow structure of a program to be inherently expressed together, describing the dependency relationships between individual operations.
This makes for convenient control-flow that coincides with data-flow; ensuring that the result of an operation is passed to the next operation in the sequence at the time it is executed.
[This is] in contrast to the split nature of other kinds of concurrency control, such as callback functions or locks and mutexes, which generally only manage the flow of control and require other techniques like lexical variables shared between multiple closures to provide the data flow. Such sharing of mutable state between domains of concurrency is the source of many kinds of concurrency bug which cannot happen with Futures.

When working with async/await syntax, all of these advantages remain because async and await are keywords that inherently deal with futures. What they give, via the basic mechanism of suspending and resuming a function around a pending future, is a way to still use regular Perl syntax to form recognisable code shapes. Asynchronous code written using async/await syntax is thus inherently more comprehensible by Perl programmers, because it follows existing patterns, while still allowing the extra abilities of concurrency and asynchronous behaviour provided by futures.

<< First | < Prev | Next >

2020/12/15

2020 Perl Advent Calendar - Day 15

<< First | < Prev | Next >

Yesterday we took a deeper dive into the insides of one asynchronous IO system, Future::IO, to get an impression of the sorts of things that go into creating it. Before we conclude our 2020 update on the concept of futures, I'd like to do one more deep dive into some real-world application code, to show some more details of the insides of an implementation. Today we'll take a look inside Net::Async::HTTP.

This module has existed on CPAN for a long time. It certainly predates Future::AsyncAwait and the mechanisms to provide parser plugin modules that allow async/await syntax. It even predates the Future module. In fact, the initial version of futures, CPS::Future, was built partly as an experiment to provide Net::Async::HTTP with a way to cancel pending requests. As a result of this history, a desire to keep the module as lightweight as possible in dependencies, and to support as many older Perl versions as possible, it does not use async/await syntax in its implementation. Therefore parts of the code are somewhat more convoluted and harder to read or follow, as would be the case for a cleaner version built on more modern techniques.

Keep that in mind on today's exploration, as much of this will be a tour of what tricks are otherwise required, when async/await is not available. As with yesterday's overview of Future::IO, the point today is not necessarily to follow and understand what is going on, but rather to simply see the amount of work is required and the steps necessary to take when writing large asynchronous systems when async/await is not available.

The inner mechanism of Net::Async::HTTP which makes the whole thing work is actually in an internal class which represents a connection to one HTTP server, named Net::Async::HTTP::Connection. The actual method that starts the request/response cycle is called request. The full version is 150 lines long at current count, but for our purposes the basic outline is as follows.

sub request
{
    ...
    
    my $f = $self->loop->new_future;
    
    push @{$self->{request_queue}}, RequestContext(
        on_read => $self->_mk_on_read_header(...),
        f       => $f,
        ...
    );
    
    ...
    $self->write(join($CRLF, @headers) . $CLRF . $CRLF);
    $self->write($req->content);
    
    return $f;
}

This creates a new future instance, builds a context structure containing that (among other things), pushes this structure to an internal array, arranges for the request itself to be sent, and then returns the future instance to the caller, which propagates up to become the return value of the actual GET call. The context structure also captures many variables and state about the request process, because those will be needed later on when response data starts to arrive. In particular is the on_read field, which stores a code reference for actually handling the response.

The response half of the process is a little more distributed around a few places. Since the connection class is a subclass of IO::Async::Stream it has an on_read event which invokes a method of the same name, which is the start of the control path here. Its job is relatively tiny, in that it mostly just has to invoke the current on_read handler code for the topmost item in the request queue.

sub on_read
{
    my $self = shift;
    my ($buffref, $closed) = @_;

    while(my $head = $self->{request_queue}[0]) {
        my $ret = $head->on_read->($self, $buffref, $closed, $head);
        ...
    }
}

Initially, the on_read handler code is set up from the start of the request for handling a response header. This will be the anonymous sub returned by the _mk_on_read_header method. The code there begins:

sub _mk_on_read_header
{
    my $self = shift;
    ...
    
    return sub {
        my ($self, $buffref, $closed, $ctx) = @_;
        ...
        unless($$buffref =~ s/(.*?$CRLF$CRLF)//s) {
            return 0;
        }
        my $header = HTTP::Response->parse($1);
        my $content_length = $header->content_length;
        ...
        
        if(defined $content_length) {
            $ctx->on_read =
                $self->_mk_on_read_length($content_length, ...);
            return 1;
        }
        
        ...
    };
}

This will continue to read bytes of response until it encounters the double-CRLF sequence that marks the end of the header. At that point it will need to decide what kind of handler is best for the response content - whether it should just read until the filehandle is closed, or a fixed amount of content, or use chunked encoding. This choice is made by inspecting various headers, then calling one of three more _mk_on_read_* methods to generate a new on_read handler to put back into the context structure.

This part needs emphasising again, as it is the important central part of this whole "request context" technique. The anonymous function returned by _mk_on_read_header is only for reading the header part of the response. Once it has determined the appropriate way to read the body content of the response, it creates a new anonymous function for that instead, and stores it in the on_read field of the request context. This piece of mutable data, in effect, stores the running state of this particular request/response cycle. Having stored it there, it just returns back to the main on_read method, which will repeat its operation again, now with the new handler in place to read the response body content.

For example, the handler for reading a fixed length of content (from the Content-Length header) will maintain a count of the number of bytes remaining, counting down to zero:

sub _mk_on_read_length
{
    my $self = shift;
    my ($content_length, ...) = @_;
    
    return sub {
        my ($self, $buffref, $closed, $ctx) = @_;
        
        my $content = $$buffref;
        $$buffref = "";
        
        $content_length -= length $content;
        $response->add_content($content);
        
        if($content_length == 0) {
            $ctx->f->done($response);
        }
        
        return 0;
    };
}

Once the required amount of content has been received, this loop will finally finish. The response object, having finally been constructed and filled with content, is passed to the done method of the future in the request context object. This is the one that the original request method first created at the start of the cycle.

We've now taken a very brief, and very summarized tour of the code responsible for a single request/response cycle in the HTTP client. These code examples have been simplified and elided many details, such as content encoding or handling of streaming responses that don't have to store the full content in memory. Already we have seen a number of layers of code, and encountered places where the running state is stored in explicit data structures, rather than being implied by the position of execution within the code itself.

In summary - there's a lot of complex code here, all loosely connected together by passing around a context object. That context object stores a bunch of state - some of it mutable and changing over time - relating to this particular request/response cycle. The context object takes the place of simply storing the state of the program in normal lexical variables, as would be the case for synchronous code, or asynchronous code when using the async/await syntax. Lacking that syntax, we have had to reconstruct many of its abilities using other, less familiar techniques. If we had been able to use async/await then a lot of this code could be much improved, by moving a lot of the "machinery" parts out from custom explicit implementation and into regular Perl code.

<< First | < Prev | Next >

2020/12/14

2020 Perl Advent Calendar - Day 14

<< First | < Prev | Next >

So far in this series we've seen how to build asynchronous functions by building up smaller components using familiar Perl syntax combined with the async and await keywords, and how to use await at the toplevel of the program in order to cause the whole thing to run. We haven't yet seen the bottom-most layer, how to actually create these base-level components that create pending futures. Lets take a look at some examples of that today.

We'll start with a deeper look at Future::IO, which we have mentioned briefly in some previous posts. We saw how it provides methods named like core IO functions, such as sleep and sysread, which work asynchronously by returning futures. In day 10 we saw some ways to use it, so now we'll take a peek at how it is implemented.

These examples won't be a complete in-depth dive covering all the details; for that you can inspect the code yourself on CPAN, or read various other bits of documentation. This is just an brief trip into a few sections, to get an overview of the general ideas and concepts. It also isn't necessary to really follow or understand these details in order to simply use futures and async/await syntax - these are more details of interest when implementing a base-level system. Don't worry if you don't follow this one, or want to skip over it.

The basic principle is that at the lowest level the event system will provide a subclass of the Future class, which provides some helper methods that the async/await syntax will use to interact with them. The full interface that is required is described by Future::AsyncAwait::Awaitable. Instances of this subclass are then constructed by the basic future-returning functions provided by the event system, and the async/await syntax can then interact with them in the appropriate way.

The job of Future::IO is two-fold. It acts as a common interface for asynchronous code which wants to perform basic IO operations (usually sleeps, or reads and writes on filehandles) asynchronously, and it acts as a central place in which some particular event system can provide an actual implementation of these operations, in terms of its own event loop. This is achieved by the package variable $Future::IO::IMPL which stores the name of a class in which it should find the actual operation methods. Most of the methods in the Future::IO class itself just redirect to wherever that is pointing.

use feature 'signatures';
package Future::IO;

our $IMPL;

sub sleep($class, $secs)
{
    return ($IMPL //= "Future::IO::_DefaultImpl")->sleep($secs);
}

Now if we were to call Future::IO->sleep we'll get a future created either by the event system which set the $IMPL variable, or if none exists yet the Future::IO module itself provides a small default one.

The default implementation is provided by an internal package that stores its state in a few lexical variables by using some Struct::Dumb structures. For example, every sleep future is backed by an entry in a list of alarms, where it stores the epoch timestamp for the time it should expire. This package is also the subclass of Future, instances of which are returned as the actual implementation futures.

use feature 'signatures';
package Future::IO::_DefaultImpl;
use base 'Future';

use Struct::Dumb;
struct Alarm => [qw( time f )];

my @alarms;

sub sleep($class, $secs)
{
    my $time = time() + $secs;

    my $f = $class->new;
    
    my $idx;
    # ... code here to find the right index to insert so
    # that @alarms remains sorted in time order
    
    splice @alarms, $idx, 0, Alarm($time, $f);
    
    return $f;
}

There are similar structures and methods defined for sysread and syswrite, though they are somewhat more complex so we won't go into the details in this brief overview.

In order for this future class to work properly, it has to provide a method called await, which is used by the async/await syntax to implement a toplevel await expression on such an instance. (Do not be confused by the identical name. While the method is involved in the process, there is more work involved in the await expression than simply invoking the method directly.) As is common with most future class implementations, this particular await method repeatedly invokes ticks of a toplevel event system implemented by an internal function called _await_once until the future instance is ready. Thus most of the real work happens in this event system. This is the mechanism by which we can asynchronously peform other work while waiting for this particular instance to have its result.

The full body of the _await_once function is around 80 lines long, but for the case of sleep futures the relevant code looks somewhat like the following. Various other details about filehandle reads and writes have mostly been elided for clarity.

sub _await_once
{
    my $rvec = ... # code to set up filehandle read vectors
    my $wvec = ... # similar for write vectors
    
    my $maxwait;
    $maxwait = $alarms[0]->time - time() if @alarms;
    
    select($rvec, $wvec, undef, $maxwait);
    
    ... # code to perform any read or write 
        # operations that are now possible
    
    my $now = time();
    while(@alarms and $alarms[0]->time <= $now) {
        (shift @alarms)->f->done;
    }
}

sub await($self)
{
    _await_once until $self->is_ready;
    return $self;
}

Here we see that on each tick of the event system we set up some variables relating to filehandle input or output that is also going on, and also inspect the @alarms array to check the next time we need to complete one of these sleep futures. We then call select to wait for some filehandle IO, waiting for a time no longer than when the next alarm will expire. Once the select call returns we check what the current time is, and complete any of the sleep futures as required. This sequence will continue, performing any IO operations that other concurrent futures have requested, until the particular instance we were waiting for has finished. In the meantime, any other futures that complete might go on to cause more to be added to the toplevel arrays, and so next time we invoke _await_once those too will be taken into account.

This default implementation isn't very good, and can't support many situations (especially on MSWin32 where the select system call is nowhere near a useful as on other OSes), but is sufficient for small use-cases and examples. Any larger script or program would be advised to pick a better event system and set that as the implementation instead.

For example, if the program decides to use IO::Async as its main event system, it can load the module Future::IO::Impl::IOAsync which will set this variable and provide a better implementation package.

use Future::IO;
use Future::IO::Impl::IOAsync;

# now we can use Future::IO->sleep, Future::IO->sysread,
# etc... mixed with other IO::Async operations.

The actual implementation methods mostly just defer to the future support already built into IO::Async. For example, the sleep method can simply call the delay_future method on the toplevel event loop, because that already returns a suitable future instance. As this is in the IO::Async::Future class, it already provides its own await method, and so no further work is necessary here.

my $loop;

sub sleep($class, $secs)
{
    $loop //= IO::Async::Loop->new;
    
    return $loop->delay_future(after => $secs);
}

By using Future::IO a module can have simple access to future-returning IO operations that yield futures, and will operate concurrently with each other, and with other parts of the program. A toplevel program or script can pick which actual event system it wants to use in order to implement these futures, and any module using them will then operate correctly against the other activity going on in the same program.

<< First | < Prev | Next >

2020/12/13

2020 Perl Advent Calendar - Day 13

<< First | < Prev | Next >

Yesterday we saw Future::Queue, and how it can help adapt data transfer between push- and pull-based code structures. Another kind of helper object that forms a queue is Future::Buffer. This acts somewhat like a UNIX FIFO or named pipe, supporting a write operation to append more data into one end of it, and various read-like operations that consume data from other end. As with Future::Queue, the methods to consume data return futures, which either complete immediately if the data is already there, or will remain pending until sufficient has been provided.

Where Future::Queue operates on distinct values, a Future::Buffer maintains an continuous string of data. Write operations will concatenate more data onto the end of the buffer, and read operations will consume substrings from the beginning of it. Thus, while there is a one-to-one correspondence between push and shift operations on a queue, it could be the case that one write into a buffer satisfies multiple reads; or that it takes multiple writes to provide enough data for the next read.

There are in fact three different read-like methods, differing in how they specify the amount of data they'll consume. read_atmost is most similar to sysread, in that it will complete with any amount of data up to the size specified. read_exactly demands an exact amount of data, and will remain pending until that is available. read_until takes a regexp pattern or string (for example, a linefeed), and will consume data up to the next match. This makes it simple to create line-based readers that consume asynchronously arriving bytes of input, for example.

use Future::Buffer;

my $buffer = Future::Buffer->new;
...

async sub respond_to_lines
{
    while(1) {
        my $line = await $buffer->read_until("\n");
        say "The line was: $line";
    }
}

To keep the buffer full of data to work on, we can as before use another while loop, concurrently with the first, to obtain bytes of data and write them into the buffer. In this example we use Future::IO to asynchronously read bytes from the STDIN filehandle.

use Future::IO;

async sub keep_buffer_filled
{
    while(1) {
        my $bytes = await Future::IO->sysread(\*STDIN, 4096);
        await $buffer->write($bytes);
    }
}

Similar to yeterday's case with Future::Queue, we can see that a Future::Buffer can act as a transfer mechanism between a push-based provider of data and a pull-based consumer.

This mode of operation is not without its downsides though. If data arrives at a rate faster than it can be consumed then the buffer will grow and consume ever more memory. We can instead opt to run the buffer with a pull-based provider as well, having it request more data from upstream when it needs it. In the example above, we see what turns out to be a common pattern - a while loop which just obtains more data from some source then calls write on the buffer. Rather than working in this manner, we can instead give the buffer object itself an asynchronous callback function for it to call when it needs more data, via the fill parameter.

use Future::Buffer;
use Future::IO;

my $buffer = Future::Buffer->new(
    fill => async sub {
        await Future::IO->sysread(\*STDIN, 4096);
    }
);

async sub respond_to_lines
{
    # same as before
}

Here, we have given an asynchronous callback function that lets the buffer request more data from the standard input stream directly. It will invoke this whenever necessary; i.e. when it has at least one read future outstanding for which it doesn't yet have enough data to complete. This allows it to fetch more data to satisfy readers, without always just pushing more data into the buffer as fast as it arrives. It provides a way to give backpressure further up the data flow, into the standard input stream, and beyond into wherever that data was coming from.

<< First | < Prev | Next >

2020/12/12

2020 Perl Advent Calendar - Day 12

<< First | < Prev | Next >

Yesterday we saw Future::Mutex for constraining the concurrency of certain parts of an asychronous program. It keeps a queue of pending calls to be invoked once it is free. Today we will look at another future-related helper object which maintains a queue - named, appropriately enough - Future::Queue. It works best in a pipeline-like structure with a producer and a consumer. Each side could be represented by a different asynchronous function, with the queue helping to move data between them.

A queue instance acts similarly to a regular Perl array, storing a list of values which may be of any Perl type. Values can be added at one end with the push method, and consumed from the other using the shift method. What makes it different from a regular array is that the shift method returns a future that will eventually yield the next item in the queue. If an item is available it will be taken immediately; if not then the future will remain pending until the next time one is added using the push method.

A typical use of this object is to store a queue of work items to be performed. Each item could be a hash reference perhaps, containing keys that describe the work to be performed and how to return the results. Items would arrive from some part of the program, perhaps a server socket of some kind, so perhaps the hash would include the socket on which to send the response back.

use feature 'signatures';
use Future::Queue;

my $queue = Future::Queue->new;

sub on_received_request($request, $client)
{
    $queue->push({
        request => $request,
        client  => $client,
    });
}

# configure a server socket somehow, to invoke
# on_received_request() at the appropriate time.

An asynchronous function can then be constructed to wait for items from this queue by waiting on the asynchronous shift method to provide another item. Whatever work is required is then performed, and a result sent back to the client that requested it.

use feature 'signatures';
use Future::AsyncAwait;

async sub perform_work($request) { ... }

async sub queue_worker($queue)
{
    while(1) {
        my $work = await $queue->shift;
        
        my $response = await perform_work($work->{request});
        
        await $work->{client}->send_response($response);
    }
}

Being an asynchronous function we can now invoke it in a toplevel await expression to start the while loop, which will continue to process requests until the program is terminated. Since it is an asynchronous function, however, and that it calls other asynchronous functions to perform the actual work items and send their responses bck to the clients, perhaps we can start multiple of them to run concurrently. For example, we could decide to run four copies of it, so we can make use of some concurrency in processing these requests.

...

await Future->needs_all(
    map { queue_worker($queue) } 1 .. 4
);

Using a queue in this manner can be seen as a sortof pattern adapter between the push-based supplier of items (which invokes some code when a new one is available), into a pull-based consumer (which actively asks to be given the next one). Different parts of a large program could be structured in these two different ways, and a Future::Queue makes a convenient conversion point between the two.

<< First | < Prev | Next >

2020/12/11

2020 Perl Advent Calendar - Day 11

<< First | < Prev | Next >

In the past few days we've seen various structures and techniques for achieving concurrency in asynchronous programs. The needs_all and wait_any constructors make it easy to split the flow of control into multiple concurrent pieces, and reconverge afterwards. Sometimes however, this concurrency gets in the way, because there may be situations in which we need to only do one thing at once. Often this comes up when we need to communicate with some external service or device, where having two concurrent actions in flight at once would confuse it.

Now, while we could just choose not to use any asynchronous programming techniques in such a program, that is something of an over-reaction. Perhaps the service or device is just a small part of a larger program and the rest can still operate concurrently, or perhaps we have some sort of user interface to drive as well, and it would be nice to keep that working even while communicating. What we need is some way to limit the concurrency around a small part of the program, while allowing the rest of the logic to run unimpeded. The simplest way of doing this is to use a mutex, provided by Future::Mutex.

This is an object that provides a single asynchronous method, enter, which takes an asynchronous function and runs it at some point in time. If the mutex is free it will be run immediately; if it is busy it be queued to run later when the previous call has finished. When the passed function eventually provides a result, that will become the eventual result of calling the method - i.e. it transparently wraps an asynchronous function.

use Future::AsyncAwait;
use Future::Mutex;

my $mutex = Future::Mutex->new;

my $result = await $mutex->enter(async sub {
    return "the result";
});

say "Returned: $result";

For example, this (somewhat simplified) code example from a hardware device communication module provides an interface onto certain kinds of IO operation. Internally, each operation is performed as a sequence of steps (a "start", some writing and reading, and finally a "stop") which must proceed in that order, uninterrupted by any others. That inner sequence is surrounded by a mutex lock.

use Future::AsyncAwait;
use Future::Mutex;

async sub _i2c_txn($self, $code)
{
    my $mutex = $self->{mutex} //= Future::Mutex->new;
    
    return await $mutex->enter(async sub {
        await $self->start_bit;
        my $ret = await $self->$code;
        await $self->stop_bit;
        return $ret;
    });
}

async sub recv($self, $address, $length)
{
    return await $self->_i2c_txn(async sub {
        await $self->write(chr($address << 1 | 1));
        return await $self->read($length);
    });
}

The _i2c_txn helper method is transparent in terms of return value, because of the way it just returns the result of the mutex enter method. Whatever the inner block of code eventually returns becomes its own eventual return value. This is used by the recv method to return the result of the read call. This is another common style with wrappers around asynchronous code - because a future represents both control flow ("I have finished") and data flow ("The answer is ..."), code can be neatly structured by using them. An operation that completes will necessarily yield its answer at the same time.

You may be looking at this wondering "where is the correspoding leave or similar, to go with the enter?". Well this is one of the great strengths of representing asynchronous operations by having functions that return futures. Calling the function starts an operation, and the future that it returns will complete when the operation finishes. There is no need for the code to explicitly point out that it has now finished because we can observe the returned future and be informed when it does. This avoids an entire class of bug caused by forgetting to release a mutex at the end of an operation.

<< First | < Prev | Next >

2020/12/10

2020 Perl Advent Calendar - Day 10

<< First | < Prev | Next >

In the past couple of days we've seen some techniques for performing multiple concurrent calls to asynchronous functions and waiting for them all to succeed. Sometimes there are situations when we want to make a number of calls and wait for any one of them to complete and take the first result that arrives, rather than waiting for them all.

The original blog series post on day 21 introduced Future->wait_any which also a list of futures and returns one to represent their combination, in a pattern similar to needs_all. Where needs_all needed all of the components to complete, this one will yield the result of whichever is first to finish - either in success or failure. In effect it creates a race between them.

This is most often seen in order to create timeouts around asynchronous functions. As with the needs_any constructor, it is usually seen directly in an await expression. Its arguments, the component futures it waits on, are often the result of directly calling asynchronous functions. Often there are just two arguments - one future to do some "useful" work and yield a successful result, the other to wait until the timeout and then yield a failure. Whichever completes first will be the result of the wait_any and hence what await will see.

For example, this method from a communication protocol sends a byte over a filehandle, then waits for either a single-byte read back from the filehandle, or a timeout.

use Future::AsyncAwait;
use Future::IO;

async sub _break
{
    my $self = shift;
    my $fh = $self->{fh};
    
    $fh->print("\x00");

    await Future->wait_any(
        Future::IO->sysread($fh, 1),
        Future::IO->sleep(0.05)->then_fail("Timed out"),
    );
}

We will introduce the Future::IO module more fully in a later post. For now, consider that it provides some methods named after core IO functions which return futures and allow the operations to take place asynchronously. In this example the sysread future will complete if there is another byte available on the filehandle, and the sleep future will complete after the given delay. The ->then_fail method is a small shortcut on a future instance, for turning a success into a failure. The result here is that if a new byte is soon available to read from the filehandle then it is returned by the first component future, though if none arrives after 50msec the second future yields a failure, which causes the await to throw an exception up to its own caller.

You may be wondering what happens to the sleep call if the byte is read successfully, or to the sysread if the timeout happens. The answer here is that as well as completing with success or failure, pending future instances can also be cancelled. As soon as one of the component futures that wait_any is waiting for is completed, it will cancel all of the other ones. It is up to the creator of the component futures to determine what should happen at this point. The original blog post series at day 19 introduced the on_cancel method, which the implementation can use to set this behaviour.

When using async/await syntax this is mostly handled automatically. While an async sub is suspended waiting for a future to complete, if its own return-value future is cancelled then this cancellation is propagated down to the future it was waiting on.

Cancellation is mostly a feature of interest to the lowest level implementation of asynchronous systems using futures, and doesn't often need special handling in intermediate layers of logic. For situations where the logic does however need to handle it, you can use a CANCEL block. At time of writing this syntax is still considered experimental, but it is designed to feel similar to my other still-experimental idea of adding FINALLY blocks to core Perl syntax.

use Future::AsyncAwait ':experimental(cancel)';

async sub f
{
   CANCEL { warn "This task was cancelled"; }
 
   await ...
}

It turns out in actual practice to be quite rare to need this ability - I had hoped to be able to paste a real example from some real code, but currently there isn't any on CPAN which actually makes use of this. Hopefully there will eventually be enough actual uses of the syntax to be able to judge the experiment, and see whether it should become stable, or still needs work.

<< First | < Prev | Next >

2020/12/09

2020 Perl Advent Calendar - Day 9

<< First | < Prev | Next >

Yesterday we saw some ways to write concurrent asynchronous code which waits on a few different tasks to complete. Sometimes we want to do the same thing multiple times concurrently but with different data each time. Often it's the case that each item can be processed independently of the others, so it makes sense to try to do several at once.

One approach here is to simply await a future to process each individual item inside a regular foreach loop. This will only work on one item at once, so we won't make use of concurrency.

## A poor idea for iterating a list ##
use Future::AsyncAwait;

foreach my $item (@ITEMS) {
    await PROCESS($item);
}

Another idea is to use map to apply an asynchronous function to every item in the list, and thus start all the items at once, then wait on all those futures using needs_all. This may end up being too concurrent - if the list contained many thousand items we might do too many at once and overload whatever external service or system we are talking to.

## Another poor idea ##
use Future::AsyncAwait;

await Future->needs_all(
    map { PROCCESS($_) } @ITEMS
);

A better middle-ground between these two extremes was introduced in the original blog series on day 18, in the form of the Future::Utils::fmap collection of helper functions. The basic idea of fmap is that it is a future-aware equivalent of Perl's map operator. The fmap function is given a block of code which is expected to return a future, and a list of items. It invokes the code block once for each item in the list, collecting up and waiting on the returned future values, until all the items are done. fmap returns a future to represent the entire operation, which will complete with the results of each individual item.

The fmap family actually contains three individual functions, which all operate in the same basic manner. The difference between them is in how they handle return values from each individual item block - fmap_concat can handle an entire list from each item and concatenates all the results together for its overall result, fmap_scalar expects exactly one result per item, and fmap_void does not collect up any results at all; running the code block simply for its side-effects.

Since fmap expects a future-returning function and itself returns a future it is also idea for use with async/await syntax. It can be invoked in an await expression and passed an async sub to operate on.

This somewhat-paraphrased example uses the GET method of a Net::Async::HTTP user agent object to concurrently fetch and JSON-decode a collection of data from multiple API endpoints of some remote service.

use feature 'signatures';
use Future::AsyncAwait;
use Future::Utils qw( fmap_scalar );
use JSON::MaybeUTF8 qw( decode_json_utf8 );

use Net::Async::HTTP;
my $ua = Net::Async::HTTP->new;
...

my @urls = ...

my @data = await fmap_scalar(async sub ($url) {
    my $response = await $ua->GET($url);
    return decode_json_utf8($response->content);
}, foreach => \@urls, concurrent => 16);

A few things should be noted about this example. First is that the async sub syntax is used explicitly to create an asynchronous function to pass as the first argument to the fmap_scalar function. Second is the use of the concurrent parameter, telling the function how many items to keep running concurrently. Finally, the list of items has to be passed in an array reference rather than a plain flat list.

These facts all come about because the fmap functions are just plain Perl functions and not special syntax, as opposed to the real syntax provided by the async and await keywords. Whereas these two keywords were inspired by a whole collection of other languages which have all adopted it as a standard pattern, there is not much existing design on the problem of bounded concurrency map-style syntax. This particular area remains a matter of ongoing design and discussion. Thoughts welcome ;)

<< First | < Prev | Next >

2020/12/08

2020 Perl Advent Calendar - Day 8

<< First | < Prev | Next >

We've now had a good look at a number of situations involving asynchronous code which does one thing at a time. Back on day 3 we noted that an important use-case for asynchronous code is the ability to do multiple actions concurrently and wait for results from all of them at once. We saw a way to achieve that, by starting multiple operations at once by calling multiple asyncronous functions, then using await on each returned future in turn.

This is a sufficiently important and frequent pattern when dealing with asynchronous code and futures, that in the original advent series, day 13 introduced a constructor method, Future->needs_all, which helps this. It takes a list of futures and yields a new future, which will complete only when all of its components have completed successfully; or alternatively, will fail when any one of them has failed. Since this constructor yields a future, when using async/await syntax we can simply await on it in order to suspend until all of the components are ready.

For example, this method from a hardware chip driver needs to write to two distinct control registers in order to change the chip's configuration. It can do this most efficiently by issuing write commands to both of them individually, then waiting for them both to complete, by using the ->needs_all constructor.

use feature 'signatures';
use Future::AsyncAwait;

async sub change_config(%changes)
{
    ...
    await Future->needs_all(
        $self->write_register(REG_CTRL1, $ctrl1),
        $self->write_register(REG_CTRL3, $ctrl3),
    );
}

We can also use this structure to obtain values. When a ->needs_all future completes, it will yield a list of results by concatenating the result lists from each of its individual components. The chip driver makes use of this when reading back the configuration, by issuing two read commands for each of the control registers, and awaiting the result of both together.

use Future::AsyncAwait;

async sub read_config
{
    my ($ctrl1, $ctrl3) = await Future->needs_all(
        $self->read_register(REG_CTRL1),
        $self->read_register(REG_CTRL3),
    );
    ...
}

If a failure occurs in any of the component futures, needs_all will re-throw that failure. In effect, it acts as if we had in fact performed an await expression once on every one of the individual futures. It acts as if we had written

## A less well-written form of the above example ##
use Future::AsyncAwait;

async sub read_config
{
    my $f1 = $self->read_register(REG_CTRL1);
    my $f3 = $self->read_register(REG_CTRL3);
    
    my $ctrl1 = await $f1;
    my $ctrl3 = await $f3;
}

When waiting for more than one future like this it is preferrable to use a structure like needs_all rather than individual await expressions. Having multiple await expressions means that the containing async sub has to be resumed and suspended again each time one of them makes progress, before it finishes them all. Having a single await on the combined future only has to suspend and resume once. Results and errors are still handled just as they would be as if multiple awaits were used.

If there are more than just a few concurrent tasks to perform, there can be even better ways to express this. We will take a look at another approach tomorrow.

<< First | < Prev | Next >

2020/12/07

2020 Perl Advent Calendar - Day 7

<< First | < Prev | Next >

So far in this advent series we've been following the progression from the original posts from 2013. In the next few posts we'll take things in a slightly different order. There will be a few new things to see about async/await syntax, but before we move onto that I want to continue the subject from the past couple of days, and cover the remaining the situations in which asynchronous programming with async/await follows the same shape and style as regular synchronous programming.

The original day 15 post introduced the Future::Utils::repeat helper function, for taking a single block of future-returning code, and calling it repeatedly until some condition is satisfied. The resulting code does behave somewhat like a regular perl while loop, though the syntax notation of it looks nothing like it. Using async/await, conversely, this sort of behaviour can simply be written with a regular while loop. Recalling that an await expression can be placed anywhere inside an async sub, we can simply wait for an intermediate result inside the body of the loop, before continuing.

This example from a module that communicates with a microcontroller needs to wait for the chip to not be busy. It can simply await a read of the current status and look for the busy bit inside a while loop.

use Future::AsyncAwait;

async sub await_nvm_not_busy
{
    my $self = shift;
    
    my $timeout = 50;
    while(--$timeout) {
        (await $self->load_nvm_status) & NVMCTRL_STATUS_FBUSY
            or last;
        
        await Future::IO->sleep(0.01);
    }
}

In fact, we can even put the await expression inside the loop condition itself. A different chip communication module for a radio transmitter uses this to wait for the chip to enter transmit state.

use Future::AsyncAwait;

async sub start_tx
{
    my $self = shift;
    
    await $self->command(CMD_STX);
    1 until (await $self->read_marcstate) eq "TX";
}

In both of these cases, we have to put the await expression in parentheses because of operator precedence, but other than that the logic reads just as short and tidy as it would for synchronous code.

Day 16 of the original series introduced another form of the Future::Utils::repeat helper for iterating over a given list of items. As before, this isn't necessary when using async/await syntax because we can just put an await expression somewhere inside a foreach loop to achieve the same effect.

For example, this somewhat-simplified example from a module that communicates with memory chips uses a regular foreach loop to iterate over a range of addresses to read.

use feature 'signature';
use Future::AsyncAwait;

async sub read_eeprom(%opts)
{
    my $start = $opts{start} // 0;
    my $stop  = $opts{stop} // MAXADDR;
    
    my $bytes = "";
    
    foreach my $addr ($start .. $stop-1) {
        await $self->set_address($addr);
        $bytes .= await $self->read_byte();
    }
    
    return $bytes;
}

Now we have a good overview of what code shapes are identical with async/await synta we'll next take a look at some new abilities we can gain from the ability to run things concurrently. While we first saw a few glimpses of that back on Day 3 this time, there are still a number of new things left to see.

<< First | < Prev | Next >