2019/08/16

async/await in Perl 5 and Dart

Dart allows programmers to write programs in an asynchronous style, in order to achieve higher performance through concurrency. Object types like Future and language features like the async/await syntax mean that asynchronous functions can be written in a natural way that reads similar to straight-line code. The function can be suspended midway through execution until a result arrives that allows it to continue.

Here is an example which shows how you can use these to implement a function that suspends in the middle until it has received a response to the HTTP request we sent. We request a JSON-encoded list of numbers, and return their sum:

  import 'dart:convert' as convert;
  import 'package:http/http.dart' as http;

  Future<int> getSumFromUrl(String url) async {
    var response = await http.get(url);
    var data = convert.jsonDecode(response.body);

    return data['numbers'].reduce((a, b) => a + b);
  }

We can write a similar thing in Perl 5. In Dart the event system is built into the language, whereas in Perl 5 we get to choose our own. Because of this, the example is a little more verbose because it has to specify more of these choices - creating the IO loop and adding the HTTP client to it.

  use Future::AsyncAwait;

  use IO::Async::Loop;
  use Net::Async::HTTP;

  use JSON::MaybeXS 'decode_json';
  use List::Util 'sum';

  my $loop = IO::Async::Loop->new;

  my $http = Net::Async::HTTP->new;
  $loop->add( $http );

  async sub get_sum_from_url($url)
  {
    my $response = await $http->GET( $url );
    my $data = decode_json( $response->content );

    return sum( $data->{numbers}->@* );
  }

The examples in both languages make use of a type of object that wraps up the idea of "an operation that may still be pending" - which both languages call a Future. While minor differences exist between the two languages - such as the methods on them - the overall idea remains the same. Essentially, the value is a placeholder for a result that will come later.

In both languages we see the await keyword, which operates on an expression. The argument to the await keyword is a value of one of these future objects. The await keyword is used to suspend the currently-running function until that result is available. Once the result arrives, the await expression itself yields that deferred result.

Similarly, in both languages the async keyword decorates a function declaration and remarks that it may return its own result asynchronously via one of these futures, and allows that function to make use of the await expression.

This similarity is no coïncidence. The Future::AsyncAwait module which adds the async/await syntax to Perl 5 was designed specifically to look and feel very similar to this feature in several other languages - of which Dart is one.

This async/await syntax makes the code read similarly to how it would look if we were not using futures to make it asynchronous, but instead just using the return values of functions directly. This similarity of notation is the reason why we prefer to use the await syntax if we can, as it helps readability of the code. Compare this syntax with earlier techniques - such as callback functions - where the structure of the code can often look very different.

By providing the same (or at least similar) semantics behind the same kind of notation, each language retains a sense of familiarity to users of other languages. It allows readers to make more sense of the program at first glance because the same sorts of structures with the same sorts of behaviour exist there too. By sharing these ideas, each ecosystem gains the strengths of those ideas it borrows from the other, to the overall benefit of both.

async/await in Perl 5 and C# 5

C# 5 allows programmers to write programs in an asynchronous style, in order to achieve higher performance through concurrency. Object types like Task and language features like the async/await syntax mean that asynchronous functions can be written in a natural way that reads similar to straight-line code. The function can be suspended midway through execution until a result arrives that allows it to continue.

Here is an example which shows how you can use these to implement a function that suspends in the middle until it has received a response to the HTTP request we sent. We request a JSON-encoded list of numbers, and return their sum:

  using System.Collections.Generic;
  using System.Net.Http;
  using System.Web.Script.Serialization;

  class ExampleSchema
  {
    public List<int> numbers { get; set; }
  }

  public async Task<int> getSumFromUrl(string url)
  {
    using (HttpClient client = new HttpClient()) {
      string response = await client.GetStringAsync(url);

      ExampleSchema data = new JavaScriptSerializer()
        .Deserialize<ExampleSchema>(response);

      return data.numbers.Sum();
    }
  }

We can write a similar thing in Perl 5. In C# 5 the event system is built into the language, whereas in Perl 5 we get to choose our own. Because of this, the example is a little more verbose because it has to specify more of these choices - creating the IO loop and adding the HTTP client to it.

  use Future::AsyncAwait;

  use IO::Async::Loop;
  use Net::Async::HTTP;

  use JSON::MaybeXS 'decode_json';
  use List::Util 'sum';

  my $loop = IO::Async::Loop->new;

  my $http = Net::Async::HTTP->new;
  $loop->add( $http );

  async sub get_sum_from_url($url)
  {
    my $response = await $http->GET( $url );
    my $data = decode_json( $response->content );

    return sum( $data->{numbers}->@* );
  }

The examples in both languages make use of a type of object that wraps up the idea of "an operation that may still be pending". In C# 5 that is a value of Task type; in Perl 5 it is a Future. While minor differences exist between the two languages - such as the names of the types or methods on them - the overall idea remains the same. Essentially, the value is a placeholder for a result that will come later.

In both languages we see the await keyword, which operates on an expression. The argument to the await keyword is a value of one of these deferred results - a task or future. The await keyword is used to suspend the currently-running function until that result is available. Once the result arrives, the await expression itself yields that deferred result.

Similarly, in both languages the async keyword decorates a function declaration and remarks that it may return its own result asynchronously via one of these deferred-result values (a Task or Future), and allows that function to make use of the await expression.

This similarity is no coïncidence. The Future::AsyncAwait module which adds the async>/await syntax to Perl 5 was designed specifically to look and feel very similar to this feature in several other languages - of which C# 5 is one.

This async/await syntax makes the code read similarly to how it would look if we were not using tasks or futures to make it asynchronous, but instead just using the return values of functions directly. This similarity of notation is the reason why we prefer to use the await syntax if we can, as it helps readability of the code. Compare this syntax with earlier techniques - such as callback functions - where the structure of the code can often look very different.

By providing the same (or at least similar) semantics behind the same kind of notation, each language retains a sense of familiarity to users of other languages. It allows readers to make more sense of the program at first glance because the same sorts of structures with the same sorts of behaviour exist there too. By sharing these ideas, each ecosystem gains the strengths of those ideas it borrows from the other, to the overall benefit of both.

2019/08/15

async/await in Perl 5 and Python 3

Python 3 allows programmers to write programs in an asynchronous style, in order to achieve higher performance through concurrency. Object types like Future and language features like the async/await syntax mean that asynchronous functions can be written in a natural way that reads similar to straight-line code. The function can be suspended midway through execution until a result arrives that allows it to continue.

Here is an example which shows how you can use these to implement a function that suspends in the middle until it has received a response to the HTTP request we sent. We request a JSON-encoded list of numbers, and return their sum:

  import aiohttp

  async def get_sum_from_url(url):
    async with aiohttp.ClientSession() as session:
      async with session.get(url) as response:
        data = await response.json()
        return sum(data.numbers)

We can write a similar thing in Perl 5. In Python 3 the event system is built into the language, whereas in Perl 5 we get to choose our own. Because of this, the example is a little more verbose because it has to specify more of these choices - creating the IO loop and adding the HTTP client to it.

  use Future::AsyncAwait;

  use IO::Async::Loop;
  use Net::Async::HTTP;

  use JSON::MaybeXS 'decode_json';
  use List::Util 'sum';

  my $loop = IO::Async::Loop->new;

  my $http = Net::Async::HTTP->new;
  $loop->add( $http );

  async sub get_sum_from_url($url)
  {
    my $response = await $http->GET( $url );
    my $data = decode_json( $response->content );

    return sum( $data->{numbers}->@* );
  }

The examples in both languages make use of a type of object that wraps up the idea of "an operation that may still be pending" - which both languages call a Future. While minor differences exist between the two languages - such as the methods on them - the overall idea remains the same. Essentially, the value is a placeholder for a result that will come later.

In both languages we see the await keyword, which operates on an expression. The argument to the await keyword is a value of one of these future objects. The await keyword is used to suspend the currently-running function until that result is available. Once the result arrives, the await expression itself yields that deferred result.

Similarly, in both languages the async keyword decorates a function declaration and remarks that it may return its own result asynchronously via one of these futures, and allows that function to make use of the await expression.

This similarity is no coïncidence. The Future::AsyncAwait module which adds the async/await syntax to Perl 5 was designed specifically to look and feel very similar to this feature in several other languages - of which Python 3 is one.

This async/await syntax makes the code read similarly to how it would look if we were not using futures to make it asynchronous, but instead just using the return values of functions directly. This similarity of notation is the reason why we prefer to use the await syntax if we can, as it helps readability of the code. Compare this syntax with earlier techniques - such as callback functions - where the structure of the code can often look very different.

By providing the same (or at least similar) semantics behind the same kind of notation, each language retains a sense of familiarity to users of other languages. It allows readers to make more sense of the program at first glance because the same sorts of structures with the same sorts of behaviour exist there too. By sharing these ideas, each ecosystem gains the strengths of those ideas it borrows from the other, to the overall benefit of both.

async/await in Perl 5 and ECMAScript 6

ECMAScript 6 allows programmers to write programs in an asynchronous style, in order to achieve higher performance through concurrency. Object types like Promise and language features like the async/await syntax mean that asynchronous functions can be written in a natural way that reads similar to straight-line code. The function can be suspended midway through execution until a result arrives that allows it to continue.

Here is an example which shows how you can use these to implement a function that suspends in the middle until it has received a response to the HTTP request we sent. We request a JSON-encoded list of numbers, and return their sum:

  const fetch = require("node-fetch");

  async function getSumFromUrl(url) {
    const response = await fetch(url);
    const data = await response.json();

    return data.numbers.reduce((a, b) => a + b, 0);
  }

We can write a similar thing in Perl 5. In ECMAScript 6 the event system is built into the language, whereas in Perl 5 we get to choose our own. Because of this, the example is a little more verbose because it has to specify more of these choices - creating the IO loop and adding the HTTP client to it.

  use Future::AsyncAwait;

  use IO::Async::Loop;
  use Net::Async::HTTP;

  use JSON::MaybeXS 'decode_json';
  use List::Util 'sum';

  my $loop = IO::Async::Loop->new;

  my $http = Net::Async::HTTP->new;
  $loop->add( $http );

  async sub get_sum_from_url($url)
  {
    my $response = await $http->GET( $url );
    my $data = decode_json( $response->content );

    return sum( $data->{numbers}->@* );
  }

The examples in both languages make use of a type of object that wraps up the idea of "an operation that may still be pending". In ECMAScript 6 that is a value of Promise type; in Perl 5 it is a Future. While minor differences exist between the two languages - such as the names of the types or methods on them - the overall idea remains the same. Essentially, the value is a placeholder for a result that will come later.

In both languages we see the await keyword, which operates on an expression. The argument to the await keyword is a value of one of these deferred results - a promise or future. The await keyword is used to suspend the currently-running function until that result is available. Once the result arrives, the await expression itself yields that deferred result.

Similarly, in both languages the async keyword decorates a function declaration and remarks that it may return its own result asynchronously via one of these deferred-result values (a Promise or Future), and allows that function to make use of the await expression.

This similarity is no coïncidence. The Future::AsyncAwait module which adds the async/await syntax to Perl 5 was designed specifically to look and feel very similar to this feature in several other languages - of which ECMAScript 6 is one.

This async/await syntax makes the code read similarly to how it would look if we were not using promises or futures to make it asynchronous, but instead just using the return values of functions directly. This similarity of notation is the reason why we prefer to use the await syntax if we can, as it helps readability of the code. Compare this syntax with earlier techniques - such as callback functions - where the structure of the code can often look very different.

By providing the same (or at least similar) semantics behind the same kind of notation, each language retains a sense of familiarity to users of other languages. It allows readers to make more sense of the program at first glance because the same sorts of structures with the same sorts of behaviour exist there too. By sharing these ideas, each ecosystem gains the strengths of those ideas it borrows from the other, to the overall benefit of both.

2019/06/10

Building for new ATtiny 1-series chips on Debian

In 2018, Microchip released a new range of ATtiny microcontroller chips, called the "ATtiny 1-series" - presumably named from the naming pattern of the part numbers. In usual Atmel (now bought by Microchip) style, the first digit(s) of the part number give the size of the flash memory; the remaining give an indication of the size and featureset of the chip.

ATtinyX128 pin package, 5/6 IO pinsATtiny212, ATtiny412
ATtinyX1414 pin package, 11/12 IO pinsATtiny214, ATtiny414, ATtiny814, ATtiny1614
ATtinyX1620 pin package, 17/18 IO pinsATtiny416, ATtiny816, ATtiny1616, ATtiny3216
ATtinyX1724 pin package, 21/22 IO pinsATtiny417, ATtiny817, ATtiny1617, ATtiny3217

I'll write more about these new chips in another post - there's much change from the older style of ATtiny chips you may be familiar with. Many new things added, things improved, as well as a couple of - in my opinion - backward steps.

This post is largely a reminder to myself, and a help to anyone else, on how to build code for these new chips. The trouble is that they're newer than the avr-libc support package in Debian, meaning that you can't actually build code for these yet. Such an attempt will fail:

$ avr-gcc -std=gnu99 -Wall -Os -DF_CPU=20000000 -mmcu=attiny814 -flto -ffunction-sections -fshort-enums -o .build/firmware.elf src/main.c
/usr/lib/avr/include/avr/io.h:625:6: warning: #warning "device type not defined" [-Wcpp]
 #    warning "device type not defined"
      ^
In file included from src/main.c:4:0:
src/main.c: In function ‘RTC_PIT_vect’:
src/main.c:33:5: warning: ‘RTC_PIT_vect’ appears to be a misspelled signal handler, missing __vector prefix [-Wmisspelled-isr]
 ISR(RTC_PIT_vect)
     ^
src/main.c:35:3: error: ‘RTC’ undeclared (first use in this function)
   RTC.PITINTFLAGS = RTC_PI_bm;
   ^
...

This is caused by the fact that, while avr-gcc has support for the chips, the various support files that should be provided by avr-libc are missing. I've reported a Debian bug about this. Until it's fixed, however, it's easy enough to work around by providing the missing files.

Start off by downloading the "Atmel ATtiny Series Device Support" file from http://packs.download.atmel.com/. This is a free and open download, licensed under Apache v2. This file carries the extension atpack but it's actually just a ZIP file:

$ file Atmel.ATtiny_DFP.1.3.229.atpack 
Atmel.ATtiny_DFP.1.3.229.atpack: Zip archive data, at least v1.0 to extract

Note that by default it'll unpack into the working directory, so you'll want to create a temporary folder to work in:

$ mkdir pack

$ cd pack/

$ unzip ~/Atmel.ATtiny_DFP.1.3.229.atpack 
Archive:  /home/leo/Atmel.ATtiny_DFP.1.3.229.atpack
   creating: atdf/
   creating: avrasm/
   creating: avrasm/inc/
...

From here, you can now copy the relevant files out to where avr-gcc will find them:

$ sudo cp include/avr/iotn?*1[2467].h /usr/lib/avr/include/avr/
$ sudo cp gcc/dev/attiny?*1[2467]/avrxmega3/*.{o,a} /usr/lib/avr/lib/avrxmega3/
$ sudo cp gcc/dev/attiny?*1[2467]/avrxmega3/short-calls/*.{o,a} /usr/lib/avr/lib/avrxmega3/short-calls/

Finally, there's one last task that needs doing. Locate the main avr/io.h file (it should live in /usr/lib/avr/include) and add the following lines somewhere within the main block of similar lines. These are needed to redirect from the toplevel #include <avr/io.h> towards the device-specific file.

#elif defined (__AVR_ATtiny212__)
#  include <avr/iotn212.h>
#elif defined (__AVR_ATtiny412__)
#  include <avr/iotn412.h>
#elif defined (__AVR_ATtiny214__)
#  include <avr/iotn214.h>
#elif defined (__AVR_ATtiny414__)
#  include <avr/iotn414.h>
#elif defined (__AVR_ATtiny814__)
#  include <avr/iotn814.h>
#elif defined (__AVR_ATtiny1614__)
#  include <avr/iotn1614.h>
#elif defined (__AVR_ATtiny3214__)
#  include <avr/iotn3214.h>
#elif defined (__AVR_ATtiny416__)
#  include <avr/iotn416.h>
#elif defined (__AVR_ATtiny816__)
#  include <avr/iotn816.h>
#elif defined (__AVR_ATtiny1616__)
#  include <avr/iotn1616.h>
#elif defined (__AVR_ATtiny3216__)
#  include <avr/iotn3216.h>
#elif defined (__AVR_ATtiny417__)
#  include <avr/iotn417.h>
#elif defined (__AVR_ATtiny817__)
#  include <avr/iotn817.h>
#elif defined (__AVR_ATtiny1617__)
#  include <avr/iotn1617.h>
#elif defined (__AVR_ATtiny3217__)
#  include <avr/iotn3217.h>

Having done this we find we can now compile firmware for these new chips:

avr-gcc -std=gnu99 -Wall -Os -DF_CPU=20000000 -mmcu=attiny814 -flto -ffunction-sections -fshort-enums -o .build/firmware.elf src/main.c
avr-size .build/firmware.elf
   text    data     bss     dec     hex filename
   3727      30     105    3862     f16 .build/firmware.elf
avr-objcopy -j .text -j .rodata -j .data -O ihex .build/firmware.elf firmware-flash.hex

Next post I'll write more about my opinions on these chips, highlighting some of the newer features and changes.

2019/04/10

Awaiting The Future

Introduction

Various articles I have previously written have described Futures and their use, such as the Futures Advent Calendar. In this article, I want to present a new syntax module that greatly improves the expressive power and neatness of writing Future-based code. This module is Future::AsyncAwait.

The new syntax provided by this module is based on two keywords, async and await that between them provide a powerful new ability to write code that uses Future objects. The await keyword causes the containing function to pause while it waits for completion of a future, and the async keyword decorates a function definition to allow this to happen. These keywords encapsulate the idea of suspending some running code that is waiting on a future to complete, and resuming it again at some later time once a result is ready.

use Future::AsyncAwait;

async sub get_price {
    my ($product) = @_;

    my $catalog = await get_catalog();

    return $catalog->{$product}->{price};
}

This already reads a little neater than how this might look with a ->then chain:

sub get_price {
    my ($product) = @_;

    return get_catalog()->then(sub {
        my ($catalog) = @_;

        return Future->done($catalog->{$product}->{price});
    });
}

This new syntax makes a much greater impact when we consider code structures like foreach loops:

use Future::AsyncAwait;

async sub send_message {
    my ($message) = @_;

    foreach my $chunk ($message->chunks) {
        await send_chunk($chunk);
    }
}

Previously we'd have had to use Future::Utils::repeat to create the loop:

use Future::Utils qw( repeat );

sub send_message {
    my ($message) = @_;

    repeat {
        my ($chunk) = @_;
        send_chunk($chunk);
    } foreach => [ $message->chunks ];
}

Because the entire function is suspended and resumed again later on, the values of lexical variables are preserved for use later on:

use Future::AsyncAwait;

async sub echo {
    my $message = await receive_message();
    await delay(0.2);
    send_message($message);
}

If instead we were to do this using ->then chaining, we'd find that we either have to hoist a variable out to the main body of the function to store $message, or use a further level of nesting and indentation to make the lexical visible to later code:

sub echo {
    my $message;
    receive_message()->then(sub {
        ($message) = @_;
        delay(0.2);
    })->then(sub {
        send_message($message);
    });
}

# or

sub echo {
    receive_message()->then(sub {
        my ($message) = @_;
        delay(0.2)->then(sub {
            send_message($message);
        });
    });
}

These final examples are each equivalent to the version using async and await above, yet are both much longer, and more full of the lower-level "machinery" of solving the problem, which obscures the logical flow of what the code is trying to achieve.

Comparison With Other Languages

This syntax isn't unique to Perl - a number of other languages have introduced very similar features.

ES6, aka JavaScript:

async function asyncCall() {
  console.log('calling');
  var result = await resolveAfter2Seconds();
  console.log(result);
}

Python 3:

async def main():
    print('hello')
    await asyncio.sleep(1)
    print('world')

C#:

public async Task<int> GetDotNetCountAsync()
{
    var html = await
        _httpClient.GetStringAsync("https://dotnetfoundation.org");

    return Regex.Matches(html, @"\.NET").Count;
}

Dart:

main() async {
  var context = querySelector("canvas").context2D;
  var running = true;    // Set false to stop game.

  while (running) {
    var time = await window.animationFrame;
    context.clearRect(0, 0, 500, 500);
    context.fillRect(time % 450, 20, 50, 50);
  }
}

In fact, much like the recognisable shapes of things like if blocks and while loops, it is starting to look like the async/await syntax is turning into a standard language feature across many languages.

Current State

At the time of writing, this module stands at version 0.22, and has been the result of an intense round of bug-fixing and improvement over the Christmas and New Year break. While it isn't fully production-tested and ready for all uses yet, I have been starting to experiment with using it in a number of less production-critical code paths (such as unit or integration testing, or less widely used CPAN modules) in order to help shake out any further bugs that may arise, and generally evaluate how stable it is becoming.

This version already handles a lot of even non-trivial cases, such as in conjunction with the try/catch syntax provided by Syntax::Keyword::Try:

use Future::AsyncAwait;
use Syntax::Keyword::Try;

async sub copy_data
{
    my ($source, $destination) = @_;

    my @rows = await $source->get_data;

    my $successful = 0;
    my $failed     = 0;

    foreach my $row (@rows) {
        try {
            await $destination->put_row($row);
            $successful++;
        } catch {
            $log->warnf("Unable to handle row ID %s: %s",
                $row->{id}, $@);
            $failed++;
        }
    }

    $log->infof("Copied %d rows successfully, with %d failures",
        $successful, $failed);
}

Known Bugs

As already mentioned, the module is not yet fully production-ready as it is known to have a few issues, and likely there may be more lurking around as yet unknown. As an outline of the current state of stability, and to suggest the size and criticality of the currently-known issues, here are a few of the main ones:

Complex expressions in foreach lose values

(RT 128619)

I haven't been able to isolate a minimal test case yet for this one, but in essence the bug is that given some code which performs

foreach my $value ( (1) x ($len - 1), (0) ) {
    await ...
}

the final 0 value gets lost. The loop executes for $len - 1 times with $value set to 1, but misses the final 0 case.

The current workaround for this issue is to calculate the full set of values for the loop to iterate on into an array variable, and then foreach over the array:
my @values = ( (1) x ($len - 1), (0) );
foreach my $value ( @values ) {
    await ...
}

While an easy workaround, the presence of this bug is nonetheless a little worrying, because it demonstrates the possibility for a silent failure. The code doesn't cause an error message or a crash, it simply produces the wrong result without any warning or other indication that anything went wrong. It is, at time of writing, the only bug of this kind known. Every other bug produces an error message, most likely a crash, either at compile or runtime.

Fails on threaded perl 5.20 and earlier

(RT 124351)

The module works on non-threaded builds of perl from version 5.16 onwards, but only on threaded builds 5.22 onwards. Threaded builds of 5.20 or earlier all fail with a wide variety of runtime errors, and are currently marked as not supported. I could look into this if there was sufficient interest, but right now I don't feel it is a good use of time to support these older perl versions, as compared fixing other issues and making other improvements elsewhere.

Devel::Cover can't see into async subs

(RT 128309)

This one is likely to need fixing within Devel::Cover itself rather than Future::AsyncAwait, as it probably comes from the optree scanning logic there getting confused by the custom LEAVEASYNC ops created by this module. By comparison, Devel::NYTprof can see them perfectly fine, so this suggests the issue shouldn't be too hard to fix.

Next Directions

There are a few missing features or other details that should be addressed at some point soon.

Core perl integration

Currently, the module operates entirely as a third-party CPAN module, without specific support from the Perl core. While the perl5-porters ("p5p") are aware of and generally encourage this work to continue, there is no specific integration at the code level to directly assist. There are two particular details that I would like to see:

  • Better core support for parsing and building the optree fragment relating to the signature part of a sub definition. Currently, async sub definitions cannot make use of function signatures, because the parser is not sufficiently fine-grained to allow it. An interface in core Perl to better support this would allow async subs to take signatures, as regular non-async ones can.

    A mailing list thread has touched on the issue, but so far no detailed plans have emerged.

  • An eventual plan to migrate parts of the suspend and resume logic out of this module and into core. Or at least, some way to try to make it more future-proof. Currently the implementation is very version-dependent and has to inspect and operate on lots of various inner parts of the Perl interpreter. If core Perl could offer a way to suspend and resume a running CV, it would make Future::AsyncAwait a lot simpler and more stable across versions, and would also pave the way for other CPAN modules to provide other syntax or semantics based around this concept, such as coroutines or generators.

local and await

Currently, the suspend logic will get upset about any local variable modifications that are in scope at the time it has to suspend the function; for instance

async sub x {
    my $self = shift;
    local $self->{debug} = 1;
    await $self->do_work();
    # is $self->{debug} restored to 1 here?
}

This is more than just a limit of the implementation, however as it extends to fundamental questions about what the semantic meaning of such code should be. It is hard to draw parallels from any of the other language the async/await syntax was inspired by, because none of these have a construct similar to Perl's local.

Recommendations For Use

Earlier, I stated that Future::AsyncAwait is not fully production-ready yet, on account of a few remaining bugs combined with its general lack of production testing at volume. While it probably shouldn't be used in any business-critical areas at the moment, it can certainly help in many other areas.

Unit tests and developer-side scripts, or things that run less often and are generally supervised when they are, should be good candidates for early adoption. If these do break it won't be critical to business operation, and should be relatively simple to revert to an older version that doesn't use Future::AsyncAwait while a bugfix is found.

The main benefit of beginning adoption is that the syntax provided by this module greatly improves the readability of the surrounding code, to the point that it can itself help reveal other bugs that were underlying in the logic. On this subject, Tom Molesworth writes that:

Simple, readable code is going to be a benefit that may outweigh the potential risks of using newer, less-well-tested modules such as this one.

This advice is similar to my own personal uses of the module, which are currently limited to a small selection of my CPAN modules that relate to various exotic pieces of hardware. Many of the driver modules related to Device::Chip have begun to use it. A list of modules that use Future::AsyncAwait is maintained by metacpan.

I am finding that the overall neatness and expressiveness of using async/await expressions is easy justification against the potential for issues in these areas. As bugs are fixed and the module is found to be increasingly stable and reliable, the boundary can be further pushed back and the module introduced to more places.


This article is adapted from one that was originally written in two parts for the Binary.com internal tech blog - part 1, part 2.

I would also like to thank The Perl Foundation whose grant has enabled me to continue working on this piece of Perl infrastructure.

2018/09/27

Devel::MAT investigation into C - part 3

In the previous part we investigated a memory leak in a Perl program, and found the line of C code in a module responsible for creating the SVs that are leaking. This of course is only half of the problem - the other half is the fact that these SVs aren't getting reclaimed again when they're no longer needed, so to answer that question we'll have to investigate a little more. We'll attempt to go looking around the code at where this SV pointer is stored, and look for other parts of code that would try to reclaim it later on, and see if we can see why they don't.

Taking a look again at the source surrounding the line that is creating these leaking SVs, we see an interesting call to a function called cb_data_advanced_put() which appears to be taking ownership of it.

cb_data_advanced_put(ctx, "tlsext_status_cb!!func", newSVsv(callback));
cb_data_advanced_put(ctx, "tlsext_status_cb!!data", newSVsv(data));
SSL_CTX_set_tlsext_status_cb(ctx, tlsext_status_cb_invoke);

This cb_data_advanced_put() is defined by the XS code itself, storing the data under the named key in a hash associated with the SSL context stored in ctx. The code implementing that looks like this:

int cb_data_advanced_put(void *ptr, const char* data_name, SV* data)
{
    HV * L2HV;
    SV ** svtmp;
    int len;
    char key_name[500];
    dMY_CXT;

    len = my_snprintf(key_name, sizeof(key_name), "ptr_%p", ptr);
    if (len == sizeof(key_name)) return 0; /* error  - key_name too short*/

    /* get or create level-2 hash */
    svtmp = hv_fetch(MY_CXT.global_cb_data, key_name, strlen(key_name), 0);
    if (svtmp == NULL) {
        L2HV = newHV();
        hv_store(MY_CXT.global_cb_data, key_name, strlen(key_name), newRV_noinc((SV*)L2HV), 0);
    }
    else {
        if (!SvOK(*svtmp) || !SvROK(*svtmp)) return 0;
        L2HV = (HV*)MUTABLE_PTR(SvRV(*svtmp));
    }

    /* first delete already stored value */
    hv_delete(L2HV, data_name, strlen(data_name), G_DISCARD);
    if (data!=NULL)
        if (SvOK(data))
            hv_store(L2HV, data_name, strlen(data_name), data, 0);

    return 1;
}

This function takes ownership of the SV pointer given in data by storing it in a hash - an HV. That HV is a second-level hash, itself stored keyed by the given context pointer ptr into a hash maintained by the toplevel context in MY_CXT.global_cb_data. To understand the intended code path to release this, we'll have to find where that gets reclaimed. Inspection of the code that manages this top-level hash, and surrounding functions in the file, soon finds another function which appears to be doing that:

int cb_data_advanced_drop(void *ptr)
{
    int len;
    char key_name[500];
    dMY_CXT;

    len = my_snprintf(key_name, sizeof(key_name), "ptr_%p", ptr);
    if (len == sizeof(key_name)) return 0; /* error  - key_name too short*/

    hv_delete(MY_CXT.global_cb_data, key_name, strlen(key_name), G_DISCARD);
    return 1;
}

The important line here is the call to hv_delete(), which is deleting a key from the shared HV stored in MY_CXT.global_cb_data. When that HV is deleted, Perl should recurse into it and reclaim all of the values stored by it, including the data we put there earlier. But yet it appears not to be - or else we wouldn't observe our memory leak in the first place - so we must dive a little deeper to understand why.

Lets start by dumping the contents of the hash that hv_delete() is discarding, to see if it contains the keys we expect - remember, from the first block of code we found, we're expecting an entry under the key named tlsext_status_cb!!data. We'll do that by adjusting the code and adding a call to the handy debugging function sv_dump() so we can see what's inside it:

SV *tmp = hv_delete(MY_CXT.global_cb_data, key_name, strlen(key_name), 0);
sv_dump(tmp);

When we run this we get a nice detailed snapshot of what was inside the HV at the time it was cleaned up. The full output is quite long, but here is a cut-down version of the most relevant detail:

SV = IV(0x55c769db3358) at 0x55c769db3368
  FLAGS = (TEMP,ROK)
  RV = 0x55c769db2f78
    SV = PVHV(0x55c769dd9280) at 0x55c769db2f78
      KEYS = 1
        Elt "tlsext_status_cb!!func" HASH = 0x567bfc0c

We have a reference to a hash, containing just one key (because KEYS = 1), being the tlsext_status_cb!!func part, but it seems we're missing the related !!data key, which is the leaking one. But this seems unusual - the code path we encountered earlier would always unconditionally insert them both. Either the data key went missing sometime, or never got inserted in the first place.

Perhaps it would help to add some further logging to the cb_data_advanced_put function, having it log what the HV looks like after it's added the key, so we can see if it was never added or goes missing later on. We'll add this to the end of the function:

fprintf(stderr, "After cb_data_advanced_put key=<%s>:\n", data_name);
sv_dump(sv_2mortal(newRV_inc(L2HV)));

(The added complication of calling sv_2mortal on newRV_inc is simply to get around an odd quirk of sv_dump(), that it will dump the contents of a hash given by indirect RV reference, but not by HV pointer directly. We have to make and dump an RV referencing it, just so we can see those keys.)

Running this, we find some more logs from its behaviour:

After cb_data_advanced_put key=:
SV = IV(0x55932b203ec0) at 0x55932b203ed0
  FLAGS = (TEMP,ROK)
  RV = 0x55932b1db110
    SV = PVHV(0x55932b20c4d0) at 0x55932b1db110
      KEYS = 2
        Elt "ssleay_verify_callback!!func" HASH = 0x4101f92c
        Elt "tlsext_status_cb!!func" HASH = 0x462de35f

After cb_data_advanced_put key=:
SV = IV(0x55932b204130) at 0x55932b204140
  FLAGS = (TEMP,ROK)
  RV = 0x55932b1db110
    SV = PVHV(0x55932b20c4d0) at 0x55932b1db110
      KEYS = 2
        Elt "ssleay_verify_callback!!func" HASH = 0x4101f92c
        Elt "tlsext_status_cb!!func" HASH = 0x462de35f

Well this is odd. After adding the func key, it was indeed there (along with another relating to the verify callback, which we can ignore for our purposes), leaving it with KEYS = 2. But after attempting to put the data data in, it still only has those two and no further key was added. It looks like the function didn't attempt to add the key to the HV.

To understand why, a further look at a small section of the code in cb_data_advanced_put() might help here:

if (data!=NULL)
    if (SvOK(data))
        hv_store(L2HV, data_name, strlen(data_name), data, 0);

The hv_store() function is only called if data points to a real SV, but contains some defined value (i.e. SvOK() is true). But recalling the leaking SV candidates we found in part 1 - they were all UNDEF. It seems perhaps we've found the reason. If the data for the tlsext_status_cb is an undefined scalar value, a new SV is allocated for it (by calling newSVsv()) which is never stored in the HV, and thus is never reclaimed when the HV itself is released. Instead, the SV just sits around in memory, unreferenced by anything else, with nothing to clean it up afterwards.

If this is the case, then we can easily fix it by just throwing away the SV immediately if it is undefined. We can change the code to:

if (data!=NULL) {
    if (SvOK(data))
        hv_store(L2HV, data_name, strlen(data_name), data, 0);
    else
        SvREFCNT_dec(data);
}

Applying this to the code, rebuilding, and testing again shows this does seem to fix this bug, as now we no longer get a constantly-increasing SV count:

SVs 49656 (+49656)
SVs 56515 (+6859)
SVs 56509 (-6)
SVs 56509 (+0)
SVs 56509 (+0)
SVs 56508 (-1)
SVs 56508 (+0)

All that remains now is to send it upstream in the form of a bug report - which I have done in RT#127131.