LeoNerd's programming thoughts

2011/03/29

When failure isn't failure

Lately I have been looking at two different problems, with a common theme.

My first problem concerns Parser::MGC, and its ability to read input lazily as needed, rather than needing to slurp an entire file all at once. This ability is provided by the from_reader method, which takes a CODE reference to a reader function.

As the documentation points out, this is only supported for reading input that's broken across skippable whitespace. This is because it's implemented by calling the reader function to look for more input if the current input buffer is completely exhausted. It cannot work in general, for splitting the input stream arbitrarily, because Perl's regular expression engine does not give sufficient feedback. It is not possible to ask, after a match attempt, whether the engine reached the end of the stream, For example, when looking for a match for m/food/, an input of "fool" definitely fails, whereas an input of "foo" is not yet a failure, because it might be that reading more input from the stream can complete the match. If the regular expression engine gave such feedback, then the reader function could be invoked again to provide more input that may help to resolve the parse.

My second problem concerns how to handle UTF-8 encoded data in nonblocking reads. An IO::Async::Stream object wraps a bytestream, such as a TCP socket or pipe. If the underlying stream contains UTF-8 encoded Unicode text, then the Unicode characters need to be decoded from these bytes, by using the Encode module.

The trouble here is that Encode does not provide a way to do this sanely. It is quite likely that a multibyte UTF-8 sequence gets split across multiple read calls. To cope with such a case, Encode has a mode where it will stop on the first error it encounters (called FB_QUIET), returning the prefix it has decoded so far, and deleting the bytes so consumed from the input. The intention here is that another call supplies more bytes, and it continues from there. Problem is, it returns on any failure, whether that's running out of input bytes or encountering an invalid byte. Without the ability to distinguish these two different conditions, it is impossible to handle nonblocking or stream-based UTF-8 decoding while still having sensible error handling.

The common theme of these two problems is that neither considers the nature of a failure, treating various reasons the same. Both cases have two kinds of failure: one a failure because something has been received that is not correct; the other a failure because something that would be correct has simply not yet been received.

Sometimes, failure is not really failure at all. Sometimes it is simply deferred success that is yet to happen.

2011/03/04

Carp from somewhere else

Carp provides two main functions, carp and croak, as replacements for core Perl's warn and die. The functions from Carp report the file and line number slightly differently from core Perl; walking up the callstack looking for a different package name. The idea being these are used to report errors in values passed in to the function, typically bad arguments from the caller.

These functions use the dynamic callstack (as provided by caller()) at the time the message is warned (or thrown) to create their context. One scenario where this does not lead to sensible results is when the called function is internally implemented using a closure via some other package again.

Having thought about this one a bit, it seems what's required is to split collecting the callstack information, from creating the message. Collect the information in one function call, and pass it into the other.

This would be useful in CPS for example. Because the closures used in CPS::kseq or kpar aren't necessarily invoked during the dynamic scope of the function that lexically contains the code, a call to croak may not be able to infer the calling context. Even if they are, the presence of stack frames in the CPS package would confuse croak's scanning of the callstack. Instead, it would be better to capture the calling context using whence, and pass it into whoops if required for generating a message.

For example, this from IO::Async::Connector:

sub connect
{
   my ( %params ) = @_;
   ...

   my $where = whence;

   kpar(
      sub {
         my ( $k ) = @_;
         if( exists $params{host} and exists $params{service} ) { 
            my $on_resolve_error = $params{on_resolve_error} or whoops $where, "Expected 'on_resolve_error' callback";
            ...
}

These functions would be a fairly simple replacement of carp and croak; capture the callsite information at entry to a function, and pass it to the message warning function.

It does occur to me though, the code will be small and self-contained, and not specific to CPS. I think it ought to live in its own module somewhere - any ideas on a name?

2011/01/22

IPv6 in Perl

A lot of people are talking about IPv6 lately. And perhaps with good reason - it's been years in coming, but finally we're really starting to run out of IPv4 addresses. Tools like the IPv4 Address Report may put a certain amount of panic on things with their realtime Flash-based countdown widget, but the problem is real and does need sorting.

The latest development release of Perl; perl-5.13.9, now has full support for IPv6, and the new address handling functions specified in RFC 2553. The new functions all live in Socket.

As well as the low-level AF_INET6 constant, and the pack_sockaddr_in6 and unpack_sockaddr_in6 structure functions (already in place in 5.13.8 in fact), there is now the full set of getaddrinfo, getnameinfo and associated constants. These allow fully protocol-agnostic handling of connections and addresses. There is now enough in core to support IO::Socket::IP. This is a fully protocol-agnostic replacement of IO::Socket::INET.

use IO::Socket::IP;

my $sock = IO::Socket::IP->new(
   PeerHost    => "www.google.com",
   PeerService => "www",
) or die "Cannot construct socket - $@";

printf "Now connected to %s:%s\n", $sock->peerhost_service;

...

What could be simpler?

Perhaps now is the time to make a case for putting IO::Socket::IP itself in the core distribution, such that when perl-5.14.0 comes out, it will be properly ready for the next 30 years of the Internet?

IO::Socket::IP's API is designed to be a drop-in replacement for the IPv4-only IO::Socket::INET. Any program currently using INET ought to work exactly the same, by simply substituting IP instead, and now will also work on IPv6.

If you maintain a distribution that uses IO::Socket::INET, please try out IO::Socket::IP instead. I'd be very keen to hear from anyone who finds it doesn't JustWork in their situation.

See also my earlier post on the subject; Perl - IO::Socket::IP

2010/12/30

Perl - IO::Async - version 0.34

There's been four releases of IO::Async since I last wrote about version 0.30. Here's a rough summary of the more important changes and additions between then and version 0.34:

New Notifier class IO::Async::Timer::Absolute, to invoke events at a fixed point in the future.

New Notifier class IO::Async::PID, to watch a child process for exit(2).

New Notifier class IO::Async::Protocol::LineStream, to implement stream protocols that use lines of plain text.

New method on IO::Async::Protocol that wraps connect(2) functionallity, allowing for simpler network protocol client modules.

IO::Async::Loop->connect's on_connect_error and IO::Async::Loop->listen's on_listen_error continuations now both receive errno information.

New direct name resolution methods on IO::Async::Resolver for getaddrinfo(3) and getnameinfo(3). The resolver is now directly accessible from the IO::Async::Loop.

IO::Async::Resolver supports deadline timeouts.

IO::Async::Stream->write supports taking a CODE reference to dynamically generate data for the stream on-demand.

IO::Async::Stream->write supports an on_flush callback.

The IO::Async::Loop->new magic constructor now caches the loop. This is useful for wrapping modules, other event system integration, etc..

Documentation has been rearranged to add new EVENTS sections, documenting the events that Notifier classes can fire either as callbacks in coderefs, or as methods on subclasses.

Various bugfixes, other documentation additions

2010/12/24

Event loops and Jenga; or 24 Advent Calendar Events in One Go

There are many event loops systems in Perl. Do they play together?

I was thinking about this recently, at my LPW2010 talk about IO::Async. In the hackathon the following day, I managed to write IO::Async::Loop::POE; a way to run IO::Async atop POE.

So I started thinking further; if you can run one event loop system on top of another, how high can we stack them? Can we build a tower, putting each atop the previous, growing taller. Each new layer we try to add would start to get harder, more difficult, increasing the chances the whole thing came crashing down. Sortof like a Jenga tower.

So what would a Perl event loop Jenga tower look like?

My attempt looks like this: (326 lines, jenga.pl)

The output looks something like this:

$ perl jenga.pl
AnyEvent resolved 127.0.0.1:80
Glib reads Hello world!
POE reads Hello world!
POE resolved 127.0.0.1:80
IO::Async reads Hello world!
AnyEvent reads Hello world!
AnyEvent listener accepted
POE listener accepted
IO::Async resolved 127.0.0.1:80
AnyEvent connected received
POE connected received
IO::Async listener accepted
IO::Async connected received
Glib child exited 0
POE child exited 0
IO::Async child exited 0
AnyEvent child exited 0
Glib timer
POE timer
IO::Async timer
AnyEvent timer
^CIO::Async SIGINT
AnyEvent SIGINT
POE SIGINT
Stopping...

That's 24 events. Count them. It combines Glib, POE, IO::Async and AnyEvent. It performs a basic filehandle read, a child process watch, and a timed wait in each of these four systems. Because Glib lacks signal watching, only the other three perform this. The other three are also used to perform name resolution, socket listening, and socket connecting.

Everyone seems to be doing Advent Calendar blogs this year. 24 daily posts, each showing one small thing. Someone suggested I should write a Perl Event systems advent calendar. So perhaps here, consider this to be one. Except it has 24 windows all in one go.

As it turns out, it's possible to make this tower a little higher. There's a module to run Event beneath Glib; that is, it replaces the core polling function of Glib to use Event instead. And I suspect it may just about be possible to run Tk on Glib, and the POE on Tk.

At some point in the new year, I have some plans to turn this one-program script into a more useful resource of examples and translations. The Rosetta Stone for Unix provides a cross-reference for looking up Unix concepts between different systems. I feel that a similar attempt at Perl event loops could be quite useful too.

2010/12/19

Perl - CPS - version 0.11

CPS is a Perl module that provides several utilities for writing programs in Continuation Passing Style.

In a nutshell, CPS is a method of control flow within a program, which uses values passed around as continuations as a replacement of the usual call/return semantics. In Perl, this is done by passing CODE references; calling a function and passing in a CODE reference to be invoked with the result, rather than having the function return it. While at first this doesn't seem to be very useful, most of its power comes from the observation that the function doesn't need to invoke its continuation immediately; if it performs some IO operation or similar, it can perform this in an asynchronous fashion, invoking its continuation later. This style of coding is often associated with nonblocking or asynchronous event-driven programming. It is typical of such event systems as IO::Async.

A typical problem with implementing CPS control flow, is that all of the usual Perl control-flow mechanisms are built for immediate call/return semantics, where the use of CPS gets in the way. The CPS module provides utility functions for implementing control flow in a continuation passing style, by providing a set of functions that replace the usual Perl control-flow keywords. For example the Perl control structure of a foreach loop

foreach my $frob ( @frobs ) {
   my $wibble = mangle( $frob );
   say "Mangled $frob looks like $wibble";
}
say "All done";

becomes a call to CPS::kforeach

use CPS qw( kforeach );

kforeach( \@frobs, sub {
   my ( $frob, $knext ) = @_;
   kmangle( $frob, sub {
      my ( $wibble ) = @_;
      say "Mangled $frob looks like $wibble";
      $knext->();
   } );
}, sub {
   say "All done";
} );

We haven't really gained anything by doing this though. If the process of mangling a frob involves some IO tasks, perhaps talking to some remote server, then we'll spend most of our time waiting for it to respond when we could be sending multiple requests and waiting on their responses. We could likely save some time by running them concurrently.

I gave a talk at LPW2010 about Continuation Passing Style and CPS, the slides of which are available here.

After discussing CPS and IO::Async at LPW, I was talking with mst about his IPC::Command::Multiplex module. He came up with the idea for another control-flow function; a combination of kpar and kforeach, which I called kpareach. In use it looks exactly like kforeach, except that it starts the loop body for each item all at the same time, in parallel, rather than waiting for each one to finish before invoking the next.

This is a new addition to CPS version 0.11, which is now on CPAN. It is also one of the first new control-flow structures that doesn't have a direct plain-Perl counterpart; a demonstration of the usefulness of CPS in event-driven programming.

2010/11/21

General Updates

I have no big specific updates today. Instead, a list of lots of little things I've been working on:

IO::Socket::IP now has preliminary non-blocking connect support (version 0.05_003). This isn't quite a perfect solution because of blocking name resolvers, but see also Net::LibAsyncNS below.

Created a new CPAN dist, wrapping libasyncns, called Net::LibAsyncNS. This allows a simple way to asynchronise name resolver lookups.

Have fixed a few bugs in IO::KQueue, relating to dodgy handling of Perl scalars in the udata field. Some memory leak bugs still exist, but I believe these to be the kernel's fault. See below.

Spent some time on Freebsd-hackers@ arguing about kqueue and managing user data pointers. Long story short I believe kqueue API itself is missing a feature, making generic wrapping of it impossible from any high-level language, or properly by C libraries also.

Both talks I submitted for LPW2010 were accepted; on the subjects of CPS and IO::Async.

Net::Async::HTTP now has SSL support and can stream response body content as it arrives, rather than waiting for the whole response (version 0.08).

That's all for a quick update, but I may write about any or all of these topics in more detail later...