2010/09/23

Perl - IO::Async - version 0.30

Yesterday, I put the next version of IO::Async on CPAN; version 0.30. This was primarily an update to add some new features, though also a few minor bugfixes and documentation updates were included too. Here I want to focus on a few of these new features.

The first of these new features is nothing groundbreaking in itself, but feeds into the others. It's simply the addition of IO::Async::Socket, a notifier subclass to contain a socket that isn't necessarily a stream (primarily SOCK_DGRAM or SOCK_RAW sockets such as UDP, PF_PACKET or PF_NETLINK). This neatens up a few rough edges with trying to put such sockets directly in IO::Async::Handle objects.

The second main new feature is the creation of the IO::Async::Protocol class, and IO::Async::Protocol::Stream subclass. These derive directly from IO::Async::Notifier rather than IO::Async::Handle, and are intended to be abstract containers of code, and not perform any IO operations directly. Instead, they contain a Handle or Stream object as a child notifier. By exposing an API identical to IO::Async::Stream, the IO::Async::Protocol::Stream should be a drop-in replacement for any modules trying to implement a network protocol.

With the addition of IO::Async::SSL, not every stream-like connection can be represented by IO::Async::Stream, so separating the transport layer from the protocol layer is required. This wasn't possible by subclassing, whereas object containment makes it much simpler.

Net::Async::FTP, Net::Async::HTTP, and Net::Async::IRC have all been updated to use it, and most other use cases should be simple to change.

The final main change is that $loop->connect and IO::Async::Listener now support direct on_stream or on_socket continuations, which will be provided an instance of Stream or Socket directly, rather than requiring the invoked code to wrap one. This can then be easily configured as a IO::Async::Protocol's transport.

Having made this change, it leads the way to transparent SSL support across all protocols, and possibly other concerns like SOCKS proxies, by extending the arguments to $loop->connect or Listener. But that's for another post...

Finally, I should announce that I've now started a channel on irc.perl.org called #ioasync, as the official IRC home for IO::Async. Feel free to drop by if you have any issues, comments, questions,...

2010/09/20

Perl - overload::substr

overload allows an object class to provide methods which Perl should use to implement certain operators, like numerical addition or string concatenation. One operator that overload doesn't allow to be provided, is substr.

overload::substr allows this to be overloaded. This allows objects that behave like a string, to specify to Perl how they will handle the substr operator.
$ cat example.pl 
#!/usr/bin/perl

use strict;
use feature qw( say );

package ExampleString;

use overload::substr;

sub new { return bless [ @_ ]; }

sub _substr
{
my $self = shift;
my ( $offs, $len, $replace ) = @_;

return sprintf ">> %s between %d and %d <<", $self, $offs, $offs+$len;
}

package main;

my $str = ExampleString->new( "Hello, world" );

say substr( $str, 2, 5 );

$ perl example.pl
>> ExampleString=ARRAY(0x86dd9c8) between 2 and 7 <<
The module is still in its early days yet, but the basics appear to be working on all Perl versions back to 5.8. I also want to try extending it, so that split() and regexp matches with m// and substitutions with s/// also use the substr operation. The identity that
$1 == substr( $str, $-[1], $+[1] - $-[1] )
is sure to be useful here.

I need a good example to show it off with sometime. I have in mind a string-alike object with real positional cursors, which remember their contextual position even after edits in other parts of the string. But more on that later...

2010/09/06

Module name suggestions: A proper IO::Socket for IPv4/IPv6 duallity

I currently don't have a good name for a module I'd like to write, because I think it is very much required right now.

We have IO::Socket::INET. It wraps PF_INET, thus making it IPv4 only.

We have IO::Socket::INET6. It wraps either PF_INET or PF_INET6, despite its name. It also uses Socket6, thus restricting it to only working on machines capable of supporting IPv6.

Thus any author wanting to write code to communicate to the internet (apparently that's some new fad everyone's talking about this week) is presented a moral dilema: Support IPv6 at the cost of not working on older v4-only machines, or support older machines but be incapable of using IPv6.

I originally partially solved this problem some years ago by the creation of Socket::GetAddrInfo, a module that presents the interface of RFC2553's getaddrinfo(3) and getnameinfo(3) functions. This however is not enough for actually connecting and using sockets.

I'd therefore like to propose a new IO::Socket subclass that uses these and only these functions, for converting between addresses and name/service pairs.
use IO::Socket::YourNameHere;

my $sock = IO::Socket::YourNameHere->new(
PeerHost => "www.google.com",
PeerService => "www",
);

printf "Now connected to %s:%s\n", $sock->peerhost, $sock->peerservice;

...
Since it would use Socket::GetAddrInfo, it can transparently support IPv4 or IPv6. Since it would only use Socket::GetAddrInfo, it will work in a v4-only mode on machines incapable of supporting IPv6, and will not be restricted to only IPv4 or IPv6 if and when some new addressing family comes along to replace IPv6 one day; as v6 is now trying to do with v4.

In order to provide an easy transition period, I'd also support additional IO::Socket::INET options where they still make sense; e.g. accepting {Local/Peer}Port as a synonym for {Local/Peer}Service. The upshot here ought to be that you can simply
sed -i s/IO::Socket::INET/IO::Socket::YourNameHere/
and suddenly your code will JustWork on IPv6 in a good >90% of cases.

Can anyone suggest me a better module name for this?


Edit 2010/09/07: We seem to be settling on IO::Socket::IP for this currently.


Edit 2010/09/23: We did indeed settle on IO::Socket::IP; this is now up on CPAN, and will be the subject of a future posting...


This cross-posted from module-authors@perl.

2010/08/15

Test to assert object identity

I've just copypasted the following test function into about the fifth different test script:
use Scalar::Util qw( refaddr );
sub identical
{
my ( $got, $expected, $name ) = @_;

my $got_addr = refaddr $got;
my $exp_addr = refaddr $expected;

ok( !defined $got_addr && !defined $exp_addr ||
$got_addr == $exp_addr,
$name ) or
diag( "Expected $got and $expected to refer to the same object" );
}
Rather than continuing to copypaste it around some more, can anyone suggest a standard Test:: module that contains it? Failing that, if it's really honestly the case that nobody has yet felt it necessary to provide one, could someone suggest a suitable module to contain it?

This behaviour cannot be implemented using, say, is( $obj, $expected ) because that will attempt to compare numerical equality, which of course will fail for any object that overloads numberification or numeric comparison operators.

2010/08/10

Perl - Config::XPath - new version 0.16

Config::XPath is a Perl module for accessing configuration files using XPath queries. It provides some wrapping around XML::XPath for convenience in using a single config file, and easily fetching string, list or map values from it. It plays nicely alongside, for example, Module::PluginFinder (about which I shall write more another day) for easily building powerful configuration-driven plugin-based programs.
use Config::XPath;
use Module::PluginFinder;

my $conf = Config::XPath->new( filename => 'foomangler.conf' );
my $finder = Module::PluginFinder->new(
search_path => 'FooMangler::Plugin',
typefunc => 'TYPE',
);

my %plugins;

foreach my $plugin_conf ( $conf->get_sub_list( '/plugin' ) ) {
my $name = $plugin_conf->get_string( '@name' );
my $type = $plugin_conf->get_string( '@type' );

$plugins{$name} = $finder->construct( $type, $plugin_conf );
}
Given a config file that perhaps looks like
<foomangler>
<plugin type="hello" name="hello_world">
<message>Hello, world</message>
</plugin>
</foomangler>
We can implement a plugin for this system quite simply, and have it be automatically discovered by the plugin system, instances created, and passed in its configuration from the config file:
package FooMangler::Plugin::Hello;
use constant TYPE => "hello";

sub new
{
my $class = shift;
my ( $config ) = @_;

my $message = $config->get_string( 'message' );
...
}

As well as providing one-shot reading support, it also has a subclass Config::XPath::Reloadable which allows for convenient reloading of config files. It itself keeps track of which XML nodes it has already seen, based on some defined key attribute, so it can determine additions and deletions. It will invoke callback functions when items are added or deleted, or their underlying config may have changed.
use Config::XPath::Reloadable;

my $conf = Config::XPath::Reloadable->new( filename => 'foomangler.conf' );
my $finder = Module::PluginFinder->new( ... );

$SIG{HUP} = sub { $conf->reload };

my %manglers;

$conf->associate_nodeset( '/mangler', '@name',
add => sub {
my ( $name, $mangler_conf ) = @_;
my $type = $mangler_conf->get_string( '@type' );

$manglers{$name} = $finder->construct( $type, $mangler_conf );
},

keep => sub {
my ( $name, $mangler_conf ) = @_;

$manglers{$name}->reconfigure( $mangler_conf );
},

remove => sub {
my ( $name ) = @_;

delete $manglers{$name};
},
);
Now, whenever a SIGHUP signal is received, the config file is re-read. The configurations for all the current manglers are updated, new ones added, and old ones deleted.

I've just uploaded a new release, 0.16. This release finally gets rid of the awkward Error-based exceptions, instead using plain-old Carp-based string exceptions. This removes a dependency on the old, deprecated, and unsupported Error distribution.

I've also manually set the configure_requires element to set the required version of Module::Build down to 0.2808, which is what Perl 5.10.0 shipped with, rather than let it pick its own version, where it sets it to 0.36. Hopefully this should lead to no awkward "please upgrade Module::Build" on clean-slate installs. If this comes out OK I might start applying that by default across all my dists (where appropriate). It does seem a little awkward, but then I can't really think of a neater way for it to detect that - hard for it to know, for example, about random methods or functionality invoked during the Build.PL file itself, or bugs/features implicitly relied upon. Something to think about for next time, I feel...

2010/07/30

Perl - List::UtilsBy

List::UtilsBy is a module containing a number of list utility functions which all take a block of code to control their behaviour. Among its delights are a neat wrapping of sort by a custom accessor, optimisation, and rearrangement functions. The functions in this module are a loose collection of functions I've written or found useful over the past few months or so. I won't give a full overview here, you can read the docs yourselves; but I will give a brief description of a few functions.

One frequent question we often get in #perl on Freenode concerns how to sort a list of items by some property of the items, perhaps the value of an object accessor or the result of some regexp extraction. Sometimes the answer comes in variants on a theme of
@sorted_items = sort { $a->accessor cmp $b->accessor } @items;
@sorted_items = sort { ( $a =~ m/^(\d+)/ )[0] <=> ( $b =~ m/^(\d+)/ )[0] } @items;
Sometimes a mention of the Schwartzian Transform comes.

I decided to take this often-use pattern and find a nicer way to represent it. The result is the sort_by functional.
@sorted_items = sort_by { $_->accessor } @items;
@sorted_items = sort_by { ( $_ =~ m/^(\d+)/ )[0] } @items;
As well as neatness of code, this also has advantage of invoking the accessor only once per item, rather than once per item pair.

An operation I've often wanted to perform is to search a list of items for the best match by some metric. For this, there is max_by and variations.
$longest = max_by { length $_ } @strings;
$closest = min_by { distance_between( $_->location, $here ) } @places;

Finally, as a replacement for the often-used pattern
@array = grep { wanted $_ } @array;
We have
extract_by { not wanted $_ } @array;
As noted in the documentation, this is implemented by spliceing the unwanted elements out, not by assigning a new list, so this is safe to use on lists containing weak references, or tied variables.

2010/07/19

My current Perl project - Circle

Recently chromatic wrote that we should "tell the world, what are you working on with Perl?"

So, to answer this then, my current project is Circle, an IRC client. Actually, it's much more than an IRC client, but that will do as a first approximation.

Rather than being Just Another IRC client, this one is split into two programs; a backend server that runs on some machine somewhere, likely your co-located shell hosting box, or home server. This maintains all the connections to the IRC networks, persists the scrollback and so on; it is the guts of the logic. Then there's the frontend program, a lightweight GTK application that draws the UI for the backend logic. The frontend doesn't really understand IRC, the backend has no knowledge of GTK. Several readers may recognise something of the MVC pattern about this.

Without going into too much detail here (you can read the above link), this gives you the advantages of a real local native-UI client, plus the advantages of a persistent server. The UI interactions are local, no network latency or bandwidth to get in the way of line editing, backbuffer scrolling, window switching, and so on; yet all the data is persisted in the server so you can just disconnect the thin client and reconnect it from anywhere else.

A common way people usually solve this sort of problem is to run irssi in a screen session, and reconnect over SSH. The primary downside of this setup is it requires a low-latency, high-bandwidth connection to the server, as every keypress of the line editor to send your next line, will have to round-trip over that network. Every backbuffer operation, scrolling up and down, or switching between windows, has to redraw over the link. If that link has high latency, or low bandwidth, the user experience will suffer. If the network charges for bandwidth, you will end up paying many times to keep re-sending the same screenful of scrollback as you switch windows. By not having a real presence on the local desktop, irssi-in-screen also cannot take advantage of local desktop features such as notification sounds or highlight popups, nor can it access the local filesystem to perform DCC transfers or similar.

Another solution to remote persistant IRC is to run an IRC proxy server or bouncer, and point a regular IRC client at that. These either don't support backbuffer refills, or save and replay events, possibly by prefixing timestamps in the message text. They suffer many shortcomings by being a hacked-on proxy in front of an existing IRC client, which overall doesn't really support the disconnect/reconnect model. These solutions almost exclusively are also IRC-specific, and cannot integrate non-IRC (such as Instant Messaging) alongside.

Right now this is pretty-much all there is to it, though the design is such that it can accommodate much more. There's also a plain telnet-alike backend module, but it could quite easily accept Instant Messaging, Email, PIM, whatever. Right now, the only frontend is GTK, but nothing says one couldn't also be written for Qt, Windows, or any other GUI toolkit. I'm also slowly in the process of writing a terminal-mode one.

The code is available on CPAN:Patches, they say, are Welcome.