2010/07/30

Perl - List::UtilsBy

List::UtilsBy is a module containing a number of list utility functions which all take a block of code to control their behaviour. Among its delights are a neat wrapping of sort by a custom accessor, optimisation, and rearrangement functions. The functions in this module are a loose collection of functions I've written or found useful over the past few months or so. I won't give a full overview here, you can read the docs yourselves; but I will give a brief description of a few functions.

One frequent question we often get in #perl on Freenode concerns how to sort a list of items by some property of the items, perhaps the value of an object accessor or the result of some regexp extraction. Sometimes the answer comes in variants on a theme of
@sorted_items = sort { $a->accessor cmp $b->accessor } @items;
@sorted_items = sort { ( $a =~ m/^(\d+)/ )[0] <=> ( $b =~ m/^(\d+)/ )[0] } @items;
Sometimes a mention of the Schwartzian Transform comes.

I decided to take this often-use pattern and find a nicer way to represent it. The result is the sort_by functional.
@sorted_items = sort_by { $_->accessor } @items;
@sorted_items = sort_by { ( $_ =~ m/^(\d+)/ )[0] } @items;
As well as neatness of code, this also has advantage of invoking the accessor only once per item, rather than once per item pair.

An operation I've often wanted to perform is to search a list of items for the best match by some metric. For this, there is max_by and variations.
$longest = max_by { length $_ } @strings;
$closest = min_by { distance_between( $_->location, $here ) } @places;

Finally, as a replacement for the often-used pattern
@array = grep { wanted $_ } @array;
We have
extract_by { not wanted $_ } @array;
As noted in the documentation, this is implemented by spliceing the unwanted elements out, not by assigning a new list, so this is safe to use on lists containing weak references, or tied variables.

2010/07/19

My current Perl project - Circle

Recently chromatic wrote that we should "tell the world, what are you working on with Perl?"

So, to answer this then, my current project is Circle, an IRC client. Actually, it's much more than an IRC client, but that will do as a first approximation.

Rather than being Just Another IRC client, this one is split into two programs; a backend server that runs on some machine somewhere, likely your co-located shell hosting box, or home server. This maintains all the connections to the IRC networks, persists the scrollback and so on; it is the guts of the logic. Then there's the frontend program, a lightweight GTK application that draws the UI for the backend logic. The frontend doesn't really understand IRC, the backend has no knowledge of GTK. Several readers may recognise something of the MVC pattern about this.

Without going into too much detail here (you can read the above link), this gives you the advantages of a real local native-UI client, plus the advantages of a persistent server. The UI interactions are local, no network latency or bandwidth to get in the way of line editing, backbuffer scrolling, window switching, and so on; yet all the data is persisted in the server so you can just disconnect the thin client and reconnect it from anywhere else.

A common way people usually solve this sort of problem is to run irssi in a screen session, and reconnect over SSH. The primary downside of this setup is it requires a low-latency, high-bandwidth connection to the server, as every keypress of the line editor to send your next line, will have to round-trip over that network. Every backbuffer operation, scrolling up and down, or switching between windows, has to redraw over the link. If that link has high latency, or low bandwidth, the user experience will suffer. If the network charges for bandwidth, you will end up paying many times to keep re-sending the same screenful of scrollback as you switch windows. By not having a real presence on the local desktop, irssi-in-screen also cannot take advantage of local desktop features such as notification sounds or highlight popups, nor can it access the local filesystem to perform DCC transfers or similar.

Another solution to remote persistant IRC is to run an IRC proxy server or bouncer, and point a regular IRC client at that. These either don't support backbuffer refills, or save and replay events, possibly by prefixing timestamps in the message text. They suffer many shortcomings by being a hacked-on proxy in front of an existing IRC client, which overall doesn't really support the disconnect/reconnect model. These solutions almost exclusively are also IRC-specific, and cannot integrate non-IRC (such as Instant Messaging) alongside.

Right now this is pretty-much all there is to it, though the design is such that it can accommodate much more. There's also a plain telnet-alike backend module, but it could quite easily accept Instant Messaging, Email, PIM, whatever. Right now, the only frontend is GTK, but nothing says one couldn't also be written for Qt, Windows, or any other GUI toolkit. I'm also slowly in the process of writing a terminal-mode one.

The code is available on CPAN:Patches, they say, are Welcome.

2010/07/01

Good code, bad tests

I've been working on IO::Async::SSL recently. Both the previous upload, and the next one I shall shortly do, are fixes for failing smoke tests. Moreover, they're fixes for failing tests, of correct code.

The first concerns the semantics of the END block. The code looked roughly like
use Test::More
system( 'socat -help' ) == 0 or plan skip_all => "No socat";
plan tests => 3;

open my $kid, "-|", "socat", .... or die;
END { kill TERM => $kid }
The idea of this code was to ensure that socat is no longer running when the test exits. Of course, if the socat program isn't installed, the entire test is skipped. And the END block is still run. The result of this is that since $kid is undefined, the containing perl process is sent SIGTERM instead, and the test harness exits with an error. Ooops.

Now what I actually wanted here was
END { kill TERM => $kid if defined $kid }
I wonder if this situation would warrant a new block-alike, perhaps lets call it ENDX, which is only executed at END time if the line declaring it was actually reached in code. Perhaps it can be hacked up by
my @ends;
sub ENDX(&) {
push @ends, $_[0];
}
END { $_->() for @ends }
...
ENDX { kill TERM => $kid };
The second failure is a curious one that starts off looking like an OS-specific bug in the code. All the Linux smoke boxes were giving fails on a particularly innocent-looking test
is( $client->peeraddr, $server->sockaddr, 'Client socket address' );
Under closer inspection, it seems that the way Linux returns socket addresses from the kernel doesn't initialise the "holes" in the address structure, whereas most other OSes do.

The result of this is that rendered as strings of bytes, the two addresses don't necessarily contain all the same bytes as each other, even though they represent the same address. The way to fix this one is to unpack the addresses, known to be AF_INET addresses, and use is_deeply instead:
use Socket qw( unpack_sockaddr_in );

is_deeply( [ unpack_sockaddr_in $client->peeraddr ],
[ unpack_sockaddr_in $server->sockaddr ],
'Client socket address' );
This doesn't look too neat this way, but perhaps there's some scope for considering a Test::Sockaddr module or somesuch, to neaten it up. Perhaps a little special-purpose though..