LeoNerd's programming thoughts

CPAN-based Experiments: A Reminder

2026-03-27T16:45:00.003+00:00

I've been adding new features to core perl for a number of years. For the most part, all of the big things I've been adding have been near-copies of existing abilities in existing CPAN modules. As a reminder, this list contains:

perl 5.34:
`try {} catch {}`	from `Syntax::Keyword::Try`
perl 5.36:
builtins	from `Scalar::Util` and friends
`defer {}`	from `Syntax::Keyword::Defer`
booleans	this was actually new
perl 5.38:
`class { }`	from `Object::Pad`
`\|\|=`, `//=` in signatures	from `Sublike::Extended`
perl 5.40:
`field :reader`, `__CLASS__`	from `Object::Pad`
perl 5.42:
`my method`, `field :writer`	from `Object::Pad`
lexical method invoke `->&`	from `Object::Pad::LexicalMethods`
builtin `any`, `all`	from `List::Util` + `List::Keywords`
(upcoming in perl 5.44):
named `:$params` in signatures	from `Sublike::Extended`

I specifically bring attention to the distinguished booleans ability of perl 5.36, because it is the only major addition I believe I've added in recent years that didn't just come from a CPAN experiment. I would have done that on CPAN if I could have, but it involved such deep internal changes to the very fundamental building blocks in perl that I don't think it would have been possible. ^[1].

Everything else was.

At conferences and discussions I enjoy telling the story of the first item on this list. I had spent about 3 years experimenting with Syntax::Keyword::Try as a CPAN module. The nature of the experiment was not "can we get this working?". Getting it working was the easy bit. The hard part was the Perl-visible design of the syntax. We went through a number of iterations there before we ended up with something that looked and felt right, and seemed to gel nicely with other ideas. This was the point at which the idea got copied into core perl. That process involved just copy-pasting the unit tests, rewriting some documentation, and actually reïmplementing the code. The entire time for implementing the feature once it was solidly designed and tested took me about 5 days. That's all it took because we had that design.

When it came time to review the change to put it into core perl, it was easy to make the case that the design of it was right. It had existed for years on CPAN and been battle-tested by lots of users in lots of ways.

This overall principle has two key advantages:

Because every idea has plenty of time (often years) of existing as a CPAN module, we have had time to feel around for any design flaws or usability issues. Remember the disaster that was perl 5.10's smartmatch and given/when? We're still to this day trying to back-pedal out of that one. I'm not going to confidently say that kind of thing wouldn't have happened if given/when was first a CPAN-based experiment, but it feels likely that if more people had tried using it in more ways, at least some of the issues would have been more apparently and the whole idea redesigned or abandoned entirely.
Every one of these existing CPAN experiments works on existing versions of perl; often going back to quite old ones. Even the most complex of my syntax modules, Object::Pad, runs just fine on perl 5.22 which was released over 10 years ago now. This means that people don't have to wait for the next bleadperl or development point release just to try out these new ideas. They can install something from CPAN and use these new things right now in existing code, on existing systems.

These CPAN-based experiments have all worked because of the much faster iteration time on CPAN modules as compared to core perl. We get one major new release of perl every year. Sometimes I've been known to release three versions of a given CPAN module in a single week. That's at least a hundred times faster iteration speed to work on a new idea. I feel quite confident in saying that almost none of the features I listed above would have been possible by now had I been given only core perl to start experimenting in.

We also get a much gentler "adoption curve", if such is a way to phrase it. In core perl, we have "experimental" features, but once something becomes non-experimental, it's declared stable and basically must be supported by perl for all of time. CPAN modules feel like they have much more finer levels than that. Within Object::Pad for instance, there's definitely quite a range from "this is something I thought up the other day so I want to see how it plays out", to "here is a stable supported feature that should be copied into core perl". Ideas can slowly graduate along the scale at their own pace.

As a direct consequence of all of the above, we then have the Feature::Compat:: modules. Because core's try feature was directly and deliberately implemented as compatible with Syntax::Keyword::Try, there is a module called Feature::Compat::Try which simply enables the try feature on a sufficiently new perl, or pulls in the CPAN module for it on older perls. It gives authors of Perl code a smooth upgrade path towards using stable core-supported features. Once a Feature::Compat module exists for a feature you want to use, you can just use that module and not worry whether it is yet supported by the perl version you are running.

I started writing this post because I had a specific point to make about Object::Pad and roles, but I feel this has gone on quite long enough already so I will write that in a follow-up shortly.

Instead, for now I will end by reminding people not to be afraid of the word "experimental". It doesn't mean "we're not sure this thing works"; we know it works. The focus of the experiment is "do we like this?" - and by "we" here I specifically mean the entire Perl community - the core maintainers, the CPAN authors, the end-user developers. We don't want to end up in another smartmatch scenario of having a poorly-designed feature suddenly dropping into core untested but needing to be supported. This gradual experiment path through CPAN allows people to try out these new ideas and give feedback which helps guide their design to something solid and stable, that everyone can be happy about supporting long-term. But this entire mechanism only works if people use these modules and give that feedback.

It's probably no surprise to folks that I dogfood^[2] a lot of my own modules, but in some cases I am the only user of those modules that I know about. I am personally happy to continue designing core perl features just suited to my own personal use, but I suspect many other of you would have your own opinions to provide. Please all, try to remember to try things out. Talk to me about these things, use some of these CPAN experiments where and when you can. Or even if you can't directly use them for some reason, just look at them and talk to me anyway. The more feedback I get from more varied people across more varied use-cases, the more confidence I have in the correctness of these designs, and hence the more likely I am to spend the time to migrate them into the core language.

[1]: I would also have done 5.32's isa operator on CPAN too, but at the time we didn't have the PL_infix_plugin mechanism, which I later added in 5.38 to allow exactly this sort of thing in future. It has been very helpful in designing a number of new operators since then.

[2]: "Eating your own dog food" on Wikipedia

This post also appears on the perl5-porters mailing list.

Building for new ATtiny 2-series chips on Debian

2023-08-31T12:34:00.000+01:00

I have previously written about how to build code for the ATtiny 1-series chips on Debian, outlining what files are missing from Debian in order to allow this. It seems, three years on, the same stuff is still missing - and moreso now that the new 2-series chips are available. Here now, is some more instructions on top of that to get code working for these newer chips as well.

As before, start off by downloading the "Atmel ATtiny Series Device Support" file from http://packs.download.atmel.com/. This is a free and open download, licensed under Apache v2. This file carries the extension atpack but it's actually just a ZIP file.

Note that by default it'll unpack into the working directory, so you'll want to create a temporary folder to work in:

$ mkdir pack

$ cd pack/

$ unzip ~/Atmel.ATtiny_DFP.2.0.368.atpack 
Archive:  /home/leo/Atmel.ATtiny_DFP.2.0.368.atpack
   creating: atdf/
   creating: avrasm/
   creating: avrasm/inc/
...

From here, you can now copy the relevant files out to where avr-gcc will find them:

$ sudo cp include/avr/iotn?*2[467].h \
    /usr/lib/avr/include/avr/
$ sudo cp gcc/dev/attiny?*2[467]/avrxmega3/*.{o,a} \
    /usr/lib/avr/lib/avrxmega3/
$ sudo cp gcc/dev/attiny?*2[467]/avrxmega3/short-calls/*.{o,a} \
    /usr/lib/avr/lib/avrxmega3/short-calls/

Unlike last time, you'll also need the device-specs files for avr-gcc itself to understand the new chips. You'll have to find the exact path on your system where the existing ones are, and then copy the new ones in there:

$ dpkg -S specs-atmega328
gcc-avr: /usr/lib/gcc/avr/5.4.0/device-specs/specs-atmega328

# So it appears to be /usr/lib/gcc/avr/5.4.0/device-specs

$ sudo cp gcc/dev/attiny?*2[467]/device-specs/* \
    /usr/lib/gcc/avr/5.4.0/device-specs/

Finally, there's one last task that needs doing. Locate the main avr/io.h file (it should live in /usr/lib/avr/include) and add the following lines somewhere within the main block of similar lines. These are needed to redirect from the toplevel #include <avr/io.h> towards the device-specific file.

#elif defined (__AVR_ATtiny424__)
#  include <avr/iotn424.h>
#elif defined (__AVR_ATtiny426__)
#  include <avr/iotn426.h>
#elif defined (__AVR_ATtiny427__)
#  include <avr/iotn427.h>
#elif defined (__AVR_ATtiny824__)
#  include <avr/iotn824.h>
#elif defined (__AVR_ATtiny826__)
#  include <avr/iotn826.h>
#elif defined (__AVR_ATtiny827__)
#  include <avr/iotn827.h>
#elif defined (__AVR_ATtiny1624__)
#  include <avr/iotn1624.h>
#elif defined (__AVR_ATtiny1626__)
#  include <avr/iotn1626.h>
#elif defined (__AVR_ATtiny1627__)
#  include <avr/iotn1627.h>
#elif defined (__AVR_ATtiny3224__)
#  include <avr/iotn3224.h>
#elif defined (__AVR_ATtiny3226__)
#  include <avr/iotn3226.h>
#elif defined (__AVR_ATtiny3227__)
#  include <avr/iotn3227.h>

Having done this we find we can now compile firmware for these new chips:

avr-gcc -std=gnu99 -Wall -Os -DF_CPU=20000000 -mmcu=attiny824 -flto -ffunction-sections -fshort-enums -o .build/firmware_t824.elf src/main.c
avr-size .build/firmware_t824.elf
   text    data     bss     dec     hex filename
   3054      24       9    3087     c0f .build/firmware_t824.elf
avr-objcopy -j .text -j .rodata -j .data -O ihex .build/firmware_t824.elf firmware_t824-flash.hex

Keep an eye on the Debian bug #930195, as hopefully one day these steps will no longer be necessary.

A troubling thought - smartmatch reïmagined

2022-06-25T21:08:00.000+01:00

Preface: This is less a concrete idea, and more a rambling set of thoughts that lead me to a somewhat awkward place. I'm writing it out here in the hope that others can lend suggestions and ideas, and see if we can arrive at a better place.

I've been thinking about comparison operators lately - somewhat in the context of my new Syntax::Operator::Elem / Syntax::Operator::In module, somewhat in the context of smartmatch and the planned deprecations thereof, and partly in the context of my new match/case syntax.

Smartmatch Deprecations

For years now, smartmatch has been an annoying thorny design, and recently we've started making moves to get rid of it. In my mind at least, this is because it has a large and complex behaviour that is often unpredictable in advance. There are two distinct reasons for this:

It tries very hard to (recursively) distribute itself across container values on either side; saying that $x ~~ @y is true if any { $x ~~ $_ } @y for example; sometimes in ways that are surprising (e.g. how do you compare an array with a hash?)
It acts unpredictably with mixed strings or numbers; because those concepts are very fluid in perl and aren't well-defined

`match/case` and New Infix Operators

I've lately been writing some new ideas for new infix operators that Perl might want; partly because they're useful on their own but also because they're useful combined with the match/case syntax provided by Syntax::Keyword::Match. Between them all, these are intended as a replacement for the given/when syntax and its troublesome smartmatch. For example, to select an option based on a string comparison you can

match($x : eq) {
  case("abc") { say "It was the string abc" }
  case("def") { say "It was the string def" }
  case($y)    { say "It was whatever string the variable $y gives" }
}

This is much more predictable than given/when and smartmatch, because the programmer declared right upfront that the eq operator is being used here; there's no smartmatch involved.

Initially this feels like a great improvement on given/when and ~~, but it has lots of tricky cornercases to it. For example, the given/when approach can easily handle undef, whereas match/case using only the eq operator cannot distinguish undef from "". For this reason, I invented a new infix operator, called equ (provided by Syntax::Operator::Equ), which can:

say "Equal" if $x equ $y;  # true if they're both undef, or both
                           #   defined and the same string

match($x : equ) {
  # these two cases are now distinct
  case(undef) { say "It was undefined" }
  case("")    { say "It was the empty string" }

  default     { say "It was something else" }
}

Plus of course it also defines a new === operator which performs the numerical equivalent, able to distinguish undef from zero.

Syntax::Operator::Elem

Another operator I felt was required was one that can test if a given string (or number) is present in a list. For that, I wrote Syntax::Operator::Elem:

say "Present" if $x elem @ys;  # stringy

say "Present" if $x ∈ @ys;     # numerical

(Yes, that really is an operator spelled with a non-ASCII Unicode character. No I will not apologise :P)

These operators too have the "oops, undef" problem about them - which lead me briefly to consider adding two more that consider undef/"" or undef/zero to be distinct. Maybe I'd call them elemu and ... er.. well, Unicode doesn't have a variant of the ∈ operator that can suggest undefness. It was about at that point that I stopped, and wondered if really we're going about this whole thing the right way at all.

Smartmatch Reïmagined

I begin to think that if we go right back to the beginning, we might find that a huge chunk of this is unnecessary, if only we can find a better model.

During the 5.35 development series and now released in 5.36, Perl core has two improvements to what some might call its "type system":

Real booleans - true and false are now first-class values distinct from 1 and zero/emptystring.
Tracking of whether defined, nonboolean, nonreferential values began as strings or numbers; even if they have since evolved to effectively be both.

It is now possible to classify any given scalar value into exactly one of the following five categories:

undef

boolean

initially string

initially number

reference

I start to wonder whether, therefore, we have enough basis to create a better version of what the smartmatch operator tried (but ultimately failed) to be. For sake of argument, since I've already used one Unicode symbol I'm going to use another for this new one: The triple-bar identity symbol, ≡.

Lets consider a few properties this ought to have. First off, it should be well-behaved as an equality operator; it should be reflexive, symmetric and transitive. That is, given any values $x, $y and $z, all three of the following must always hold:

$x ≡ $x  is true                     # reflexive
$x ≡ $y  is the same as  $y ≡ $x     # symmetric
if $x ≡ $y and $y ≡ $z then $x ≡ $z  # transitive

Additionally, I don't think it ought to have any sort of distributive properties like $x ~~ @arr has. That sort of distribution should be handled at a higher level. (For example, the proposed caselist syntax of match/case.)

Because it only operates on pairs of scalars, this is already a much simpler kind of operator to think about. Because of the fact we can classify perl scalar values into these neat five categories, we can already write down five simple rules for when both sides are given the same category of scalar:

UNDEF	undef ≡ undef	is true
BOOL	$x ≡ $y	is true if $x and $y are both true, or both false
STR	$x ≡ $y	is true if $x eq $y
NUM	$x ≡ $y	is true if $x == $y
REF	$x ≡ $y	is true if refaddr($x) == refaddr($y)

I'd also like to suggest a rule that given any pair of scalars of different categories, the result is always false. This means in particular, that undef is never ≡ to any defined value (but never warns), that no boolean is ever ≡ to any non-boolean, and no reference is ever ≡ to any non-reference. I don't think anyone would argue with that.

Already this operator feels useful, because of the way it neatly handles undef as distinct from any number or string, we now don't need the equ or === operators.

The one problem I have with this whole model is what do we do with STR ≡ NUM; how do we handle code like the following:

my $x = "10";
say "Equivalent" if $x ≡ 10;

By my first suggestion, this would always be false. While it's predictable and simple, I don't think it's very useful. It would mean that whenever you want to e.g. perform a numerical case comparison on a value taken from @ARGV, you always have to "numify" it by doing some ugly code like:

match(0 + $ARGV[0] : ≡) {
  case(1) { ... }
}

This does not feel very perlish.

So maybe we can find a more useful handling of STR vs NUM. I can already think of several bad ideas:

Pick the category on the righthand side
Superficially this feels beneficial to the match/case syntax, but it soon falls down in a lot of other scenarios. Plus it is blatantly not symmetric, which we already decided any good equality test operator ought to be.
The operator throws an exception
This doesn't feel like the right way to go. Having things like UNDEF, BOOL and REF already neatly just yield false, means that you can safely mix strings/numbers and undef in match/case labels, for example, and all is handled nicely. To have NUM-vs-UNDEF yield false but NUM-vs-STR throw an exception feels like a bad model. Plus it would not be transitive.

About the only sensible model I can think of in this mixed case, is to say that

NUM ≡ STR  is true if both `eq` and `==` would say true

It's reflexive and symmetric. It feels useful. It does (what most people would argue is) the right thing for "10" ≡ 10.

Still, something at the back of my mind feels wrong about this design for an operator. Some situation in which is will be Obviously Terrible, and thus bring the whole tower crashing down. Perhaps it isn't truely transitive - there might be some set of values for which it fails. Offhand I can't think of one, but maybe someone can find an example?

It's a shame, because if we did happen to find an operator like this, then I think combined with match/case syntax it could go a long way towards providing a far better replacement for smartmatch + given/when and additionally solve a lot of other problems in Perl all in one go.

I'm sorry I don't have a more concrete and specific message to say there, other than that I've given (and will continue to give) this a lot of thought, and that I invite comment and ideas from others on how we might further it towards something that can really work in Perl.

Thanks all for listening.

Perl in 2022 - A Yearly Update

2022-01-26T15:34:00.002+00:00

At the end of 2020, I wrote a series of articles on the subject of recent CPAN modules that provide useful syntax, or recent core features added to perl. The series ended with a bonus post looking forward to imagine what new additions might one day appear. I followed this up with a video-based talk at FOSDEM, titled "Perl in 2025", with yet more ideas considering how a Perl might look in a few more years' time.

Over the past twelve months, I have made progress on several of these ideas. Four of them have already become CPAN modules and thus are available for writing in Perl in 2022:

match/case - Now available as Syntax::Keyword::Match.

match($n : ==) {
   case(1) { say "It's one" }
   case(2) { say "It's two" }
   case(3) { say "It's three" }
}

any, all - Now available as syntax-level keywords from List::Keywords.

if( any { $_->size > 100 } @boxes ) {
   say "There are some large boxes here";
}

multi sub - An early experiment in Syntax::Keyword::MultiSub.

multi sub max()          { return undef; }
multi sub max($x)        { return $x; }
multi sub max($x, @more) { my $y = max(@more);
                           return $x > $y ? $x : $y; }

equ, === - Available from Syntax::Operator::Equ, though at present is only usable via Syntax::Keyword::Match or a specially-patched version of perl.

if($x equ $y) {
   say "Both are undef, or defined and equal strings";
}
 
if($i === $j) {
   say "Both are undef, or defined and equal numbers";
}

Of the rest:

in - I have the beginnings of some code but it's not yet on CPAN as it again requires a patched version of perl for pluggable infix operators.
let and is - not started yet.

In addition, not mentioned in the original article, the latest development version of perl has gained:

defer blocks.

{
    say "This happens first";
    defer { say "This happens last"; }
 
    say "And this happens inbetween";
}

finally as part of try/catch.

try {
    say "This happens first";
}
catch ($e) {
    say "Oops, it failed";
}
finally {
    say "This happens last in either case";
}

The builtin:: namespace, providing many new utility functions that ought to have been considered part of the core language - copying utilities from places like Scalar::Util and POSIX, as well as providing some new ones.
```
say "The refaddr of my object is ", builtin::refaddr($obj);

use builtin 'ceil';
say "The next integer above the value is ", ceil($value);
```

Real boolean values. These will be useful in many places, such as data serialisation and cross-language conversion modules.

use builtin qw(true false isbool);

sub serialise($v) {
  return $v ? 'true' : 'false' if isbool $v;
  return qq("$v");
}

say join ",", map { serialise($_) }
    0, 1, false, true, 'true';

Overall I'm happy with progress so far. A lot of things have been started, laying much of the groundwork for more work that can follow. Behind the scenes all of these syntax modules are now using the XS::Parse::Keyword module to do the bulk of their parsing. This is great for getting something powerful written quickly, and has good properties in terms of interoperability between modules - for example, the way the new infix operators already work with the match/case syntax.

Core perl is on-track for a summer release as usual; hopefully that will provide the new defer and finally syntax, builtin functions and boolean values. I hope to have as much success in 2022 as I did in 2021 at writing more of these things, and with any luck I'll be able to write another article like this next year explaining what new progress has been achieved towards the Perl in 2025 goal.

Perl UV binding hits version 2.000

2021-07-30T12:35:00.001+01:00

Over the past few months I've been working on finishing off the libuv Perl binding module, UV. Yesterday I finally got it finished enough to feel like calling it version 2.000. Now's a good time to take a look at it.

libuv itself is a cross-platform event handling library, which focuses on providing nicely portable abstractions for things like TCP sockets, timers, and sub-process management between UNIX, Windows and other platforms. Traditionally things like event-based socket handling have always been difficult to write in a portable way between Windows and other places due to the very different ways things work on Windows as opposed to anywhere else. libuv provides a large number of helpful wrappers to write event-based code in a portable way, freeing the developer from having to care about these things.

A number of languages have nice bindings for libuv, but until recently there wasn't a good one for Perl. My latest project for The Perl Foundation aimed to fix this. The latest release of UV version 2.000 indicates that this is now done.

It's unlikely that most programs would choose to operate directly with UV itself, but rather via some higher-level event system. There are UV adapter modules for IO::Async (IO::Async::Loop::UV), Mojo (Mojo::Reactor::UV), and Future::IO (Future::IO::Impl::UV) at least.

The UV module certainly wraps much of what libuv has to offer, but there are still some parts missing. libuv can watch filesystems for changes of files, and provides asynchronous filesystem access access functions - both of these are currently missing from the Perl binding. Threadpools are an entire concept that doesn't map very well to the Perl language, so they are absent too. Finally, libuv lists an entire category of "miscellaneous functions", most of which are already available independently in Perl, so there seems little point to wrapping those provided by libuv.

Finally, we should take note of one thing that doesn't work - the UV::TCP->open and UV::UDP->open functions when running on Windows. The upshot here is that you cannot create TCP or UDP sockets in your application independently of libuv and then hand them over to be handled by the library; this is not permitted. This is because on Windows, there are fundamentally two different kinds of sockets that require two different sets of API to access them - ones using WSA_FLAG_OVERLAPPED, and ones not. libuv needs that flag in order to perform event-based IO on sockets, and so it won't work with sockets created without it - which is the usual kind that most other modules, and perl itself, will create. This means that on Windows, the only sockets you can use with the UV module are ones created by UV itself - such as by asking it to connect out to servers, or listen and accept incoming connections. Fortunately, this is sufficient for the vast majority of applications.

I would like to finish up by saying thanks to The Perl Foundation for funding me to complete this project.

Writing a Perl Core Feature - part 11: Core modules

2021-02-26T13:00:00.082+00:00

Index | < Prev

Our new feature is now implemented, tested, and documented. There's just one last thing we need to do - update the bundled modules that come with core. Specifically, because we've added some new syntax, we need to update B::Deparse to be able to deparse it.

When the isa operator was added, the deparse module needed to be informed about the new OP_ISA opcode, in this small addition: (github.com/Perl/perl5).

--- a/lib/B/Deparse.pm
+++ b/lib/B/Deparse.pm
@@ -52,7 +52,7 @@ use B qw(class main_root main_start main_cv svref_2object opnumber perlstring
         MDEREF_SHIFT
     );
 
-$VERSION = '1.51';
+$VERSION = '1.52';
 use strict;
 our $AUTOLOAD;
 use warnings ();
@@ -3060,6 +3060,8 @@ sub pp_sge { binop(@_, "ge", 15) }
 sub pp_sle { binop(@_, "le", 15) }
 sub pp_scmp { maybe_targmy(@_, \&binop, "cmp", 14) }
 
+sub pp_isa { binop(@_, "isa", 15) }
+
 sub pp_sassign { binop(@_, "=", 7, SWAP_CHILDREN) }
 sub pp_aassign { binop(@_, "=", 7, SWAP_CHILDREN | LIST_CONTEXT) }

As you can see it's quite a small addition here; we just need to add a new method to the main B::Deparse package named after the new opcode. This new method calls down to the common binop function which is shared by the various binary operators, and recurses down parts of the optree, returning a combined result using the "isa" string in between the two parts.

A more complex addition was made with the try syntax, as can be seen at (github.com/Perl/perl5); abbreviated here:

+sub pp_leavetrycatch {
+    my $self = shift;
+    my ($op) = @_;
...
+    my $trycode = scopeop(0, $self, $tryblock);
+    my $catchvar = $self->padname($catch->targ);
+    my $catchcode = scopeop(0, $self, $catchblock);
+
+    return "try {\n\t$trycode\n\b}\n" .
+           "catch($catchvar) {\n\t$catchcode\n\b}\cK";
+}

As before, this adds a new method named after the new opcode (in the case of the try/catch syntax this is named OP_LEAVETRYCATCH). The body of this method too just recurses down to parts of the sub-tree it was passed; in this case being two scope ops for the bodies of the blocks, plus a lexical variable name for the catch variable. The method then again returns a new string combining the various parts together along with the required braces, linefeeds, and indentation hints.

We can tell we need to add this for our new banana feature, as currently this does not deparse properly:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -Mexperimental=banana -MO=Deparse -ce 'print ban "Hello, world" ana;'
unexpected OP_BANANA at lib/B/Deparse.pm line 1664.
BEGIN {${^WARNING_BITS} = "\x10\x01\x00\x00\x00\x50\x04\x00\x00\x00\x00\x00\x00\x55\x51\x55\x50\x51\x45\x00"}
use feature 'banana';
print XXX;
-e syntax OK

We'll fix this by adding a new pp_banana in an appropriate place, perhaps just after the ones for lc/uc/fc. Don't forget to bump the $VERSION number too:

leo@shy:~/src/bleadperl/perl [git]
$ nvim lib/B/Deparse.pm 

leo@shy:~/src/bleadperl/perl [git]
$ git diff 
diff --git a/lib/B/Deparse.pm b/lib/B/Deparse.pm
index 67147f12dd..f6039a435d 100644
--- a/lib/B/Deparse.pm
+++ b/lib/B/Deparse.pm
@@ -52,7 +52,7 @@ use B qw(class main_root main_start main_cv svref_2object opnumber perlstring
         MDEREF_SHIFT
     );
 
-$VERSION = '1.56';
+$VERSION = '1.57';
 use strict;
 our $AUTOLOAD;
 use warnings ();
@@ -2824,6 +2824,13 @@ sub pp_lc { dq_unop(@_, "lc") }
 sub pp_quotemeta { maybe_targmy(@_, \&dq_unop, "quotemeta") }
 sub pp_fc { dq_unop(@_, "fc") }
 
+sub pp_banana {
+    my $self = shift;
+    my ($op, $cx) = @_;
+    my $kid = $op->first;
+    return "ban " . $self->deparse($kid, 1) . " ana";
+}
+
 sub loopex {
     my $self = shift;
     my ($op, $cx, $name) = @_;

This new function recurses down to deparse for the subtree, and returns a new string wrapped in the appropriate syntax for it. That should be all we need:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -Mexperimental=banana -MO=Deparse -ce 'print ban "Hello, world" ana;'
BEGIN {${^WARNING_BITS} = "\x10\x01\x00\x00\x00\x50\x04\x00\x00\x00\x00\x00\x00\x55\x51\x55\x50\x51\x45\x00"}
use feature 'banana';
print ban 'Hello, world' ana;
-e syntax OK

Of course, this being a perl module we should remember to update its unit tests.

leo@shy:~/src/bleadperl/perl [git]
$ git diff lib/B/Deparse.t
diff --git a/lib/B/Deparse.t b/lib/B/Deparse.t
index 24eb445041..0fe6940cb3 100644
--- a/lib/B/Deparse.t
+++ b/lib/B/Deparse.t
@@ -3171,3 +3171,10 @@ try {
 catch($var) {
     SECOND();
 }
+####
+# banana
+# CONTEXT use feature 'banana'; no warnings 'experimental::banana';
+ban 'literal' ana;
+ban $a ana;
+ban $a . $b ana;
+ban "stringify $a" ana;

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness lib/B/Deparse.t 
../lib/B/Deparse.t .. ok     
All tests successful.
Files=1, Tests=321,  9 wallclock secs ( 0.14 usr  0.00 sys +  8.99 cusr  0.38 csys =  9.51 CPU)
Result: PASS

Because in part 10 we added documentation for a new function in pod/perlfunc.pod there's another test that needs updating:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness ext/Pod-Functions/t/Functions.t 
../ext/Pod-Functions/t/Functions.t .. 1/? 
#   Failed test 'run as plain program'
#   at t/Functions.t line 55.
#          got: '
...
Result: FAIL

We can fix that by adding the new function to the expected list in the test file itself:

leo@shy:~/src/bleadperl/perl [git]
$ nvim ext/Pod-Functions/t/Functions.t

leo@shy:~/src/bleadperl/perl [git]
$ git diff ext/Pod-Functions/t/Functions.t
diff --git a/ext/Pod-Functions/t/Functions.t b/ext/Pod-Functions/t/Functions.t
index 2beccc1ac6..4d5b03e978 100644
--- a/ext/Pod-Functions/t/Functions.t
+++ b/ext/Pod-Functions/t/Functions.t
@@ -76,7 +76,7 @@ Functions.t - Test Pod::Functions
 __DATA__
 
 Functions for SCALARs or strings:
-     chomp, chop, chr, crypt, fc, hex, index, lc, lcfirst,
+     ban, chomp, chop, chr, crypt, fc, hex, index, lc, lcfirst,
      length, oct, ord, pack, q/STRING/, qq/STRING/, reverse,
      rindex, sprintf, substr, tr///, uc, ucfirst, y///
 
leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness ext/Pod-Functions/t/Functions.t 
../ext/Pod-Functions/t/Functions.t .. ok     
All tests successful.
Files=1, Tests=234,  1 wallclock secs ( 0.04 usr  0.01 sys +  0.23 cusr  0.00 csys =  0.28 CPU)
Result: PASS

At this point, we're done. We've now completed all the steps to add a new feature to the perl interpreter. As well as all the steps required to actually implement it in the core binary itself, we've updated the tests, documentation, and support modules to match.

Along the way we've seen examples from real commits into the perl tree while we made our own. Any particular design of new feature will of course have its own variations and differences - there's still many parts of the interpreter we haven't touched on in this series. It would be difficult to try to cover all the possible ideas of things that could be added or changed, but hopefully having completed this series you'll at least have a good overview of the main pieces that are likely to be involved, and have some starting-off points to explore further to see whatever additional details might be required for whatever situation you encounter.

Index | < Prev

Writing a Perl Core Feature - part 10: Documentation

2021-02-24T13:00:00.078+00:00

Index | < Prev | Next >

Now that have our new feature nicely implemented and tested, we're nearly finished. We just have a few more loose ends to tidy up. The first of these is to take a look at some documentation.

We've already done one small documentation addition to perldiag.pod when we added the new warning message, but the bulk of documentation to explain a new feature would likely be found in one of the main documents - perlsyn.pod, perlop.pod, perlfunc.pod or similar. Exactly which of these is best would depend on the nature of the specific feature.

The isa feature, being a new infix operator, was documented in perlop.pod: (github.com/Perl/perl5).

...
+=head2 Class Instance Operator
+X<isa operator>
+
+Binary C<isa> evaluates to true when left argument is an object instance of
+the class (or a subclass derived from that class) given by the right argument.
+If the left argument is not defined, not a blessed object instance, or does
+not derive from the class given by the right argument, the operator evaluates
+as false. The right argument may give the class either as a barename or a
+scalar expression that yields a string class name:
+
+    if( $obj isa Some::Class ) { ... }
+
+    if( $obj isa "Different::Class" ) { ... }
+    if( $obj isa $name_of_class ) { ... }
+
+This is an experimental feature and is available from Perl 5.31.6 when enabled
+by C<use feature 'isa'>. It emits a warning in the C<experimental::isa>
+category.

Lets now write a little bit of documentation for our new banana feature. Since it is a named function-like operator (though with odd syntax involving a second trailing named keyword), perhaps we'll write it in perlfunc.pod. We'll style it similarly to the case-changing functions lc and uc to get some suggested wording.

leo@shy:~/src/bleadperl/perl [git]
$ nvim pod/perlfunc.pod 

leo@shy (1 job):~/src/bleadperl/perl [git]
$ git diff | xml_escape 
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index b655a08ecc..319e9aab96 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -114,6 +114,7 @@ X<scalar> X<string> X<character>
 
 =for Pod::Functions =String
 
+L<C<ban>|/ban EXPR ana>,
 L<C<chomp>|/chomp VARIABLE>, L<C<chop>|/chop VARIABLE>,
 L<C<chr>|/chr NUMBER>, L<C<crypt>|/crypt PLAINTEXT,SALT>,
 L<C<fc>|/fc EXPR>, L<C<hex>|/hex EXPR>,
@@ -136,6 +137,10 @@ prefixed with C<CORE::>.  The
 L<C<"fc"> feature|feature/The 'fc' feature> is enabled automatically
 with a C<use v5.16> (or higher) declaration in the current scope.
 
+L<C<ban>|/ban EXPR ana> is available only if the
+L<C<"banana"> feature|feature/The 'banana' feature.> is enabled or if it is
+prefixed with C<CORE::>.
+
 =item Regular expressions and pattern matching
 X<regular expression> X<regex> X<regexp>
 
@@ -773,6 +778,15 @@ your L<atan2(3)> manpage for more information.
 
 Portability issues: L<perlport/atan2>.
 
+=item ban EXPR ana
+X<ban>
+
+=for Pod::Functions return ROT13 transformed version of a string
+
+Applies the "ROT13" transform to upper- and lower-case letters in the given
+expression string, returning the newly-formed string. Non-letter characters
+are left unchanged.
+
 =item bind SOCKET,NAME
 X<bind>

While this will do as a short example here, any real feature would likely have a lot more words to say than just this.

When editing POD files it's good to get into the habit of running the porting tests (or at least the POD checking ones) before committing, to check the formatting is valid:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness t/porting/pod*.t
porting/podcheck.t ... ok         
porting/pod_rules.t .. ok   
All tests successful.
Files=2, Tests=1472, 34 wallclock secs ( 0.20 usr  0.00 sys + 33.79 cusr  0.15 csys = 34.14 CPU)
Result: PASS

While I was writing this documentation it occurred to me to write about how the function handles Unicode characters vs byte strings, so I was thinking more about how it actually does. It turns out the implementation doesn't work properly for that, as we can demonstrate with a new test:

--- a/t/op/banana.t
+++ b/t/op/banana.t
@@ -11,7 +11,7 @@ use strict;
 use feature 'banana';
 no warnings 'experimental::banana';
 
-plan 7;
+plan 8;
 
 is(ban "ABCD" ana, "NOPQ", 'Uppercase ROT13');
 is(ban "abcd" ana, "nopq", 'Lowercase ROT13');
@@ -23,3 +23,8 @@ my $str = "efgh";
 is(ban $str ana, "rstu", 'Lexical variable');
 is(ban $str . "IJK" ana, "rstuVWX", 'Concat expression');
 is("(" . ban "LMNO" ana . ")", "(YZAB)", 'Outer concat');
+
+{
+    use utf8;
+    is(ban "café" ana, "pnsé", 'Unicode string');
+}

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness t/op/banana.t 
op/banana.t .. 1/8 # Failed test 8 - Unicode string at op/banana.t line 29
#      got "pnsé"
# expected "pns�"
op/banana.t .. Failed 1/8 subtests

This comes down to a bug in the pp_banana opcode function, which used the internal byte buffer of the incoming SV (SvPV) without inspecting the corresponding SvUTF8 flag. Such a pattern is always indicative of a Unicode support bug. We can easily fix this:

leo@shy:~/src/bleadperl/perl [git]
$ git diff pp.c
diff --git a/pp.c b/pp.c
index 9725806b84..3dbe21fadd 100644
--- a/pp.c
+++ b/pp.c
@@ -7211,6 +7211,8 @@ PP(pp_banana)
     s = SvPV(arg, len);
 
     mPUSHs(newSVpvn_rot13(s, len));
+    if(SvUTF8(arg))
+        SvUTF8_on(TOPs);
     RETURN;
 }
 

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness t/op/banana.t 
op/banana.t .. ok   
All tests successful.
Files=1, Tests=8,  0 wallclock secs ( 0.02 usr  0.00 sys +  0.02 cusr  0.00 csys =  0.04 CPU)
Result: PASS

Writing good documentation is an integral part of the process of developing a new feature. Firstly it helps to explain the feature to users so they know how to use it. But often you find that the process of writing the words helps you think about different aspects of that feature that you may not have considered before. With that new frame of mind you sometimes discover missing parts to it, or uncover bugs or cornercases that need fixing. Make sure to spend time working on the documentation for any new feature - it is said that you never truely understand something until you try teach it to someone else.

Index | < Prev | Next >

Writing a Perl Core Feature - part 9: Tests

2021-02-22T13:00:00.090+00:00

Index | < Prev | Next >

By the end of part 8 we finally managed to see an actual implementation of our new feature. We tested a couple of things on the commandline directly to see that it seems to be doing the right thing. For a real core feature though it would be better to have it tested in a more automated, repeatable fashion. This is what the core unit tests are for.

The core perl source distribution contains a t/ directory with unit test files, very similar to the structure used by regular CPAN modules. The process for running these is a little different; as we already saw back in part 3 they need to be invoked by t/harness. The files themselves are somewhat more limited in what other modules they can use, so the full suite of Test:: modules are unavailable. But still they are expected to emit the regular TAP output we've come to expect from Perl unit tests, and tend to be structured quite similarly inside.

For example, the isa feature added an entire new file for its unit tests. As they all relate to the new syntax and semantics around a new opcode, they go in a file under the t/op directory. I won't paste the entire t/op/isa.t file, but consider this small section: (github.com/Perl/perl5):

#!./perl

BEGIN {
    chdir 't' if -d 't';
    require './test.pl';
    set_up_inc('../lib');
    require Config;
}

use strict;
use feature 'isa';
no warnings 'experimental::isa';

...

my $baseobj = bless {}, "BaseClass";

# Bareword package name
ok($baseobj isa BaseClass, '$baseobj isa BaseClass');
ok(not($baseobj isa Another::Class), '$baseobj is not Another::Class');

While it doesn't use Test::More, it does still have access to some similar testing functions such as the ok test. The initial lines of boilerplate in the BEGIN block set up the testing functions from the test.pl script, so we can use them in the actual tests.

Lets now have a go at writing some tests for our new banana feature. As it works like a text transformation function we can imagine a few different test strings to throw at it.

leo@shy:~/src/bleadperl/perl [git]
$ nvim t/op/banana.t

leo@shy:~/src/bleadperl/perl [git]
$ cat t/op/banana.t
#!./perl

BEGIN {
    chdir 't' if -d 't';
    require './test.pl';
    set_up_inc('../lib');
    require Config;
}

use strict;
use feature 'banana';
no warnings 'experimental::banana';

plan 7;

is(ban "ABCD" ana, "NOPQ", 'Uppercase ROT13');
is(ban "abcd" ana, "nopq", 'Lowercase ROT13');
is(ban "1234" ana, "1234", 'Numbers unaffected');

is(ban "a! b! c!" ana, "n! o! p!", 'Whitespace and symbols intermingled');

my $str = "efgh";
is(ban $str ana, "rstu", 'Lexical variable');

is(ban $str . "IJK" ana, "rstuVWX", 'Concat expression');
is("(" . ban "LMNO" ana . ")", "(YZAB)", 'Outer concat');

$ ./perl t/harness t/op/banana.t
op/banana.t .. ok   
All tests successful.
Files=1, Tests=4,  1 wallclock secs ( 0.02 usr  0.00 sys +  0.03 cusr  0.00 csys =  0.05 CPU)
Result: PASS

Here we have used the is() testing function to test that various strings that we got the ban ... ana operator to generate are what we expected them to be. We've tested both uppercase and lowercase letters, and that non-letter characters such as numbers, symbols and spaces remain unaffected. In addition we've added some syntax tests as well, to check variables as well as literal string constants, and to demonstrate that the parser works correctly on the precedence of the operator mixed with string concatenation. All appears to be working fine.

Before we commit this one there is one last thing we have to do. Having added a new file to the distribution, one of the porting tests will now be unhappy:

leo@shy:~/src/bleadperl/perl [git]
$ git add t/op/banana.t 

leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
porting/manifest.t ........ 9848/? # Failed test 10502 - git ls-files
gives the same number of files as MANIFEST lists at porting/manifest.t line 101
#      got "6304"
# expected "6303"
# Failed test 10504 - Nothing added to the repo that isn't in MANIFEST
at porting/manifest.t line 113
#      got "1"
# expected "0"
# Failed test 10505 - Nothing added to the repo that isn't in MANIFEST
at porting/manifest.t line 114
#      got "not in MANIFEST: t/op/banana.t"
# expected "not in MANIFEST: "
porting/manifest.t ........ Failed 3/10507 subtests

To fix this one we need to manually add an entry in the MANIFEST file; unlike as is common practice for CPAN modules, this file is not automatically generated.

leo@shy:~/src/bleadperl/perl [git]
$ nvim MANIFEST

leo@shy:~/src/bleadperl/perl [git]
$ git diff MANIFEST
diff --git a/MANIFEST b/MANIFEST
index 71d3b453da..03ecdda3d2 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5779,6 +5779,7 @@ t/op/attrproto.t          See if the prototype attribute works
 t/op/attrs.t                   See if attributes on declarations work
 t/op/auto.t                    See if autoincrement et all work
 t/op/avhv.t                    See if pseudo-hashes work
+t/op/banana.t                  See if the ban ... ana syntax works
 t/op/bless.t                   See if bless works
 t/op/blocks.t                  See if BEGIN and friends work
 t/op/bop.t                     See if bitops work

leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
Result: PASS

Of course, in this test file we've added only 7 tests. It is likely that any actual real feature would have a lot more testing around it, to deal with a wider variety of situations and corner-cases. It's often that the really interesting cases only come to light after trying to use it for real and finding odd situations that don't quite work as expected; so after adding a new feature expect to spend a while expanding the test file to cover more things. It's especially useful to add new tests of new situations you find yourself using the feature in, even if they currently work just fine. The presence of such tests helps ensure the feature remains working in that manner.

Index | < Prev | Next >

Writing a Perl Core Feature - part 8: Interpreter internals

2021-02-19T13:00:00.200+00:00

Index | < Prev | Next >

At this point we are most of the way to adding a new feature to the Perl interpreter. In part 4 we created an opcode function to represent the new behaviour, part 5 and part 6 added compiler support to recognise the syntax used to represent it, and in part 7 we made a helper function to provide the required behaviour. It's now time to tie them all together.

When we looked at opcodes and optrees back in part 4, I mentioned that each node of the optree performs a little part of the execution of a function, with child nodes usually obtaining some piece of data somewhere that gets passed up to parent nodes to operate on. I skipped over exactly how that all works, so for this part lets look at that in more detail.

The data model used by the perl interpreter for runtime execution of code is based around being a stack machine. Most opcodes that operate in some way on regular perl data values do so by interacting with the data stack (often simply called "the stack"; though this is sometimes ambiguous as there are in fact several stacks within the perl interpreter). As the interpreter walks along an optree invoking the function associated with each opcode, these various functions either push values onto the stack, or pop values already there back off it again, in order to use them.

For example, in part 4 we saw how the line of code my $x = 5; might get represented by an optree of three nodes - an OP_SASSIGN with two child nodes OP_CONST and OP_PADSV.

When this statement is executed the optree nodes are visited in postfix order, with the two child BASEOPs running first in order to push some values to the stack, followed by the assignment BINOP afterwards, which takes those values back off the stack and performs the appropriate assignment.

Lets now take a closer look at the code inside one of the actual functions which implements this. For example, pp_const, the function for OP_CONST consists of three short lines:

PP(pp_const)
{
    dSP;
    XPUSHs(cSVOP_sv);
    RETURN;
}

Of these three lines, all four symbols are in fact macros:

dSP declares some local variables for tracking state, used by later macros
cSVOP_sv fetches the actual SV pointer out of the SVOP itself. This will be the one holding the constant's value
XPUSHs extends the (data) stack if necessary, then pushes it there
RETURN resynchronises the interpreter state from the local variables, and arranges for the opcode function to return the next opcode, for the toplevel instruction loop

The pp_padsv function is somewhat more complex, but the essential parts of it are quite similar; the following example is heavily paraphrased:

PP(pp_padsv)
{
    SV ** const padentry = &(PAD_SVl(op->op_targ));
    XPUSHs(*padentry);
    RETURN;
}

This time, rather than the cSVOP_sv which takes the SV out of the op itself, we use PAD_SVl which looks up the SV in the currently-active pad, by using the target index which is stored in the op.

When the isa feature was added, its main pp_isa opcode function was actually quite small: (github.com/Perl/perl5).

--- a/pp.c
+++ b/pp.c
@@ -7143,6 +7143,18 @@ PP(pp_argcheck)
     return NORMAL;
 }
 
+PP(pp_isa)
+{
+    dSP;
+    SV *left, *right;
+
+    right = POPs;
+    left  = TOPs;
+
+    SETs(boolSV(sv_isa_sv(left, right)));
+    RETURN;
+}
+

Since OP_ISA is a BINOP it is expecting to find two arguments on the stack; traditionally these are called left and right. This opcode function simply takes those two values and calls the sv_isa_sv() function, which returns a boolean truth value. The boolSV helper function returns an SV pointer to represent this boolean value, which is then used as the result of the opcode itself.

As a small performance optimsation, this function decides to only POP one argument, before changing the top-of-stack value to its result using SETs. This is equivalent to POPing two of them and PUSHing its result, except that it doesn't have to alter the stack pointer as many times.

For more of a look at how the stack works, you could also take a look at another post from my series on Parser Plugins: Part 3a - The Stack.

Lets now take a look at implementing our banana feature for real. Recall in part 4 we added the pp_banana function with some placeholder content that just died with a panic message if invoked. We'll now replace that with a real implementation:

leo@shy:~/src/bleadperl/perl [git]
$ nvim pp.c 

leo@shy:~/src/bleadperl/perl [git]
$ git diff pp.c
diff --git a/pp.c b/pp.c
index 93141454e1..bced3d23ea 100644
--- a/pp.c
+++ b/pp.c
@@ -7203,7 +7203,15 @@ PP(pp_cmpchain_dup)
 
 PP(pp_banana)
 {
-    DIE(aTHX_ "panic: we have no bananas");
+    dSP;
+    const char *s;
+    STRLEN len;
+    SV *arg = POPs;
+
+    s = SvPV(arg, len);
+
+    PUSHs(newSVpvn_rot13(s, len));
+    RETURN;
 }
 
 /*

Now lets rebuild perl and try it out:

leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -E 'use experimental "banana"; say ban "Hello, world!" ana;'
Uryyb, jbeyq!

Well it certainly looks plausible - we've got back a different string of the same length, with different letters but in the same capitalisation and identical non-letter characters. Lets compare with something like tr to see if it's correct:

leo@shy:~/src/bleadperl/perl [git]
$ echo "Uryyb, jbeyq!" | tr "A-Za-z" "N-ZA-Mn-za-m"
Hello, world!

Seems good. But it turns out we've still missed something. This function has a memory leak. We can demonstrate it by writing a small example that calls ban ... ana a large number of times (say, a thousand), and printing the total count of SVs on the heap before and after. There's a handy function in perl's unit test suited called XS::APItest::sv_count we can use here:

leo@shy (1 job):~/src/bleadperl/perl [git]
$ ./perl -Ilib -I. -MXS::APItest=sv_count -E \
  'use experimental "banana";
   say sv_count();
   ban "Hello, world!" ana for 1..1000;
   say sv_count();'
5321
6321

Oh dear. The SV count is a thousand higher afterwards than before, suggesting we leaked an SV on every call.

It turns out this is because of an optimisation that the interpreter uses, where SV pointers on Perl data stack don't actually contribute to reference counting. When values get POP'ed from the stack we don't have to decrement their refcount; when values get pushed we don't increment it. This saves an amount of runtime performance to not have to be adjusting those counts all the time. The consequence here is that we have to be a bit more careful when returning newly-constructed values. We must mark the value as mortal, which means we are saying that its reference count is somehow artificially high (because of that pointer on the stack), and perl should decrement the reference count at some point soon, when it next discards temporary values.

Because this sort of thing is done a lot, there is a handy macro called mPUSHs, which mortalizes an SV when it pushes it to the data stack. We can call that instead:

$ git diff pp.c
...
+    mPUSHs(newSVpvn_rot13(s, len));
+    RETURN;
 }
 
 /*

Now when we try our leak test we find the same SV count before and after, meaning no leak has occurred:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -I. -MXS::APItest=sv_count -E ...
5321
5321

We may be onto a winner here.

Index | < Prev | Next >

Writing a Perl Core Feature - part 7: Support functions

2021-02-17T13:00:00.118+00:00

Index | < Prev | Next >

So far in this series we've seen several modifications and small additions, to add the required bits and pieces for our new feature to various parts of the perl interpreter. Often when adding anything but the very smallest and simplest of features or changes, it becomes necessary not just to modify existing things, but to add some new support functions as well.

For example, adding the isa feature required adding a new function to actually implement the bulk of the operation, which is then called from the pp_isa opcode function. This helper function was added into universal.c in this commit: (github.com/Perl/perl5).

--- a/universal.c
+++ b/universal.c
@@ -187,6 +187,74 @@ Perl_sv_derived_from_pvn(pTHX_ SV *sv, const char *const name, const STRLEN len,
     return sv_derived_from_svpvn(sv, NULL, name, len, flags);
 }
 
+/*
+=for apidoc sv_isa_sv
+
+Returns a boolean indicating whether the SV is an object reference and is
+derived from the specified class, respecting any C<isa()> method overloading
+it may have. Returns false if C<sv> is not a reference to an object, or is
+not derived from the specified class.
...
+
+=cut
+
+*/
+
+bool
+Perl_sv_isa_sv(pTHX_ SV *sv, SV *namesv)
+{
+    GV *isagv;
+
+    PERL_ARGS_ASSERT_SV_ISA_SV;
+
+    if(!SvROK(sv) || !SvOBJECT(SvRV(sv)))
+        return FALSE;
+
...
+    return sv_derived_from_sv(sv, namesv, 0);
+}
+
 /*
 =for apidoc sv_does_sv

Like all good helper functions, this one is named beginning with a Perl_ prefix and takes as its first parameter the pTHX_ macro. To make the function properly visible to other code within the interpreter, an entry needed adding to the embed.fnc file which lists all of the functions. (github.com/Perl/perl5).

--- a/embed.fnc
+++ b/embed.fnc
@@ -1777,6 +1777,7 @@ ApdR      |bool   |sv_derived_from_sv|NN SV* sv|NN SV *namesv|U32 flags
 ApdR   |bool   |sv_derived_from_pv|NN SV* sv|NN const char *const name|U32 flags
 ApdR   |bool   |sv_derived_from_pvn|NN SV* sv|NN const char *const name \
                                     |const STRLEN len|U32 flags
+ApdRx  |bool   |sv_isa_sv      |NN SV* sv|NN SV* namesv
 ApdR   |bool   |sv_does        |NN SV* sv|NN const char *const name
 ApdR   |bool   |sv_does_sv     |NN SV* sv|NN SV* namesv|U32 flags
 ApdR   |bool   |sv_does_pv     |NN SV* sv|NN const char *const name|U32 flags

This file stores pipe-separated columns, containing:

A set of flags - in this case marking an API function (A), having the Perl_ prefix (p), with documentation (d), whose return value must not be ignored (R) and is currently experimental (x)
The return type
The name
Argument types in all remaining columns; where NN prefixes an argument which must not be passed as NULL

For our new banana feature lets now think of some semantics. Perhaps, given the example code we saw yesterday, it should return a new string built from its argument. For arbitrary reasons of having something interesting yet unlikely in practice, lets make it return a ROT13 transformed version.

Lets now add a helper function to do this - something to construct a new string SV containing the ROT13'ed transformation of the given input. We'll begin by picking a new name for this new function, and adding a definition line into the embed.fnc list, and running the regen/embed.pl regeneration script:

leo@shy:~/src/bleadperl/perl [git]
$ nvim embed.fnc 

leo@shy:~/src/bleadperl/perl [git]
$ git diff embed.fnc
diff --git a/embed.fnc b/embed.fnc
index eb7b47601a..74946566e7 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1488,6 +1488,7 @@ ApdR      |SV*    |newSVuv        |const UV u
 ApdR   |SV*    |newSVnv        |const NV n
 ApdR   |SV*    |newSVpv        |NULLOK const char *const s|const STRLEN len
 ApdR   |SV*    |newSVpvn       |NULLOK const char *const buffer|const STRLEN len
+ApdR   |SV*    |newSVpvn_rot13 |NN const char *const s|const STRLEN len
 ApdR   |SV*    |newSVpvn_flags |NULLOK const char *const s|const STRLEN len|const U32 flags
 ApdR   |SV*    |newSVhek       |NULLOK const HEK *const hek
 ApdR   |SV*    |newSVpvn_share |NULLOK const char* s|I32 len|U32 hash

leo@shy:~/src/bleadperl/perl [git]
$ perl regen/embed.pl 
Changed: proto.h embed.h

Take a look now at the changes it's made.

A new macro in embed.h which calls the full Perl_-prefixed function name from its shorter alias. The macro makes sure to pass in the aTHX_ parameter, meaning we don't have to remember that all the time
A prototype and an arguments assertion macro for the function in proto.h

To actually implement this function we should pick a file to put it in. Since it's creating a new SV, the file sv.c seems reasonable. For neatness we'll put it right next to the other newSVpv* functions, in the same order as the list in embed.fnc:

leo@shy:~/src/bleadperl/perl [git]
$ nvim sv.c

leo@shy:~/src/bleadperl/perl [git]
$ git diff sv.c
diff --git a/sv.c b/sv.c
index e54d0a078f..156e64e879 100644
--- a/sv.c
+++ b/sv.c
@@ -9397,6 +9397,43 @@ Perl_newSVpvn(pTHX_ const char *const buffer, const STRLEN len)
     return sv;
 }
 
+/*
+=for apidoc newSVpvn_rot13
+
+Creates a new SV and copies a string into it by transforming letters by the
+ROT13 algorithm, and copying other bytes literally. The string may contain
+C<NUL> characters and other binary data. The reference count for the new SV
+is set to 1.
+
+=cut
+*/
+
+SV *
+Perl_newSVpvn_rot13(pTHX_ const char *const s, const STRLEN len)
+{
+    char *dp;
+    const char *sp = s, *send = s + len;
+    SV *sv = newSV(len);
+
+    dp = SvPVX(sv);
+    while(sp < send) {
+        char c = *sp;
+        if(isLOWER(c))
+            *dp = 'a' + (c - 'a' + 13) % 26;
+        else if(isUPPER(c))
+            *dp = 'A' + (c - 'A' + 13) % 26;
+        else
+            *dp = c;
+
+        sp++; dp++;
+    }
+
+    *dp = '\0';
+    SvPOK_on(sv);
+    SvCUR_set(sv, len);
+    return sv;
+}
+
 /*
 =for apidoc newSVhek

I don't want to spend a large amount of time or space in this post to explain the whole function, but as a brief summary,

newSV() creates a new SV with a string buffer big enough to store the content (it internally adds 1 more to accomodate the terminating NUL)
The pointers sp and dp are initialised to point into the source and destination string buffers
Characters are copied one at a time; performing the ROT13 algorithm on lower or uppercase letters and passing anything else transparently
The terminating NUL is appended
The current string size and stringiness flag are set on the new SV, which is then returned

If we run the porting tests again now, we'll find one gets upset:

leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
porting/args_assert.t ..... 1/? # Failed test 2 - PERL_ARGS_ASSERT_NEWSVPVN_ROT13 is 
declared but not used at porting/args_assert.t line 64

This test is unhappy because it didn't find any code that actually called the argument-asserting macro which the regeneration script added to proto.h. This is the macro that asserts on the types of arguments to the function. We can fix that by remembering to use it in the function's definition:

leo@shy:~/src/bleadperl/perl [git]
$ nvim sv.c

leo@shy:~/src/bleadperl/perl [git]
$ git diff sv.c
diff --git a/sv.c b/sv.c
index e54d0a078f..d63c8a7bbb 100644
--- a/sv.c
+++ b/sv.c
...
+SV *
+Perl_newSVpvn_rot13(pTHX_ const char *const s, const STRLEN len)
+{
+    char *dp;
+    const char *sp = s, *send = s + len;
+    SV *sv;
+
+    PERL_ARGS_ASSERT_NEWSVPVN_ROT13;
+
+    sv = newSV(len);
+
+    dp = SvPVX(sv);
...

leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
Result: PASS

As core functions go this one is actually pretty terrible. It presumes ASCII (and doesn't work properly on EBCDIC platforms), and requires careful handling in the caller to set the UTF8 flag if required. But overall it's at least good enough for demonstration purposes for our feature. In the next part we'll hook this function up with the opcode implementation and finally see our new feature in action.

Index | < Prev | Next >

Writing a Perl Core Feature - part 6: Parser

2021-02-15T13:00:00.211+00:00

Index | < Prev | Next >

In the previous part I introduced the concepts of the lexer and the parser, and the way they combine together to form part of the compiler which actually translates the incoming program source code into the in-memory optree where it can be executed. We took a look at some parser changes, and the way that the isa operator was able to work with that alone without needing a corresponding change in the parser, but also noted that most non-trivial syntax additions will require concurrent changes to both the parser and the lexer to cope with it.

In particular, although it is the lexer that creates and emits tokens into the parser, it is the parser which maintains the list of what token types it expects. It is there where new token types have to be added.

The isa operator did not need to make any changes in the parser, so for today's article we'll look instead at the recently-added try/catch syntax, which did. That was first added in this commit, though subsequent modifications have been made to it. Go take a look now - perhaps you will find parts of it recognisable, similar to the changes we've already seen with isa and made for our new banana feature we have been building up.

Similar to the situation with features, warnings, and opcodes, the parser is maintained primarily by changes to one source file which is then run through a regeneration script to update several other files that are generated from it. The source of truth in this case is the file perly.y, and the regeneration script for it is regen_perly.pl (neither of which live in the regen directory for reasons lost to the mists of time).

The part of the try/catch commit which updated the parser generation file had two parts to it: (github.com/Perl/perl5).

--- a/perly.y
+++ b/perly.y
@@ -69,6 +69,7 @@
 %token <ival> FORMAT SUB SIGSUB ANONSUB ANON_SIGSUB PACKAGE USE
 %token <ival> WHILE UNTIL IF UNLESS ELSE ELSIF CONTINUE FOR
 %token <ival> GIVEN WHEN DEFAULT
+%token <ival> TRY CATCH
 %token <ival> LOOPEX DOTDOT YADAYADA
 %token <ival> FUNC0 FUNC1 FUNC UNIOP LSTOP
 %token <ival> MULOP ADDOP
@@ -459,6 +460,31 @@ barestmt:  PLUGSTMT
                                  newFOROP(0, NULL, $mexpr, $mblock, $cont));
                          parser->copline = (line_t)$FOR;
                        }
+       |       TRY mblock[try] CATCH PERLY_PAREN_OPEN 
+                       { parser->in_my = 1; }
+               remember scalar 
+                       { parser->in_my = 0; intro_my(); }
+               PERLY_PAREN_CLOSE mblock[catch]
+                       {
...
+                       }
        |       block cont
                        {
                          /* a block is a loop that happens once */

Of these two parts, the first is the bit that defines two new token types. These are types we can use in the lexer - recall from the previous part we saw the lexer emit these tokens as PREBLOCK(TRY) and PREBLOCK(CATCH).

The second part of this change gives the actual parsing rules which the parser uses to recognise the new syntax. This appears in the form of a new alternative to the set of possible rules that the parser may use to create a barestmt (each alternative is separated by | characters). The rules on how to recognise this one are made from a mix of basic tokens (those in capitals) and other grammar rules (those in lower case). The four basic tokens here are the keyword try, an open and close parenthesis pair (named represented by tokens called PERLY_PAREN_OPEN and PERLY_PAREN_CLOSE) and the keyword catch.

In effect we can imagine if the rule were expressed instead using literal strings:

barestmt =
    ...
    | "try" mblock "catch" "(" scalar ")" mblock

The other grammar rules that are referred to by this one define the basic shape of a block of code (the one called mblock), and a single scalar variable (the one called scalar). The other parts that I omitted in this simplified version (remember and the two action blocks relating to parser->in_my) are involved with ensuring that the catch variable part of the syntax is recognised as creating a new variable. It pretends that there had been a my keyword just before the variable name, so the name introduces a new variable.

Don't worry too much about the contents of the main action block for this try/catch syntax rule. That's all specific to how to build up the optree for this particular syntax, and in any case was changed in a later commit to move most of it out to a helper function. We'll come back in a moment to see what we can put there for our new syntax.

Lets now begin adding the tokenizing and parsing rules for our new banana feature. Recall from part 5 we decided we'd add two new token types to represent the two basic keywords. We can do that by adding them to the collection of tokens at the top of the perly.y file and running the regeneration script:

leo@shy:~/src/bleadperl/perl [git]
$ nvim perly.y 

leo@shy:~/src/bleadperl/perl [git]
$ git diff perly.y
diff --git a/perly.y b/perly.y
index 184fb0c158..7bbb64f202 100644
--- a/perly.y
+++ b/perly.y
@@ -77,6 +77,7 @@
 %token <ival> LOCAL MY REQUIRE
 %token <ival> COLONATTR FORMLBRACK FORMRBRACK
 %token <ival> SUBLEXSTART SUBLEXEND
+%token <ival> BAN ANA
 
 %type <ival> grammar remember mremember
 %type <ival>  startsub startanonsub startformsub

leo@shy:~/src/bleadperl/perl [git]
$ perl regen_perly.pl 
Changed: perly.act perly.tab perly.h

At this point if you want you could take a look at the change the script introduced in perly.h - it just adds the two new token types to the main enum yytokentype, where the tokizer and the parser can use them. Don't worry about the other two files (perly.act and perly.tab) - they are long tables of automatically generated output; mostly numbers which help the parser to maintain its internal state. The change there won't be particularly meaningful to look at.

As these new token types now exist in perly.h we can use them to update toke.c to recognise them:

leo@shy:~/src/bleadperl/perl [git]
$ nvim toke.c 

leo@shy:~/src/bleadperl/perl [git]
$ git diff toke.c
diff --git a/toke.c b/toke.c
index 628a79fb43..9f86e110ce 100644
--- a/toke.c
+++ b/toke.c
@@ -7686,6 +7686,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
     case KEY_accept:
         LOP(OP_ACCEPT,XTERM);
 
+    case KEY_ana:
+        Perl_ck_warner_d(aTHX_
+            packWARN(WARN_EXPERIMENTAL__BANANA), "banana is experimental");
+        TOKEN(ANA);
+
     case KEY_and:
         if (!PL_lex_allbrackets && PL_lex_fakeeof >= LEX_FAKEEOF_LOWLOGIC)
             return REPORT(0);
@@ -7694,6 +7699,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
     case KEY_atan2:
         LOP(OP_ATAN2,XTERM);
 
+    case KEY_ban:
+        Perl_ck_warner_d(aTHX_
+            packWARN(WARN_EXPERIMENTAL__BANANA), "banana is experimental");
+        TOKEN(BAN);
+
     case KEY_bind:
         LOP(OP_BIND,XTERM);

Now we can rebuild perl and test some examples:

leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -E 'use feature "banana"; say ban "a string here" ana;'
banana is experimental at -e line 1.
banana is experimental at -e line 1.
syntax error at -e line 1, near "say ban"
Execution of -e aborted due to compilation errors.

We get our expected warnings about the experimental syntax, and then a syntax error. This is because, while the lexer recognises our keywords, we haven't yet written a parser rule to tell the parser what to do with it. But we can at least tell the lexer recognised the keywords, because if we test without enabling the feature we get a totally different error:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -E 'say ban "a string here" ana;'
Bareword found where operator expected at -e line 1, near ""a string here" ana"
        (Missing operator before ana?)
syntax error at -e line 1, near ""a string here" ana"
Execution of -e aborted due to compilation errors.

Lets now add a grammar rule to let the parser recognise this syntax:

leo@shy:~/src/bleadperl/perl [git]
$ nvim perly.y 

leo@shy:~/src/bleadperl/perl [git]
$ git diff perly.y
...
                    SUBLEXSTART listexpr optrepl SUBLEXEND
                        { $$ = pmruntime($PMFUNC, $listexpr, $optrepl, 1, $<ival>2); }
+       |       BAN expr ANA
+                       { $$ = newUNOP(OP_BANANA, 0, $expr); }
        |       BAREWORD
        |       listop
...

leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl

With this new definition our new syntax:

is recognised as a basic term expression, meaning it can stand in the same parts of syntax as other expressions such as constants or variables
requires an expr expression between the ban and ana keywords, meaning it will accept any sort of complex expression such as a string concatenation operator or function call

After the grammar rule which tells the parser how to recognise the new syntax, we've added a block of code telling it how to implement it. This is translated into some real C code that forms part of the parser, so we can invoke any bits of perl interpreter internals from here. When it gets translated a few special variables are replaced in the code - these are the ones prefixed with $ symbols. The $$ variable is where the parser is expecting to find the output of this particular grammar rule; it's where we put the optree we construct to represent it. For arguments into that we can use the other variable, named after the sub-rule used to parse it - $expr. That will contain the output of parsing that part of the syntax - again an optree.

In this action block it is now a simple matter of generating an optree for the OP_BANANA opcode we added in part 4. Recall that was an op of type UNOP, so we use the newUNOP() function to do this, taking as its child subtree the expression between the two keywords which we got in $expr. We just put that result into the $$ result variable, and we're done.

Now we can try using it:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -E 'use feature "banana"; say ban "a string here" ana;'
banana is experimental at -e line 1.
banana is experimental at -e line 1.
panic: we have no bananas at -e line 1.

Hurrah! We get the panic message we added as a placeholder when we created the Perl_pp_banana function back in part 4. The pieces are now starting to come together - in the next part we'll start implementing the actual behaviour behind this syntax.

Lets not forget to add the new "experimental" warnings to pod/perldiag.pod in order to keep the porting test happy:

leo@shy:~/src/bleadperl/perl [git]
$ nvim pod/perldiag.pod 

$ git diff pod/perldiag.pod
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 98d159dc21..66b0a4aa40 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -519,6 +519,11 @@ wasn't a symbol table entry.
 (P) An internal request asked to add a scalar entry to something that
 wasn't a symbol table entry.
 
+=item banana is experimental
+
+(S experimental::banana) This warning is emitted if you use the banana
+syntax (C<ban> ... C<ana>). This syntax is currently experimental.
+
 =item Bareword found in conditional

For now there's one last thing we can look at. Even though we don't have an implementation behind the syntax, we can at least compile it into an optree. We can inspect the generated optree by using the -MO=Concise compiler backend:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -MO=Concise -E 'use feature "banana"; say ban "a string here" ana;'
banana is experimental at -e line 1.
banana is experimental at -e line 1.
7  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter v ->2
2     <;> nextstate(main 3 -e:1) v:%,us,{,fea=15 ->3
6     <@> say vK ->7
3        <0> pushmark s ->4
5        <1> banana sK/1 ->6
4           <$> const(PV "a string here") s ->5
-e syntax OK

I won't go into the full details here - for that you can read the documentation at B::Concise. For now I'll just remark that we can see the banana op here, as an UNOP (the 1 flag before it), sitting in the optree as a child node of say, with the string constant as its own child op. When working with optree parsing, the B::Concise module is a handy debugging tool you can use to inspect the generated optree and ensure it has the shape you expected.

Index | < Prev | Next >

Writing a Perl Core Feature - part 5: Lexer

2021-02-12T13:00:00.255+00:00

Index | < Prev | Next >

Now we have a controllable feature flag that conditionally recognises our new keywords, and we have a new opcode that we can use to implement some behaviour for it, we can begin to tie them together. The previous post mentioned that the Perl interpreter converts source code of a program into an optree, stored in memory. This is done by a collection of code loosely described as the compiler. Exactly what the compiler will do with these new keywords depends on its two main parts - the lexer, and the parser.

If you're unfamiliar with these general concepts of compiler technology, allow me a brief explanation. A lexer takes the source code, in the form of a stream of characters, and begins analysing it by grouping those characters up into the basic elements of the syntax, called tokens (sometimes called lexemes). This sequence of tokens is then passed into the parser, whose job is to build up the syntax tree representing the program from those analysed tokens. (The lexer is sometimes also called a tokenizer; the two words are interchangable).

Tokens may be as small as a single character (for example a + or - operator), or could be an entire string or numerical constant. It is the job of the lexer to skip over things like comments and ignorable whitespace. Typically in compilers, tokens are usually represented by some sort of type system, where each kind of token has a specific type, often with associated values. For example, any numerical constant in the source code would be represented by a token giving a "NUMBER" type, whose associated value was the specific number. In this manner the parser can then consider the types of tokens it has received (for example it may have recently received a number, a + operator, and another number), and emit some form of syntax tree to represent the numerical addition of these two numbers.

For example for a simple expression language we might find it gets first tokenized into a stream of tokens. Any sequence of digits becomes a NUMBER token with its associated numerical value, and operators become their own token types representing the symbol itself:

It then gets parsed by recursively applying an ordered list of rules (to implement operator precedence) to form some sort of syntax tree. We're looking ultimately for an expr (short for "expression"). At high priority, a sequence of expr-STAR-expr can be considered as an expr (by combining the two numbers by a MULTIPLY operation). At lesser priority, a sequence expr-PLUS-expr can be considered as such (by using ADD). Finally, a NUMBER token can stand alone as an expr.

Specifically in Perl's case, the lexer is rather more complex than most typical languages. It has a number of features which may surprise you if you are familiar with the overall concept of token-based parsing. Whereas some much simpler languages can be tokenized with a statically-defined set of rules, Perl's lexer is much more stateful and dynamically controlled. The recent history of tokens it has already seen can change its interpretation of things to come. The parser can influence what the lexer will expect to see next. Additionally, existing code that has already been seen and parsed will also affect its decisions.

To give a few examples here, consider the way that braces are used both to delimit blocks of code, and anonymous hash references. The lexer resolves which case is which by examining what "expect" state it is in - whether it should be expecting an expression term, or a statement. Consider also the way that the names of existing functions already in scope (and what prototypes, if any, they may have) influences the way that calls to those functions are parsed. This is, in part, performed by the lexer.

my $hashref = { one => 1, two => 2 };
# These braces are a hashref constructor

if($cond) { say "Cond is true"; }
# These braces are a code block

sub listy_func { ... }
sub unary_func($) { ... }

say listy_func 1, 2, 3;
# parsed as  say(listy_func(1, 2, 3));

say unary_func 4, 5, 6;
# parsed as  say(unary_func(4), 5, 6);

Due to its central role in parsing the source code of a program, it is important that the lexer knows about every keyword and combination of symbols used in its syntax. Not all new features and keywords would need to consider the parser, so for now we'll leave that for the next post in this series and concentrate on the lexer.

The lexer is contained in the file toke.c. When the isa feature was added the change here was rather small: (github.com/Perl/perl5).

--- a/toke.c
+++ b/toke.c
@@ -7800,6 +7800,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
     case KEY_ioctl:
         LOP(OP_IOCTL,XTERM);
 
+    case KEY_isa:
+        Perl_ck_warner_d(aTHX_
+            packWARN(WARN_EXPERIMENTAL__ISA), "isa is experimental");
+        Rop(OP_ISA);
+
     case KEY_join:
         LOP(OP_JOIN,XTERM);

Here we have extended the main function that recognises barewords vs keywords; the function yyl_word_or_keyword. This function is based, in part, on the function in keywords.c that we saw modified back in part 3. (Remember; that added the new keywords, to be conditionally recognised depending on whether our feature is enabled). If the keyword was recognised as the isa keyword (meaning the feature had been enabled), then the lexer will recognise it as a token in the category of "relational operator", called Rop. We additionally report the value of the opcode to implement it; the opcode OP_ISA which we saw added in part 4. Since the feature is experimental, here is the time at which we emit the "is experimental" warning, using the warning category we saw added in part 2.

Because of this neat convenience, the change adding the isa operator didn't need to touch the parser at all. In order for us to have something interesting to talk about when we move on to the parser, lets imagine a slightly weirder grammar shape for our new banana feature. We have two keywords to play with, so lets now imagine that they are used in a pair, surrounding some other expression; as in the syntax:

use feature 'banana';

my $something = ban "Some other stuff goes here" ana;

Because of this rather weird structure, we won't be able to make use of any of the convenience token types, so we'll instead just emit these as plain TOKENs and let the parser deal with it. This will necessitate some changes to the parser as well, to add some new token values for it to recognise, so we'll do that in the next part too.

Before we leave the topic of the lexer, lets just take a look at another recent Perl core change - the one that first introduces the try/catch syntax, via the try named feature: (github.com/Perl/perl5).

...
@@ -7704,6 +7706,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
     case KEY_break:
         FUN0(OP_BREAK);
 
+    case KEY_catch:
+        Perl_ck_warner_d(aTHX_
+            packWARN(WARN_EXPERIMENTAL__TRY), "try/catch is experimental");
+        PREBLOCK(CATCH);
+
     case KEY_chop:
         UNI(OP_CHOP);
 
@@ -8435,6 +8442,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
     case KEY_truncate:
         LOP(OP_TRUNCATE,XTERM);
 
+    case KEY_try:
+        Perl_ck_warner_d(aTHX_
+            packWARN(WARN_EXPERIMENTAL__TRY), "try/catch is experimental");
+        PREBLOCK(TRY);
+
     case KEY_uc:
         UNI(OP_UC);

This was a very similar change - again just two new case labels to handle the two newly-added keywords. Each one emits a token of the PREBLOCK type. This is a hint to the parser that following the keyword it should expect to find a block of code surrounded by braces ({ ... }). In general when adding new syntax, there will likely be some existing token types that can be used for it, because it is likely following a similar shape to things already there.

Each of these changes adds a new warning - a call to Perl_ck_warner_d. There's a porting test file that checks to see that every one of these has been mentioned somewhere in pod/perldiag.pod. In order to keep that test happy, each commit had to add a new section there too; for example for isa: (github.com/Perl/perl5).

--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3262,6 +3262,12 @@ an anonymous subroutine, or a reference to a subroutine.
 (W overload) You tried to overload a constant type the overload package is
 unaware of.
 
+=item isa is experimental
+
+(S experimental::isa) This warning is emitted if you use the (C<isa>)
+operator. This operator is currently experimental and its behaviour may
+change in future releases of Perl.
+
 =item -i used with no filenames on the command line, reading from STDIN
 
 (S inplace) The C<-i> option was passed on the command line, indicating

In the next part, we'll take a look at the other half of the compiler, the parser. It is there where we'll make our next modifications to add the banana feature.

Index | < Prev | Next >

Writing a Perl Core Feature - part 4: Opcodes

2021-02-10T13:00:00.012+00:00

Index | < Prev | Next >

Optrees and Ops

Before we get into this next part, I want to first explain some details about how the Perl interpreter works. In summary, the source code of a Perl program is translated into a more compiled form when the interpreter starts up and reads the files. This form is stored in memory and is used to implement the behaviour of the functions that make up the program. It is called an Optree.

Or rather more accurately, every individual function in the program is represented by an Optree. This is a tree-shaped data structure, whose individual nodes each represent one basic kind of operation or step in the execution of that function. This could be considered similar to a sort of assembly language representation, except that rather than being stored as a flat list of instructions, the tree-shaped structure of the individual nodes (called "ops") helps determine the behaviour of the program when run.

For example, while there are many kinds of ops that have no child nodes, these are typically used to represent constants in the program, or fetch items from well-defined locations elsewhere in the interpreter - such as lexical or package variables. Most other kinds of op take one or more subtrees as child nodes and form the tree structure, where they will operate on the data those child nodes previously fetched - such as adding numbers together, or assigning values into variable locations. To execute the optree the interpreter visits each node in postfix order; recursively gathering results from child nodes of the tree to pass upwards to their parents.

Each individual type of op determines what sort of tree-shaped structure it will have, and are grouped together by classes. The most basic class of op (variously called either just "op", or sometimes a "baseop") is one that has no child nodes. An op class with a single child op is called an "unop" (for "unary operator"), one with two children is called a "binop" (for "binary operator"), and one with a variable number of children is a "listop". Within these broad categories there are also sub-divisions: for example a basic op which carries a Perl value with it is an "svop".

Specific types of op are identified by names, given by the constants defined in opnames.h. For example, a basic op carrying a constant value is an OP_CONST, and one representing a lexical variable is an OP_PADSV (so named because variables - SVs - are stored in a data structure called a scratchpad, or pad for short). A binop which performs a scalar assignment between its two child ops is OP_SASSIGN. Thus, for example, the following Perl statement could be represented by the optree given below it:

my $x = 5;

Of course, in such a brief overview as this I have omitted many details, as well as made many simplifications of the actual subject. This should be sufficient to stand as an introduction into the next step of adding a new core Perl feature, but for more information on the subject you could take a look at another blog post of mine, where I talked about optrees from the perspective of writing syntax keyword modules - Perl Parser Plugins 3 - Optrees.

One final point to note is that in some ways you can think of an optree as being similar to an abstract syntax tree (an AST). This isn't always a great analogy, because some parts of the optree don't bear a very close resemblence to the syntax of the source code that produced it. While there are certain similarities, it is important to remember it is not quite the same. For example, there is no opcode to represent the if syntax; the same opcode is used as for the and infix shortcircuit operator. It is best to think of the optree as representing the abstract algorithm - the sequence of required operations - that were described by the source code that compiled into it.

Opcodes in Perl Core

As with adding features, warnings, and keywords, the first step to adding a new opcode to the Perl core begins with editing a file under regen/. The file in this case is regen/opcodes, and is not a perl script, but a plain-text file listing the various kinds of op, along with various properties about them. The file begins with a block of comments which explains more of the details.

The choice of how to represent a new Perl feature in terms of the optree that the syntax will generate depends greatly on exactly what the behaviour of the feature should be. Especially when creating a new feature as core syntax (rather than just adding some functions in a module) the syntax and semantic shape often don't easily relate to a simple function-like structure. There aren't any hard-and-fast rules here; the best bet is usually to look around the existing ops and syntax definitions for similar ideas to be inspired by.

For example, when I added the isa operator I observed that it should behave as an infix comparison-style operator, similar to perhaps the eq or == ones. In the regen/opcodes file these are defined by the two lines:

eq		numeric eq (==)		ck_cmp		Iifs2	S S<
seq		string eq		ck_null		ifs2	S S

The meanings of these five tab-separated columns are as follows:

The source-level name of the op (this is used, capitalised, to form the constants OP_EQ and OP_SEQ).
A human-readable string description for the op (used in printed warnings).
The name of the op-checker function (more on this later).
Some flags describing the operator itself; notable ones being s - produces a scalar result, and 2 - it is a binop.
More flags describing the operands; in this case two scalars. It turns out in practice nothing cares about that column so on later additions it is omitted.

The definition for the isa operator was added in a similar style: (github.com/Perl/perl5).

--- a/regen/opcodes
+++ b/regen/opcodes
@@ -572,3 +572,5 @@ lvref               lvalue ref assignment   ck_null         d%
 lvrefslice     lvalue ref assignment   ck_null         d@
 lvavref                lvalue array reference  ck_null         d%
 anonconst      anonymous constant      ck_null         ds1
+
+isa            derived class test      ck_isa          s2

Lets now consider what we need for our new banana feature. Although we've added two new keywords in the previous part, that is just for the source code way to spell this feature. Perhaps the semantics we want can be represented by a single opcode (remembering what we said above - that the optree is more a representation of the underlying semantics of the program, and not merely the surface level syntax of how it is written).

For sake of argument, let us now imagine that whatever new syntax our new banana feature requires, its operation (via that one opcode) will behave somewhat like a string transform function (perhaps similar to uc or lc). As with so many things relating to adding a new feature/keyword/opcode/etc... it is often best to look for something else similar to copy and adjust as appropriate. We'll add a single new opcode to the list by making a copy of one of those and editing it:

leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/opcodes

leo@shy:~/src/bleadperl/perl [git]
$ git diff
diff --git a/regen/opcodes b/regen/opcodes
index 2a2da77c5c..27114c9659 100644
--- a/regen/opcodes
+++ b/regen/opcodes
@@ -579,3 +579,5 @@ cmpchain_and        comparison chaining     ck_null         |
 cmpchain_dup   comparand shuffling     ck_null         1
 
 catch          catch {} block          ck_null         |
+
+banana         banana operation        ck_null         s1

leo@shy:~/src/bleadperl/perl [git]
$ perl regen/opcode.pl 
Changed: opcode.h opnames.h pp_proto.h lib/B/Op_private.pm

The regeneration script has edited quite a few files this time. Take a look at those now. The notable parts are:

A new value named OP_BANANA has been added to the list in opnames.h.
A new entry has been added to each of several arrays defined in opcode.h. These contain the name and description strings, function pointers, and various bitflags. Of specific note is the new entry in PL_ppaddr[] which points to a new function named Perl_pp_banana.
A new function prototype for Perl_pp_banana in pp_proto.h.

If we were to try building perl now we'd find it won't currently even compile, because the opcode tables are looking for this new Perl_pp_banana function but we haven't even written it yet:

leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...
/usr/bin/ld: globals.o:(.data.rel+0xc88): undefined reference to `Perl_pp_banana'
collect2: error: ld returned 1 exit status

We'll have to provide an actual function for this. There are in fact a number of files which potentially could contain this function. pp_ctl.c contains the control-flow ops (such as entersub and return), pp_sys.c contains the various ops that interact with the OS (such as open and socket), pp_sort.c and pp_pack.c each contain just those specific ops (for various reasons), and the rest of the "normal" ops are scattered between pp.c and pp_hot.c - the latter containing a few of the smaller more-frequently invoked ops.

For adding a new feature like this, it's almost certain that we want to be adding it to pp.c. For now so that we can at least compile perl again and continue our work lets just add a little stub function that will panic if actually run.

leo@shy:~/src/bleadperl/perl [git]
$ nvim pp.c 

leo@shy:~/src/bleadperl/perl [git]
$ git diff pp.c
diff --git a/pp.c b/pp.c
index d0e639fa32..bc54a06aa3 100644
--- a/pp.c
+++ b/pp.c
@@ -7207,6 +7207,11 @@ PP(pp_cmpchain_dup)
     RETURN;
 }
 
+PP(pp_banana)
+{
+    DIE(aTHX_ "panic: we have no bananas");
+}
+
 /*
  * ex: set ts=8 sts=4 sw=4 et:
  */

leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl

Before we conclude this already-long part, there's something we have to tidy up to keep the unit tests happy. There are a few tests which care about the total list of opcodes, and since we've added one more they will now need adjusting.

porting/utils.t ........... 58/? # Failed test 59 - utils/cpan compiles at porting/utils.t line 85
#      got "Untagged opnames: banana\nutils/cpan syntax OK\n"
# expected "utils/cpan syntax OK\n"
# when executing perl with '-c utils/cpan'
porting/utils.t ........... Failed 1/82 subtests

It's non-obvious from the error result, but this is actually complaining that the module Opcode::Opcode has not categorised this opcode into a category. We can fix that by editing the module file and again doing similar to whatever uc and lc do. Again as it's a shipped .pm file don't forget to update the $VERSION declaration:

leo@shy:~/src/bleadperl/perl [git]
$ nvim ext/Opcode/Opcode.pm 

leo@shy:~/src/bleadperl/perl [git]
$ git diff ext/Opcode/Opcode.pm
diff --git a/ext/Opcode/Opcode.pm b/ext/Opcode/Opcode.pm
index f1b2247b07..eaabc43757 100644
--- a/ext/Opcode/Opcode.pm
+++ b/ext/Opcode/Opcode.pm
@@ -6,7 +6,7 @@ use strict;
 
 our($VERSION, @ISA, @EXPORT_OK);
 
-$VERSION = "1.50";
+$VERSION = "1.51";
 
 use Carp;
 use Exporter ();
@@ -336,7 +336,7 @@ invert_opset function.
     substr vec stringify study pos length index rindex ord chr
 
     ucfirst lcfirst uc lc fc quotemeta trans transr chop schop
-    chomp schomp
+    chomp schomp banana
 
     match split qr

At this point, the tests should all run cleanly again. We're now getting perilously close to actually being able to implement something. Maybe we'll get around to that in the next part.

Index | < Prev | Next >

Writing a Perl Core Feature - part 3: Keywords

2021-02-08T13:00:00.005+00:00

Index | < Prev | Next >

Some Perl features use a syntax entirely made of punctuation symbols; for example Perl 5.10's defined-or operator (//), or Perl 5.24's postfix dereference (->$*, etc..). Other features are based around new keywords spelled like regular identifiers; such as 5.10's state or 5.32's isa. It is rare to find examples where newly-added syntax can be done simply on existing operator symbols, so most new features come in the form of new keywords.

As with adding the named feature itself and its associated warning, the first step to adding a keyword begins with editing a regeneration file. The file required this time is called regen/keywords.pl.

For example when the isa feature was added, it required a new keyword of the same name: (github.com/Perl/perl5).

--- a/regen/keywords.pl
+++ b/regen/keywords.pl
@@ -46,6 +46,7 @@ my %feature_kw = (
     evalbytes => 'evalbytes',
     __SUB__   => '__SUB__',
     fc        => 'fc',
+    isa       => 'isa',
 );
 
 my %pos = map { ($_ => 1) } @{$by_strength{'+'}};
@@ -217,6 +218,7 @@ __END__
 -index
 -int
 -ioctl
+-isa
 -join
 -keys
 -kill

There are two parts to this change. The later part adds our new keyword to the main list of all the known keywords in the DATA section at the end of the script. If it wasn't for the first part of this change, then the new keyword would be recognised unconditionally in all code - almost certainly not what we want as that would cause compatibility issues in existing code. Since we have a lexical named feature for exactly this purpose, we made use of it here by listing the new keyword along with its associated feature into the %feature_kw hash so that the keyword is only recognised conditionally based on that feature being enabled.

For our new banana feature we need to decide if we're going to add some keywords, and if so what they will be called. Lets add two to make a more interesting example, called ban and ana. As before we'll start by editing the regeneration script and running it to have it rebuild some files.

leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/keywords.pl 

leo@shy:~/src/bleadperl/perl [git]
$ git diff
diff --git a/regen/keywords.pl b/regen/keywords.pl
index b9ae8cf0f2..adbec89c71 100755
--- a/regen/keywords.pl
+++ b/regen/keywords.pl
@@ -47,6 +47,8 @@ my %feature_kw = (
     __SUB__   => '__SUB__',
     fc        => 'fc',
     isa       => 'isa',
+    ban       => 'banana',
+    ana       => 'banana',
 );
 
 my %pos = map { ($_ => 1) } @{$by_strength{'+'}};
@@ -125,8 +127,10 @@ __END__
 -abs
 -accept
 -alarm
+-ana
 -and
 -atan2
+-ban
 -bind
 -binmode
 -bless

leo@shy:~/src/bleadperl/perl [git]
$ perl regen/keywords.pl 
Changed: keywords.c keywords.h

We still have a few more files to edit before we're done adding the keywords, but before continuing you should take a look at these regenerated files to see what changes have been made. Notice that this time there are no changes to any Perl files, only C files. This is why we didn't need to update any $VERSION values.

The keywords.h file just contains a long list of macros named KEY_... which give numbers to each keyword. Don't worry that most of the numbers have now changed - regen/keywords.pl likes to keep them in alphabetical order, and since we added new ones near the beginning it has had to move the rest downwards. This won't be a problem because the numbers are only internal within the perl lexer and parser, so there's no API compatibility to worry about here.

The keywords.c file contains just one function, whose job is to recognise any of the keywords by name. It returns values of these KEY_... macros. Take a look at the added code, and notice that its recognition of each of our additions is conditional on the FEATURE_BANANA_IS_ENABLED macro we saw added when we added the named feature.

We're not quite done yet though. If we were to run the full test suite now, we'd already find a few tests that fail:

op/coreamp.t .. 1/? # Failed test 591 - ana either has been tested or is not ampable at op/coreamp.t line 1178
# Failed test 593 - ban either has been tested or is not ampable at op/coreamp.t line 1178
op/coreamp.t .. Failed 2/778 subtests 
...
op/coresubs.t .. 1/? perl: op.c:14795: Perl_ck_entersub_args_core: Assertion `!"UNREACHABLE"' failed.
op/coresubs.t .. All 52 subtests passed
...
../lib/B/Deparse-core.t .. 3690/3904 # keyword 'ana' seen in ../regen/keywords.pl, but not tested here!!
# keyword 'ban' seen in ../regen/keywords.pl, but not tested here!!

#   Failed test 'sanity checks'
#   at ../lib/B/Deparse-core.t line 430.
# Looks like you failed 1 test of 3904.
../lib/B/Deparse-core.t .. Dubious, test returned 1 (wstat 256, 0x100)

The two tests in t/op are checking variations on a theme of the &CORE::... syntax, by which core operators can be reïfied into regular code references to functions that behave like the operator. Often this is appropriate for operators which act like regular functions - for example the mathematical sin and cos operators, but isn't what we want for keywords that act more structural like basic syntax. We should tell these tests to skip the new keywords by adding them to each file's skip list:

leo@shy:~/src/bleadperl/perl [git]
$ nvim t/op/coreamp.t t/op/coresubs.t 

leo@shy:~/src/bleadperl/perl [git]
$ git diff t/
diff --git a/t/op/coreamp.t b/t/op/coreamp.t
index b57609bef0..bd60ca83b9 100644
--- a/t/op/coreamp.t
+++ b/t/op/coreamp.t
@@ -1162,7 +1162,7 @@ like $@, qr'^Undefined format "STDOUT" called',
   my %nottest_words = map { $_ => 1 } qw(
     AUTOLOAD BEGIN CHECK CORE DESTROY END INIT UNITCHECK
     __DATA__ __END__
-    and cmp default do dump else elsif eq eval for foreach format ge given goto
+    ana and ban cmp default do dump else elsif eq eval for foreach format ge given goto
     grep gt if isa last le local lt m map my ne next no or our package print
     printf q qq qr qw qx redo require return s say sort state sub tr unless
     until use when while x xor y
diff --git a/t/op/coresubs.t b/t/op/coresubs.t
index 1fa11c02f0..85c08a4756 100644
--- a/t/op/coresubs.t
+++ b/t/op/coresubs.t
@@ -15,7 +15,8 @@ BEGIN {
 use B;
 
 my %unsupported = map +($_=>1), qw (
- __DATA__ __END__ AUTOLOAD BEGIN UNITCHECK CORE DESTROY END INIT CHECK and
+ __DATA__ __END__ AUTOLOAD BEGIN UNITCHECK CORE DESTROY END INIT CHECK
+  ana and ban
   cmp default do dump else elsif eq eval for foreach
   format ge given goto grep gt if isa last le local lt m map my ne next
   no  or  our  package  print  printf  q  qq  qr  qw  qx  redo  require

Now lets run those two tests in particular. We can do this by using our newly-built perl binary to run the t/harness script and pass in the paths (relative to the t/ directory) to specific tests we wish to run:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness op/coreamp.t op/coresubs.t
op/coreamp.t ... ok     
op/coresubs.t .. 1/? # Failed test 51 - no CORE::ana at op/coresubs.t line 53
# Failed test 58 - no CORE::ban at op/coresubs.t line 53
op/coresubs.t .. Failed 2/1099 subtests 

Test Summary Report
-------------------
op/coresubs.t (Wstat: 0 Tests: 1099 Failed: 2)
  Failed tests:  51, 58
Files=2, Tests=1875,  1 wallclock secs ( 0.35 usr  0.02 sys +  0.67 cusr  0.03 csys =  1.07 CPU)
Result: FAIL

Well that's one solved, but the other is still upset. This time it is complaining that it expected not to find a &CORE::ana at all, but instead one was there. In order to fix that we will have to edit the list of exceptions in gv.c.

leo@shy:~/src/bleadperl/perl [git]
$ nvim gv.c

leo@shy:~/src/bleadperl/perl [git]
$ git diff gv.c
diff --git a/gv.c b/gv.c
index 92bada56b1..10271159dc 100644
--- a/gv.c
+++ b/gv.c
@@ -543,8 +543,9 @@ S_maybe_add_coresub(pTHX_ HV * const stash, GV *gv,
     switch (code < 0 ? -code : code) {
      /* no support for \&CORE::infix;
         no support for funcs that do not parse like funcs */
-    case KEY___DATA__: case KEY___END__: case KEY_and: case KEY_AUTOLOAD:
-    case KEY_BEGIN   : case KEY_CHECK  : case KEY_cmp:
+    case KEY___DATA__: case KEY___END__: case KEY_ana   : case KEY_and    :
+    case KEY_AUTOLOAD: case KEY_ban    : case KEY_BEGIN : case KEY_CHECK  :
+    case KEY_cmp     :
     case KEY_default : case KEY_DESTROY:
     case KEY_do      : case KEY_dump   : case KEY_else  : case KEY_elsif  :
     case KEY_END     : case KEY_eq     : case KEY_eval  :

Now we rebuild perl (because we have edited a C file) and rerun the tests:

leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness op/coreamp.t op/coresubs.t 
op/coreamp.t ... ok     
op/coresubs.t .. ok      
All tests successful.
Files=2, Tests=1875,  1 wallclock secs ( 0.43 usr  0.02 sys +  0.76 cusr  0.02 csys =  1.23 CPU)
Result: PASS

The test under ../lib/B/Deparse-core.t checks the behaviour of the B::Deparse module against the core keywords. (The path is relative to the t/ directory, which is why it begins with .., and shows that tests within bundled core modules are counted as part of the full test suite.)

When the isa feature was added, this test file was updated to add some deparsing tests around the isa operator as a regular infix binary syntax. We'll come back later and add some unit tests for our new ban and ana keywords, but for now as with the coreamp and coresubs tests it is best to just add these to the skip list in that test file as well.

leo@shy:~/src/bleadperl/perl [git]
$ nvim lib/B/Deparse-core.t 

leo@shy:~/src/bleadperl/perl [git]
$ git diff lib/B/Deparse-core.t
diff --git a/lib/B/Deparse-core.t b/lib/B/Deparse-core.t
index cdbd27ce5e..edf86f809d 100644
--- a/lib/B/Deparse-core.t
+++ b/lib/B/Deparse-core.t
@@ -362,6 +362,8 @@ my %not_tested = map { $_ => 1} qw(
     END
     INIT
     UNITCHECK
+    ana
+    ban
     default
     else
     elsif

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness ../lib/B/Deparse-core.t
../lib/B/Deparse-core.t .. ok         
All tests successful.
Files=1, Tests=3904, 17 wallclock secs ( 1.17 usr  0.06 sys + 16.86 cusr  0.06 csys = 18.15 CPU)
Result: PASS

At this point we now have a named feature with its associated warning, and some conditionally-recognised keywords. In the next parts we will get the compiler to recognise these when parsing Perl code.

Index | < Prev | Next >

Writing a Perl Core Feature - part 2: warnings.pm

2021-02-05T13:00:00.002+00:00

Index | < Prev | Next >

Ever since Perl version 5.18, newly added features are initially declared as experimental. This gives time for them to be more widely tested and used in practice, so that the design can be further refined and changed if necessary. In order to achieve this for a new feature our next step will be to add a warning to warnings.pm.

Similar to the named feature in feature.pm this file also isn't edited directly, but instead is maintained by a regeneration script; this one called regen/warnings.pl.

For example, the isa feature added a new warning here: (github.com/Perl/perl5).

--- a/regen/warnings.pl
+++ b/regen/warnings.pl
@@ -16,7 +16,7 @@
 #
 # This script is normally invoked from regen.pl.
 
-$VERSION = '1.45';
+$VERSION = '1.46';
 
 BEGIN {
     require './regen/regen_lib.pl';
@@ -117,6 +117,8 @@ my $tree = {
                                     [ 5.029, DEFAULT_ON ],
                                 'experimental::vlb' =>
                                     [ 5.029, DEFAULT_ON ],
+                                'experimental::isa' =>
+                                    [ 5.031, DEFAULT_ON ],
                         }],
 
         'missing'       => [ 5.021, DEFAULT_OFF],

This change simply adds another entry into the list of defined warnings. It has a name, a Perl version from which it appears, and is declared to be on by default (as all "experimental" warnings should be). We also have to bump the version number because that is the value inserted into the generated warnings.pm file.

For adding a new warning to go along with our banana feature, we follow a similar process to what we did for the named feature bit. We edit the regeneration file to make a similar change to the one seen above, then run the script to have it generate the required files.

leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/warnings.pl 

leo@shy:~/src/bleadperl/perl [git]
$ perl regen/warnings.pl 
Changed: warnings.h lib/warnings.pm

As before, we can see that it has generated the new lib/warnings.pm Perl pragma file, and also a header file for compiling the interpreter itself. Take a look at these files now to get a feel for what's there.

In particular, the items of note are:

The generated warnings.pm file includes changes to the documented list of known warning categories.
A new WARN_EXPERIMENTAL__BANANA macro has been created in the warnings.h file. We shall be seeing this used soon.

Now that we have both the named feature and the experimental warning we can check that the experimental pragma module can enable it:

leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -ce 'use experimental "banana";'
-e syntax OK

We're now one step closer to being able to actually start implementing this feature.

Index | < Prev | Next >

Writing a Perl Core Feature - part 1: feature.pm

2021-02-03T13:00:00.014+00:00

Index | < Prev | Next >

The first step towards adding a new feature to Perl is introducing the new name into feature.pm, so that it may be requested by

use feature 'banana';

To accomplish this we don't actually edit feature.pm directly, because that is a file which is automatically generated from other source. The primary file we need to work on that lives in the regen/ directory, called regen/feature.pl.

For example, when adding the isa feature this was the change made there: (github.com/Perl/perl5).

--- a/regen/feature.pl
+++ b/regen/feature.pl
@@ -35,6 +35,7 @@ my %feature = (
     unicode_strings => 'unicode',
     fc              => 'fc',
     signatures      => 'signatures',
+    isa             => 'isa',
 );
 
 # NOTE: If a feature is ever enabled in a non-contiguous range of Perl
@@ -752,6 +753,14 @@ Reference to a Variable> for examples.
 
 This feature is available from Perl 5.26 onwards.
 
+=head2 The 'isa' feature
+
+This allows the use of the C<isa> infix operator, which tests whether the
+scalar given by the left operand is an object of the class given by the
+right operand. See L<perlop/Class Instance Operator> for more details.
+
+This feature is available from Perl 5.32 onwards.
+
 =head1 FEATURE BUNDLES
 
 It's possible to load multiple features together, using

We can see two distinct parts in here. The first, a single line addition to the %feature hash, is the part which actually introduces the new name. The second part adds some documentation for it, which will appear in the generated feature.pm file.

To add our new banana feature then, this is where we must start editing. For now don't worry too much about the documentation part - we'll come back to that later. Just add a single line into the %feature hash.

leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/feature.pl

Once we've made our required changes in here, we run the script to get it to regenerate its files. Note that we need to use a perl to run this, but it doesn't have to be the one we are trying to build (indeed - that would be problematic would it not? ;) ). Any recently up-to-date system Perl install will be fine.

leo@shy:~/src/bleadperl/perl [git]
$ perl regen/feature.pl 
Changed: lib/feature.pm feature.h

Here we can see that it has regenerated two files. The first of these is the lib/feature.pm file that the perl VM will use at runtime to implement the actual use feature pragma with. The second file is feature.h which is used during compiling the interpreter itself and contains the various feature-test macros. If you want, take a look now at the changes it has made.

Specifically, notice that:

A new FEATURE_BANANA_BIT macro has been created, and a value assigned to it. These features are kept in numerical order, so also notice that the subsequent features have been renumbered. This is fine - the bit fields are only used internally and there are no API guarantees of numerical stability between major versions of Perl.
A new FEATURE_BANANA_IS_ENABLED macro has been created, which other code may use to test if the feature is currently in effect during compile-time. Keep note of this - we will be seeing it again later on.
The other change in the file is in the S_magic_sethint_feature() function, which adds code to recognise the string name of the new feature; this is ultimately used by use feature ... line itself to recognise the names of the requested features.

At this point already, we can test that the newly-created feature is at least recognised by the feature.pm file itself:

leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...

leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -ce 'use feature "banana";'
-e syntax OK

It actually turns out that the particular commit that added isa was somewhat atypical. It didn't actually need to change the $VERSION of the generated file, because another change earlier in the history had already done so. This is unlikely to be the case most of the time.

Now would be a good time to introduce the porting tests. This is a subset of the full test suite, which checks various details to do with whether the source code is being maintained properly. We can run these directly:

leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
porting/cmp_version.t ..... 1/4 # not ok 3 - lib/feature.pm version 1.62
porting/cmp_version.t ..... Failed 1/4 subtests 
...
Test Summary Report
-------------------
porting/cmp_version.t   (Wstat: 0 Tests: 4 Failed: 1)
  Failed test:  3
Files=32, Tests=44043, 188 wallclock secs ( 7.88 usr  0.16 sys + 186.14 cusr  3.98 csys = 198.16 CPU)
Result: FAIL

Here indeed we see that for our banana feature we have forgotten to bump the version number. No matter, we can do that now and test again:

leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/feature.pl 

leo@shy:~/src/bleadperl/perl [git]
$ perl regen/feature.pl
Changed: lib/feature.pm

leo@shy:~/src/bleadperl/perl [git]
$ git diff
...
--- a/lib/feature.pm
+++ b/lib/feature.pm
@@ -5,7 +5,7 @@
 
 package feature;
 
-our $VERSION = '1.62';
+our $VERSION = '1.63';
 
 our %feature = (
     fc                   => 'feature_fc',
...

leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
All tests successful.
Files=32, Tests=44044, 175 wallclock secs ( 7.32 usr  0.12 sys + 174.11 cusr  3.58 csys = 185.13 CPU)
Result: PASS

While working on core features it's often a good idea to make use of the porting tests regularly at least. The full test suite takes quite a while to run and likely most of it won't affect the particular parts of a new feature you are working on (especially as new features should be lexically guarded and thus limited in impact in the vast majority of the exiting test suite which won't be expecting it), but the porting tests are designed to be fairly small and lightweight to run often enough and keep an eye on the most likely things to check.

Index | < Prev | Next >

Writing a Perl Core Feature

2021-02-01T14:42:00.014+00:00

(Index) | < Prev | Next >

One of the headline features that was added in Perl version 5.32.0 was the isa operator. This feature was written by me, and while the actual development history of it spanned many commits, they were all squashed into one to be merged into the actual blead branch, which is the main development head for the Perl interpreter itself.

That commit can be seen on github.

At initial glance the commit looks quite long and involved - some might even say scary. But in practice there's somewhat less to it than may first appear. For one thing, while the commit touches 33 different files, 12 of those files are automatically generated from other files in the repository, and comprise the majority of the actual lines of diff (413 of the 656 lines in total).

One thing I found while writing that and getting it reviewed was how few people are actually aware of all the inner details of what goes into creating such a feature. Therefore I've decided to write this blog post series, in which I will take that commit apart in detail, and go over all the individual pieces. My aim here is to not only explain what they're all doing there, but additionally to talk you through the process of creating a feature yourself. Along the way we'll also take a look at some other commits and details of other features. We'll also follow the development of a new, hypothetical feature called banana - a word unlikely to collide with any existing or future feature, so it should be easy enough to grep out, and find in these examples.

These examples have all been written midway through the Perl version 5.33.x development series, and so will relate to various internal details of that version. If you are reading this at some point off in the future when internals have change significantly you may have to adjust to cope - but hopefully I will have edited these posts to remain relevant to whatever is current-generation technology in the meantime.

I'll be entirely honest here - at least half of the point of my writing this series is as a handy reference for me to read again in future the next time I want to do one of these. But I hope other people will find it useful too. Readers are expected to be familiar with using and writing Perl code, as well as have some experience of writing C code. Some knowledge of the internals of the Perl interpreter (such as from writing XS code) might be useful, but I'll try to explain any particularly in-depth concepts as we encounter them, so don't worry too much there.

Rather than write the whole thing in one big post, I shall split it across various sections. Each of them will be linked from this list for easy reference. Not every potential new feature will need every one of these stages, and of course there may be situations where other things need adding or changing, but overall this is a reasonable first guess at what may need to be done.

If you're generally curious about what goes into these things, or looking for a general overview of the process, I suggest reading all of them in order - several will depend on concepts introduced previously. I'll also leave a handy index of all of them here, for easy reference if you want to look up particular things.

The next parts of this post series are:

(Index) | < Prev | Next >

2020 Perl Advent Calendar - Day 25

2020-12-25T16:00:00.013+00:00

<< First | < Prev

Bonus Day!

Over this blog post series we have built up to the post on day 24, which explains that all of what we've seen this series is available and working in Perl, right now. It is the Perl we can write in 2020. All of this has been possible because of the custom keyword feature which was first introduced to Perl in version 5.14.

When Perl gained the ability to support custom keywords provided by modules it started down the path that CPAN modules would experiment with new language ideas. Already a number of such modules exist, and it is likely this idea will continue to develop. What new ideas might turn up in the next few years, and will any of them evolve to become parts of the actual core language?

Here's a collection of some thoughts of mine. Some of these can be implemented in CPAN modules, in the same way as the four modules we've already seen this series. Other ideas however go beyond what would be possible via keywords alone, and stray into the realm of ideas that really do need core Perl support.

Match/case Syntax

Perl 5.10 added the smartmatch operator, ~~. I think we can mostly agree it has not been the success many had been hoping for. Its rules are complex and subtle, and there's far too many of them to remember. Furthermore, it still doesn't express the most basic question of whether basic scalar comparisons for equality are performed as string or number tests. For example, is the expression 5 ~~ "5.0" true or false? I honestly don't know and the fact I'd have to look it up in a big table of behavior suggests that the thing has failed to achieve its goal.

Yet still we are left without a useful syntax to express control-flow dispatch based on comparing a given value to several example choices - a task for which many languages use keywords switch and case. I have already written to Perl5 Porters with my thoughts on a design I have nicknamed "dumb match", in response to this. The basic idea of dumb match is to make the programmer write down their choice of operator to be used to compare the given value with the various alternatives.

match($var : eq) {
    case("abc") { ... }
    case("def") { ... }
    
    case("ij", "kl", "mno") { ... }  # any of these will match
}

Here the programmer has specifically requested the eq operator, so we know these are stringy comparisons. Alternatively they could have requested any of

match($var : ==) {
    # numerical comparisons
    case(123) { ... }
    case(456) { ... }
}

match($string : =~) {
    # regexp matches
    case(m/^pattern/)    { ... }
    case(m/morestring$/) { ... }  # only the first match wins
}

match($obj : isa) {
    # object class matches
    case(IO::Handle) { ... }
}

Type Assertions

Various people have in various times written about or designed all sorts of variations on a theme of a "type system" for Perl. I have written reactions to some of those ideas before.

The idea I have in mind here is less a feature in itself, and more a piece of common ground for several of the other ideas, though it may have applications to existing pieces of Perl syntax. Common to several ideas is the need to be able to ask, at runtime, whether a given value satisfies some classification criteria. People often bring up thoughts of assertions like "this is a string" or "this is an integer" at the start of these discussions, but that isn't really within the nature or spirit of what Perl's value system can answer. Instead, I think any workable solution would be written in terms of the existing kinds of comparisons.

Perl 5.32 added the isa operator - a real infix operator that asks if its first operand is an object derived from the class given by its second.

if($arg isa IO::Handle) {
    ...
}

This is certainly one kind of type assertion. I could imagine a new keyword, for the sake of argument for now lets call it is^*, which can answer similar yes/no questions on a broader category of criteria. It is likely that the righthand side argument would have to be some sort of expression giving a "type constraint", though exactly what that is I admit I don't have a neat design for currently.

^*: Yes, I'm aware this operator choice would interfere probably with Test::More::is. Likely a solution can be found somehow, either by a better naming choice, better parser disambiguation, or a lexical feature guard.

It may be the case that generic type constraints can be constructed with an arbitrary Perl expression to explain how to test if a value meets the constraint:

type PositiveNumber is Numeric where { $_ > 0 };

While in general that would be the most powerful system, it may not lead to a very good performance for several of the other ideas here, so I am still somewhat on the fence about this sort of detail. Because I don't have a firm design on this yet, for the rest of this post I'm just going to give examples using the isa operator instead. But any of the examples or ideas would definitely apply to a more generalised type constraint operator or system, whenever one came to exist.

In any case, once a generic is operator exists for testing type constraints, it feels natural to allow that in match/case syntax too:

match($value : is) {
    case(PositiveNumber) { ... }
    case(NegativeNumber) { ... }
}

In addition it would be wanted in function and method signatures:

method exec($code is Callable)
{
    ...
}

And also object slot variables:

class Caption
{
    has $text is Textual;
    ...
}

Multiple Dispatch

Another idea that comes once you have assertions is the idea of hooking that into function dispatch itself. Some languages give you the ability to define the same-named function multiple times, with different kinds of assertion on its arguments, and at runtime the one that best matches the given arguments will be chosen. There are usually many rules and subtleties to this idea, so it may not ultimately be very suitable for Perl, but if a constraint system did exist then it would be relatively simple to write a CPAN module providing a multi keyword to allow these.

multi sub speak($animal isa Cow) { say "Moo" }

multi sub speak($animal isa Sheep) { say "Baaah" }

Naturally this syntax ought to be implemented in a way that means it still works with method and async as well, allowing us to just as easily

async multi method speak_to($animal isa Goose)
{
    await $self->say("Boo", to => $animal);
}

Signature-like List Assignment

Perl 5.20 introduced signatures, which can be imagined as a neatening up of the familiar syntax of unpacking the @_ list into variables. In some ways the following two functions could be considered identical:

sub add_a
{
    my ($x, $y) = @_;
    return $x + $y;
}

sub add_b($x, $y)
{
    return $x + $y;
}

This does however brush over a few more subtle details of signatures. Firstly, signatures are more strict on the number of values they receive vs. how many they were expecting. While this is a useful feature, it seems odd that Perl now lacks any syntax for performing a list unpack and checking that it has exactly the right number of elements in any situation other than the arguments from function entry.

For that task, I could imagine an operator maybe spelled := which acts exactly the same as a signature on a function:

my ($x, $y) := (1, 2);

my ($x, $y) := (1, 2, 3);  # complains about too many values
my ($x, $y) := (1);        # complains about not enough values

Of course, there's more to signatures than simply counting the elements. Signatures permit a default value to be used if the caller did not specify it; we could allow that too:

my ($one, $two, $three = 3) := (1, 2);

If signatures gain features like type assertions then it seems natural to apply them to the signature-like list assignment operator as well, allowing that to check also:

my ($item isa Item, $group isa Group) := @itemgroup;

If key/value unpacking of named arguments arrives then that too would be useful for unpacking a hash:

my (:$height, :$width) := %params;

Twigils

The slot variables introduced by Object::Pad are written the same as regular lexical variables. I have for a while wished them to be distinct from regular lexicals, so they stand out better visually. The $: syntax can easily be made available, allowing them to be written with that instead:

class Point
{
    has $:x = 0;
    has $:y = 0;
    
    method describe($name) {
        say "Hello $name, this point is at ($:x, $:y)";
    }
}

I accept this is a much more subjective idea than most of the other features. Personally I find it helps to visually distinguish object slots, now that they don't have such notation as $self->{...} to remind you.

True Core Implementations

As earlier mentioned, some of these ideas can be implemented as CPAN modules (those introduced by new keywords), but others (such as the := operator) would require core Perl support. It would also be nice to see some of the more established and stable CPAN keyword modules implemented in core Perl as true syntax as well.

It would be great if, in 2025, we could simply

use v5.40;    # or maybe it will be use v7.x by then

try { ... }
catch ($e) { ... }

class Calculator {
    method add($x, $y) { ... }
}

Having these available to the core language would hopefully mean that a lot more code would more quickly adopt them as features. While these things are all available as CPAN modules, and work even on historic Perl versions as far back as 5.16 from 2012, it seems that some people don't want to make use of such syntax features unless they are provided by the core language itself. Moving the implementation into core may help for other reasons too, such as efficiency of operation, or allowing them to do yet more abilities not available to them while they are third-party modules.

All in all, it's something we can hope for over the next five years...

<< First | < Prev

2020 Perl Advent Calendar - Day 24

2020-12-24T12:00:00.102+00:00

<< First | < Prev

Over the course of this blog post series we have seen a number of syntax-providing modules from CPAN. Each of them sets out to neaten up some specific kind of structure often found in Perl code.

Future::AsyncAwait aims to neaten up asynchronous code flow, replacing older techniques like ->then method chaining and helper functions from Future::Utils by replacing them with regular Perl syntax.
Syntax::Keyword::Try brings the familiar try/catch pattern for handling exceptions, replacing more manual techniques involving eval {} blocks and inspecting the $@ variable.
Object::Pad provides an entire set of syntax keywords for managing classes of objects, allowing stateful object-oriented code to be neatly written without the risk of things like hash key collsions on $self->{...}.

Each one of these allows writing shorter, neater code that has less "machinery noise". With fewer distractions in the code it becomes clearer to see the detail of the specific situation the code is for. With less code to write there's less opportunity to introduce bugs.

Moreover we have seen that these syntax modules can be combined together, used in conjunction to allow even greater benefits. We saw on day 4 that try/catch control flow works within async sub, on day 22 that object methods can be marked as asynchronous with async method, and on day 23 we explored how the dynamically assignment syntax can be combined with objects, asynchronous functions, and even both at the same time.

The various code examples we've seen over the past 22 days or so have been written using these syntax modules, and also make use of Perl's signatures feature, and other things where possible, all to help in this regard. The shorter neatness that comes from not needing to write the line (or two) of code to unpack the function's arguments from @_ (and maybe the $self method invocant as well) removes yet another distraction and potential source of errors.

In summary: This series has been about what it feels like to write Perl code in the year 2020 - it has been about 2020 Perl. This is a language just as flexible and adaptable as Perl has ever been, yet still capable of any of the modern techniques common to other languages, which perhaps even the Perl of five or ten years ago was lacking in - neat function arguments, asynchronous control, exception handling, and syntax for object orientation. With all these new abilities, 2020 has been a great year for writing Perl code.

<< First | < Prev

2020 Perl Advent Calendar - Day 23

2020-12-23T12:00:00.005+00:00

<< First | < Prev | Next >

For today's article, I'd like to take a look at yet another of my syntax-providing CPAN modules, Syntax::Keyword::Dynamically. This provides a single new keyword, dynamically. To quote its documentation:

Syntactically and semantically it is similar to the built-in perl keyword local, but is implemented somewhat differently to give two key advantages over regular local:

You can dynamically assign to lvalue functions and accessors.

You can dynamically assign to regular lexical variables.

This is important to us when working with Object::Pad because of the way slot variables work. Within a method body a slot looks like a regular lexical variable. This means that Perl's regular local keyword refuses to interact with one. If we want to assign a new value temporarily, only for the duration of one block of code and have it restored automatically afterwards, we must use dynamically instead.

For example, both Syntax::Keyword::Dynamically and Object::Pad contain a copy of a unit test which asserts that their interaction works as expected.:

has $value = 1;
method value { $value }

method test
{
    is $self->value, 1, 'value is 1 initially';

    {
        dynamically $value = 2;
        is $self->value, 2, 'value is 2';
    }

    is $self->value, 1, 'value is 1 finally';
}

If instead we were to try this using core Perl's local it fails to compile:

...
    {
        local $value = 2;
        ...

$ perl -c example.pl
Can't localize lexical variable $value at ...

When a variable is dynamically assigned a new value inside an asynchronous function it has to be swapped back to its original value while that function is suspended, and its new value put back when the function resumes. This may have to happen several times before the function eventually returns. The way that dynamically is implemented means it is supported by Future::AsyncAwait and can detect the times it needs to swap values back and forth.

There is also a unit test which checks this interaction in both Syntax::Keyword::Dynamically and Future::AsyncAwait:

my $var = 1;

async sub with_dynamically
{
    my $f = shift;

    dynamically $var = 2;

    is $var, 2, '$var is 2 before await';
    await $f;
    is $var, 2, '$var is 2 after await';
}

my $f1 = Future->new;
my $fret = with_dynamically( $f1 );

is $var, 1, '$var is 1 while suspended';

$f1->done;
is $var, 1, '$var is 1 after finish';

Given these three modules are now known to be working nicely in each of the three pairwise combinations, you might wonder if all three can be combined at once - can you dynamically change the value of an object slot during an async method? The answer is still yes.

All three of these module distributions contain a copy of a unit test which checks this behaviour:

class Logger {
    has $_level = 1;

    method level { $_level }

    async method verbosely {
        my ( $code ) = @_;
        dynamically $_level = $_level + 1;
        is $self->level, 2, 'level is 2 before code';
        await $code->();
        is $self->level, 2, 'level is 2 after code';
    }
}

my $logger = Logger->new;

my $f1 = Future->new;
my $fret = $logger->verbosely(async sub {
    is $logger->level, 2, 'level is 2 before await';
    await $f1;
    is $logger->level, 2, 'level is 2 after await';
});

is $logger->level, 1, 'level is 1 outside';

$f1->done;

is $logger->level, 1, 'level is 1 finally';

Each of these syntax modules has provided something useful on its own, but as we have seen both yesterday and today they can be combined with each other to provide even more useful behaviours. It is easily possible to create CPAN modules that operate together to extend the Perl language with new syntax and semantics, and have those extensions work and feel every bit as convenient and powerful as all of the native syntax built into the language.

<< First | < Prev | Next >

2020 Perl Advent Calendar - Day 22

2020-12-22T12:00:00.096+00:00

<< First | < Prev | Next >

We started off this advent calendar series looking at the async/await syntax provided by Future::AsyncAwait, and the way that functions can be marked as async. More recently we have been looking at the class and object syntax provided by Object::Pad, such as syntax to provide named methods. Some of you may be wondering whether these two things can be combined; whether methods can be marked as being asynchronous. The answer is yes.

The way that these two modules are implemented means that they can coöperate on how functions are parsed. The end result is that a method can be declared using the combined keywords async method and it behaves exactly as expected. Namely, that $self and the class's slot variables are available within the code, it returns a future-wrapped value, and permits the await keyword.

For example, back on day 6 we saw an example of await with a //= shortcircuit expression to optionally wait for a read operation to fill a cache on an object, implemented with a $self->{...} key inside async sub. At the time I said that the example was slightly reworded from the original code. That is because in reality, the code is implemented using the combination of async and method:

use Object::Pad;
use Future::AsyncAwait;

class Device::Chip::TSL256x extends Device::Chip;

...

has $_TIMINGbytes;

async method _cached_read_TIMING ()
{
    return $_TIMINGbytes //= await $self->_read(REG_TIMING, 1);
}

In fact, almost every post after that also had some code taken from modules that are implemented using async method. In each case, the real code was in fact shorter and more concise than the posted example because it did not have to start with the my $self = shift; line initially, and could use the shorter slot variables instead of hash key accesses on $self->{...}.

These two syntax modules - either individually or in combination - are able to greatly neaten a lot of common code patterns. To see just how much they provide here is what the method above might have been written if neither syntax module was used:

sub _cached_read_TIMING
{
    my $self = shift;

    return Future->done($self->{TIMINGbytes})
        if defined $self->{TIMINGbytes};
    
    return $self->_read(REG_TIMING, 1)->then(sub {
        ($self->{TIMINGbytes}) = @_;
        return Future->done($self->{TININGbytes});
    });
}

In this version of the code it is far less obvious to see the flow of the logic. The caching behaviour of the TIMINGbytes field is harder to see, hidden by the various machinery of the future return value and ->then chaining. Additionally, the $self->{TIMINGbytes} field is referred to four times here - each one being just a hash key, and thus prone to typoes. Sure there are techniques to help detect such problems with classical Perl hash-based objects (such as locked hashes), but those all detect runtime attempts to actually touch the fields; none of them are able to point out problems at compiletime.

Such an error would be detected at compiletime using an Object::Pad-based slot variable:

has $_TIMINGbytes;

async method _cached_read_TIMING {
    return $_TININGbytes //= await $self->_read(REG_TIMING, 1);
}

$ perl -c example.pl
Global symbol "$_TININGbytes" requires explicit package name
  (did you forget to declare "my $_TININGbytes"?) at ...

By the way, did anyone spot the typo on the long example code above? I didn't, the first time I wrote it... ;)

<< First | < Prev | Next >

2020 Perl Advent Calendar - Day 21

2020-12-21T12:00:00.013+00:00

<< First | < Prev | Next >

So far we've been looking at features of some syntax modules that are relatively well-established - Future::AsyncAwait has a couple of years of production battle-testing against it, and even Object::Pad's basic class features have been found to be quite stable over the past six months or so. For today's article I'd like to take a slightly different direction and take a look at something much newer and still under experimental design.

Some object systems which use inheritance to create derived classes out of base ones (including the base system in Perl itself) support the idea that a given class may have multiple bases. This is called Multiple Inheritance. Iniitally it may sound like a useful feature to have, but in practice trying to support it makes implementations of object systems more complicated, and can lead to situations where the choice of correct behaviour is non-obvious, or in some cases conflicting with what may seem sensible. Situations get especially complicated if the same partial class appears multiple times in the inheritance hierarchy leading up to a given class.

For this reason most modern object systems, including Object::Pad, do not support multiple interitance, to keep behaviours simpler. In order to try to provide the same useful properties (that of being able to share code from multiple component classes), they provide a somewhat different idea, called roles. A role can be considered similar to a partial class which can be merged into a real class. A role can provide methods, BUILD blocks, and slot variables. In many ways a role appears the same as a class, except that instances of it cannot be directly created. To be used as an instance a role must be applied to a class. This has the effect of copying all of the pieces of that role into the target class.

For example, in the Tickit-Widget-Menu distribution there are two different classes of object that can appear in a menu - an individual menu item, or a submenu. In order to avoid code duplication by copying parts of the implementation around both classes, the common behaviours are implemented in a role, by using the role keyword:

use Object::Pad 0.33;

role Tickit::Widget::Menu::itembase;

has $_name;

BUILD (%args)
{
    $_name = $args{name}
}

...

To apply this role to both of the required classes each uses the implements keyword on its class statement to copy the components of that role into the class:

use Object::Pad 0.33;

class Tickit::Widget::Menu:::Item
    implements Tickit::Widget::Menu::itembase;
...

class Tickit::Widget::Menu::base
    implements Tickit::Widget::Menu::itembase;
...

Superficially this might feel like it suffers the same problems as multiple inheritance, but keep in mind that applying a role is basically just a fancy form of copy-pasting the code into the class. There is no runtime lookup of methods or other class items whenever they are accessed. The parts of a role are simply copied individually into the class that applies it. This means that any naming conflicts are detected as errors at compile-time, alerting the programmer to the potential problem:

use Object::Pad 0.33;

role R
{
    method collides() {}
}

class C implements R
{
    method collides() {}
}

$ perl example.pl
Method 'collides' clashes with the one provided by role R at ...

A program will only successfully compile if there are no naming collisions. As a result of this, and because the pieces of the role are simply copied into a class, it means that it does not matter in what order individual roles are applied to a class, nor does it matter if the same role is applied multiple times within the hierarchy (e.g. if both a class and its base class tried to apply the same role). The end result is always the same, presuming no conflicts. This compiletime check, and flexibility on ordering and duplicate application, helps to ensure more robust code.

<< First | < Prev | Next >

2020 Perl Advent Calendar - Day 20

2020-12-20T12:00:00.105+00:00

<< First | < Prev | Next >

We have now seen the way that the has keyword creates a new kind of variable, called a slot variable, where object instances can store their state values. All of the code in yesterday's examples creates variables that begin, like a new my variable, as the undefined value. Often though with an object instance we want to store some other value initially. For this there are two options available.

In simple cases where slot variables of any new object should start off with the same default value we can use an expression on the has statement itself to assign a default value. In these two examples, the slot is initialised from a simple constant.

class Device::Chip::AD9833 extends Device::Chip;

has $_config = 0;

class Tickit::Widget::LinearSplit
    extends Tickit::ContainerWidget;
    
has $_split_fraction = 0.5;

These are compiletime constants, though any form of expression is allowed here. However, note: much like would apply to a my or our variable in the scope of an entire package or class, any expression is evaluated just once at the time the class itself is first created. The resulting value is stored as the default for every new instance. This expression is not evaluated for each new instance individually. Thus it is rare in practice to see anything other than a constant here. For example, using an expression that created some new helper object would mean that all new instances of the containing class will share the same reference to the same helper object - unlikely what was intended.

For more complex situations which require code to be evaluated for every new instance of a class we can use a BUILD block. This provides a block of code which is run as part of the construction process for every individual instance of the class. For example, this BUILD block allows us to create a new mutex helper instance for every instance of the containing class:

class Device::Chip::LEO1306
    extends Device::Chip::Base::RegisteredI2C;

use Future::Mutex;

has $_mutex;

BUILD
{
    $_mutex = Future::Mutex->new;
}

The BUILD block is basic syntax, similar to Perl's own BEGIN block for instance. People familiar with object systems like Moo and Moose especially should take note - a BUILD block is not a method. It does not take the sub or method keyword, and it cannot be called like one.

Whenever a new instance is invoked BUILD block is passed a copy of the argument list given to the constructor. A common task is to set slot variables from those, or perhaps applying defaults if values weren't specified. It is also a common style in Perl for constructor arguments to passed in an even-sized key/value list, so they can be easily unpacked as a hash variable. This makes it simple for BUILD blocks to inspect the named keys they're interested in. Despite not being a true method, a BUILD block still permits a signature to unpack its arguments as if it were one.

class Device::Chip::CC1101 extends Device::Chip;

has $_fosc;
has $_poll_interval;

BUILD (%opts)
{
    $_fosc          = $opts{fosc} // 26E6;
    $_poll_interval = $opts{poll_interval} // 0.05;
}

There is still much ongoing design work here. It turns out in practice that a large majority of the code in BUILD blocks is something like this form - a series of lines, each setting a slot variable from one constructor argument.

There may be value in having Object::Pad provide a convenient way to let each slot variable declaration specify how it should be initialised from name constructor arguments. This would help keep the code less cluttered by the low-level machinery, and allow additional features such as error checking by rejecting unrecognised key names. This would, however, involve Object::Pad specifying that constructor arguments must be in named argument pairs, which it currently does not.

<< First | < Prev | Next >

2020 Perl Advent Calendar - Day 19

2020-12-19T12:00:00.123+00:00

<< First | < Prev | Next >

We have already discussed that the most fundamental property of an object-oriented programming is the idea that a collection of state can be encapsulated into a single piece, and given behaviours that operate on the state. In yesterday's article we saw how to create new classes of object (with the class keyword), and how to add behaviours (with the method keyword). Today we'll take a closer look at the other half of this - how to add state.

While the word "method" seems to be fairly well entrenched, various object systems across various languages have a variety of different words to describe the state values stored for each given instance. The word "field" has been used in Perl before, and refers specifically to the now-obsolete fields pragma. Sometimes programmers refer to "attributes" of an object, but in Perl this is also an overloaded term referring to the :named annotations that can be applied to functions or variables. In Object::Pad the per-instance state is stored in variables called "slots".

Within a class, slots are created by the has keyword. This looks and feels similar to the my and our keywords. It introduces a new variable, optionally initialised with the value of an expression. Whereas a my or our variable is visible to all subsequent code (including nested functions) within its scope, a has variable is only visible within functions declared as method, because it will be associated with individual instances of the object class.

In this example the slot variables storing the label and click behaviour are available within any method:

class Tickit::Widget::Button extends Tickit::Widget;

has $_label;
has $_on_click;

method label { return $_label; }

method set_label
{
    ( $_label) = @_;
    $self->redraw;
}

method on_click { return $_on_click; }

method click
{
    $_on_click->($self);
}

In terms of visibility these slot variables behave much like other kinds of lexical variable - namely, they are not visible from outside the source of this particular class. This means that by default any such state variables are private to the class's implementation, inaccessible by other code that uses the class. We can choose to expose certain parts of it via the class's interface by providing these accessor methods, but we are not required to do so.

It is a common style in Object::Pad-based code to name the slot variables with a leading underscore, as in this example, as it helps them to stand out visually in larger code. It helps remind people that these are slot variables, because they now lack other visual signalling (such as $self->{...}) to otherwise distinguish them.

Another common behaviour is creating simple accessor methods to simply return the value of a slot, thus deciding to expose that particular variable as part of the object's interface, visible to callers. So common in fact that Object::Pad provides a shortcut to create these accessor methods automatically:

class Device::Chip::SSD1306 extends Device::Chip;

has $_rows :reader;
has $_columns :reader;

# now the class has ->rows and ->columns methods visible

The :reader attribute requests that a simple accessor method is created to return the current value of the slot. It is named the same as the slot, with a leading underscore first removed to account for the common naming convention.

One key advantage that these variable-like slots have over classical Perl objects built on hash keys or data provided by accessor methods is that the names are scoped within just the class body that defines them. Names cannot collide with those defined by subclasses. This is even checked by one of Object::Pad's own unit tests, which defines a base class and a subclass from it that both have a slot called $data:

class Base::Class {
    has $data;
    method data { $data }
}
 
class Derived::Class extends Base::Class {
    has $data;
    method data { $data }
}

It then has some tests to check that each of these methods behaves differently. In particular, this provides the guarantee that classes can freely add, delete, or rename their own slot variables without risking breaking other related classes. This leads to more robust class definitions.

<< First | < Prev | Next >

2020 Perl Advent Calendar - Day 18

2020-12-18T12:00:00.095+00:00

<< First | < Prev | Next >

Yesterday we took our first glance at some example code using Object::Pad. Today I'd like to continue with some more in-depth examples showing a few details of the new syntax provided. These will be real examples from actual code on CPAN.

The class keyword introduces a new package that will form a class, much like Perl's existing package keyword. It creates the new package, much as the package statement does, and additionally sets up the various Object::Pad-related machinery to have the new package be a proper class. It also makes the other new keywords available - method and has. As with package it supports setting the $VERSION of the new package by specifying a version number after the name. It also supports several new sub-keywords to further specify details about the class, such as a base class that it is extending (via the extends keyword).

Even though the class keyword acts the same as the package keyword, it isn't currently recognised by parts of CPAN infrastructure, such as the indexer which creates package-to-file indexes. As such, any module uploaded to CPAN still needs to have a package statement as well, to keep these tools happy. It's usual to find them both in combination:

use Object::Pad;

package Tickit::Widget::HBox 0.49;
class Tickit::Widget::HBox extends Tickit::Widget::LinearBox;

...

Like with package the class syntax can be used in either of two forms. It can set the prevailing package name for following declarations if used as a simple statement, or it can take a block of code surrounded by braces, and applies just to the contents of that block. The first form is usually preferred for the toplevel class in a file, with the latter form being seen for internal "helper" classes within a file. For example, the Device::Chip::NoritakeGU_D module contains three small internal helper classes defined using a block

class Device::Chip::NoritakeGU_D::_Iface::UART {
    use constant DEFAULT_BAUDRATE => 38400;

    has $_baudrate;

    ...
}

The class keyword was at least partly designed during the 2019 Perl 5 Hackathon event in Amsterdam, at which there was a similar idea for a module keyword. That has yet to be implemented anywhere, but a common theme to both ideas was that they would imply a more modern set of default pragma settings than default Perl begins with. After a class statement (or inside its block), the strict and warnings pragmas are applied, and on versions of Perl new enough to support it, the signatures feature is turned on and the indirect feature is turned off.

The method keyword adds a new function into the class namespace, much like sub does. The $self invocant parameter is handled automatically within the body of a method, meaning that a parameter signature or @_ unpacking code does not have to handle it specially. The code can totally ignore this and it will work correctly.

Because the signatures feature is automatically enabled on supported Perl versions, it makes method declarations inside classes particularly short and neat. For example, this from Tickit::Widget::Scroller:

method scroll ($delta, %opts)
{
    return unless $delta;
    
    my $window = $self->window;
    @_items or return;
    
    ...
}

Straight away we haven't needed to write the usual two lines of method setup code, of handling the $self variable and then unpacking the other arguments out of @_. As we have already seen with the use of async/await syntax, this method keyword helps reduce a lot of the "noise" of machinery out of the code, and lets us more clearly and easily see the domain-specific details inside it.

<< First | < Prev | Next >

LeoNerd's programming thoughts

CPAN-based Experiments: A Reminder

Building for new ATtiny 2-series chips on Debian

A troubling thought - smartmatch reïmagined

Smartmatch Deprecations

match/case and New Infix Operators

Syntax::Operator::Elem

Smartmatch Reïmagined

Perl in 2022 - A Yearly Update

Perl UV binding hits version 2.000

Writing a Perl Core Feature - part 11: Core modules

Writing a Perl Core Feature - part 10: Documentation

Writing a Perl Core Feature - part 9: Tests

Writing a Perl Core Feature - part 8: Interpreter internals

Writing a Perl Core Feature - part 7: Support functions

Writing a Perl Core Feature - part 6: Parser

Writing a Perl Core Feature - part 5: Lexer

Writing a Perl Core Feature - part 4: Opcodes

Optrees and Ops

Opcodes in Perl Core

Writing a Perl Core Feature - part 3: Keywords

Writing a Perl Core Feature - part 2: warnings.pm

Writing a Perl Core Feature - part 1: feature.pm

Writing a Perl Core Feature

2020 Perl Advent Calendar - Day 25

Bonus Day!

Match/case Syntax

Type Assertions

Multiple Dispatch

Signature-like List Assignment

Twigils

True Core Implementations

2020 Perl Advent Calendar - Day 24

2020 Perl Advent Calendar - Day 23

2020 Perl Advent Calendar - Day 22

2020 Perl Advent Calendar - Day 21

2020 Perl Advent Calendar - Day 20

2020 Perl Advent Calendar - Day 19

2020 Perl Advent Calendar - Day 18

`match/case` and New Infix Operators