2021/02/08

Writing a Perl Core Feature - part 3: Keywords

Index | < Prev | Next >

Some Perl features use a syntax entirely made of punctuation symbols; for example Perl 5.10's defined-or operator (//), or Perl 5.24's postfix dereference (->$*, etc..). Other features are based around new keywords spelled like regular identifiers; such as 5.10's state or 5.32's isa. It is rare to find examples where newly-added syntax can be done simply on existing operator symbols, so most new features come in the form of new keywords.

As with adding the named feature itself and its associated warning, the first step to adding a keyword begins with editing a regeneration file. The file required this time is called regen/keywords.pl.

For example when the isa feature was added, it required a new keyword of the same name: (github.com/Perl/perl5).

--- a/regen/keywords.pl
+++ b/regen/keywords.pl
@@ -46,6 +46,7 @@ my %feature_kw = (
     evalbytes => 'evalbytes',
     __SUB__   => '__SUB__',
     fc        => 'fc',
+    isa       => 'isa',
 );
 
 my %pos = map { ($_ => 1) } @{$by_strength{'+'}};
@@ -217,6 +218,7 @@ __END__
 -index
 -int
 -ioctl
+-isa
 -join
 -keys
 -kill

There are two parts to this change. The later part adds our new keyword to the main list of all the known keywords in the DATA section at the end of the script. If it wasn't for the first part of this change, then the new keyword would be recognised unconditionally in all code - almost certainly not what we want as that would cause compatibility issues in existing code. Since we have a lexical named feature for exactly this purpose, we made use of it here by listing the new keyword along with its associated feature into the %feature_kw hash so that the keyword is only recognised conditionally based on that feature being enabled.

For our new banana feature we need to decide if we're going to add some keywords, and if so what they will be called. Lets add two to make a more interesting example, called ban and ana. As before we'll start by editing the regeneration script and running it to have it rebuild some files.

leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/keywords.pl 

leo@shy:~/src/bleadperl/perl [git]
$ git diff
diff --git a/regen/keywords.pl b/regen/keywords.pl
index b9ae8cf0f2..adbec89c71 100755
--- a/regen/keywords.pl
+++ b/regen/keywords.pl
@@ -47,6 +47,8 @@ my %feature_kw = (
     __SUB__   => '__SUB__',
     fc        => 'fc',
     isa       => 'isa',
+    ban       => 'banana',
+    ana       => 'banana',
 );
 
 my %pos = map { ($_ => 1) } @{$by_strength{'+'}};
@@ -125,8 +127,10 @@ __END__
 -abs
 -accept
 -alarm
+-ana
 -and
 -atan2
+-ban
 -bind
 -binmode
 -bless

leo@shy:~/src/bleadperl/perl [git]
$ perl regen/keywords.pl 
Changed: keywords.c keywords.h

We still have a few more files to edit before we're done adding the keywords, but before continuing you should take a look at these regenerated files to see what changes have been made. Notice that this time there are no changes to any Perl files, only C files. This is why we didn't need to update any $VERSION values.

The keywords.h file just contains a long list of macros named KEY_... which give numbers to each keyword. Don't worry that most of the numbers have now changed - regen/keywords.pl likes to keep them in alphabetical order, and since we added new ones near the beginning it has had to move the rest downwards. This won't be a problem because the numbers are only internal within the perl lexer and parser, so there's no API compatibility to worry about here.

The keywords.c file contains just one function, whose job is to recognise any of the keywords by name. It returns values of these KEY_... macros. Take a look at the added code, and notice that its recognition of each of our additions is conditional on the FEATURE_BANANA_IS_ENABLED macro we saw added when we added the named feature.

We're not quite done yet though. If we were to run the full test suite now, we'd already find a few tests that fail:

op/coreamp.t .. 1/? # Failed test 591 - ana either has been tested or is not ampable at op/coreamp.t line 1178
# Failed test 593 - ban either has been tested or is not ampable at op/coreamp.t line 1178
op/coreamp.t .. Failed 2/778 subtests 
...
op/coresubs.t .. 1/? perl: op.c:14795: Perl_ck_entersub_args_core: Assertion `!"UNREACHABLE"' failed.
op/coresubs.t .. All 52 subtests passed
...
../lib/B/Deparse-core.t .. 3690/3904 # keyword 'ana' seen in ../regen/keywords.pl, but not tested here!!
# keyword 'ban' seen in ../regen/keywords.pl, but not tested here!!

#   Failed test 'sanity checks'
#   at ../lib/B/Deparse-core.t line 430.
# Looks like you failed 1 test of 3904.
../lib/B/Deparse-core.t .. Dubious, test returned 1 (wstat 256, 0x100)

The two tests in t/op are checking variations on a theme of the &CORE::... syntax, by which core operators can be reïfied into regular code references to functions that behave like the operator. Often this is appropriate for operators which act like regular functions - for example the mathematical sin and cos operators, but isn't what we want for keywords that act more structural like basic syntax. We should tell these tests to skip the new keywords by adding them to each file's skip list:

leo@shy:~/src/bleadperl/perl [git]
$ nvim t/op/coreamp.t t/op/coresubs.t 

leo@shy:~/src/bleadperl/perl [git]
$ git diff t/
diff --git a/t/op/coreamp.t b/t/op/coreamp.t
index b57609bef0..bd60ca83b9 100644
--- a/t/op/coreamp.t
+++ b/t/op/coreamp.t
@@ -1162,7 +1162,7 @@ like $@, qr'^Undefined format "STDOUT" called',
   my %nottest_words = map { $_ => 1 } qw(
     AUTOLOAD BEGIN CHECK CORE DESTROY END INIT UNITCHECK
     __DATA__ __END__
-    and cmp default do dump else elsif eq eval for foreach format ge given goto
+    ana and ban cmp default do dump else elsif eq eval for foreach format ge given goto
     grep gt if isa last le local lt m map my ne next no or our package print
     printf q qq qr qw qx redo require return s say sort state sub tr unless
     until use when while x xor y
diff --git a/t/op/coresubs.t b/t/op/coresubs.t
index 1fa11c02f0..85c08a4756 100644
--- a/t/op/coresubs.t
+++ b/t/op/coresubs.t
@@ -15,7 +15,8 @@ BEGIN {
 use B;
 
 my %unsupported = map +($_=>1), qw (
- __DATA__ __END__ AUTOLOAD BEGIN UNITCHECK CORE DESTROY END INIT CHECK and
+ __DATA__ __END__ AUTOLOAD BEGIN UNITCHECK CORE DESTROY END INIT CHECK
+  ana and ban
   cmp default do dump else elsif eq eval for foreach
   format ge given goto grep gt if isa last le local lt m map my ne next
   no  or  our  package  print  printf  q  qq  qr  qw  qx  redo  require
   

Now lets run those two tests in particular. We can do this by using our newly-built perl binary to run the t/harness script and pass in the paths (relative to the t/ directory) to specific tests we wish to run:

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness op/coreamp.t op/coresubs.t
op/coreamp.t ... ok     
op/coresubs.t .. 1/? # Failed test 51 - no CORE::ana at op/coresubs.t line 53
# Failed test 58 - no CORE::ban at op/coresubs.t line 53
op/coresubs.t .. Failed 2/1099 subtests 

Test Summary Report
-------------------
op/coresubs.t (Wstat: 0 Tests: 1099 Failed: 2)
  Failed tests:  51, 58
Files=2, Tests=1875,  1 wallclock secs ( 0.35 usr  0.02 sys +  0.67 cusr  0.03 csys =  1.07 CPU)
Result: FAIL

Well that's one solved, but the other is still upset. This time it is complaining that it expected not to find a &CORE::ana at all, but instead one was there. In order to fix that we will have to edit the list of exceptions in gv.c.

leo@shy:~/src/bleadperl/perl [git]
$ nvim gv.c

leo@shy:~/src/bleadperl/perl [git]
$ git diff gv.c
diff --git a/gv.c b/gv.c
index 92bada56b1..10271159dc 100644
--- a/gv.c
+++ b/gv.c
@@ -543,8 +543,9 @@ S_maybe_add_coresub(pTHX_ HV * const stash, GV *gv,
     switch (code < 0 ? -code : code) {
      /* no support for \&CORE::infix;
         no support for funcs that do not parse like funcs */
-    case KEY___DATA__: case KEY___END__: case KEY_and: case KEY_AUTOLOAD:
-    case KEY_BEGIN   : case KEY_CHECK  : case KEY_cmp:
+    case KEY___DATA__: case KEY___END__: case KEY_ana   : case KEY_and    :
+    case KEY_AUTOLOAD: case KEY_ban    : case KEY_BEGIN : case KEY_CHECK  :
+    case KEY_cmp     :
     case KEY_default : case KEY_DESTROY:
     case KEY_do      : case KEY_dump   : case KEY_else  : case KEY_elsif  :
     case KEY_END     : case KEY_eq     : case KEY_eval  :

Now we rebuild perl (because we have edited a C file) and rerun the tests:

leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness op/coreamp.t op/coresubs.t 
op/coreamp.t ... ok     
op/coresubs.t .. ok      
All tests successful.
Files=2, Tests=1875,  1 wallclock secs ( 0.43 usr  0.02 sys +  0.76 cusr  0.02 csys =  1.23 CPU)
Result: PASS

The test under ../lib/B/Deparse-core.t checks the behaviour of the B::Deparse module against the core keywords. (The path is relative to the t/ directory, which is why it begins with .., and shows that tests within bundled core modules are counted as part of the full test suite.)

When the isa feature was added, this test file was updated to add some deparsing tests around the isa operator as a regular infix binary syntax. We'll come back later and add some unit tests for our new ban and ana keywords, but for now as with the coreamp and coresubs tests it is best to just add these to the skip list in that test file as well.

leo@shy:~/src/bleadperl/perl [git]
$ nvim lib/B/Deparse-core.t 

leo@shy:~/src/bleadperl/perl [git]
$ git diff lib/B/Deparse-core.t
diff --git a/lib/B/Deparse-core.t b/lib/B/Deparse-core.t
index cdbd27ce5e..edf86f809d 100644
--- a/lib/B/Deparse-core.t
+++ b/lib/B/Deparse-core.t
@@ -362,6 +362,8 @@ my %not_tested = map { $_ => 1} qw(
     END
     INIT
     UNITCHECK
+    ana
+    ban
     default
     else
     elsif

leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness ../lib/B/Deparse-core.t
../lib/B/Deparse-core.t .. ok         
All tests successful.
Files=1, Tests=3904, 17 wallclock secs ( 1.17 usr  0.06 sys + 16.86 cusr  0.06 csys = 18.15 CPU)
Result: PASS

At this point we now have a named feature with its associated warning, and some conditionally-recognised keywords. In the next parts we will get the compiler to recognise these when parsing Perl code.

Index | < Prev | Next >

No comments:

Post a Comment