2016/08/26

Perl Parser Plugins 2 - Lexical Hints

<< First | < Prev | Next >

In the previous post we saw the introduction to how the perl parser engine can be extended, letting us hook a new function in that gets invoked whenever perl sees something that might be a keyword. The code in the previous post didn't actually add any new functionality. Today we're going to take a look at how new things can actually be added.

A Trivial Keyword

Possibly the simplest form of keyword plugin is one that doesn't actually do anything, other than consume the keyword that controls it. For example, lets consider the following program, in which we've introduced the please keyword. It doesn't actually have any effect on the runtime behaviour of the program, but it lets us be a little more polite in our request to the interpreter.

use tmp;
use feature 'say';

please say "Hello, world!"

To implement this plugin, we start by writing a keyword plugin hook function that recognises the please keyword, and invokes its custom handling function if it's found. It's purely a matter of style, but I like to write this as a small function for the plugin hook itself that simply recognises the keyword. If it finds the keyword it invokes a different function to actually implement the behaviour. This helps keep the code nice and neat.

static int (*next_keyword_plugin)(pTHX_ char *, STRLEN, OP **);

static int MY_keyword_plugin(pTHX_ char *kw, STRLEN kwlen,
    OP **op_ptr)
{
  if(kwlen == 6 && strEQ(kw, "please"))
    return please_keyword(op_ptr);

  return (*next_keyword_plugin)(aTHX_ kw, kwlen, op_ptr);
}

MODULE = tmp  PACKAGE = tmp

BOOT:
  next_keyword_plugin = PL_keyword_plugin;
  PL_keyword_plugin = &MY_keyword_plugin;

Next, we need to provide this please_keyword function that implements our required behaviour.

static int please_keyword(OP **op_ptr)
{
  *op_ptr = newOP(OP_NULL, 0);
  return KEYWORD_PLUGIN_STMT;
}

The last line of this function, the return statement, tells the perl parser that this plugin has consumed the keyword, and that the resulting syntactic structure should be considered as a statement. The effect of this resets the parser into wanting to find the start of a new statement following it, which lets it then see the say call as normal.

As mentioned in the previous post, a parser plugin provides new behaviour into the parse tree by using the op_ptr double-pointer argument, to give a new optree back to the parser to represent the code the plugin just parsed. Since our plugin doesn't actually do anything, we don't really need to construct an optree. But since there is no way to tell perl "nothing", instead we have to build a single op which does nothing; this is OP_NULL. Don't worry too much about this line of code; consider it simply as part of the required boilerplate here, and I'll expand more on the subject in a later post.

Lexical Hints

Implemented as it stands, there's one large problem with this syntax plugin. You may recall from part 1 that the plugin hook chain is global to the entire parser, and is invoked whenever code is found anywhere. This means that other code far away from our plugin will also get disturbed. For example, if the following code appears elsewhere when our module is loaded:

sub please { print @_; }

print "Before\n";
please "It works\n";
print "After\n";

__END__
Useless use of a constant ("It works\n") in void context at - line 4.
Before
After

This happens because our plugin has no knowledge of the lexical scope of the keywords; it will happily consume the keyword wherever it appears in any file, in any scope. This doesn't follow the usual way that perl code is parsed; ideally we would like our keywords to be enabled or disabled lexically. The lexical hints hash, %^H, can be used to provide this lexical scoping. We can make use of this by setting a lexical hint in this hash when the module is loaded, and having the XS code look for that hint to control whether it consumes the keyword.

We start by adding a import function to the perl module that loads the XS code, which sets this hint. There's a strong convention among CPAN modules, which must all share this one hash, about how to name keys within it. They are named using the controlling module's name, followed by a / symbol.

package tmp;

require XSLoader;
XSLoader::load( __PACKAGE__ );

sub import
{
  $^H{"tmp/please"}++;
}

The perl interpreter will now assist us with maintaining the value of %^H, ensuring that this key only has a true value during lexical scopes which have used our module. We can now extend the test function in the XS code to check for this condition. In XS code, the hints hash is accessible as the GvHV of PL_hintgv:

static int MY_keyword_plugin(pTHX_ char *kw, STRLEN kwlen,
    OP **op_ptr)
{
  HV *hints = GvHV(PL_hintgv);
  if(kwlen == 6 && strEQ(kw, "please") &&
     hints && hv_fetchs(hints, "tmp/please", 0))
    return please_keyword(op_ptr);

  return (*next_keyword_plugin)(aTHX_ kw, kwlen, op_ptr);
}

We now have a well-behaved syntax plugin which is properly aware of lexical scope. It only responds to its keyword in scopes where use tmp is in effect:

$ perl -wE '{use tmp; sub please { say @_ }; please "hello"}'
Useless use of a constant ("hello") in void context at -e line 1.

$ perl -wE '{use tmp;} sub please { say @_ }; please "hello"'
hello

Of course, we still haven't actually added any real behaviour yet but we're at least all set for having a lexically-scoped keyword we use to add that. We'll see how we can start to introduce new behaviour to the perl interpreter in part 3.

<< First | < Prev | Next >

2016/08/12

Perl Parser Plugins - part 1

<< First | < Prev | Next >

Today's subject is the parser plugins that were added to perl in 5.14. The parser plugin API itself is relatively small and modest, but the power it grants the user is enormous. To use this power however, you need to know and be familiar with a lot of the internals of the perl core.

This post will be a little different to most of my usual posts. Rather than announcing and demonstrating some code that's already written and available on CPAN or wherever, I shall today be writing about some code I'm currently in the progress of writing. I'm hoping this series of posts will contain useful information about the underlying subject of perl parser plugins and other internal details.

As a parser plugin would be written in C as part of an XS module, the reader is presumed to be at least somewhat familiar with C syntax, and to have at least a basic understanding of what an XS module is and how to go about creating one.

The PL_keyword_plugin Hook

The initial point of entry into the parser plugin system is a function pointer variable that's part of the interpreter core, called PL_keyword_plugin. This variable points to a function that the perl parser will invoke whenever it finds something that looks like it could be a keyword. It's typed as

int (*PL_keyword_plugin)(pTHX_
    char *keyword_ptr, STRLEN keyword_len,
    OP **op_ptr);

How it behaves is that whenever the parser finds a sequence of identifier-like characters that could be a keyword, it first asks the keyword plugin whether it wishes to handle it. The sequence of characters itself is passed in via the pointer and length pair. Because this is a pointer directly into the parser's working-space buffer, the keyword itself will not be NUL-terminated, and so the length argument is required too. The integer return value works with the op_ptr double-pointer value to let the plugin return its result back to the interpreter in the form of an optree. For today's post we won't concern ourselves with this optree, and instead write a plugin that doesn't actually do anything.

Side note:If you don't recognise what that pTHX_ macro is doing at the start of the argument list, don't worry too much about it. It exists to pass around the perl interpreter state if the perl is built with threads enabled. Just think of it as a curious quirk of writing C functions in the perl interpreter; the detail of its operation does not concern us too closely here.

To use this plugin API, we need to define our own custom handling function, and set this variable to its address so that the perl parser will invoke it. This following example implements a trivially tiny plugin; one that declines to process any keyword, but as a side-effect prints its name to the standard error stream, so we can see it in action.

static int MY_keyword_plugin(pTHX_ char *kw, STRLEN kwlen,
    OP **op_ptr)
{
  fprintf(stderr, "Keyword [%.*s]\n", kwlen, kw);
  return KEYWORD_PLUGIN_DECLINE;
}

MODULE = tmp  PACKAGE = tmp

BOOT:
  PL_keyword_plugin = &MY_keyword_plugin;

To test this out, I've compiled this little blob of XS code into a tiny module called tmp. The plugin's output can clearly be seen:

$ perl -Mtmp -e 'sub main { print "Hello, world\n" }  main()'
Keyword [sub]
Keyword [print]
Keyword [main]
Hello, world

Here we can see that the keyword was first informed about the sub. Because we declined to handle it, this was taken by the core perl parser which consumes the main identifier and opening brace in its usual manner. Next up is another candidate for being a keyword, print, which we also decline. The entire string literal does not look like a keyword, so the plugin wasn't invoked here. It is finally informed again of the main that appears at the top-level of the program, which it also declines.

From this initial example, we can see already how the parser plugin API is a lot more robust than earlier mechanisms, such as source filters. We haven't had to take any special precautions against false matches of keywords inside string literals, comments, or anything else like that. The perl parser core itself already invokes the plugin only in places where it expects there to be a real keyword, so all of that is taken care of already.

Hook Chains

The plugin we have created above has one critical flaw in it - it is very rude. It takes over control of the PL_keyword_plugin pointer, overwriting whatever else was there. If the containing program had already loaded a module that wanted to provide custom keyword parsing, our module has just destroyed its ability to do that.

The way we solve this in practice is to wrap the pre-existing parser hook with our own function; saving the old pointer value in a local variable within the plugin code, so if we don't want to handle the keyword we simply invoke the next function down the chain. The perl core helpfully initialises this variable to point to a function that simply declines any input, so the first module loaded doesn't have to take special precautions to check if it was set. It can safely chain to the next one in any circumstance.

We can now update our code to do this:

static int (*next_keyword_plugin)(pTHX_ char *, STRLEN, OP **);

static int MY_keyword_plugin(pTHX_ char *kw, STRLEN kwlen,
    OP **op_ptr)
{
  fprintf(stderr, "Keyword [%.*s]\n", kwlen, kw);

  return (*next_keyword_plugin)(aTHX_ kw, kwlen, op_ptr);
}

MODULE = tmp  PACKAGE = tmp

BOOT:
  next_keyword_plugin = PL_keyword_plugin;
  PL_keyword_plugin = &MY_keyword_plugin;

The plugin now behaves exactly as it did before, but this time if there had been any more plugins loaded before this one, they will be given a chance to run as well. Hopefully the next one that's loaded does this same logic, so that ours continues to work.

Well, I say "work".

Aside from spying on the keywords in the program, this parser plugin doesn't do anything useful yet. I shall start to explore what things the plugin can actually do in part 2.

<< First | < Prev | Next >