2016/08/26

Perl Parser Plugins 2 - Lexical Hints

<< First | < Prev | Next >

In the previous post we saw the introduction to how the perl parser engine can be extended, letting us hook a new function in that gets invoked whenever perl sees something that might be a keyword. The code in the previous post didn't actually add any new functionality. Today we're going to take a look at how new things can actually be added.

A Trivial Keyword

Possibly the simplest form of keyword plugin is one that doesn't actually do anything, other than consume the keyword that controls it. For example, lets consider the following program, in which we've introduced the please keyword. It doesn't actually have any effect on the runtime behaviour of the program, but it lets us be a little more polite in our request to the interpreter.

use tmp;
use feature 'say';

please say "Hello, world!"

To implement this plugin, we start by writing a keyword plugin hook function that recognises the please keyword, and invokes its custom handling function if it's found. It's purely a matter of style, but I like to write this as a small function for the plugin hook itself that simply recognises the keyword. If it finds the keyword it invokes a different function to actually implement the behaviour. This helps keep the code nice and neat.

static int (*next_keyword_plugin)(pTHX_ char *, STRLEN, OP **);

static int MY_keyword_plugin(pTHX_ char *kw, STRLEN kwlen,
    OP **op_ptr)
{
  if(kwlen == 6 && strEQ(kw, "please"))
    return please_keyword(op_ptr);

  return (*next_keyword_plugin)(aTHX_ kw, kwlen, op_ptr);
}

MODULE = tmp  PACKAGE = tmp

BOOT:
  next_keyword_plugin = PL_keyword_plugin;
  PL_keyword_plugin = &MY_keyword_plugin;

Next, we need to provide this please_keyword function that implements our required behaviour.

static int please_keyword(OP **op_ptr)
{
  *op_ptr = newOP(OP_NULL, 0);
  return KEYWORD_PLUGIN_STMT;
}

The last line of this function, the return statement, tells the perl parser that this plugin has consumed the keyword, and that the resulting syntactic structure should be considered as a statement. The effect of this resets the parser into wanting to find the start of a new statement following it, which lets it then see the say call as normal.

As mentioned in the previous post, a parser plugin provides new behaviour into the parse tree by using the op_ptr double-pointer argument, to give a new optree back to the parser to represent the code the plugin just parsed. Since our plugin doesn't actually do anything, we don't really need to construct an optree. But since there is no way to tell perl "nothing", instead we have to build a single op which does nothing; this is OP_NULL. Don't worry too much about this line of code; consider it simply as part of the required boilerplate here, and I'll expand more on the subject in a later post.

Lexical Hints

Implemented as it stands, there's one large problem with this syntax plugin. You may recall from part 1 that the plugin hook chain is global to the entire parser, and is invoked whenever code is found anywhere. This means that other code far away from our plugin will also get disturbed. For example, if the following code appears elsewhere when our module is loaded:

sub please { print @_; }

print "Before\n";
please "It works\n";
print "After\n";

__END__
Useless use of a constant ("It works\n") in void context at - line 4.
Before
After

This happens because our plugin has no knowledge of the lexical scope of the keywords; it will happily consume the keyword wherever it appears in any file, in any scope. This doesn't follow the usual way that perl code is parsed; ideally we would like our keywords to be enabled or disabled lexically. The lexical hints hash, %^H, can be used to provide this lexical scoping. We can make use of this by setting a lexical hint in this hash when the module is loaded, and having the XS code look for that hint to control whether it consumes the keyword.

We start by adding a import function to the perl module that loads the XS code, which sets this hint. There's a strong convention among CPAN modules, which must all share this one hash, about how to name keys within it. They are named using the controlling module's name, followed by a / symbol.

package tmp;

require XSLoader;
XSLoader::load( __PACKAGE__ );

sub import
{
  $^H{"tmp/please"}++;
}

The perl interpreter will now assist us with maintaining the value of %^H, ensuring that this key only has a true value during lexical scopes which have used our module. We can now extend the test function in the XS code to check for this condition. In XS code, the hints hash is accessible as the GvHV of PL_hintgv:

static int MY_keyword_plugin(pTHX_ char *kw, STRLEN kwlen,
    OP **op_ptr)
{
  HV *hints = GvHV(PL_hintgv);
  if(kwlen == 6 && strEQ(kw, "please") &&
     hints && hv_fetchs(hints, "tmp/please", 0))
    return please_keyword(op_ptr);

  return (*next_keyword_plugin)(aTHX_ kw, kwlen, op_ptr);
}

We now have a well-behaved syntax plugin which is properly aware of lexical scope. It only responds to its keyword in scopes where use tmp is in effect:

$ perl -wE '{use tmp; sub please { say @_ }; please "hello"}'
Useless use of a constant ("hello") in void context at -e line 1.

$ perl -wE '{use tmp;} sub please { say @_ }; please "hello"'
hello

Of course, we still haven't actually added any real behaviour yet but we're at least all set for having a lexically-scoped keyword we use to add that. We'll see how we can start to introduce new behaviour to the perl interpreter in part 3.

<< First | < Prev | Next >

No comments:

Post a Comment