2016/09/12

Perl Parser Plugins 3 - Optrees

<< First | < Prev | Next >

So far we've seen how to interact with the perl parser to introduce new keywords. We've seen how we can allow that keyword to be enabled or disabled in lexical scopes. But our newly-introduced syntax still doesn't actually do anything yet. Today lets change that, and actually provide some new syntax which really does something.

Optrees

To understand the operation of any parser plugin (or at least, one that actually does anything), we first have to understand some more internals of how perl works; a little of how the parser interprets source code, and some detail about how the runtime actually works. I won't go into a lot of detail in this post, only as much as needed for this next example. I'll expand a lot more on it in later posts.

Every piece of code in a perl program (i.e. the body of every named and anonymous function, and the top-level code in every file) is represented by an optree; a tree-shaped structure of individual nodes called ops. The structure of this optree broadly relates to the syntactic nature of the code it was compiled from - it is the parser's job to take the textual form of the program and generate these trees. Each op in the tree has an overall type which determines its runtime behaviour, and may have additional arguments, flags that alter its behaviour, and child ops that relate to it. The particular fields relating to each op depend on the type of that op.

To execute the code in one of these optrees the interpreter walks the tree structure, invoking built-in functions determined by the type of each op in the tree. These functions implement the behaviour of the optree by having side-effects on the interpreter state, which may include global variables, the symbol table, or the state of the temporary value stack.

For example, let us consider the following arithmetic expression:

(1 + 2) * 3

This expression involves an addition, a multiplication, and three constant values. To express this expression as an optree requires three kinds of ops - a OP_ADD op represents the addition, a OP_MULT the multiplication, and each constant is represented by its own OP_CONST. These are arranged in a tree structure, with the OP_MULT at the toplevel whose children are the OP_ADD and one of the OP_CONSTs, the OP_ADD having the other two OP_CONSTs. The tree structure looks something like:

OP_MULT:
  +-- OP_ADD
  |     +-- OP_CONST (IV=1)
  |     +-- OP_CONST (IV=2)
  +-- OP_CONST (IV=3)
Side note: it is unlikely that a real program would ever actually contain an optree like this one, because the compiler will fold the constants out into a single constant value. But this will serve fine as a simple example to demonstrate how it works.

You may recall from the previous post that we implemented a keyword plugin that simply created a new OP_NULL optree; i.e. an optree that doesn't do anything. If we now change this to construct an OP_CONST we can build a keyword that behaves like a symbolic constant; placing it into an expression will yield the value of that constant. This returned op will then be inserted into the optree of the function containing the syntax that invoked our plugin, to be executed at this point in the tree when that function is run.

To start with, we'll adjust the main plugin hook function to recognise a new keyword; this time tau:

static int MY_keyword_plugin(pTHX_ char *kw, STRLEN kwlen,
    OP **op_ptr)
{
  HV *hints = GvHV(PL_hintgv);
  if(kwlen == 3 && strEQ(kw, "tau") &&
     hints && hv_fetchs(hints, "tmp/tau", 0))
    return tau_keyword(op_ptr);

  return (*next_keyword_plugin)(aTHX_ kw, kwlen, op_ptr);
}

Now we can hook this up to a new keyword implementation function that constructs an optree with a OP_CONST set to the required value, and tells the parser that it behaves like an expression:

#include <math.h>

static int tau_keyword(OP **op_ptr)
{
  *op_ptr = newSVOP(OP_CONST, 0, newSVnv(2 * M_PI));
  return KEYWORD_PLUGIN_EXPR;
}

We can now use this new keyword in an expression as if it was a regular constant:

$ perl -E 'use tmp; say "Tau is ", tau'
Tau is 6.28318530717959

Of course, so far we could have done this just as easily with a normal constant, such as one provided by use constant. However, since this is now implemented by a keyword plugin, it can do many exciting things not available to normal perl code. In the next part we'll explore this further.

<< First | < Prev | Next >