2014/02/27

Instrumentation and Logging in Event-Reflexive Programming

<< First | < Prev

The previous two posts in this series have introduced the idea of event-reflexive programming, and started to investigate a little into its properties, and design decisions about creating a system using it in terms of ordering considerations and ways to invoke individual actions.

I said this story would continue chronologically, following the history of this idea through various systems I've encountered or built using it. So far we have an ISP provisioning system, and an IRC bot. Quite different in size, scope and semantics, they did however have some common features. The one I want to talk about today is the debugging logging and instrumentation part.

At the ISP, every top-level provisioning action was identified by its name, a plain string, and a named set of arguments, the value of each argument also a string. This made it trivially simple to encode the action over a simple TCP socket we had at the time (this being years before the explosion of YAML and JSON-driven systems, and before, even, the peak of the XML craze). While it wasn't strictly required, it turned out that keeping this property for all of the inner reflexive events as well made logging very simple. The logging was also aware of the nesting level of the event-reflexive core, allowing it to print simple logs showing the full tree-shaped nature of the events as they took place. I forget all the inner details of the logging format, but a hypothetical example could look something like:

Action 'add_user': reseller=foo username=a-new-user product=shiney
+- Action(1) 'make_user_config': reseller=foo username=a-new-user product=shiney
+- Action(1) 'make_user_homedir': reseller=foo username=a-new-user product=shiney
|  +- Action(2) 'copy_skeleton': reseller=foo username=a-new-user product=shiney
...

It wasn't too long before I started finding this logging system simply not powerful enough. At least in the "leaf" events, these often actually did useful things - performed LDAP reads or writes, interacted with the filesystem, talked to various 3rd-party systems, and so on. The logging system then gained the ability to write these as well.

While most logging/debugging systems we have currently use a simple linear scale of "verboseness", the logged items in our logging system were tagged with any of a wide set of possible categories. The set of categories in effect at any time was given by an environment variable. For example, to log just the reads and writes to the LDAP directory, and the attributes of a write, one could set:

DEBUG_INSTRUMENT_FLAGS=LDAPr,LDAPw,LDAPwa

This becomes a much more powerful logging system because it allows the programmer/operator to choose not simply the level of verbosity of the logging, but to more finely-tune where in the code the logging is more detailed. A few current logging systems also possess this ability now.

The full power of our instrumentation system here, however, was that it was integrated with the rest of the event-reflexive core. This meant that it could peek into the action names and parameters, and test the values of them. These values could then be used to set or change the logging flags. A particularly powerful example could be:

A=add_user,reseller=foo,product=gold{LDAPw,LDAPwa};reseller=bar,user=frank{IMAP}

In this example, we are interested in logging the LDAP writes and attributes contained by them that happens during any attempt to add a new user to the foo reseller, for their gold product (perhaps because we are investigating some issue with this one), or separately, any IMAP-related activity that the bar user called frank performs. Because these are interpreted by the core of the event-reflexive system, they can apply fully recursively; applying to any nested inner action performed as part of the ultimate add_user action, for example.

Being strings, these tests could also be performed using regexps, though I never got around to implementing that test as part of this code, so they remained only simple string equality tests. However, even this far gets us an enormous amount of expressive power very simply, and virtually for free in the main body of action-handling code in the plugins, because almost all of it exists only once, at the very core. However, much of its power has come exactly because of the limited expressiveness of the individual arguments, being simple strings.

Over in my IRC bot, meanwhile, many of the arguments and return values being passed around between the named events were application-level objects expressing such concepts as IRC users, channels, and so on. These objects made it convenient to write the code powerful in the action handlers, but limited the ability of the event-reflexive core to introspect into them, to provide such abilities as detailed debug logging similar to that seen above for the ISP provisioning system.

This leads to my third question about event-reflexivity:

To what extent should the arguments of event-reflexive calls be understood and interpreted by the event-reflexive system itself, above simply passing them on to invoked action handlers? Do such abilities as powerfully flexible instrumentation logging justify limiting the expressiveness of the parameters that may be passed?

Next >

2014/02/22

Kinds of Invocation in Event-Reflexive Programming

<< First | < Prev

In the previous post I introduced the idea of Event-Reflexive programming, and discussed the first use-case I had for it; driving the user provisioning system at an ISP. I said this story would continue in chronological order.

A couple of years into this job, I felt I had learned Perl enough to do what surely pretty-much any Perl developer does at this time. Yes, I decided to write an IRC bot. It's one of those rites of passage that every developer goes through at some point. Of course, even this early in my programming career I had already seen several dozen terrible attempts at this, so I was quite determined to ensure mine wouldn't suffer quite as many of those mistakes. In my head, of course, I knew I wouldn't suffer many mistakes because I, of course, was armed with Event-Reflexive Programming.

I ended up with, I thought, the most amazing (it wasn't), the most powerful (it wasn't) and the most flexible (it wasn't) IRC bot the world had ever seen (it isn't). However, in the process of building it I had expanded on the original concept of event-reflexivity considerably.

In the previous post I introduced the most basic two forms of invoking the plugins in an event-reflexive system, run_plugins and run_plugins_reverse. In the course of developing this IRC bot, I found it necessary to create a number of other variations on this basic theme.

Recalling the original two functions, both of these simply execute action hooks defined by the plugins. Neither of them returns an interesting result. What I found while implementing the IRC bot was that as well as merely requesting that work be performed, I was also using the event-reflexive core to abstract out a number of query-like operations - such as abstracting away the specific mechanism of database used to store information about registered users. At this point, the event-reflexive core needs a way to pose a question to the list of plugins, and return an answer as soon as one has been provided:

sub ask_plugins {
  my ( $query, @args ) = @_

  foreach my $plugin ( @plugins ) {
    next unless $plugin->can( $query );
    my $ret = $plugin->$query( @args );
    return $ret if defined $answer;
  }

  return undef;
}

Perl being Perl, it's only a short matter of time before we want a list-valued return from some of these queries. And once we're returning a list, we're not restricted to returning the result from a single plugin - we can run them all:

sub ask_plugins_list {
  my ( $query, @args ) = @_;

  my @ret;
  foreach my $plugin ( @plugins ) {
    next unless $plugin->can( $query );
    push @ret, $plugin->$query( @args );
  }

  return @ret;
}

The final and most interesting invocation function was called scatter_plugins. This being written years before I had encountered the concept of Futures, it was initially written with a complex combination of additional code reference arguments, before I managed to neaten it up somewhat with the creation of Async::MergePoint. I won't give the implementation here, but the point of this particular call was to account for the fact that some plugin actions are going to be asynchronous, and only return a result later. What we'd like to do is start all the operations concurrently, then await their eventual completion before continuing.

These days, I would instead implement this operation using Future. In fact, at this point a case could be made for implementing all of them using Future. If the entire event-reflexive core was based on futures, then trivially it will cope with any synchronous or asynchronous kind of work environment (due to the universal suitability of futures). If we currently set aside our previous question of plugin ordering, and assert that ordering doesn't matter, then all the remaining operations besides reverse can be expressed on top of a single idea:

sub _call_all_plugins {
  my ( $method, @args ) = @_;

  return map {
    my $plugin = $_;
    $plugin->$method( @args );
  } grep { $_->can( $method ) } @plugins;
}

sub run_plugins_concurrently {
  Future->needs_all( _call_all_plugins( @_ ) )
        ->then_done( "" ); # return an empty result
}

sub ask_plugins_concurrently {
  Future->needs_any( _call_all_plugins( @_ ) )
}

sub ask_all_plugins {
  Future->needs_all( _call_all_plugins( @_ ) )
}

In fact at this point our previous idea of scatter_plugins becomes totally redundant - the universal expressiveness of Futures has allowed this to be expressed even simpler. But this incredibly simple implementation has come at a cost - we've lost the sequential lazy-evaluation nature of ask_plugins. Additionally, whatever stop-on-error semantics we might have wanted out of run_plugins have been lost.

Perhaps instead we decide we need ordering, at least in some cases. This brings to mind some additional invocation functions, that themselves are also wrappers around a single common idea:

use Future::Utils qw( repeat );

sub _call_each_plugin {
  my ( $reverse, $while, $method, @args ) = @_;

  repeat {
    my ( $plugin ) = @_;
    $plugin->$method( @args );
  } foreach => [ grep { $_->can( $method ) }
                 $reverse ? reverse(@plugins) : @plugins ],
    while => $while;
}

sub run_plugins_sequentially {
  _call_each_plugin( 0, sub { 1 }, @_ );
}

sub run_plugins_sequentially_reverse {
  _call_each_plugin( 1, sub { 1 }, @_ );
}

sub ask_plugins_sequentially {
  _call_each_plugin( 0, sub { not shift->get }, @_ );
}

Keeping in mind my first question from the previous post, on the subject of ordering between plugins, this motivates a choice of second questions:

If ordering guarantees are not required, are the concurrent invocation functions given above sufficient to express any order-less possibly-asynchronous operation in an event-reflexive system?
If ordering is required, are the additional sequential invocation functions sufficient to express any ordered possibly-asynchronous operation?

Next >

2014/02/20

Event-Reflexive Programming

I am relatively certain that nobody reading this article for the first time will know what "event-reflexive" means, because it's a term I just made up. Or rather, I made up about a year ago while thinking over a bunch of related ideas that I felt needed a name. I'm hoping I didn't invent the basic concept because the reason for this post is to ask people a bunch of questions about it (as well as get the idea out of my head and onto at least some virtual paper).

Event-reflexive programming is a technique to manage the collection of a number of loosely-connected and perhaps interchangeable pieces into a larger coherent form, to work together to solve some common task. It is perhaps most illustrative to explain this story in roughly chronological order.

My first ever post-University job (and as it happens, the one where I learned Perl, and picked up such controversial habits as three-space indent) was centred around a large Internet Service Provider back-end provisioning system. In this system we had a large number of plugins to provide hooks into various components that made up a user's account. This was all hung together, at its core, by basically a big foreach loop looking something like

sub run_plugins {
  my ( $action, @args ) = @_;

  foreach my $plugin ( @plugins ) {
    $plugin->$action( @args ) if $plugin->can( $action );
  }
}

At its very basic level, this was all there was to it. Plugins are just classes, each can hook into the processing of some named action merely by having a method of its name. The list of plugins was determined by a config file, one per virtual reseller the ISP hosted users for. This made for a very powerful yet very simple way to customise and vary the capabilities per reseller - simply provide different plugin modules for them.

I should perhaps at this point explain the meaning of the word "event". An event here could mean a reaction to some external stimulus - the toplevel events used in this ISP provisioning system were such requests as "create a user" or "change this user's password". However, most large actions were really broken down into a sequence of smaller steps that themselves were run through the plugin system. A simple action like "change the user's password" is likely to be handled directly by each plugin module that cares, but an action as large as creating a user is probably broken down into several stages - create the initial user config in a database, mkdir their home directory, fill it with skeleton files, and so on.

It is at this point that the first of the awkward architectural questions of this approach comes to light. We started off with a list of the plugins stored in an array, and as we know, arrays are ordered containers. This approach necessarily imparts an ordering onto the plugins. The order each plugin reacts in this ISP system is determined by the order the names of the plugins appear in a configuration file. This allows us a certain degree of control, but also brings up the first large problem - it is the plugins that have order applied to them, but sometimes it is the individual action hooks that you wish to order.

For example; the action to create a new user has the stages of creating user's config in a database, then mkdiring their home directory, filling it with skeleton files, and so on. The ordering of these steps is fixed - we can't fill skeleton files in their home until it exists, and we can't create the directory until we have the basic user config in the database to look up where their home is. So naturally this leads to a good ordering relationship between the plugins that are responsible for these steps.

But now lets consider what happens when we want to delete a user again. By this plugin ordering we'd first remove their config from the database, before we rmdir their home, but if we did it that way we'd forget where their home was. Clearly, the solution here is that some events (such as deleting a user) have to be executed in reverse. This leads fairly quickly to the invention of a second control function:

sub run_plugins_reverse {
  my ( $action, @args ) = @_;

  foreach my $plugin ( reverse @plugins ) {
    $plugin->$action( @args ) if $plugin->can( $action );
  }
}

Some actions have more interesting ordering constraints even than this, however. The most complex action we had was the action used to regrade a user out of one product and into a different one. This basically involved a partial tear-down of the user, followed by some shuffling around of data, then a recreation again to add back the pieces that were removed, now in a different product. There is no simple way to express this simply in terms of plugin ordering, as a lot of plugins are needed both near the beginning and again near the end of the process.

The solution to this was to re-invoke the plugin action system again as part of handling a particular action. By decomposing this large action into three smaller steps, the plugin system's action ordering abilities help us ensure things all run correctly.

sub regrade_user {
  my %args = @_;

  run_plugins_reverse teardown_user_for_regrade => (
    user => $args{user},
    product => $args{old_product},
    ...
  );

  run_plugins regrade_user_inner => (
    user => $args{user},
    ...
  );

  run_plugins setup_user_for_regrade => (
    user => $args{user},
    product => $args{new_product},
    ...
  );
}

This toplevel action has virtually nothing to do - the above code is a slight over-simplification of the real implementation of course, but in principle all it contained was three recursive calls back down to the plugin management system to execute the three main stages of regrading a user. This is where the "reflexive" part of the name of the technique comes from - a lot of its power derives from the way that events are broken down into smaller pieces reflected back through the core of the system a second time, or further, to react to the whole thing as a large nested tree of individual steps.

I didn't remain at this job for long enough to properly see this idea through to its logical conclusion, but had I continued further on this route, I may well have decided to break these actions themselves into smaller chunks with small wrapper functions like the one given above, to split stages of other actions - for example, to split the stages of creating a user in the first place into the configuration stage, the home directory stage, and the provisioning of 3rd-party services stage. Continuing to break code down into smaller chunks like this leads to the idea that actually, the order in which the individual plugins run starts not to matter, because the order the steps are performed in is controlled by the way it is reflexively decomposed.

This now leads me on to ask the first of several questions I still have about this as a general programming technique:

Is it ever required to actually have strict ordering control of plugins in an event-reflexive system, or can any ordering problem always be resolved by breaking down actions into ordered requests to perform sequences of sub-actions? Even if so, is it ever desirable to have such an ordering control anyway; does it make any kinds of system neater or easier to create, at the cost of having to manage that order?

Next >


Edit 2014/06/09: I am leaning to the conclusion now, that if any kind of ordering were to be guaranteed, it starts to defeat the entire purpose of using event-reflexivity in the first place. The basic premise is to build a large system by loosely glueing together smaller components which are largely independent. If you have to think about which other components exist and what order they come in relative to the current one, it makes it harder to write that, and you might as well have written it by some other method. I think therefore, the answer to this question ought to be:

No; the base of an event-reflexive system should not attempt to give any kind of ordering guarantees between plugins. Any strict ordering that is required should be constructed out of explicit inter-component reflexive calls, and not by relying on the implicit side-effect of plugin execution order.