2013/12/24

Futures advent day 24

Day 24 - Futures compared to Callbacks

It would seem at first glance that futures provide similar benefits to managing control flow by callbacks. However, they provide several advantages in comparison.

When performing a sequence of many operations using callbacks, the ever-increasing nesting nature of the callback functions leads to an ugly indenting pyramid look in the source code.

FIRST_CB( $arg1, sub {
  SECOND_CB( $arg2, sub {
    THIRD_CB( $arg3, sub {
      FINISHED()
    });
  });
});

Because futures are connected together using the return value of a function, not through a value passed into it, they can avoid this mess and remain at a fixed indentation level. This also allows, for example, a new stage to be added between existing stages without upsetting the indentation of the following code; making neater diff output in revision control systems, and giving less chance of a merge conflict when branching.

FIRST_F( $arg1 )->then(sub {
  SECOND_F( $arg2 )
})->then(sub {
  THIRD_F( $arg3 )
})->then(sub {
  FINISHED()
});

Moreover, many other shapes of control flow start to look much more like their synchronous counterparts, precisely because they are linked together using the return values out of the individual units and require no other values to be passed in.

Possibly the most simple example of concurrent control flow is a two-way merge case, where two operations are started concurrently waiting for the result of both before continuing. Using callbacks this would need to be solved by each callback storing its result in a variable they both lexically capture, and checking in each whether both results have been provided.

my $one_result; my $two_result;

ONE_CB( sub {
  $one_result = shift;
  if( defined $two_result ) {
    FINISHED($one_result, $two_result);
  }
});
TWO_CB( sub {
  $two_result = shift;
  if( defined $one_result ) {
    FINISHED($one_result, $two_result);
  }
});

Immediately two issues come to light here. First is the repeated FINISHED code - if that were itself a further chain of operations with callbacks, this would be impossible (or at least very tedious) to repeat twice, and of course gets much worse beyond two concurrent branches. Secondly, we are testing the results for definedness - maybe undef is a perfectly valid result from each function. In that case we'd have to track two further variables to simply remark whether each operation has completed:

my $one_done; my $one_result;
my $two_done; my $two_result;

ONE_CB( sub {
  $one_result = shift;
  $one_done++;
  if( $two_done ) { ... }
});
...

This example of course only handles the success case. Imagine how much more complex the code would be if each function took two code references, one for success and one for failure, and additionally returned some kind of operation ID that would be used to cancel the operation in progress if it was no longer required. This would now need eight lexically captured variables, adding much more boilerplate control-flow noise to the code. Moreover, now there are more variables being shared among code blocks, it creates the possibility that strong reference cycles remain long after the operation has finished, failed, or been cancelled that retain an object in memory long after it was required. It may end up looking something like (and keep in mind this is the most simple case of two concurrent operations and a single "afterwards"):

my $one_done; my $one_result; my $one_failed; my $one_id;
my $two_done; my $two_result; my $two_failed; my $two_id;

my $finished = sub {
  undef $one_id; undef $two_id;
  FINISHED();
};

$one_id = ONE_CB(
  sub { $one_result = shift; $one_done++;
        $finished->() if $two_done; },
  sub { $one_failed++;
        TWO_CANCEL($two_id) if !$two_done; undef $two_id;
        FAILED() },
);
$two_id = TWO_CB(
  sub { $two_result = shift; $two_done++;
        $finished->() if $one_done; },
  sub { $two_failed++;
        ONE_CANCEL($one_id) if !$one_done; undef $one_id;
        FAILED() },
);

By comparison, the Future needs_all constructor neatly wraps up all this implicit behaviour, removing the control- and data-flow noise from the code, and much more concisely expressing its intent.

Future->needs_all(
  ONE_F(), TWO_F(),
)->then(sub {
  my ( $one_result, $two_result ) = @_;
  FINISHED($one_result, $two_result);
})->else(sub {
  FAILED();
})->get;

So, there we have it. In the past 24 posts we have seen how Futures can neatly express all the various kinds of control-flow logic we typically find in a Perl program, and also express the additional shapes of code we find useful when working with asynchronous and concurrent programming. This neatness ultimately comes from the fact that a Future object is a first-class value representing the operation itself, and being first-class comes the ability to combine it with others to produce new first-class values to represent combinations of this operation with others.

Futures allow the control- and data-flow structure of a program to be inherently expressed together, describing the dependency relationships between individual operations. Both successful results and failures are automatically propagated up from the atomic units that create them, through the various layers of logic up towards the topmost level of the program. Actions in progress can be abandoned when no longer required, causing a graceful cancellation of the activity that had been pending up until that time.

Futures change state from pending to complete when they are provided with a result, meaning that when they become ready they already have the results stored in them. This makes for convenient control-flow that coincides with data-flow; ensuring that the result of an operation is passed to the next operation in the sequence at the time it is executed. This convenient pairing of control- and data-flow stands in contrast to the split nature of other kinds of concurrency control, such as callback functions or locks and mutexes, which generally only manage the flow of control and require other techniques like lexical variables shared between multiple closures to provide the data flow. Such sharing of mutable state between domains of concurrency is the source of many kinds of concurrency bug which cannot happen with Futures.

In summary, Futures provide a useful abstraction to build all kinds of program logic on top of, whether it is initially intended to be asynchronous or not. Middle-level library modules especially will benefit from using Futures to express intent and combine actions together, as they will then automatically be able to make use of asynchronous and concurrent abilities of the base layers they are built from, without having to expressly depend on those being present.

<< First | < Prev

No comments:

Post a Comment