LeoNerd's programming thoughts: cpan

Showing posts with label cpan. Show all posts

2022/06/25

A troubling thought - smartmatch reïmagined

Preface: This is less a concrete idea, and more a rambling set of thoughts that lead me to a somewhat awkward place. I'm writing it out here in the hope that others can lend suggestions and ideas, and see if we can arrive at a better place.

I've been thinking about comparison operators lately - somewhat in the context of my new Syntax::Operator::Elem / Syntax::Operator::In module, somewhat in the context of smartmatch and the planned deprecations thereof, and partly in the context of my new match/case syntax.

Smartmatch Deprecations

For years now, smartmatch has been an annoying thorny design, and recently we've started making moves to get rid of it. In my mind at least, this is because it has a large and complex behaviour that is often unpredictable in advance. There are two distinct reasons for this:

It tries very hard to (recursively) distribute itself across container values on either side; saying that $x ~~ @y is true if any { $x ~~ $_ } @y for example; sometimes in ways that are surprising (e.g. how do you compare an array with a hash?)
It acts unpredictably with mixed strings or numbers; because those concepts are very fluid in perl and aren't well-defined

`match/case` and New Infix Operators

I've lately been writing some new ideas for new infix operators that Perl might want; partly because they're useful on their own but also because they're useful combined with the match/case syntax provided by Syntax::Keyword::Match. Between them all, these are intended as a replacement for the given/when syntax and its troublesome smartmatch. For example, to select an option based on a string comparison you can

match($x : eq) {
  case("abc") { say "It was the string abc" }
  case("def") { say "It was the string def" }
  case($y)    { say "It was whatever string the variable $y gives" }
}

This is much more predictable than given/when and smartmatch, because the programmer declared right upfront that the eq operator is being used here; there's no smartmatch involved.

Initially this feels like a great improvement on given/when and ~~, but it has lots of tricky cornercases to it. For example, the given/when approach can easily handle undef, whereas match/case using only the eq operator cannot distinguish undef from "". For this reason, I invented a new infix operator, called equ (provided by Syntax::Operator::Equ), which can:

say "Equal" if $x equ $y;  # true if they're both undef, or both
                           #   defined and the same string

match($x : equ) {
  # these two cases are now distinct
  case(undef) { say "It was undefined" }
  case("")    { say "It was the empty string" }

  default     { say "It was something else" }
}

Plus of course it also defines a new === operator which performs the numerical equivalent, able to distinguish undef from zero.

Syntax::Operator::Elem

Another operator I felt was required was one that can test if a given string (or number) is present in a list. For that, I wrote Syntax::Operator::Elem:

say "Present" if $x elem @ys;  # stringy

say "Present" if $x ∈ @ys;     # numerical

(Yes, that really is an operator spelled with a non-ASCII Unicode character. No I will not apologise :P)

These operators too have the "oops, undef" problem about them - which lead me briefly to consider adding two more that consider undef/"" or undef/zero to be distinct. Maybe I'd call them elemu and ... er.. well, Unicode doesn't have a variant of the ∈ operator that can suggest undefness. It was about at that point that I stopped, and wondered if really we're going about this whole thing the right way at all.

Smartmatch Reïmagined

I begin to think that if we go right back to the beginning, we might find that a huge chunk of this is unnecessary, if only we can find a better model.

During the 5.35 development series and now released in 5.36, Perl core has two improvements to what some might call its "type system":

Real booleans - true and false are now first-class values distinct from 1 and zero/emptystring.
Tracking of whether defined, nonboolean, nonreferential values began as strings or numbers; even if they have since evolved to effectively be both.

It is now possible to classify any given scalar value into exactly one of the following five categories:

undef

boolean

initially string

initially number

reference

I start to wonder whether, therefore, we have enough basis to create a better version of what the smartmatch operator tried (but ultimately failed) to be. For sake of argument, since I've already used one Unicode symbol I'm going to use another for this new one: The triple-bar identity symbol, ≡.

Lets consider a few properties this ought to have. First off, it should be well-behaved as an equality operator; it should be reflexive, symmetric and transitive. That is, given any values $x, $y and $z, all three of the following must always hold:

$x ≡ $x  is true                     # reflexive
$x ≡ $y  is the same as  $y ≡ $x     # symmetric
if $x ≡ $y and $y ≡ $z then $x ≡ $z  # transitive

Additionally, I don't think it ought to have any sort of distributive properties like $x ~~ @arr has. That sort of distribution should be handled at a higher level. (For example, the proposed caselist syntax of match/case.)

Because it only operates on pairs of scalars, this is already a much simpler kind of operator to think about. Because of the fact we can classify perl scalar values into these neat five categories, we can already write down five simple rules for when both sides are given the same category of scalar:

UNDEF	undef ≡ undef	is true
BOOL	$x ≡ $y	is true if $x and $y are both true, or both false
STR	$x ≡ $y	is true if $x eq $y
NUM	$x ≡ $y	is true if $x == $y
REF	$x ≡ $y	is true if refaddr($x) == refaddr($y)

I'd also like to suggest a rule that given any pair of scalars of different categories, the result is always false. This means in particular, that undef is never ≡ to any defined value (but never warns), that no boolean is ever ≡ to any non-boolean, and no reference is ever ≡ to any non-reference. I don't think anyone would argue with that.

Already this operator feels useful, because of the way it neatly handles undef as distinct from any number or string, we now don't need the equ or === operators.

The one problem I have with this whole model is what do we do with STR ≡ NUM; how do we handle code like the following:

my $x = "10";
say "Equivalent" if $x ≡ 10;

By my first suggestion, this would always be false. While it's predictable and simple, I don't think it's very useful. It would mean that whenever you want to e.g. perform a numerical case comparison on a value taken from @ARGV, you always have to "numify" it by doing some ugly code like:

match(0 + $ARGV[0] : ≡) {
  case(1) { ... }
}

This does not feel very perlish.

So maybe we can find a more useful handling of STR vs NUM. I can already think of several bad ideas:

Pick the category on the righthand side
Superficially this feels beneficial to the match/case syntax, but it soon falls down in a lot of other scenarios. Plus it is blatantly not symmetric, which we already decided any good equality test operator ought to be.
The operator throws an exception
This doesn't feel like the right way to go. Having things like UNDEF, BOOL and REF already neatly just yield false, means that you can safely mix strings/numbers and undef in match/case labels, for example, and all is handled nicely. To have NUM-vs-UNDEF yield false but NUM-vs-STR throw an exception feels like a bad model. Plus it would not be transitive.

About the only sensible model I can think of in this mixed case, is to say that

NUM ≡ STR  is true if both `eq` and `==` would say true

It's reflexive and symmetric. It feels useful. It does (what most people would argue is) the right thing for "10" ≡ 10.

Still, something at the back of my mind feels wrong about this design for an operator. Some situation in which is will be Obviously Terrible, and thus bring the whole tower crashing down. Perhaps it isn't truely transitive - there might be some set of values for which it fails. Offhand I can't think of one, but maybe someone can find an example?

It's a shame, because if we did happen to find an operator like this, then I think combined with match/case syntax it could go a long way towards providing a far better replacement for smartmatch + given/when and additionally solve a lot of other problems in Perl all in one go.

I'm sorry I don't have a more concrete and specific message to say there, other than that I've given (and will continue to give) this a lot of thought, and that I invite comment and ideas from others on how we might further it towards something that can really work in Perl.

Thanks all for listening.

2022/01/26

Perl in 2022 - A Yearly Update

At the end of 2020, I wrote a series of articles on the subject of recent CPAN modules that provide useful syntax, or recent core features added to perl. The series ended with a bonus post looking forward to imagine what new additions might one day appear. I followed this up with a video-based talk at FOSDEM, titled "Perl in 2025", with yet more ideas considering how a Perl might look in a few more years' time.

Over the past twelve months, I have made progress on several of these ideas. Four of them have already become CPAN modules and thus are available for writing in Perl in 2022:

match/case - Now available as Syntax::Keyword::Match.

match($n : ==) {
   case(1) { say "It's one" }
   case(2) { say "It's two" }
   case(3) { say "It's three" }
}

any, all - Now available as syntax-level keywords from List::Keywords.

if( any { $_->size > 100 } @boxes ) {
   say "There are some large boxes here";
}

multi sub - An early experiment in Syntax::Keyword::MultiSub.

multi sub max()          { return undef; }
multi sub max($x)        { return $x; }
multi sub max($x, @more) { my $y = max(@more);
                           return $x > $y ? $x : $y; }

equ, === - Available from Syntax::Operator::Equ, though at present is only usable via Syntax::Keyword::Match or a specially-patched version of perl.

if($x equ $y) {
   say "Both are undef, or defined and equal strings";
}
 
if($i === $j) {
   say "Both are undef, or defined and equal numbers";
}

Of the rest:

in - I have the beginnings of some code but it's not yet on CPAN as it again requires a patched version of perl for pluggable infix operators.
let and is - not started yet.

In addition, not mentioned in the original article, the latest development version of perl has gained:

defer blocks.

{
    say "This happens first";
    defer { say "This happens last"; }
 
    say "And this happens inbetween";
}

finally as part of try/catch.

try {
    say "This happens first";
}
catch ($e) {
    say "Oops, it failed";
}
finally {
    say "This happens last in either case";
}

The builtin:: namespace, providing many new utility functions that ought to have been considered part of the core language - copying utilities from places like Scalar::Util and POSIX, as well as providing some new ones.
```
say "The refaddr of my object is ", builtin::refaddr($obj);

use builtin 'ceil';
say "The next integer above the value is ", ceil($value);
```

Real boolean values. These will be useful in many places, such as data serialisation and cross-language conversion modules.

use builtin qw(true false isbool);

sub serialise($v) {
  return $v ? 'true' : 'false' if isbool $v;
  return qq("$v");
}

say join ",", map { serialise($_) }
    0, 1, false, true, 'true';

Overall I'm happy with progress so far. A lot of things have been started, laying much of the groundwork for more work that can follow. Behind the scenes all of these syntax modules are now using the XS::Parse::Keyword module to do the bulk of their parsing. This is great for getting something powerful written quickly, and has good properties in terms of interoperability between modules - for example, the way the new infix operators already work with the match/case syntax.

Core perl is on-track for a summer release as usual; hopefully that will provide the new defer and finally syntax, builtin functions and boolean values. I hope to have as much success in 2022 as I did in 2021 at writing more of these things, and with any luck I'll be able to write another article like this next year explaining what new progress has been achieved towards the Perl in 2025 goal.

2021/07/30

Perl UV binding hits version 2.000

Over the past few months I've been working on finishing off the libuv Perl binding module, UV. Yesterday I finally got it finished enough to feel like calling it version 2.000. Now's a good time to take a look at it.

libuv itself is a cross-platform event handling library, which focuses on providing nicely portable abstractions for things like TCP sockets, timers, and sub-process management between UNIX, Windows and other platforms. Traditionally things like event-based socket handling have always been difficult to write in a portable way between Windows and other places due to the very different ways things work on Windows as opposed to anywhere else. libuv provides a large number of helpful wrappers to write event-based code in a portable way, freeing the developer from having to care about these things.

A number of languages have nice bindings for libuv, but until recently there wasn't a good one for Perl. My latest project for The Perl Foundation aimed to fix this. The latest release of UV version 2.000 indicates that this is now done.

It's unlikely that most programs would choose to operate directly with UV itself, but rather via some higher-level event system. There are UV adapter modules for IO::Async (IO::Async::Loop::UV), Mojo (Mojo::Reactor::UV), and Future::IO (Future::IO::Impl::UV) at least.

The UV module certainly wraps much of what libuv has to offer, but there are still some parts missing. libuv can watch filesystems for changes of files, and provides asynchronous filesystem access access functions - both of these are currently missing from the Perl binding. Threadpools are an entire concept that doesn't map very well to the Perl language, so they are absent too. Finally, libuv lists an entire category of "miscellaneous functions", most of which are already available independently in Perl, so there seems little point to wrapping those provided by libuv.

Finally, we should take note of one thing that doesn't work - the UV::TCP->open and UV::UDP->open functions when running on Windows. The upshot here is that you cannot create TCP or UDP sockets in your application independently of libuv and then hand them over to be handled by the library; this is not permitted. This is because on Windows, there are fundamentally two different kinds of sockets that require two different sets of API to access them - ones using WSA_FLAG_OVERLAPPED, and ones not. libuv needs that flag in order to perform event-based IO on sockets, and so it won't work with sockets created without it - which is the usual kind that most other modules, and perl itself, will create. This means that on Windows, the only sockets you can use with the UV module are ones created by UV itself - such as by asking it to connect out to servers, or listen and accept incoming connections. Fortunately, this is sufficient for the vast majority of applications.

I would like to finish up by saying thanks to The Perl Foundation for funding me to complete this project.

2020/12/24

2020 Perl Advent Calendar - Day 24

<< First | < Prev

Over the course of this blog post series we have seen a number of syntax-providing modules from CPAN. Each of them sets out to neaten up some specific kind of structure often found in Perl code.

Future::AsyncAwait aims to neaten up asynchronous code flow, replacing older techniques like ->then method chaining and helper functions from Future::Utils by replacing them with regular Perl syntax.
Syntax::Keyword::Try brings the familiar try/catch pattern for handling exceptions, replacing more manual techniques involving eval {} blocks and inspecting the $@ variable.
Object::Pad provides an entire set of syntax keywords for managing classes of objects, allowing stateful object-oriented code to be neatly written without the risk of things like hash key collsions on $self->{...}.

Each one of these allows writing shorter, neater code that has less "machinery noise". With fewer distractions in the code it becomes clearer to see the detail of the specific situation the code is for. With less code to write there's less opportunity to introduce bugs.

Moreover we have seen that these syntax modules can be combined together, used in conjunction to allow even greater benefits. We saw on day 4 that try/catch control flow works within async sub, on day 22 that object methods can be marked as asynchronous with async method, and on day 23 we explored how the dynamically assignment syntax can be combined with objects, asynchronous functions, and even both at the same time.

The various code examples we've seen over the past 22 days or so have been written using these syntax modules, and also make use of Perl's signatures feature, and other things where possible, all to help in this regard. The shorter neatness that comes from not needing to write the line (or two) of code to unpack the function's arguments from @_ (and maybe the $self method invocant as well) removes yet another distraction and potential source of errors.

In summary: This series has been about what it feels like to write Perl code in the year 2020 - it has been about 2020 Perl. This is a language just as flexible and adaptable as Perl has ever been, yet still capable of any of the modern techniques common to other languages, which perhaps even the Perl of five or ten years ago was lacking in - neat function arguments, asynchronous control, exception handling, and syntax for object orientation. With all these new abilities, 2020 has been a great year for writing Perl code.

2022/06/25

Smartmatch Deprecations

match/case and New Infix Operators

Syntax::Operator::Elem

Smartmatch Reïmagined

2022/01/26

2021/07/30

2020/12/24

2020/12/23

2020/12/22

2020/12/21

2020/12/20

2020/12/19

2020/12/18

2020/12/17

2020/12/16

2020/12/15

2020/12/14

2020/12/13

2020/12/12

2020/12/11

2020/12/10

2020/12/09

`match/case` and New Infix Operators