2022/06/25

A troubling thought - smartmatch reïmagined

Preface: This is less a concrete idea, and more a rambling set of thoughts that lead me to a somewhat awkward place. I'm writing it out here in the hope that others can lend suggestions and ideas, and see if we can arrive at a better place.

I've been thinking about comparison operators lately - somewhat in the context of my new Syntax::Operator::Elem / Syntax::Operator::In module, somewhat in the context of smartmatch and the planned deprecations thereof, and partly in the context of my new match/case syntax.

Smartmatch Deprecations

For years now, smartmatch has been an annoying thorny design, and recently we've started making moves to get rid of it. In my mind at least, this is because it has a large and complex behaviour that is often unpredictable in advance. There are two distinct reasons for this:

  1. It tries very hard to (recursively) distribute itself across container values on either side; saying that $x ~~ @y is true if any { $x ~~ $_ } @y for example; sometimes in ways that are surprising (e.g. how do you compare an array with a hash?)
  2. It acts unpredictably with mixed strings or numbers; because those concepts are very fluid in perl and aren't well-defined

match/case and New Infix Operators

I've lately been writing some new ideas for new infix operators that Perl might want; partly because they're useful on their own but also because they're useful combined with the match/case syntax provided by Syntax::Keyword::Match. Between them all, these are intended as a replacement for the given/when syntax and its troublesome smartmatch. For example, to select an option based on a string comparison you can

match($x : eq) {
  case("abc") { say "It was the string abc" }
  case("def") { say "It was the string def" }
  case($y)    { say "It was whatever string the variable $y gives" }
}

This is much more predictable than given/when and smartmatch, because the programmer declared right upfront that the eq operator is being used here; there's no smartmatch involved.

Initially this feels like a great improvement on given/when and ~~, but it has lots of tricky cornercases to it. For example, the given/when approach can easily handle undef, whereas match/case using only the eq operator cannot distinguish undef from "". For this reason, I invented a new infix operator, called equ (provided by Syntax::Operator::Equ), which can:

say "Equal" if $x equ $y;  # true if they're both undef, or both
                           #   defined and the same string

match($x : equ) {
  # these two cases are now distinct
  case(undef) { say "It was undefined" }
  case("")    { say "It was the empty string" }

  default     { say "It was something else" }
}

Plus of course it also defines a new === operator which performs the numerical equivalent, able to distinguish undef from zero.

Syntax::Operator::Elem

Another operator I felt was required was one that can test if a given string (or number) is present in a list. For that, I wrote Syntax::Operator::Elem:

say "Present" if $x elem @ys;  # stringy

say "Present" if $x ∈ @ys;     # numerical

(Yes, that really is an operator spelled with a non-ASCII Unicode character. No I will not apologise :P)

These operators too have the "oops, undef" problem about them - which lead me briefly to consider adding two more that consider undef/"" or undef/zero to be distinct. Maybe I'd call them elemu and ... er.. well, Unicode doesn't have a variant of the ∈ operator that can suggest undefness. It was about at that point that I stopped, and wondered if really we're going about this whole thing the right way at all.

Smartmatch Reïmagined

I begin to think that if we go right back to the beginning, we might find that a huge chunk of this is unnecessary, if only we can find a better model.

During the 5.35 development series and now released in 5.36, Perl core has two improvements to what some might call its "type system":

  • Real booleans - true and false are now first-class values distinct from 1 and zero/emptystring.
  • Tracking of whether defined, nonboolean, nonreferential values began as strings or numbers; even if they have since evolved to effectively be both.

It is now possible to classify any given scalar value into exactly one of the following five categories:

undef
boolean
initially string
initially number
reference

I start to wonder whether, therefore, we have enough basis to create a better version of what the smartmatch operator tried (but ultimately failed) to be. For sake of argument, since I've already used one Unicode symbol I'm going to use another for this new one: The triple-bar identity symbol, ≡.

Lets consider a few properties this ought to have. First off, it should be well-behaved as an equality operator; it should be reflexive, symmetric and transitive. That is, given any values $x, $y and $z, all three of the following must always hold:

$x ≡ $x  is true                     # reflexive
$x ≡ $y  is the same as  $y ≡ $x     # symmetric
if $x ≡ $y and $y ≡ $z then $x ≡ $z  # transitive

Additionally, I don't think it ought to have any sort of distributive properties like $x ~~ @arr has. That sort of distribution should be handled at a higher level. (For example, the proposed caselist syntax of match/case.)

Because it only operates on pairs of scalars, this is already a much simpler kind of operator to think about. Because of the fact we can classify perl scalar values into these neat five categories, we can already write down five simple rules for when both sides are given the same category of scalar:

UNDEF undef ≡ undef is true
BOOL $x ≡ $y is true if $x and $y are both true, or both false
STR $x ≡ $y is true if $x eq $y
NUM $x ≡ $y is true if $x == $y
REF $x ≡ $y is true if refaddr($x) == refaddr($y)

I'd also like to suggest a rule that given any pair of scalars of different categories, the result is always false. This means in particular, that undef is never ≡ to any defined value (but never warns), that no boolean is ever ≡ to any non-boolean, and no reference is ever ≡ to any non-reference. I don't think anyone would argue with that.

Already this operator feels useful, because of the way it neatly handles undef as distinct from any number or string, we now don't need the equ or === operators.

The one problem I have with this whole model is what do we do with STR ≡ NUM; how do we handle code like the following:

my $x = "10";
say "Equivalent" if $x ≡ 10;

By my first suggestion, this would always be false. While it's predictable and simple, I don't think it's very useful. It would mean that whenever you want to e.g. perform a numerical case comparison on a value taken from @ARGV, you always have to "numify" it by doing some ugly code like:

match(0 + $ARGV[0] : ≡) {
  case(1) { ... }
}

This does not feel very perlish.

So maybe we can find a more useful handling of STR vs NUM. I can already think of several bad ideas:

  • Pick the category on the righthand side
    Superficially this feels beneficial to the match/case syntax, but it soon falls down in a lot of other scenarios. Plus it is blatantly not symmetric, which we already decided any good equality test operator ought to be.
  • The operator throws an exception
    This doesn't feel like the right way to go. Having things like UNDEF, BOOL and REF already neatly just yield false, means that you can safely mix strings/numbers and undef in match/case labels, for example, and all is handled nicely. To have NUM-vs-UNDEF yield false but NUM-vs-STR throw an exception feels like a bad model. Plus it would not be transitive.

About the only sensible model I can think of in this mixed case, is to say that

NUM ≡ STR  is true if both `eq` and `==` would say true

It's reflexive and symmetric. It feels useful. It does (what most people would argue is) the right thing for "10" ≡ 10.

Still, something at the back of my mind feels wrong about this design for an operator. Some situation in which is will be Obviously Terrible, and thus bring the whole tower crashing down. Perhaps it isn't truely transitive - there might be some set of values for which it fails. Offhand I can't think of one, but maybe someone can find an example?

It's a shame, because if we did happen to find an operator like this, then I think combined with match/case syntax it could go a long way towards providing a far better replacement for smartmatch + given/when and additionally solve a lot of other problems in Perl all in one go.

I'm sorry I don't have a more concrete and specific message to say there, other than that I've given (and will continue to give) this a lot of thought, and that I invite comment and ideas from others on how we might further it towards something that can really work in Perl.

Thanks all for listening.

9 comments:

  1. How about special defined case syntax?

    match($x : eq) {
    case(defined) { say "It was defined" }
    case("abc") { say "It was the string abc" }
    case("def") { say "It was the string def" }
    case($y) { say "It was whatever string the variable $y gives" }
    }

    ReplyDelete
  2. What about `"10.0" ≡ 10`? A numeric `10` can't stringify to *both* "10" and "10.0", but intuitively these two values are the same.

    ReplyDelete
  3. Having to numify a string passed through @ARGV is not that big a loss, IMO, if the equivalence operator has so many other benefits. If this perlishness was needed, you could always fallback to ==.

    ReplyDelete
  4. Idle thought, the post mentions this:
    STR ≡ NUM ≝ STR eq NUM and STR == num

    So how about this:
    STR ≡ NUM ≝ STR eq NUM or STR == num

    The latter looks like it would solve the `"10.0" ≡ 10` case. What would it break though?

    ReplyDelete
  5. Sorry, I meant

    match($x : eq) {
    case_undef { say "It was undefined" }
    case("abc") { say "It was the string abc" }
    case("def") { say "It was the string def" }
    case($y) { say "It was whatever string the variable $y gives" }
    }

    case(defined) was meaningless.


    ReplyDelete
  6. Hi LeoNerd,

    Thanks for opening this discussion -- I would love to see a sensible, modest form of smartmatch in Perl that doesn't have the original problems of v5.10.1 but still offers something better than just deprecating it.

    There are several points in your proposal :

    1. ≡ operator : great idea, I love it, well thought, excellent compromise !

    2. match/case : here I'm disappointed because it's less flexible than given/when : it forces all cases to be compared through the same operator. I totally agree with the idea of dropping distributive behaviour and just dealing with scalars; but on scalars we might want several kinds of conditions : when (/regex/), when (length($_) < 4), when('foobar'), etc.

    3. elem/equ/etc operators : well, OK it saves a few characters to type, but the gain is quite small when compared to the more general way of expressing this as : any {$x eq $_} @list, or any {$x == $_} @list, or any {defined $_ and ...} @list, etc. It seems to me we don't really need new operators for this.

    ReplyDelete
  7. TL;DR: It's time to stop trying to reinvent "case" in Perl.
    Long story:
    https://www.reddit.com/r/perl/comments/vkzawl/comment/idyjrbw/?utm_source=share&utm_medium=web2x&context=3

    ReplyDelete
  8. Thank you for putting some very good thought behind this. I've always appreciated that smartmatch/when were doing the right thing when dealing with scalars only, and were always confusing when dealing with containers (arrays/hashes/objects). Your solution captures that correctly.

    I'll say that I like given() better than any other word to dictate the focus of the branch of logic, as it feels more appropriate for Perl. With that being said, I like the *functionality* more, so I'll take whatever keyword it needs to be.

    As for containers, it seems that there could be some keywords other than when() to try and express the right way to delve into them. Perhaps something like within() to search for a value in a container, having() to search for a key within a container, matches() for a regex, is()/isnot() for functions/code; not sure what keyword to suggest with objects.

    --Bob K

    ReplyDelete
  9. We already have implicit matching with regex using the // operator against the $_ variable.

    Would it be feasable to do the same with numeric comparisons? eg:

    match($value){
    when(/regex/) {...}; #we can implicitly match agains $_
    when(>10){ ...}; #so why not this,
    when(10<){...}; #or this,
    when($_ > 10){...}; #instead of this?
    }

    Also what about having two different 'matches' (with better names than my example below).

    First one that does the existing perl string/number/undef conversions:

    classic_match($value){
    when(undef){}; #Test for definedness of $value
    when(22){}; #numerify and ==. undef value becomes 0
    when(/23/){}; #stringify value then regex match. undef value becomes empty string
    when("df"){}; #Stringify $value and eq. undef value becomes empty string
    when(<10){}; #numerify $value and perform <. undef becomes 0
    default {};
    }

    A second one that uses new boolean logic and does not convert undef to empty strings or 0s

    new_match($value){
    when(undef){} #Test for definedness
    when(22){}; #numerify $value and perform == . undef NEVER matches
    when(/23/){}; #stringify value then regex match. undef NEVER matches
    when("df"){}; #Stringigy $value and eq. undef NEVER matches
    when(<10){}; #numerify $value and perform <. undef NEVER matches
    default {};
    }

    This would be in addition to the nice new operators proposed.

    Cheers

    --Ruben

    ReplyDelete