tag:blogger.com,1999:blog-91125603382915743602024-03-13T15:48:06.225+00:00LeoNerd's programming thoughtsMy thoughts, ideas, and sometimes rants, on Perl, C, Linux, terminals,...LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.comBlogger162125tag:blogger.com,1999:blog-9112560338291574360.post-69636045216519438052023-08-31T12:34:00.000+01:002023-08-31T12:34:21.598+01:00 Building for new ATtiny 2-series chips on Debian <p>I have previously written about <a href="http://leonerds-code.blogspot.com/2019/06/building-for-new-attiny-1-series-chips.html">how to build code for the ATtiny 1-series chips on Debian</a>, outlining what files are missing from Debian in order to allow this. It seems, three years on, the same stuff is still missing - and moreso now that the new 2-series chips are available. Here now, is some more instructions on top of that to get code working for these newer chips as well.</p>
<p>As before, start off by downloading the "Atmel ATtiny Series Device Support" file from <a href="http://packs.download.atmel.com/">http://packs.download.atmel.com/</a>. This is a free and open download, licensed under Apache v2. This file carries the extension <tt>atpack</tt> but it's actually just a ZIP file.</p>
<p>Note that by default it'll unpack into the working directory, so you'll want to create a temporary folder to work in:</p>
<pre>$ mkdir pack
$ cd pack/
$ unzip ~/Atmel.ATtiny_DFP.2.0.368.atpack
Archive: /home/leo/Atmel.ATtiny_DFP.2.0.368.atpack
creating: atdf/
creating: avrasm/
creating: avrasm/inc/
...</pre>
<p>From here, you can now copy the relevant files out to where <tt>avr-gcc</tt> will find them:</p>
<pre>$ sudo cp include/avr/iotn?*2[467].h \
/usr/lib/avr/include/avr/
$ sudo cp gcc/dev/attiny?*2[467]/avrxmega3/*.{o,a} \
/usr/lib/avr/lib/avrxmega3/
$ sudo cp gcc/dev/attiny?*2[467]/avrxmega3/short-calls/*.{o,a} \
/usr/lib/avr/lib/avrxmega3/short-calls/</pre>
<p>Unlike last time, you'll also need the <tt>device-specs</tt> files for <tt>avr-gcc</tt> itself to understand the new chips. You'll have to find the exact path on your system where the existing ones are, and then copy the new ones in there:</p>
<pre>$ dpkg -S specs-atmega328
gcc-avr: /usr/lib/gcc/avr/5.4.0/device-specs/specs-atmega328
# So it appears to be /usr/lib/gcc/avr/5.4.0/device-specs
$ sudo cp gcc/dev/attiny?*2[467]/device-specs/* \
/usr/lib/gcc/avr/5.4.0/device-specs/</pre>
<p>Finally, there's one last task that needs doing. Locate the main <tt>avr/io.h</tt> file (it should live in <tt>/usr/lib/avr/include</tt>) and add the following lines somewhere within the main block of similar lines. These are needed to redirect from the toplevel <tt>#include <avr/io.h></tt> towards the device-specific file.</p>
<pre>#elif defined (__AVR_ATtiny424__)
# include <avr/iotn424.h>
#elif defined (__AVR_ATtiny426__)
# include <avr/iotn426.h>
#elif defined (__AVR_ATtiny427__)
# include <avr/iotn427.h>
#elif defined (__AVR_ATtiny824__)
# include <avr/iotn824.h>
#elif defined (__AVR_ATtiny826__)
# include <avr/iotn826.h>
#elif defined (__AVR_ATtiny827__)
# include <avr/iotn827.h>
#elif defined (__AVR_ATtiny1624__)
# include <avr/iotn1624.h>
#elif defined (__AVR_ATtiny1626__)
# include <avr/iotn1626.h>
#elif defined (__AVR_ATtiny1627__)
# include <avr/iotn1627.h>
#elif defined (__AVR_ATtiny3224__)
# include <avr/iotn3224.h>
#elif defined (__AVR_ATtiny3226__)
# include <avr/iotn3226.h>
#elif defined (__AVR_ATtiny3227__)
# include <avr/iotn3227.h></pre>
<p>Having done this we find we can now compile firmware for these new chips:</p>
<pre>avr-gcc -std=gnu99 -Wall -Os -DF_CPU=20000000 -mmcu=attiny824 -flto -ffunction-sections -fshort-enums -o .build/firmware_t824.elf src/main.c
avr-size .build/firmware_t824.elf
text data bss dec hex filename
3054 24 9 3087 c0f .build/firmware_t824.elf
avr-objcopy -j .text -j .rodata -j .data -O ihex .build/firmware_t824.elf firmware_t824-flash.hex</pre>
<p>Keep an eye on <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=930195">the Debian bug #930195</a>, as hopefully one day these steps will no longer be necessary.</p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-11506593751236719542022-06-25T21:08:00.000+01:002022-06-25T21:08:08.471+01:00A troubling thought - smartmatch reïmagined<blockquote>
Preface: This is less a concrete idea, and more a rambling set of thoughts that lead me to a somewhat awkward place. I'm writing it out here in the hope that others can lend suggestions and ideas, and see if we can arrive at a better place.
</blockquote>
<p>I've been thinking about comparison operators lately - somewhat in the context of my new <a href="https://metacpan.org/pod/Syntax::Operator::Elem"><tt>Syntax::Operator::Elem</tt></a> / <a href="https://metacpan.org/pod/Syntax::Operator::In"><tt>Syntax::Operator::In</tt></a> module, somewhat in the context of smartmatch and the planned deprecations thereof, and partly in the context of my new <tt>match/case</tt> syntax.</p>
<h2>Smartmatch Deprecations</h2>
<p>For years now, smartmatch has been an annoying thorny design, and recently we've started making moves to get rid of it. In my mind at least, this is because it has a large and complex behaviour that is often unpredictable in advance. There are two distinct reasons for this:</p>
<ol>
<li>It tries very hard to (recursively) distribute itself across container values on either side; saying that <tt>$x ~~ @y</tt> is true if <tt>any { $x ~~ $_ } @y</tt> for example; sometimes in ways that are surprising (e.g. how do you compare an array with a hash?)</li>
<li>It acts unpredictably with mixed strings or numbers; because those concepts are very fluid in perl and aren't well-defined</li>
</ol>
<h2><tt>match/case</tt> and New Infix Operators</h2>
<p>I've lately been writing some new ideas for new infix operators that Perl might want; partly because they're useful on their own but also because they're useful combined with the <tt>match/case</tt> syntax provided by <a href="https://metacpan.org/pod/Syntax::Keyword::Match"><tt>Syntax::Keyword::Match</tt></a>. Between them all, these are intended as a replacement for the <tt>given/when</tt> syntax and its troublesome smartmatch. For example, to select an option based on a string comparison you can</p>
<pre>
match($x : eq) {
case("abc") { say "It was the string abc" }
case("def") { say "It was the string def" }
case($y) { say "It was whatever string the variable $y gives" }
}
</pre>
<p>This is much more predictable than <tt>given/when</tt> and smartmatch, because the programmer declared right upfront that the <tt>eq</tt> operator is being used here; there's no smartmatch involved.</p>
<p>Initially this feels like a great improvement on <tt>given/when</tt> and <tt>~~</tt>, but it has lots of tricky cornercases to it. For example, the <tt>given/when</tt> approach can easily handle <tt>undef</tt>, whereas <tt>match/case</tt> using only the <tt>eq</tt> operator cannot distinguish <tt>undef</tt> from <tt>""</tt>. For this reason, I invented a new infix operator, called <tt>equ</tt> (provided by <a href="https://metacpan.org/pod/Syntax::Operator::Equ"><tt>Syntax::Operator::Equ</tt></a>), which can:</p>
<pre>
say "Equal" if $x equ $y; # true if they're both undef, or both
# defined and the same string
match($x : equ) {
# these two cases are now distinct
case(undef) { say "It was undefined" }
case("") { say "It was the empty string" }
default { say "It was something else" }
}
</pre>
<p>Plus of course it also defines a new <tt>===</tt> operator which performs the numerical equivalent, able to distinguish <tt>undef</tt> from zero.</p>
<h2>Syntax::Operator::Elem</h2>
<p>Another operator I felt was required was one that can test if a given string (or number) is present in a list. For that, I wrote <a href="https://metacpan.org/pod/Syntax::Operator::Elem"><tt>Syntax::Operator::Elem</tt></a>:</p>
<pre>
say "Present" if $x elem @ys; # stringy
say "Present" if $x ∈ @ys; # numerical
</pre>
<p>(Yes, that really is an operator spelled with a non-ASCII Unicode character. No I will not apologise :P)</p>
<p>These operators too have the "oops, undef" problem about them - which lead me briefly to consider adding two more that consider undef/"" or undef/zero to be distinct. Maybe I'd call them <tt>elemu</tt> and ... er.. well, Unicode doesn't have a variant of the ∈ operator that can suggest undefness. It was about at that point that I stopped, and wondered if really we're going about this whole thing the right way at all.</p>
<h2>Smartmatch Reïmagined</h2>
<p>I begin to think that if we go right back to the beginning, we might find that a huge chunk of this is unnecessary, if only we can find a better model.</p>
<p>During the 5.35 development series and now released in 5.36, Perl core has two improvements to what some might call its "type system":</p>
<ul>
<li>Real booleans - true and false are now first-class values distinct from 1 and zero/emptystring.</li>
<li>Tracking of whether defined, nonboolean, nonreferential values began as strings or numbers; even if they have since evolved to effectively be both.</li>
</ul>
<p>It is now possible to classify any given scalar value into <em>exactly</em> one of the following five categories:</p>
<table border=1>
<tr><td>undef</td></tr>
<tr><td>boolean</td></tr>
<tr><td>initially string</td></tr>
<tr><td>initially number</td></tr>
<tr><td>reference</td></tr>
</table>
<p>I start to wonder whether, therefore, we have enough basis to create a better version of what the smartmatch operator tried (but ultimately failed) to be. For sake of argument, since I've already used one Unicode symbol I'm going to use another for this new one: The triple-bar identity symbol, ≡.</p>
<p>Lets consider a few properties this ought to have. First off, it should be well-behaved as an equality operator; it should be reflexive, symmetric and transitive. That is, given any values <tt>$x</tt>, <tt>$y</tt> and <tt>$z</tt>, all three of the following must always hold:</p>
<pre>
$x ≡ $x is true # reflexive
$x ≡ $y is the same as $y ≡ $x # symmetric
if $x ≡ $y and $y ≡ $z then $x ≡ $z # transitive
</pre>
<p>Additionally, I don't think it ought to have any sort of distributive properties like <tt>$x ~~ @arr</tt> has. That sort of distribution should be handled at a higher level. (For example, the proposed <a href="https://rt.cpan.org/Ticket/Display.html?id=143482"><tt>caselist</tt></a> syntax of <tt>match/case</tt>.)</p>
<p>Because it only operates on pairs of scalars, this is already a much simpler kind of operator to think about. Because of the fact we can classify perl scalar values into these neat five categories, we can already write down five simple rules for when both sides are given the same category of scalar:</p>
<table border=1>
<tr>
<td>UNDEF</td>
<td>undef ≡ undef</td>
<td>is true</td>
</tr>
<tr>
<td>BOOL</td>
<td>$x ≡ $y</td>
<td>is true if $x and $y are both true, or both false</td>
</tr>
<tr>
<td>STR</td>
<td> $x ≡ $y</td>
<td>is true if $x eq $y</td>
</tr>
<tr>
<td>NUM</td>
<td>$x ≡ $y</td>
<td>is true if $x == $y</td>
</tr>
<tr>
<td>REF</td>
<td>$x ≡ $y</td>
<td>is true if refaddr($x) == refaddr($y)</td>
</tr>
</table>
<p>I'd also like to suggest a rule that given any pair of scalars of different categories, the result is <em>always</em> false. This means in particular, that <tt>undef</tt> is never ≡ to any defined value (but never warns), that no boolean is ever ≡ to any non-boolean, and no reference is ever ≡ to any non-reference. I don't think anyone would argue with that.</p>
<p>Already this operator feels useful, because of the way it neatly handles <tt>undef</tt> as distinct from any number or string, we now don't need the <tt>equ</tt> or <tt>===</tt> operators.</p>
<p>The one problem I have with this whole model is what do we do with <tt>STR ≡ NUM</tt>; how do we handle code like the following:</p>
<pre>
my $x = "10";
say "Equivalent" if $x ≡ 10;
</pre>
<p>By my first suggestion, this would always be false. While it's predictable and simple, I don't think it's very useful. It would mean that whenever you want to e.g. perform a numerical case comparison on a value taken from <tt>@ARGV</tt>, you always have to "numify" it by doing some ugly code like:</p>
<pre>
match(0 + $ARGV[0] : ≡) {
case(1) { ... }
}
</pre>
<p>This does not feel very perlish.</p>
<p>So maybe we can find a more useful handling of STR vs NUM. I can already think of several bad ideas:</p>
<ul>
<li>Pick the category on the righthand side<br/>
Superficially this feels beneficial to the <tt>match/case</tt> syntax, but it soon falls down in a lot of other scenarios. Plus it is blatantly not symmetric, which we already decided any good equality test operator ought to be.</li>
<li>The operator throws an exception<br/>
This doesn't feel like the right way to go. Having things like UNDEF, BOOL and REF already neatly just yield false, means that you can safely mix strings/numbers and <tt>undef</tt> in <tt>match/case</tt> labels, for example, and all is handled nicely. To have NUM-vs-UNDEF yield false but NUM-vs-STR throw an exception feels like a bad model. Plus it would not be transitive.</li>
</ul>
<p>About the only sensible model I can think of in this mixed case, is to say that</p>
<pre>
NUM ≡ STR is true if both `eq` and `==` would say true
</pre>
<p>It's reflexive and symmetric. It feels useful. It does (what most people would argue is) the right thing for <tt>"10" ≡ 10</tt>.</p>
<p>Still, something at the back of my mind feels wrong about this design for an operator. Some situation in which is will be Obviously Terrible, and thus bring the whole tower crashing down. Perhaps it isn't truely transitive - there might be some set of values for which it fails. Offhand I can't think of one, but maybe someone can find an example?</p>
<p>It's a shame, because if we did happen to find an operator like this, then I think combined with <tt>match/case</tt> syntax it could go a long way towards providing a far better replacement for <tt>smartmatch</tt> + <tt>given/when</tt> and additionally solve a lot of other problems in Perl all in one go.</p>
<p>I'm sorry I don't have a more concrete and specific message to say there, other than that I've given (and will continue to give) this a lot of thought, and that I invite comment and ideas from others on how we might further it towards something that can really work in Perl.</p>
<p>Thanks all for listening.</p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com9tag:blogger.com,1999:blog-9112560338291574360.post-56222634795495675192022-01-26T15:34:00.002+00:002022-02-04T13:29:33.249+00:00Perl in 2022 - A Yearly Update<p>At the end of 2020, I wrote <a href="/2020/12/2020-perl-advent-calendar-day-1.html">a series of articles</a> on the subject of recent CPAN modules that provide useful syntax, or recent core features added to perl. The series ended with <a href="/2020/12/2020-perl-advent-calendar-day-25.html">a bonus post</a> looking forward to imagine what new additions might one day appear. I followed this up with a video-based talk at FOSDEM, titled <a href="https://archive.fosdem.org/2021/schedule/event/perl_in_2025/">"Perl in 2025"</a>, with yet more ideas considering how a Perl might look in a few more years' time.</p>
<p>Over the past twelve months, I have made progress on several of these ideas. Four of them have already become CPAN modules and thus are available for writing in Perl in 2022:</p>
<ul>
<li><b>match/case</b> - Now available as <a href="https://metacpan.org/pod/Syntax::Keyword::Match"><tt>Syntax::Keyword::Match</tt></a>.
<pre>
match($n : ==) {
case(1) { say "It's one" }
case(2) { say "It's two" }
case(3) { say "It's three" }
}</pre></li>
<li><b>any, all</b> - Now available as syntax-level keywords from <a href="https://metacpan.org/pod/List::Keywords"><tt>List::Keywords</tt></a>.
<pre>
if( any { $_->size > 100 } @boxes ) {
say "There are some large boxes here";
}</pre></li>
<li><b>multi sub</b> - An early experiment in <a href="https://metacpan.org/pod/Syntax::Keyword::MultiSub"><tt>Syntax::Keyword::MultiSub</tt></a>.
<pre>
multi sub max() { return undef; }
multi sub max($x) { return $x; }
multi sub max($x, @more) { my $y = max(@more);
return $x > $y ? $x : $y; }</pre></li>
<li><b>equ, ===</b> - Available from <a href="https://metacpan.org/pod/Syntax::Operator::Equ"><tt>Syntax::Operator::Equ</tt></a>, though at present is only usable via <tt>Syntax::Keyword::Match</tt> or a specially-patched version of perl.
<pre>
if($x equ $y) {
say "Both are undef, or defined and equal strings";
}
if($i === $j) {
say "Both are undef, or defined and equal numbers";
}</pre></li>
</ul>
<p>Of the rest:</p>
<ul>
<li><b>in</b> - I have the beginnings of some code but it's not yet on CPAN as it again requires a patched version of perl for pluggable infix operators.</li>
<li><b>let</b> and <b>is</b> - not started yet.</li>
</ul>
<p>In addition, not mentioned in the original article, the latest development version of perl has gained:</p>
<ul>
<li><b>defer</b> blocks.
<pre>
{
say "This happens first";
defer { say "This happens last"; }
say "And this happens inbetween";
}</pre></li>
<li><b>finally</b> as part of <b>try/catch</b>.
<pre>
try {
say "This happens first";
}
catch ($e) {
say "Oops, it failed";
}
finally {
say "This happens last in either case";
}</pre></li>
<li>The <b>builtin::</b> namespace, providing many new utility functions that ought to have been considered part of the core language - copying utilities from places like <tt>Scalar::Util</tt> and <tt>POSIX</tt>, as well as providing some new ones.
<pre>
say "The refaddr of my object is ", builtin::refaddr($obj);
use builtin 'ceil';
say "The next integer above the value is ", ceil($value);</pre></li>
<li>Real <b>boolean</b> values. These will be useful in many places, such as data serialisation and cross-language conversion modules.
<pre>
use builtin qw(true false isbool);
sub serialise($v) {
return $v ? 'true' : 'false' if isbool $v;
return qq("$v");
}
say join ",", map { serialise($_) }
0, 1, false, true, 'true';</pre></li>
</ul>
<p>Overall I'm happy with progress so far. A lot of things have been started, laying much of the groundwork for more work that can follow. Behind the scenes all of these syntax modules are now using the <a href="https://metacpan.org/pod/XS::Parse::Keyword"><tt>XS::Parse::Keyword</tt></a> module to do the bulk of their parsing. This is great for getting something powerful written quickly, and has good properties in terms of interoperability between modules - for example, the way the new infix operators already work with the <tt>match/case</tt> syntax.</p>
<p>Core perl is on-track for a summer release as usual; hopefully that will provide the new <tt>defer</tt> and <tt>finally</tt> syntax, <tt>builtin</tt> functions and boolean values. I hope to have as much success in 2022 as I did in 2021 at writing more of these things, and with any luck I'll be able to write another article like this next year explaining what new progress has been achieved towards the Perl in 2025 goal.</p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com5tag:blogger.com,1999:blog-9112560338291574360.post-83662381054455036782021-07-30T12:35:00.001+01:002021-07-30T12:35:21.221+01:00Perl UV binding hits version 2.000<p>Over the past few months I've been working on finishing off the <a href="https://github.com/libuv/libuv"><tt>libuv</tt></a> Perl binding module, <a href="https://metacpan.org/pod/UV">UV</a>. Yesterday I finally got it finished enough to feel like calling it version 2.000. Now's a good time to take a look at it.</p>
<p><tt>libuv</tt> itself is a cross-platform event handling library, which focuses on providing nicely portable abstractions for things like TCP sockets, timers, and sub-process management between UNIX, Windows and other platforms. Traditionally things like event-based socket handling have always been difficult to write in a portable way between Windows and other places due to the very different ways things work on Windows as opposed to anywhere else. <tt>libuv</tt> provides a large number of helpful wrappers to write event-based code in a portable way, freeing the developer from having to care about these things.</p>
<p>A number of languages have nice bindings for <tt>libuv</tt>, but until recently there wasn't a good one for Perl. My latest project for The Perl Foundation aimed to fix this. The latest release of UV version 2.000 indicates that this is now done.</p>
<p>It's unlikely that most programs would choose to operate directly with UV itself, but rather via some higher-level event system. There are UV adapter modules for <a href="https://metacpan.org/pod/IO::Async">IO::Async</a> (<a href="https://metacpan.org/pod/IO::Async::Loop::UV">IO::Async::Loop::UV</a>), <a href="https://metacpan.org/pod/Mojo::IOLoop">Mojo</a> (<a href="https://metacpan.org/pod/Mojo::Reactor::UV">Mojo::Reactor::UV</a>), and <a href="https://metacpan.org/pod/Future::IO">Future::IO</a> (<a href="https://metacpan.org/pod/Future::IO::Impl::UV">Future::IO::Impl::UV</a>) at least.</p>
<p>The UV module certainly wraps much of what <tt>libuv</tt> has to offer, but there are still some parts missing. <tt>libuv</tt> can <a href="http://docs.libuv.org/en/v1.x/fs_event.html">watch filesystems</a> for changes of files, and provides <a href="http://docs.libuv.org/en/v1.x/fs.html">asynchronous filesystem access access functions</a> - both of these are currently missing from the Perl binding. <a href="http://docs.libuv.org/en/v1.x/threadpool.html">Threadpools</a> are an entire concept that doesn't map very well to the Perl language, so they are absent too. Finally, <tt>libuv</tt> lists an entire category of "<a href="http://docs.libuv.org/en/v1.x/misc.html">miscellaneous functions</a>", most of which are already available independently in Perl, so there seems little point to wrapping those provided by <tt>libuv</tt>.
<p>Finally, we should take note of one thing that doesn't work - the <a href="https://metacpan.org/pod/UV::TCP#open"><tt>UV::TCP->open</tt></a> and <a href="https://metacpan.org/pod/UV::UDP#open"><tt>UV::UDP->open</tt></a> functions when running on Windows. The upshot here is that you cannot create TCP or UDP sockets in your application independently of <tt>libuv</tt> and then hand them over to be handled by the library; this is not permitted. This is because on Windows, there are fundamentally two different kinds of sockets that require two different sets of API to access them - ones using <tt>WSA_FLAG_OVERLAPPED</tt>, and ones not. <tt>libuv</tt> needs that flag in order to perform event-based IO on sockets, and so it won't work with sockets created without it - which is the usual kind that most other modules, and perl itself, will create. This means that on Windows, the only sockets you can use with the <tt>UV</tt> module are ones created by <tt>UV</tt> itself - such as by asking it to connect out to servers, or listen and accept incoming connections. Fortunately, this is sufficient for the vast majority of applications.</p>
<p>I would like to finish up by saying thanks to <a href="https://www.perlfoundation.org/">The Perl Foundation</a> for funding me to complete this project.</p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com4tag:blogger.com,1999:blog-9112560338291574360.post-81699506143318903072021-02-26T13:00:00.082+00:002021-02-26T13:00:03.551+00:00Writing a Perl Core Feature - part 11: Core modules<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-10.html">< Prev</a>
<p>Our new feature is now implemented, tested, and documented. There's just one last thing we need to do - update the bundled modules that come with core. Specifically, because we've added some new syntax, we need to update <tt>B::Deparse</tt> to be able to deparse it.</p>
<p>When the <tt>isa</tt> operator was added, the deparse module needed to be informed about the new <tt>OP_ISA</tt> opcode, in this small addition: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-b5c60c7219d9cb0213f5568513fbc55b14b886c45104767fe47a9c1fe3352f89">(github.com/Perl/perl5)</a>.</p>
<pre>
--- a/lib/B/Deparse.pm
+++ b/lib/B/Deparse.pm
@@ -52,7 +52,7 @@ use B qw(class main_root main_start main_cv svref_2object opnumber perlstring
MDEREF_SHIFT
);
-$VERSION = '1.51';
+$VERSION = '1.52';
use strict;
our $AUTOLOAD;
use warnings ();
@@ -3060,6 +3060,8 @@ sub pp_sge { binop(@_, "ge", 15) }
sub pp_sle { binop(@_, "le", 15) }
sub pp_scmp { maybe_targmy(@_, \&binop, "cmp", 14) }
+sub pp_isa { binop(@_, "isa", 15) }
+
sub pp_sassign { binop(@_, "=", 7, SWAP_CHILDREN) }
sub pp_aassign { binop(@_, "=", 7, SWAP_CHILDREN | LIST_CONTEXT) }
</pre>
<p>As you can see it's quite a small addition here; we just need to add a new method to the main <tt>B::Deparse</tt> package named after the new opcode. This new method calls down to the common <tt>binop</tt> function which is shared by the various binary operators, and recurses down parts of the optree, returning a combined result using the <tt>"isa"</tt> string in between the two parts.</p>
<p>A more complex addition was made with the <tt>try</tt> syntax, as can be seen at <a href="https://github.com/Perl/perl5/commit/683e0651b057a7be4b2765ceb3d9f6617cd4c464#diff-b5c60c7219d9cb0213f5568513fbc55b14b886c45104767fe47a9c1fe3352f89">(github.com/Perl/perl5)</a>; abbreviated here:</p>
<pre>
+sub pp_leavetrycatch {
+ my $self = shift;
+ my ($op) = @_;
...
+ my $trycode = scopeop(0, $self, $tryblock);
+ my $catchvar = $self->padname($catch->targ);
+ my $catchcode = scopeop(0, $self, $catchblock);
+
+ return "try {\n\t$trycode\n\b}\n" .
+ "catch($catchvar) {\n\t$catchcode\n\b}\cK";
+}
</pre>
<p>As before, this adds a new method named after the new opcode (in the case of the <tt>try/catch</tt> syntax this is named <tt>OP_LEAVETRYCATCH</tt>). The body of this method too just recurses down to parts of the sub-tree it was passed; in this case being two scope ops for the bodies of the blocks, plus a lexical variable name for the catch variable. The method then again returns a new string combining the various parts together along with the required braces, linefeeds, and indentation hints.</p>
<p>We can tell we need to add this for our new <tt>banana</tt> feature, as currently this does not deparse properly:</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -Mexperimental=banana -MO=Deparse -ce 'print ban "Hello, world" ana;'
unexpected OP_BANANA at lib/B/Deparse.pm line 1664.
BEGIN {${^WARNING_BITS} = "\x10\x01\x00\x00\x00\x50\x04\x00\x00\x00\x00\x00\x00\x55\x51\x55\x50\x51\x45\x00"}
use feature 'banana';
print XXX;
-e syntax OK
</pre>
<p>We'll fix this by adding a new <tt>pp_banana</tt> in an appropriate place, perhaps just after the ones for <tt>lc</tt>/<tt>uc</tt>/<tt>fc</tt>. Don't forget to bump the <tt>$VERSION</tt> number too:</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ nvim lib/B/Deparse.pm
leo@shy:~/src/bleadperl/perl [git]
$ git diff
diff --git a/lib/B/Deparse.pm b/lib/B/Deparse.pm
index 67147f12dd..f6039a435d 100644
--- a/lib/B/Deparse.pm
+++ b/lib/B/Deparse.pm
@@ -52,7 +52,7 @@ use B qw(class main_root main_start main_cv svref_2object opnumber perlstring
MDEREF_SHIFT
);
-$VERSION = '1.56';
+$VERSION = '1.57';
use strict;
our $AUTOLOAD;
use warnings ();
@@ -2824,6 +2824,13 @@ sub pp_lc { dq_unop(@_, "lc") }
sub pp_quotemeta { maybe_targmy(@_, \&dq_unop, "quotemeta") }
sub pp_fc { dq_unop(@_, "fc") }
+sub pp_banana {
+ my $self = shift;
+ my ($op, $cx) = @_;
+ my $kid = $op->first;
+ return "ban " . $self->deparse($kid, 1) . " ana";
+}
+
sub loopex {
my $self = shift;
my ($op, $cx, $name) = @_;
</pre>
<p>This new function recurses down to <tt>deparse</tt> for the subtree, and returns a new string wrapped in the appropriate syntax for it. That should be all we need:</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -Mexperimental=banana -MO=Deparse -ce 'print ban "Hello, world" ana;'
BEGIN {${^WARNING_BITS} = "\x10\x01\x00\x00\x00\x50\x04\x00\x00\x00\x00\x00\x00\x55\x51\x55\x50\x51\x45\x00"}
use feature 'banana';
print ban 'Hello, world' ana;
-e syntax OK
</pre>
<p>Of course, this being a perl module we should remember to update its unit tests.</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ git diff lib/B/Deparse.t
diff --git a/lib/B/Deparse.t b/lib/B/Deparse.t
index 24eb445041..0fe6940cb3 100644
--- a/lib/B/Deparse.t
+++ b/lib/B/Deparse.t
@@ -3171,3 +3171,10 @@ try {
catch($var) {
SECOND();
}
+####
+# banana
+# CONTEXT use feature 'banana'; no warnings 'experimental::banana';
+ban 'literal' ana;
+ban $a ana;
+ban $a . $b ana;
+ban "stringify $a" ana;
leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness lib/B/Deparse.t
../lib/B/Deparse.t .. ok
All tests successful.
Files=1, Tests=321, 9 wallclock secs ( 0.14 usr 0.00 sys + 8.99 cusr 0.38 csys = 9.51 CPU)
Result: PASS
</pre>
<p>Because in <a href="/2021/02/writing-perl-core-feature-part-10.html">part 10</a> we added documentation for a new function in <tt>pod/perlfunc.pod</tt> there's another test that needs updating:</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness ext/Pod-Functions/t/Functions.t
../ext/Pod-Functions/t/Functions.t .. 1/?
# Failed test 'run as plain program'
# at t/Functions.t line 55.
# got: '
...
Result: FAIL
</pre>
<p>We can fix that by adding the new function to the expected list in the test file itself:</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ nvim ext/Pod-Functions/t/Functions.t
leo@shy:~/src/bleadperl/perl [git]
$ git diff ext/Pod-Functions/t/Functions.t
diff --git a/ext/Pod-Functions/t/Functions.t b/ext/Pod-Functions/t/Functions.t
index 2beccc1ac6..4d5b03e978 100644
--- a/ext/Pod-Functions/t/Functions.t
+++ b/ext/Pod-Functions/t/Functions.t
@@ -76,7 +76,7 @@ Functions.t - Test Pod::Functions
__DATA__
Functions for SCALARs or strings:
- chomp, chop, chr, crypt, fc, hex, index, lc, lcfirst,
+ ban, chomp, chop, chr, crypt, fc, hex, index, lc, lcfirst,
length, oct, ord, pack, q/STRING/, qq/STRING/, reverse,
rindex, sprintf, substr, tr///, uc, ucfirst, y///
leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness ext/Pod-Functions/t/Functions.t
../ext/Pod-Functions/t/Functions.t .. ok
All tests successful.
Files=1, Tests=234, 1 wallclock secs ( 0.04 usr 0.01 sys + 0.23 cusr 0.00 csys = 0.28 CPU)
Result: PASS
</pre>
<hr/>
<p>At this point, we're done. We've now completed all the steps to add a new feature to the perl interpreter. As well as all the steps required to actually implement it in the core binary itself, we've updated the tests, documentation, and support modules to match.</p>
<p>Along the way we've seen examples from real commits into the perl tree while we made our own. Any particular design of new feature will of course have its own variations and differences - there's still many parts of the interpreter we haven't touched on in this series. It would be difficult to try to cover all the possible ideas of things that could be added or changed, but hopefully having completed this series you'll at least have a good overview of the main pieces that are likely to be involved, and have some starting-off points to explore further to see whatever additional details might be required for whatever situation you encounter.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-10.html">< Prev</a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com1tag:blogger.com,1999:blog-9112560338291574360.post-51294249840403500882021-02-24T13:00:00.078+00:002021-02-26T17:29:14.428+00:00Writing a Perl Core Feature - part 10: Documentation<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-9.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-11-core.html">Next ></a></p>
<p>Now that have our new feature nicely implemented and tested, we're nearly finished. We just have a few more loose ends to tidy up. The first of these is to take a look at some documentation.</p>
<p>We've already done one small documentation addition to <tt>perldiag.pod</tt> when we added the new warning message, but the bulk of documentation to explain a new feature would likely be found in one of the main documents - <tt>perlsyn.pod</tt>, <tt>perlop.pod</tt>, <tt>perlfunc.pod</tt> or similar. Exactly which of these is best would depend on the nature of the specific feature.</p>
<p>The <tt>isa</tt> feature, being a new infix operator, was documented in <tt>perlop.pod</tt>: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-45f15865d451b10224b02f3ceeb1335151554ca8b1c03c1d978d4aef590eafb6">(github.com/Perl/perl5)</a>.</p>
<pre>
...
+=head2 Class Instance Operator
+X<isa operator>
+
+Binary C<isa> evaluates to true when left argument is an object instance of
+the class (or a subclass derived from that class) given by the right argument.
+If the left argument is not defined, not a blessed object instance, or does
+not derive from the class given by the right argument, the operator evaluates
+as false. The right argument may give the class either as a barename or a
+scalar expression that yields a string class name:
+
+ if( $obj isa Some::Class ) { ... }
+
+ if( $obj isa "Different::Class" ) { ... }
+ if( $obj isa $name_of_class ) { ... }
+
+This is an experimental feature and is available from Perl 5.31.6 when enabled
+by C<use feature 'isa'>. It emits a warning in the C<experimental::isa>
+category.
</pre>
<p>Lets now write a little bit of documentation for our new <tt>banana</tt> feature. Since it is a named function-like operator (though with odd syntax involving a second trailing named keyword), perhaps we'll write it in <tt>perlfunc.pod</tt>. We'll style it similarly to the case-changing functions <tt>lc</tt> and <tt>uc</tt> to get some suggested wording.</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ nvim pod/perlfunc.pod
leo@shy (1 job):~/src/bleadperl/perl [git]
$ git diff | xml_escape
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index b655a08ecc..319e9aab96 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -114,6 +114,7 @@ X<scalar> X<string> X<character>
=for Pod::Functions =String
+L<C<ban>|/ban EXPR ana>,
L<C<chomp>|/chomp VARIABLE>, L<C<chop>|/chop VARIABLE>,
L<C<chr>|/chr NUMBER>, L<C<crypt>|/crypt PLAINTEXT,SALT>,
L<C<fc>|/fc EXPR>, L<C<hex>|/hex EXPR>,
@@ -136,6 +137,10 @@ prefixed with C<CORE::>. The
L<C<"fc"> feature|feature/The 'fc' feature> is enabled automatically
with a C<use v5.16> (or higher) declaration in the current scope.
+L<C<ban>|/ban EXPR ana> is available only if the
+L<C<"banana"> feature|feature/The 'banana' feature.> is enabled or if it is
+prefixed with C<CORE::>.
+
=item Regular expressions and pattern matching
X<regular expression> X<regex> X<regexp>
@@ -773,6 +778,15 @@ your L<atan2(3)> manpage for more information.
Portability issues: L<perlport/atan2>.
+=item ban EXPR ana
+X<ban>
+
+=for Pod::Functions return ROT13 transformed version of a string
+
+Applies the "ROT13" transform to upper- and lower-case letters in the given
+expression string, returning the newly-formed string. Non-letter characters
+are left unchanged.
+
=item bind SOCKET,NAME
X<bind>
</pre>
<p>While this will do as a short example here, any real feature would likely have a lot more words to say than just this.</p>
<p>When editing POD files it's good to get into the habit of running the porting tests (or at least the POD checking ones) before committing, to check the formatting is valid:</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness t/porting/pod*.t
porting/podcheck.t ... ok
porting/pod_rules.t .. ok
All tests successful.
Files=2, Tests=1472, 34 wallclock secs ( 0.20 usr 0.00 sys + 33.79 cusr 0.15 csys = 34.14 CPU)
Result: PASS
</pre>
<p>While I was writing this documentation it occurred to me to write about how the function handles Unicode characters vs byte strings, so I was thinking more about how it actually does. It turns out the implementation doesn't work properly for that, as we can demonstrate with a new test:</p>
<pre>
--- a/t/op/banana.t
+++ b/t/op/banana.t
@@ -11,7 +11,7 @@ use strict;
use feature 'banana';
no warnings 'experimental::banana';
-plan 7;
+plan 8;
is(ban "ABCD" ana, "NOPQ", 'Uppercase ROT13');
is(ban "abcd" ana, "nopq", 'Lowercase ROT13');
@@ -23,3 +23,8 @@ my $str = "efgh";
is(ban $str ana, "rstu", 'Lexical variable');
is(ban $str . "IJK" ana, "rstuVWX", 'Concat expression');
is("(" . ban "LMNO" ana . ")", "(YZAB)", 'Outer concat');
+
+{
+ use utf8;
+ is(ban "café" ana, "pnsé", 'Unicode string');
+}
leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness t/op/banana.t
op/banana.t .. 1/8 # Failed test 8 - Unicode string at op/banana.t line 29
# got "pnsé"
# expected "pns�"
op/banana.t .. Failed 1/8 subtests
</pre>
<p>This comes down to a bug in the <tt>pp_banana</tt> opcode function, which used the internal byte buffer of the incoming SV (<tt>SvPV</tt>) without inspecting the corresponding <tt>SvUTF8</tt> flag. Such a pattern is always indicative of a Unicode support bug. We can easily fix this:</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ git diff pp.c
diff --git a/pp.c b/pp.c
index 9725806b84..3dbe21fadd 100644
--- a/pp.c
+++ b/pp.c
@@ -7211,6 +7211,8 @@ PP(pp_banana)
s = SvPV(arg, len);
mPUSHs(newSVpvn_rot13(s, len));
+ if(SvUTF8(arg))
+ SvUTF8_on(TOPs);
RETURN;
}
leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness t/op/banana.t
op/banana.t .. ok
All tests successful.
Files=1, Tests=8, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.02 cusr 0.00 csys = 0.04 CPU)
Result: PASS
</pre>
<p>Writing good documentation is an integral part of the process of developing a new feature. Firstly it helps to explain the feature to users so they know how to use it. But often you find that the process of writing the words helps you think about different aspects of that feature that you may not have considered before. With that new frame of mind you sometimes discover missing parts to it, or uncover bugs or cornercases that need fixing. Make sure to spend time working on the documentation for any new feature - it is said that you never truely understand something until you try teach it to someone else.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-9.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-11-core.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-45957176407051990242021-02-22T13:00:00.090+00:002021-02-24T16:34:20.667+00:00Writing a Perl Core Feature - part 9: Tests<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-8.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-10.html">Next ></a></p>
<p>By the end of <a href="/2021/02/writing-perl-core-feature-part-8.html">part 8</a> we finally managed to see an actual implementation of our new feature. We tested a couple of things on the commandline directly to see that it seems to be doing the right thing. For a real core feature though it would be better to have it tested in a more automated, repeatable fashion. This is what the core unit tests are for.</p>
<p>The core perl source distribution contains a <tt>t/</tt> directory with unit test files, very similar to the structure used by regular CPAN modules. The process for running these is a little different; as we already saw back in <a href="/2021/02/writing-perl-core-feature-part-3.html">part 3</a> they need to be invoked by <tt>t/harness</tt>. The files themselves are somewhat more limited in what other modules they can <tt>use</tt>, so the full suite of <tt>Test::</tt> modules are unavailable. But still they are expected to emit the regular TAP output we've come to expect from Perl unit tests, and tend to be structured quite similarly inside.</p>
<p>For example, the <tt>isa</tt> feature added an entire new file for its unit tests. As they all relate to the new syntax and semantics around a new opcode, they go in a file under the <tt>t/op</tt> directory. I won't paste the entire <tt>t/op/isa.t</tt> file, but consider this small section: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-414d6862e3a9c06d10c2798e20b812a6ae9503a14cf7da20d4da1fd1e548c306">(github.com/Perl/perl5)</a>:</p>
<pre>#!./perl
BEGIN {
chdir 't' if -d 't';
require './test.pl';
set_up_inc('../lib');
require Config;
}
use strict;
use feature 'isa';
no warnings 'experimental::isa';
...
my $baseobj = bless {}, "BaseClass";
# Bareword package name
ok($baseobj isa BaseClass, '$baseobj isa BaseClass');
ok(not($baseobj isa Another::Class), '$baseobj is not Another::Class');</pre>
<p>While it doesn't use <tt>Test::More</tt>, it does still have access to some similar testing functions such as the <tt>ok</tt> test. The initial lines of boilerplate in the <tt>BEGIN</tt> block set up the testing functions from the <tt>test.pl</tt> script, so we can use them in the actual tests.</p>
<p>Lets now have a go at writing some tests for our new <tt>banana</tt> feature. As it works like a text transformation function we can imagine a few different test strings to throw at it.</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim t/op/banana.t
leo@shy:~/src/bleadperl/perl [git]
$ cat t/op/banana.t
#!./perl
BEGIN {
chdir 't' if -d 't';
require './test.pl';
set_up_inc('../lib');
require Config;
}
use strict;
use feature 'banana';
no warnings 'experimental::banana';
plan 7;
is(ban "ABCD" ana, "NOPQ", 'Uppercase ROT13');
is(ban "abcd" ana, "nopq", 'Lowercase ROT13');
is(ban "1234" ana, "1234", 'Numbers unaffected');
is(ban "a! b! c!" ana, "n! o! p!", 'Whitespace and symbols intermingled');
my $str = "efgh";
is(ban $str ana, "rstu", 'Lexical variable');
is(ban $str . "IJK" ana, "rstuVWX", 'Concat expression');
is("(" . ban "LMNO" ana . ")", "(YZAB)", 'Outer concat');
$ ./perl t/harness t/op/banana.t
op/banana.t .. ok
All tests successful.
Files=1, Tests=4, 1 wallclock secs ( 0.02 usr 0.00 sys + 0.03 cusr 0.00 csys = 0.05 CPU)
Result: PASS
</pre>
<p>Here we have used the <tt>is()</tt> testing function to test that various strings that we got the <tt>ban ... ana</tt> operator to generate are what we expected them to be. We've tested both uppercase and lowercase letters, and that non-letter characters such as numbers, symbols and spaces remain unaffected. In addition we've added some syntax tests as well, to check variables as well as literal string constants, and to demonstrate that the parser works correctly on the precedence of the operator mixed with string concatenation. All appears to be working fine.</p>
<p>Before we commit this one there is one last thing we have to do. Having added a new file to the distribution, one of the porting tests will now be unhappy:</p>
<pre>
leo@shy:~/src/bleadperl/perl [git]
$ git add t/op/banana.t
leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
porting/manifest.t ........ 9848/? # Failed test 10502 - git ls-files
gives the same number of files as MANIFEST lists at porting/manifest.t line 101
# got "6304"
# expected "6303"
# Failed test 10504 - Nothing added to the repo that isn't in MANIFEST
at porting/manifest.t line 113
# got "1"
# expected "0"
# Failed test 10505 - Nothing added to the repo that isn't in MANIFEST
at porting/manifest.t line 114
# got "not in MANIFEST: t/op/banana.t"
# expected "not in MANIFEST: "
porting/manifest.t ........ Failed 3/10507 subtests
</pre>
<p>To fix this one we need to manually add an entry in the <tt>MANIFEST</tt> file; unlike as is common practice for CPAN modules, this file is not automatically generated.</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim MANIFEST
leo@shy:~/src/bleadperl/perl [git]
$ git diff MANIFEST
diff --git a/MANIFEST b/MANIFEST
index 71d3b453da..03ecdda3d2 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5779,6 +5779,7 @@ t/op/attrproto.t See if the prototype attribute works
t/op/attrs.t See if attributes on declarations work
t/op/auto.t See if autoincrement et all work
t/op/avhv.t See if pseudo-hashes work
+t/op/banana.t See if the ban ... ana syntax works
t/op/bless.t See if bless works
t/op/blocks.t See if BEGIN and friends work
t/op/bop.t See if bitops work
leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
Result: PASS
</pre>
<p>Of course, in this test file we've added only 7 tests. It is likely that any actual real feature would have a lot more testing around it, to deal with a wider variety of situations and corner-cases. It's often that the really interesting cases only come to light after trying to use it for real and finding odd situations that don't quite work as expected; so after adding a new feature expect to spend a while expanding the test file to cover more things. It's especially useful to add new tests of new situations you find yourself using the feature in, even if they currently work just fine. The presence of such tests helps ensure the feature remains working in that manner.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-8.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-10.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-91033717672299130082021-02-19T13:00:00.200+00:002021-02-22T13:06:40.666+00:00Writing a Perl Core Feature - part 8: Interpreter internals<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-7.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-9.html">Next ></a></p>
<p>At this point we are most of the way to adding a new feature to the Perl interpreter. In <a href="/2021/02/writing-perl-core-feature-part-4.html">part 4</a> we created an opcode function to represent the new behaviour, <a href="/2021/02/writing-perl-core-feature-part-5.html">part 5</a> and <a href="/2021/02/writing-perl-core-feature-part-6.html">part 6</a> added compiler support to recognise the syntax used to represent it, and in <a href="/2021/02/writing-perl-core-feature-part-7.html">part 7</a> we made a helper function to provide the required behaviour. It's now time to tie them all together.</p>
<p>When we looked at opcodes and optrees back in part 4, I mentioned that each node of the optree performs a little part of the execution of a function, with child nodes usually obtaining some piece of data somewhere that gets passed up to parent nodes to operate on. I skipped over exactly how that all works, so for this part lets look at that in more detail.</p>
<p>The data model used by the perl interpreter for runtime execution of code is based around being a stack machine. Most opcodes that operate in some way on regular perl data values do so by interacting with the data stack (often simply called "the stack"; though this is sometimes ambiguous as there are in fact several stacks within the perl interpreter). As the interpreter walks along an optree invoking the function associated with each opcode, these various functions either push values onto the stack, or pop values already there back off it again, in order to use them.</p>
<p>For example, in part 4 we saw how the line of code <tt>my $x = 5;</tt> might get represented by an optree of three nodes - an <tt>OP_SASSIGN</tt> with two child nodes <tt>OP_CONST</tt> and <tt>OP_PADSV</tt>.</p>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgudVv5adJvN7zn9dnp36ZuIb5OVKzgkmYl2iLUq0zop9PCaTLQKOulq28eQpIqQtTlYaXEQ0HO1MOiYNusnX-qnSnxX1lbOn_c2y9eC7i7AyXAIO_AF-mu5y8H09V-rcxnDybdCiZxvEi2/s506/corefeature-p4-f1.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="320" data-original-height="253" data-original-width="506" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgudVv5adJvN7zn9dnp36ZuIb5OVKzgkmYl2iLUq0zop9PCaTLQKOulq28eQpIqQtTlYaXEQ0HO1MOiYNusnX-qnSnxX1lbOn_c2y9eC7i7AyXAIO_AF-mu5y8H09V-rcxnDybdCiZxvEi2/s320/corefeature-p4-f1.png"/></a></div>
<p>When this statement is executed the optree nodes are visited in postfix order, with the two child BASEOPs running first in order to push some values to the stack, followed by the assignment BINOP afterwards, which takes those values back off the stack and performs the appropriate assignment.</p>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsXfLUvehp0zoyDny2mtS0jwOHsaK49CNjiJhCO7yithcQ0pT9MvVMypxlV-WEVc26kYKhQeZcfuFgjzUvANaT0nmmQI4lRMEAlJ74f1ChQacecuy39YzQh1jqccXPkF1mwwuf3sJSI-Lf/s1000/corefeature-p8-f1.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="400" data-original-height="882" data-original-width="1000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsXfLUvehp0zoyDny2mtS0jwOHsaK49CNjiJhCO7yithcQ0pT9MvVMypxlV-WEVc26kYKhQeZcfuFgjzUvANaT0nmmQI4lRMEAlJ74f1ChQacecuy39YzQh1jqccXPkF1mwwuf3sJSI-Lf/s400/corefeature-p8-f1.png"/></a></div>
<p>Lets now take a closer look at the code inside one of the actual functions which implements this. For example, <tt>pp_const</tt>, the function for <tt>OP_CONST</tt> consists of three short lines:</p>
<pre>PP(pp_const)
{
dSP;
XPUSHs(cSVOP_sv);
RETURN;
}
</pre>
<p>Of these three lines, all four symbols are in fact macros:</p>
<ol>
<li><tt>dSP</tt> declares some local variables for tracking state, used by later macros</li>
<li><tt>cSVOP_sv</tt> fetches the actual SV pointer out of the SVOP itself. This will be the one holding the constant's value</li>
<li><tt>XPUSHs</tt> extends the (data) stack if necessary, then pushes it there</li>
<li><tt>RETURN</tt> resynchronises the interpreter state from the local variables, and arranges for the opcode function to return the next opcode, for the toplevel instruction loop</li>
</ol>
<p>The <tt>pp_padsv</tt> function is somewhat more complex, but the essential parts of it are quite similar; the following example is heavily paraphrased:</p>
<pre>
PP(pp_padsv)
{
SV ** const padentry = &(PAD_SVl(op->op_targ));
XPUSHs(*padentry);
RETURN;
}
</pre>
<p>This time, rather than the <tt>cSVOP_sv</tt> which takes the SV out of the op itself, we use <tt>PAD_SVl</tt> which looks up the SV in the currently-active pad, by using the target index which is stored in the op.</p>
<p>When the <tt>isa</tt> feature was added, its main <tt>pp_isa</tt> opcode function was actually quite small: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-a70d20047abcfcd3faacb266bd4822a5816fb7fea1754e2fcc14fc18cc13c426">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/pp.c
+++ b/pp.c
@@ -7143,6 +7143,18 @@ PP(pp_argcheck)
return NORMAL;
}
+PP(pp_isa)
+{
+ dSP;
+ SV *left, *right;
+
+ right = POPs;
+ left = TOPs;
+
+ SETs(boolSV(sv_isa_sv(left, right)));
+ RETURN;
+}
+
</pre>
<p>Since <tt>OP_ISA</tt> is a BINOP it is expecting to find two arguments on the stack; traditionally these are called <tt>left</tt> and <tt>right</tt>. This opcode function simply takes those two values and calls the <tt>sv_isa_sv()</tt> function, which returns a boolean truth value. The <tt>boolSV</tt> helper function returns an SV pointer to represent this boolean value, which is then used as the result of the opcode itself.</p>
<p>As a small performance optimsation, this function decides to only <tt>POP</tt> one argument, before changing the top-of-stack value to its result using <tt>SETs</tt>. This is equivalent to <tt>POP</tt>ing two of them and <tt>PUSH</tt>ing its result, except that it doesn't have to alter the stack pointer as many times.</p>
<p>For more of a look at how the stack works, you could also take a look at another post from my series on Parser Plugins: <a href="/2019/09/perl-parser-plugins-3a-stack.html">Part 3a - The Stack</a>.</p>
<p>Lets now take a look at implementing our <tt>banana</tt> feature for real. Recall in part 4 we added the <tt>pp_banana</tt> function with some placeholder content that just died with a panic message if invoked. We'll now replace that with a real implementation:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim pp.c
leo@shy:~/src/bleadperl/perl [git]
$ git diff pp.c
diff --git a/pp.c b/pp.c
index 93141454e1..bced3d23ea 100644
--- a/pp.c
+++ b/pp.c
@@ -7203,7 +7203,15 @@ PP(pp_cmpchain_dup)
PP(pp_banana)
{
- DIE(aTHX_ "panic: we have no bananas");
+ dSP;
+ const char *s;
+ STRLEN len;
+ SV *arg = POPs;
+
+ s = SvPV(arg, len);
+
+ PUSHs(newSVpvn_rot13(s, len));
+ RETURN;
}
/*
</pre>
<p>Now lets rebuild perl and try it out:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...
leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -E 'use experimental "banana"; say ban "Hello, world!" ana;'
Uryyb, jbeyq!
</pre>
<p>Well it certainly looks plausible - we've got back a different string of the same length, with different letters but in the same capitalisation and identical non-letter characters. Lets compare with something like <tt>tr</tt> to see if it's correct:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ echo "Uryyb, jbeyq!" | tr "A-Za-z" "N-ZA-Mn-za-m"
Hello, world!
</pre>
<p>Seems good. But it turns out we've still missed something. This function has a memory leak. We can demonstrate it by writing a small example that calls <tt>ban ... ana</tt> a large number of times (say, a thousand), and printing the total count of SVs on the heap before and after. There's a handy function in perl's unit test suited called <tt>XS::APItest::sv_count</tt> we can use here:</p>
<pre>leo@shy (1 job):~/src/bleadperl/perl [git]
$ ./perl -Ilib -I. -MXS::APItest=sv_count -E \
'use experimental "banana";
say sv_count();
ban "Hello, world!" ana for 1..1000;
say sv_count();'
5321
6321
</pre>
<p>Oh dear. The SV count is a thousand higher afterwards than before, suggesting we leaked an SV on every call.</p>
<p>It turns out this is because of an optimisation that the interpreter uses, where SV pointers on Perl data stack don't actually contribute to reference counting. When values get <tt>POP</tt>'ed from the stack we don't have to decrement their refcount; when values get pushed we don't increment it. This saves an amount of runtime performance to not have to be adjusting those counts all the time. The consequence here is that we have to be a bit more careful when returning newly-constructed values. We must mark the value as <i>mortal</i>, which means we are saying that its reference count is somehow artificially high (because of that pointer on the stack), and perl should decrement the reference count at some point soon, when it next discards temporary values.</p>
<p>Because this sort of thing is done a lot, there is a handy macro called <tt>mPUSHs</tt>, which mortalizes an SV when it pushes it to the data stack. We can call that instead:</p>
<pre>$ git diff pp.c
...
+ mPUSHs(newSVpvn_rot13(s, len));
+ RETURN;
}
/*
</pre>
<p>Now when we try our leak test we find the same SV count before and after, meaning no leak has occurred:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -I. -MXS::APItest=sv_count -E ...
5321
5321
</pre>
<p>We may be onto a winner here.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-7.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-9.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-63053240849927643962021-02-17T13:00:00.118+00:002021-02-19T13:07:34.716+00:00Writing a Perl Core Feature - part 7: Support functions<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-6.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-8.html">Next ></a></p>
<p>So far in this series we've seen several modifications and small additions, to add the required bits and pieces for our new feature to various parts of the perl interpreter. Often when adding anything but the very smallest and simplest of features or changes, it becomes necessary not just to modify existing things, but to add some new support functions as well.</p>
<p>For example, adding the <tt>isa</tt> feature required adding a new function to actually implement the bulk of the operation, which is then called from the <tt>pp_isa</tt> opcode function. This helper function was added into <tt>universal.c</tt> in this commit: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-d97aaa5c1f94f2b10c80fa3c7cd87a7ae6de41abcbad6ed66a9d318ebe827f68">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/universal.c
+++ b/universal.c
@@ -187,6 +187,74 @@ Perl_sv_derived_from_pvn(pTHX_ SV *sv, const char *const name, const STRLEN len,
return sv_derived_from_svpvn(sv, NULL, name, len, flags);
}
+/*
+=for apidoc sv_isa_sv
+
+Returns a boolean indicating whether the SV is an object reference and is
+derived from the specified class, respecting any C<isa()> method overloading
+it may have. Returns false if C<sv> is not a reference to an object, or is
+not derived from the specified class.
...
+
+=cut
+
+*/
+
+bool
+Perl_sv_isa_sv(pTHX_ SV *sv, SV *namesv)
+{
+ GV *isagv;
+
+ PERL_ARGS_ASSERT_SV_ISA_SV;
+
+ if(!SvROK(sv) || !SvOBJECT(SvRV(sv)))
+ return FALSE;
+
...
+ return sv_derived_from_sv(sv, namesv, 0);
+}
+
/*
=for apidoc sv_does_sv
</pre>
<p>Like all good helper functions, this one is named beginning with a <tt>Perl_</tt> prefix and takes as its first parameter the <tt>pTHX_</tt> macro. To make the function properly visible to other code within the interpreter, an entry needed adding to the <tt>embed.fnc</tt> file which lists all of the functions. <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-6ca428db36ed0a28d7097280a98256b8600e43a806c8e210f82f7f92fd1fabd7">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/embed.fnc
+++ b/embed.fnc
@@ -1777,6 +1777,7 @@ ApdR |bool |sv_derived_from_sv|NN SV* sv|NN SV *namesv|U32 flags
ApdR |bool |sv_derived_from_pv|NN SV* sv|NN const char *const name|U32 flags
ApdR |bool |sv_derived_from_pvn|NN SV* sv|NN const char *const name \
|const STRLEN len|U32 flags
+ApdRx |bool |sv_isa_sv |NN SV* sv|NN SV* namesv
ApdR |bool |sv_does |NN SV* sv|NN const char *const name
ApdR |bool |sv_does_sv |NN SV* sv|NN SV* namesv|U32 flags
ApdR |bool |sv_does_pv |NN SV* sv|NN const char *const name|U32 flags
</pre>
<p>This file stores pipe-separated columns, containing:</p>
<ul>
<li>A set of flags - in this case marking an API function (<tt>A</tt>), having the <tt>Perl_</tt> prefix (<tt>p</tt>), with documentation (<tt>d</tt>), whose return value must not be ignored (<tt>R</tt>) and is currently experimental (<tt>x</tt>)</li>
<li>The return type</li>
<li>The name</li>
<li>Argument types in all remaining columns; where <tt>NN</tt> prefixes an argument which must not be passed as <tt>NULL</tt>
</ul>
<p>For our new <tt>banana</tt> feature lets now think of some semantics. Perhaps, given the example code we saw yesterday, it should return a new string built from its argument. For arbitrary reasons of having something interesting yet unlikely in practice, lets make it return a ROT13 transformed version.</p>
<p>Lets now add a helper function to do this - something to construct a new string SV containing the ROT13'ed transformation of the given input. We'll begin by picking a new name for this new function, and adding a definition line into the <tt>embed.fnc</tt> list, and running the <tt>regen/embed.pl</tt> regeneration script:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim embed.fnc
leo@shy:~/src/bleadperl/perl [git]
$ git diff embed.fnc
diff --git a/embed.fnc b/embed.fnc
index eb7b47601a..74946566e7 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1488,6 +1488,7 @@ ApdR |SV* |newSVuv |const UV u
ApdR |SV* |newSVnv |const NV n
ApdR |SV* |newSVpv |NULLOK const char *const s|const STRLEN len
ApdR |SV* |newSVpvn |NULLOK const char *const buffer|const STRLEN len
+ApdR |SV* |newSVpvn_rot13 |NN const char *const s|const STRLEN len
ApdR |SV* |newSVpvn_flags |NULLOK const char *const s|const STRLEN len|const U32 flags
ApdR |SV* |newSVhek |NULLOK const HEK *const hek
ApdR |SV* |newSVpvn_share |NULLOK const char* s|I32 len|U32 hash
leo@shy:~/src/bleadperl/perl [git]
$ perl regen/embed.pl
Changed: proto.h embed.h
</pre>
<p>Take a look now at the changes it's made.</p>
<ul>
<li>A new macro in <tt>embed.h</tt> which calls the full <tt>Perl_</tt>-prefixed function name from its shorter alias. The macro makes sure to pass in the <tt>aTHX_</tt> parameter, meaning we don't have to remember that all the time</li>
<li>A prototype and an arguments assertion macro for the function in <tt>proto.h</tt></li>
</ul>
<p>To actually implement this function we should pick a file to put it in. Since it's creating a new SV, the file <tt>sv.c</tt> seems reasonable. For neatness we'll put it right next to the other <tt>newSVpv*</tt> functions, in the same order as the list in <tt>embed.fnc</tt>:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim sv.c
leo@shy:~/src/bleadperl/perl [git]
$ git diff sv.c
diff --git a/sv.c b/sv.c
index e54d0a078f..156e64e879 100644
--- a/sv.c
+++ b/sv.c
@@ -9397,6 +9397,43 @@ Perl_newSVpvn(pTHX_ const char *const buffer, const STRLEN len)
return sv;
}
+/*
+=for apidoc newSVpvn_rot13
+
+Creates a new SV and copies a string into it by transforming letters by the
+ROT13 algorithm, and copying other bytes literally. The string may contain
+C<NUL> characters and other binary data. The reference count for the new SV
+is set to 1.
+
+=cut
+*/
+
+SV *
+Perl_newSVpvn_rot13(pTHX_ const char *const s, const STRLEN len)
+{
+ char *dp;
+ const char *sp = s, *send = s + len;
+ SV *sv = newSV(len);
+
+ dp = SvPVX(sv);
+ while(sp < send) {
+ char c = *sp;
+ if(isLOWER(c))
+ *dp = 'a' + (c - 'a' + 13) % 26;
+ else if(isUPPER(c))
+ *dp = 'A' + (c - 'A' + 13) % 26;
+ else
+ *dp = c;
+
+ sp++; dp++;
+ }
+
+ *dp = '\0';
+ SvPOK_on(sv);
+ SvCUR_set(sv, len);
+ return sv;
+}
+
/*
=for apidoc newSVhek
</pre>
<p>I don't want to spend a large amount of time or space in this post to explain the whole function, but as a brief summary,</p>
<ol>
<li><tt>newSV()</tt> creates a new SV with a string buffer big enough to store the content (it internally adds 1 more to accomodate the terminating NUL)</li>
<li>The pointers <tt>sp</tt> and <tt>dp</tt> are initialised to point into the source and destination string buffers</li>
<li>Characters are copied one at a time; performing the ROT13 algorithm on lower or uppercase letters and passing anything else transparently</li>
<li>The terminating NUL is appended</li>
<li>The current string size and stringiness flag are set on the new SV, which is then returned</li>
</ol>
<p>If we run the porting tests again now, we'll find one gets upset:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
porting/args_assert.t ..... 1/? # Failed test 2 - PERL_ARGS_ASSERT_NEWSVPVN_ROT13 is
declared but not used at porting/args_assert.t line 64
</pre>
<p>This test is unhappy because it didn't find any code that actually called the argument-asserting macro which the regeneration script added to <tt>proto.h</tt>. This is the macro that asserts on the types of arguments to the function. We can fix that by remembering to use it in the function's definition:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim sv.c
leo@shy:~/src/bleadperl/perl [git]
$ git diff sv.c
diff --git a/sv.c b/sv.c
index e54d0a078f..d63c8a7bbb 100644
--- a/sv.c
+++ b/sv.c
...
+SV *
+Perl_newSVpvn_rot13(pTHX_ const char *const s, const STRLEN len)
+{
+ char *dp;
+ const char *sp = s, *send = s + len;
+ SV *sv;
+
+ PERL_ARGS_ASSERT_NEWSVPVN_ROT13;
+
+ sv = newSV(len);
+
+ dp = SvPVX(sv);
...
leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
Result: PASS</pre>
<p>As core functions go this one is actually pretty terrible. It presumes ASCII (and doesn't work properly on EBCDIC platforms), and requires careful handling in the caller to set the UTF8 flag if required. But overall it's at least good enough for demonstration purposes for our feature. In the next part we'll hook this function up with the opcode implementation and finally see our new feature in action.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-6.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-8.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-51399714247777168792021-02-15T13:00:00.211+00:002021-05-04T17:32:06.358+01:00Writing a Perl Core Feature - part 6: Parser<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-5.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-7.html">Next ></a></p>
<p>In the previous part I introduced the concepts of the lexer and the parser, and the way they combine together to form part of the compiler which actually translates the incoming program source code into the in-memory optree where it can be executed. We took a look at some parser changes, and the way that the <tt>isa</tt> operator was able to work with that alone without needing a corresponding change in the parser, but also noted that most non-trivial syntax additions will require concurrent changes to both the parser and the lexer to cope with it.</p>
<p>In particular, although it is the lexer that creates and emits tokens into the parser, it is the parser which maintains the list of what token types it expects. It is there where new token types have to be added.</p>
<p>The <tt>isa</tt> operator did not need to make any changes in the parser, so for today's article we'll look instead at the recently-added <tt>try/catch</tt> syntax, which did. That was first added in <a href="https://github.com/Perl/perl5/commit/a1325b902d57aa7a99bed3d2ec0fa5ce42836207">this commit</a>, though subsequent modifications have been made to it. Go take a look now - perhaps you will find parts of it recognisable, similar to the changes we've already seen with <tt>isa</tt> and made for our new <tt>banana</tt> feature we have been building up.</p>
<p>Similar to the situation with features, warnings, and opcodes, the parser is maintained primarily by changes to one source file which is then run through a regeneration script to update several other files that are generated from it. The source of truth in this case is the file <tt>perly.y</tt>, and the regeneration script for it is <tt>regen_perly.pl</tt> (neither of which live in the <tt>regen</tt> directory for reasons lost to the mists of time).</p>
<p>The part of the <tt>try/catch</tt> commit which updated the parser generation file had two parts to it: <a href="https://github.com/Perl/perl5/commit/a1325b902d57aa7a99bed3d2ec0fa5ce42836207#diff-0f6ed78c83e75dc7b07a379943afb2b8b62906eda1ec5f44a6b4b854b8d7e2b3">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/perly.y
+++ b/perly.y
@@ -69,6 +69,7 @@
%token <ival> FORMAT SUB SIGSUB ANONSUB ANON_SIGSUB PACKAGE USE
%token <ival> WHILE UNTIL IF UNLESS ELSE ELSIF CONTINUE FOR
%token <ival> GIVEN WHEN DEFAULT
+%token <ival> TRY CATCH
%token <ival> LOOPEX DOTDOT YADAYADA
%token <ival> FUNC0 FUNC1 FUNC UNIOP LSTOP
%token <ival> MULOP ADDOP
@@ -459,6 +460,31 @@ barestmt: PLUGSTMT
newFOROP(0, NULL, $mexpr, $mblock, $cont));
parser->copline = (line_t)$FOR;
}
+ | TRY mblock[try] CATCH PERLY_PAREN_OPEN
+ { parser->in_my = 1; }
+ remember scalar
+ { parser->in_my = 0; intro_my(); }
+ PERLY_PAREN_CLOSE mblock[catch]
+ {
...
+ }
| block cont
{
/* a block is a loop that happens once */
</pre>
<p>Of these two parts, the first is the bit that defines two new token types. These are types we can use in the lexer - recall from the previous part we saw the lexer emit these tokens as <tt>PREBLOCK(TRY)</tt> and <tt>PREBLOCK(CATCH)</tt>.</p>
<p>The second part of this change gives the actual parsing rules which the parser uses to recognise the new syntax. This appears in the form of a new alternative to the set of possible rules that the parser may use to create a <tt>barestmt</tt> (each alternative is separated by <tt>|</tt> characters). The rules on how to recognise this one are made from a mix of basic tokens (those in capitals) and other grammar rules (those in lower case). The four basic tokens here are the keyword <tt>try</tt>, an open and close parenthesis pair (named represented by tokens called <tt>PERLY_PAREN_OPEN</tt> and <tt>PERLY_PAREN_CLOSE</tt>) and the keyword <tt>catch</tt>.
<p>In effect we can imagine if the rule were expressed instead using literal strings:</p>
<pre>barestmt =
...
| "try" mblock "catch" "(" scalar ")" mblock</pre>
<p>The other grammar rules that are referred to by this one define the basic shape of a block of code (the one called <tt>mblock</tt>), and a single scalar variable (the one called <tt>scalar</tt>). The other parts that I omitted in this simplified version (<tt>remember</tt> and the two action blocks relating to <tt>parser->in_my</tt>) are involved with ensuring that the catch variable part of the syntax is recognised as creating a new variable. It pretends that there had been a <tt>my</tt> keyword just before the variable name, so the name introduces a new variable.</p>
<p>Don't worry too much about the contents of the main action block for this <tt>try/catch</tt> syntax rule. That's all specific to how to build up the optree for this particular syntax, and in any case was changed in a later commit to move most of it out to a helper function. We'll come back in a moment to see what we can put there for our new syntax.</p>
<p>Lets now begin adding the tokenizing and parsing rules for our new <tt>banana</tt> feature. Recall from <a href="/2021/02/writing-perl-core-feature-part-5.html">part 5</a> we decided we'd add two new token types to represent the two basic keywords. We can do that by adding them to the collection of tokens at the top of the <tt>perly.y</tt> file and running the regeneration script:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim perly.y
leo@shy:~/src/bleadperl/perl [git]
$ git diff perly.y
diff --git a/perly.y b/perly.y
index 184fb0c158..7bbb64f202 100644
--- a/perly.y
+++ b/perly.y
@@ -77,6 +77,7 @@
%token <ival> LOCAL MY REQUIRE
%token <ival> COLONATTR FORMLBRACK FORMRBRACK
%token <ival> SUBLEXSTART SUBLEXEND
+%token <ival> BAN ANA
%type <ival> grammar remember mremember
%type <ival> startsub startanonsub startformsub
leo@shy:~/src/bleadperl/perl [git]
$ perl regen_perly.pl
Changed: perly.act perly.tab perly.h</pre>
<p>At this point if you want you could take a look at the change the script introduced in <tt>perly.h</tt> - it just adds the two new token types to the main <tt>enum yytokentype</tt>, where the tokizer and the parser can use them. Don't worry about the other two files (<tt>perly.act</tt> and <tt>perly.tab</tt>) - they are long tables of automatically generated output; mostly numbers which help the parser to maintain its internal state. The change there won't be particularly meaningful to look at.</p>
<p>As these new token types now exist in <tt>perly.h</tt> we can use them to update <tt>toke.c</tt> to recognise them:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim toke.c
leo@shy:~/src/bleadperl/perl [git]
$ git diff toke.c
diff --git a/toke.c b/toke.c
index 628a79fb43..9f86e110ce 100644
--- a/toke.c
+++ b/toke.c
@@ -7686,6 +7686,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
case KEY_accept:
LOP(OP_ACCEPT,XTERM);
+ case KEY_ana:
+ Perl_ck_warner_d(aTHX_
+ packWARN(WARN_EXPERIMENTAL__BANANA), "banana is experimental");
+ TOKEN(ANA);
+
case KEY_and:
if (!PL_lex_allbrackets && PL_lex_fakeeof >= LEX_FAKEEOF_LOWLOGIC)
return REPORT(0);
@@ -7694,6 +7699,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
case KEY_atan2:
LOP(OP_ATAN2,XTERM);
+ case KEY_ban:
+ Perl_ck_warner_d(aTHX_
+ packWARN(WARN_EXPERIMENTAL__BANANA), "banana is experimental");
+ TOKEN(BAN);
+
case KEY_bind:
LOP(OP_BIND,XTERM);
</pre>
<p>Now we can rebuild perl and test some examples:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -E 'use feature "banana"; say ban "a string here" ana;'
banana is experimental at -e line 1.
banana is experimental at -e line 1.
syntax error at -e line 1, near "say ban"
Execution of -e aborted due to compilation errors.</pre>
<p>We get our expected warnings about the experimental syntax, and then a syntax error. This is because, while the lexer recognises our keywords, we haven't yet written a parser rule to tell the parser what to do with it. But we can at least tell the lexer recognised the keywords, because if we test without enabling the feature we get a totally different error:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -E 'say ban "a string here" ana;'
Bareword found where operator expected at -e line 1, near ""a string here" ana"
(Missing operator before ana?)
syntax error at -e line 1, near ""a string here" ana"
Execution of -e aborted due to compilation errors.</pre>
<p>Lets now add a grammar rule to let the parser recognise this syntax:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim perly.y
leo@shy:~/src/bleadperl/perl [git]
$ git diff perly.y
...
SUBLEXSTART listexpr optrepl SUBLEXEND
{ $$ = pmruntime($PMFUNC, $listexpr, $optrepl, 1, $<ival>2); }
+ | BAN expr ANA
+ { $$ = newUNOP(OP_BANANA, 0, $expr); }
| BAREWORD
| listop
...
leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl</pre>
<p>With this new definition our new syntax:</p>
<ul>
<li>is recognised as a basic <i>term</i> expression, meaning it can stand in the same parts of syntax as other expressions such as constants or variables</li>
<li>requires an <i>expr</i> expression between the <tt>ban</tt> and <tt>ana</tt> keywords, meaning it will accept any sort of complex expression such as a string concatenation operator or function call</li>
</ul>
<p>After the grammar rule which tells the parser how to recognise the new syntax, we've added a block of code telling it how to implement it. This is translated into some real C code that forms part of the parser, so we can invoke any bits of perl interpreter internals from here. When it gets translated a few special variables are replaced in the code - these are the ones prefixed with <tt>$</tt> symbols. The <tt>$$</tt> variable is where the parser is expecting to find the output of this particular grammar rule; it's where we put the optree we construct to represent it. For arguments into that we can use the other variable, named after the sub-rule used to parse it - <tt>$expr</tt>. That will contain the output of parsing that part of the syntax - again an optree.</p>
<p>In this action block it is now a simple matter of generating an optree for the <tt>OP_BANANA</tt> opcode we added in <a href="/2021/02/writing-perl-core-feature-part-4.html">part 4</a>. Recall that was an op of type <tt>UNOP</tt>, so we use the <tt>newUNOP()</tt> function to do this, taking as its child subtree the expression between the two keywords which we got in <tt>$expr</tt>. We just put that result into the <tt>$$</tt> result variable, and we're done.</li>
<p>Now we can try using it:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -E 'use feature "banana"; say ban "a string here" ana;'
banana is experimental at -e line 1.
banana is experimental at -e line 1.
panic: we have no bananas at -e line 1.</pre>
<p>Hurrah! We get the panic message we added as a placeholder when we created the <tt>Perl_pp_banana</tt> function back in part 4. The pieces are now starting to come together - in the next part we'll start implementing the actual behaviour behind this syntax.</p>
<p>Lets not forget to add the new "experimental" warnings to <tt>pod/perldiag.pod</tt> in order to keep the porting test happy:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim pod/perldiag.pod
$ git diff pod/perldiag.pod
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 98d159dc21..66b0a4aa40 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -519,6 +519,11 @@ wasn't a symbol table entry.
(P) An internal request asked to add a scalar entry to something that
wasn't a symbol table entry.
+=item banana is experimental
+
+(S experimental::banana) This warning is emitted if you use the banana
+syntax (C<ban> ... C<ana>). This syntax is currently experimental.
+
=item Bareword found in conditional
</pre>
<p>For now there's one last thing we can look at. Even though we don't have an implementation behind the syntax, we can at least compile it into an optree. We can inspect the generated optree by using the <tt>-MO=Concise</tt> compiler backend:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -MO=Concise -E 'use feature "banana"; say ban "a string here" ana;'
banana is experimental at -e line 1.
banana is experimental at -e line 1.
7 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 3 -e:1) v:%,us,{,fea=15 ->3
6 <@> say vK ->7
3 <0> pushmark s ->4
5 <1> banana sK/1 ->6
4 <$> const(PV "a string here") s ->5
-e syntax OK</pre>
<p>I won't go into the full details here - for that you can read the documentation at <a href="https://metacpan.org/pod/B::Concise">B::Concise</a>. For now I'll just remark that we can see the <tt>banana</tt> op here, as an UNOP (the <tt>1</tt> flag before it), sitting in the optree as a child node of <tt>say</tt>, with the string constant as its own child op. When working with optree parsing, the <tt>B::Concise</tt> module is a handy debugging tool you can use to inspect the generated optree and ensure it has the shape you expected.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-5.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-7.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-24884920157499218962021-02-12T13:00:00.255+00:002021-02-15T14:26:43.270+00:00Writing a Perl Core Feature - part 5: Lexer<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-4.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-6.html">Next ></a></p>
<p>Now we have a controllable <tt>feature</tt> flag that conditionally recognises our new keywords, and we have a new opcode that we can use to implement some behaviour for it, we can begin to tie them together. The previous post mentioned that the Perl interpreter converts source code of a program into an optree, stored in memory. This is done by a collection of code loosely described as the <i>compiler</i>. Exactly what the compiler will do with these new keywords depends on its two main parts - the <i>lexer</i>, and the <i>parser</i>.</p>
<p>If you're unfamiliar with these general concepts of compiler technology, allow me a brief explanation. A lexer takes the source code, in the form of a stream of characters, and begins analysing it by grouping those characters up into the basic elements of the syntax, called <i>tokens</i> (sometimes called <i>lexemes</i>). This sequence of tokens is then passed into the parser, whose job is to build up the syntax tree representing the program from those analysed tokens. (The lexer is sometimes also called a <i>tokenizer</i>; the two words are interchangable).</p>
<p>Tokens may be as small as a single character (for example a <tt>+</tt> or <tt>-</tt> operator), or could be an entire string or numerical constant. It is the job of the lexer to skip over things like comments and ignorable whitespace. Typically in compilers, tokens are usually represented by some sort of type system, where each kind of token has a specific type, often with associated values. For example, any numerical constant in the source code would be represented by a token giving a "NUMBER" type, whose associated value was the specific number. In this manner the parser can then consider the types of tokens it has received (for example it may have recently received a number, a <tt>+</tt> operator, and another number), and emit some form of syntax tree to represent the numerical addition of these two numbers.</p>
<p>For example for a simple expression language we might find it gets first tokenized into a stream of tokens. Any sequence of digits becomes a <tt>NUMBER</tt> token with its associated numerical value, and operators become their own token types representing the symbol itself:</p>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjB2bH8lkbAaDQNVe7KhFdv3S0vLN92g0yMwyxeSW8BcYC7xep1qTPMFRYLtg73OCqU739QGTyySbvEJUZivLOciPnbuQMBBhW7Q9MjHZgVd9Fz-ORG161jDdG_lLD6XcUOFLG5zpPd6Hom/s800/corefeature-p5-f1.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="400" data-original-height="263" data-original-width="800" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjB2bH8lkbAaDQNVe7KhFdv3S0vLN92g0yMwyxeSW8BcYC7xep1qTPMFRYLtg73OCqU739QGTyySbvEJUZivLOciPnbuQMBBhW7Q9MjHZgVd9Fz-ORG161jDdG_lLD6XcUOFLG5zpPd6Hom/s400/corefeature-p5-f1.png"/></a></div>
<p>It then gets parsed by recursively applying an ordered list of rules (to implement operator precedence) to form some sort of syntax tree. We're looking ultimately for an <i>expr</i> (short for "expression"). At high priority, a sequence of <i>expr-STAR-expr</i> can be considered as an <i>expr</i> (by combining the two numbers by a <tt>MULTIPLY</tt> operation). At lesser priority, a sequence <i>expr-PLUS-expr</i> can be considered as such (by using <tt>ADD</tt>). Finally, a <i>NUMBER</i> token can stand alone as an <i>expr</i>.</p>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguR-B295lGvkS1Q0pYooFUxUfRZ5mBdArxgk2AbzedKO5jX6IHulHkTqt56EI-Wh-TEI98kcLYEYI7zs7sBVy5mcbLzlApH3N3x_jId-yG0WUjyR7A1UI8BEz55rkrw_A97T39osLvXpNm/s801/corefeature-p5-f2.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="400" data-original-height="629" data-original-width="801" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguR-B295lGvkS1Q0pYooFUxUfRZ5mBdArxgk2AbzedKO5jX6IHulHkTqt56EI-Wh-TEI98kcLYEYI7zs7sBVy5mcbLzlApH3N3x_jId-yG0WUjyR7A1UI8BEz55rkrw_A97T39osLvXpNm/s400/corefeature-p5-f2.png"/></a></div>
<p>Specifically in Perl's case, the lexer is rather more complex than most typical languages. It has a number of features which may surprise you if you are familiar with the overall concept of token-based parsing. Whereas some much simpler languages can be tokenized with a statically-defined set of rules, Perl's lexer is much more stateful and dynamically controlled. The recent history of tokens it has already seen can change its interpretation of things to come. The parser can influence what the lexer will expect to see next. Additionally, existing code that has already been seen and parsed will also affect its decisions.</p>
<p>To give a few examples here, consider the way that braces are used both to delimit blocks of code, and anonymous hash references. The lexer resolves which case is which by examining what "expect" state it is in - whether it should be expecting an expression term, or a statement. Consider also the way that the names of existing functions already in scope (and what prototypes, if any, they may have) influences the way that calls to those functions are parsed. This is, in part, performed by the lexer.</p>
<pre>my $hashref = { one => 1, two => 2 };
# These braces are a hashref constructor
if($cond) { say "Cond is true"; }
# These braces are a code block</pre>
<pre>sub listy_func { ... }
sub unary_func($) { ... }
say listy_func 1, 2, 3;
# parsed as say(listy_func(1, 2, 3));
say unary_func 4, 5, 6;
# parsed as say(unary_func(4), 5, 6);</pre>
<p>Due to its central role in parsing the source code of a program, it is important that the lexer knows about every keyword and combination of symbols used in its syntax. Not all new features and keywords would need to consider the parser, so for now we'll leave that for the next post in this series and concentrate on the lexer.</p>
<p>The lexer is contained in the file <tt>toke.c</tt>. When the <tt>isa</tt> feature was added the change here was rather small: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-eab4115a843bf91f2d20457b59e6b0c6c747ffa64ba1022d3ff958512da8df12">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/toke.c
+++ b/toke.c
@@ -7800,6 +7800,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
case KEY_ioctl:
LOP(OP_IOCTL,XTERM);
+ case KEY_isa:
+ Perl_ck_warner_d(aTHX_
+ packWARN(WARN_EXPERIMENTAL__ISA), "isa is experimental");
+ Rop(OP_ISA);
+
case KEY_join:
LOP(OP_JOIN,XTERM);
</pre>
<p>Here we have extended the main function that recognises barewords vs keywords; the function <tt>yyl_word_or_keyword</tt>. This function is based, in part, on the function in <tt>keywords.c</tt> that we saw modified back in <a href="/2021/02/writing-perl-core-feature-part-3.html">part 3</a>. (Remember; that added the new keywords, to be conditionally recognised depending on whether our feature is enabled). If the keyword was recognised as the <tt>isa</tt> keyword (meaning the feature had been enabled), then the lexer will recognise it as a token in the category of "relational operator", called <tt>Rop</tt>. We additionally report the value of the opcode to implement it; the opcode <tt>OP_ISA</tt> which we saw added in <a href="/2021/02/writing-perl-core-feature-part-4.html">part 4</a>. Since the feature is experimental, here is the time at which we emit the "is experimental" warning, using the warning category we saw added in <a href="/2021/02/writing-perl-core-feature-part-2.html">part 2</a>.</p>
<p>Because of this neat convenience, the change adding the <tt>isa</tt> operator didn't need to touch the parser at all. In order for us to have something interesting to talk about when we move on to the parser, lets imagine a slightly weirder grammar shape for our new <tt>banana</tt> feature. We have two keywords to play with, so lets now imagine that they are used in a pair, surrounding some other expression; as in the syntax:</p>
<pre>use feature 'banana';
my $something = ban "Some other stuff goes here" ana;</pre>
<p>Because of this rather weird structure, we won't be able to make use of any of the convenience token types, so we'll instead just emit these as plain <tt>TOKEN</tt>s and let the parser deal with it. This will necessitate some changes to the parser as well, to add some new token values for it to recognise, so we'll do that in the next part too.</p>
<p>Before we leave the topic of the lexer, lets just take a look at another recent Perl core change - the one that first introduces the <tt>try/catch</tt> syntax, via the <tt>try</tt> named feature: <a href="https://github.com/Perl/perl5/commit/a1325b902d57aa7a99bed3d2ec0fa5ce42836207#diff-eab4115a843bf91f2d20457b59e6b0c6c747ffa64ba1022d3ff958512da8df12">(github.com/Perl/perl5)</a>.</p>
<pre>...
@@ -7704,6 +7706,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
case KEY_break:
FUN0(OP_BREAK);
+ case KEY_catch:
+ Perl_ck_warner_d(aTHX_
+ packWARN(WARN_EXPERIMENTAL__TRY), "try/catch is experimental");
+ PREBLOCK(CATCH);
+
case KEY_chop:
UNI(OP_CHOP);
@@ -8435,6 +8442,11 @@ yyl_word_or_keyword(pTHX_ char *s, STRLEN len, I32 key, I32 orig_keyword, struct
case KEY_truncate:
LOP(OP_TRUNCATE,XTERM);
+ case KEY_try:
+ Perl_ck_warner_d(aTHX_
+ packWARN(WARN_EXPERIMENTAL__TRY), "try/catch is experimental");
+ PREBLOCK(TRY);
+
case KEY_uc:
UNI(OP_UC);
</pre>
<p>This was a very similar change - again just two new <tt>case</tt> labels to handle the two newly-added keywords. Each one emits a token of the <tt>PREBLOCK</tt> type. This is a hint to the parser that following the keyword it should expect to find a block of code surrounded by braces (<tt>{ ... }</tt>). In general when adding new syntax, there will likely be some existing token types that can be used for it, because it is likely following a similar shape to things already there.</p>
<p>Each of these changes adds a new warning - a call to <tt>Perl_ck_warner_d</tt>. There's a porting test file that checks to see that every one of these has been mentioned somewhere in <tt>pod/perldiag.pod</tt>. In order to keep that test happy, each commit had to add a new section there too; for example for <tt>isa</tt>: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-2180b3a8e0d94767dca4a874a90bd790c4d9730e5171287b73981b125f7d466b">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3262,6 +3262,12 @@ an anonymous subroutine, or a reference to a subroutine.
(W overload) You tried to overload a constant type the overload package is
unaware of.
+=item isa is experimental
+
+(S experimental::isa) This warning is emitted if you use the (C<isa>)
+operator. This operator is currently experimental and its behaviour may
+change in future releases of Perl.
+
=item -i used with no filenames on the command line, reading from STDIN
(S inplace) The C<-i> option was passed on the command line, indicating
</pre>
<p>In the next part, we'll take a look at the other half of the compiler, the parser. It is there where we'll make our next modifications to add the <tt>banana</tt> feature.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-4.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-6.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-48117956430597078782021-02-10T13:00:00.012+00:002021-02-18T22:51:38.843+00:00Writing a Perl Core Feature - part 4: Opcodes<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-3.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-5.html">Next ></a></p>
<h2>Optrees and Ops</h2>
<p>Before we get into this next part, I want to first explain some details about how the Perl interpreter works. In summary, the source code of a Perl program is translated into a more compiled form when the interpreter starts up and reads the files. This form is stored in memory and is used to implement the behaviour of the functions that make up the program. It is called an <i>Optree</i>.</p>
<p>Or rather more accurately, every individual function in the program is represented by an Optree. This is a tree-shaped data structure, whose individual nodes each represent one basic kind of operation or step in the execution of that function. This could be considered similar to a sort of assembly language representation, except that rather than being stored as a flat list of instructions, the tree-shaped structure of the individual nodes (called "ops") helps determine the behaviour of the program when run.</p>
<p>For example, while there are many kinds of ops that have no child nodes, these are typically used to represent constants in the program, or fetch items from well-defined locations elsewhere in the interpreter - such as lexical or package variables. Most other kinds of op take one or more subtrees as child nodes and form the tree structure, where they will operate on the data those child nodes previously fetched - such as adding numbers together, or assigning values into variable locations. To execute the optree the interpreter visits each node in postfix order; recursively gathering results from child nodes of the tree to pass upwards to their parents.</p>
<p>Each individual type of op determines what sort of tree-shaped structure it will have, and are grouped together by classes. The most basic class of op (variously called either just "op", or sometimes a "baseop") is one that has no child nodes. An op class with a single child op is called an "unop" (for "unary operator"), one with two children is called a "binop" (for "binary operator"), and one with a variable number of children is a "listop". Within these broad categories there are also sub-divisions: for example a basic op which carries a Perl value with it is an "svop".</p>
<p>Specific types of op are identified by names, given by the constants defined in <tt>opnames.h</tt>. For example, a basic op carrying a constant value is an <tt>OP_CONST</tt>, and one representing a lexical variable is an <tt>OP_PADSV</tt> (so named because variables - SVs - are stored in a data structure called a scratchpad, or pad for short). A binop which performs a scalar assignment between its two child ops is <tt>OP_SASSIGN</tt>. Thus, for example, the following Perl statement could be represented by the optree given below it:</p>
<pre>my $x = 5;</pre>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmymqvgGzm4zdeNE-_NY4X59WNM7gJOjOxLjMe5P6jyqrrWX4VWUrjJc0TrtFeWIc-XjQlwk8Ez-dg81msvKwdXASaCWPZhhKGVZSODgLdQpx-MqxcbM6suJyUQVHquui-Uch7enbnjYFT/s506/corefeature-p4-f1.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="400" data-original-height="253" data-original-width="506" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmymqvgGzm4zdeNE-_NY4X59WNM7gJOjOxLjMe5P6jyqrrWX4VWUrjJc0TrtFeWIc-XjQlwk8Ez-dg81msvKwdXASaCWPZhhKGVZSODgLdQpx-MqxcbM6suJyUQVHquui-Uch7enbnjYFT/s400/corefeature-p4-f1.png"/></a></div>
<p>Of course, in such a brief overview as this I have omitted many details, as well as made many simplifications of the actual subject. This should be sufficient to stand as an introduction into the next step of adding a new core Perl feature, but for more information on the subject you could take a look at another blog post of mine, where I talked about optrees from the perspective of writing syntax keyword modules - <a href="/2016/09/perl-parser-plugins-3-optrees.html">Perl Parser Plugins 3 - Optrees</a>.</p>
<p>One final point to note is that in some ways you can think of an optree as being similar to an abstract syntax tree (an AST). This isn't always a great analogy, because some parts of the optree don't bear a very close resemblence to the syntax of the source code that produced it. While there are certain similarities, it is important to remember it is not quite the same. For example, there is no opcode to represent the <tt>if</tt> syntax; the same opcode is used as for the <tt>and</tt> infix shortcircuit operator. It is best to think of the optree as representing the abstract algorithm - the sequence of required operations - that were described by the source code that compiled into it.</p>
<h2>Opcodes in Perl Core</h2>
<p>As with adding features, warnings, and keywords, the first step to adding a new opcode to the Perl core begins with editing a file under <tt>regen/</tt>. The file in this case is <tt>regen/opcodes</tt>, and is not a perl script, but a plain-text file listing the various kinds of op, along with various properties about them. The file begins with a block of comments which explains more of the details.</p>
<p>The choice of how to represent a new Perl feature in terms of the optree that the syntax will generate depends greatly on exactly what the behaviour of the feature should be. Especially when creating a new feature as core syntax (rather than just adding some functions in a module) the syntax and semantic shape often don't easily relate to a simple function-like structure. There aren't any hard-and-fast rules here; the best bet is usually to look around the existing ops and syntax definitions for similar ideas to be inspired by.</p>
<p>For example, when I added the <tt>isa</tt> operator I observed that it should behave as an infix comparison-style operator, similar to perhaps the <tt>eq</tt> or <tt>==</tt> ones. In the <tt>regen/opcodes</tt> file these are defined by the two lines:</p>
<pre>
eq numeric eq (==) ck_cmp Iifs2 S S<
seq string eq ck_null ifs2 S S</pre>
<p>The meanings of these five tab-separated columns are as follows:</p>
<ol>
<li>The source-level name of the op (this is used, capitalised, to form the constants <tt>OP_EQ</tt> and <tt>OP_SEQ</tt>).</li>
<li>A human-readable string description for the op (used in printed warnings).</li>
<li>The name of the op-checker function (more on this later).</li>
<li>Some flags describing the operator itself; notable ones being <tt>s</tt> - produces a scalar result, and <tt>2</tt> - it is a binop.</li>
<li>More flags describing the operands; in this case two scalars. It turns out in practice nothing cares about that column so on later additions it is omitted.</li>
</ol>
<p>The definition for the <tt>isa</tt> operator was added in a similar style: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-9315ea3e7ef8daeb0063fbc04990bb9b2f6dde93e77b6446bb7639e92aa0838e">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/regen/opcodes
+++ b/regen/opcodes
@@ -572,3 +572,5 @@ lvref lvalue ref assignment ck_null d%
lvrefslice lvalue ref assignment ck_null d@
lvavref lvalue array reference ck_null d%
anonconst anonymous constant ck_null ds1
+
+isa derived class test ck_isa s2
</pre>
<p>Lets now consider what we need for our new <tt>banana</tt> feature. Although we've added two new keywords in the previous part, that is just for the source code way to spell this feature. Perhaps the semantics we want can be represented by a single opcode (remembering what we said above - that the optree is more a representation of the underlying semantics of the program, and not merely the surface level syntax of how it is written).</p>
<p>For sake of argument, let us now imagine that whatever new syntax our new <tt>banana</tt> feature requires, its operation (via that one opcode) will behave somewhat like a string transform function (perhaps similar to <tt>uc</tt> or <tt>lc</tt>). As with so many things relating to adding a new feature/keyword/opcode/etc... it is often best to look for something else similar to copy and adjust as appropriate. We'll add a single new opcode to the list by making a copy of one of those and editing it:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/opcodes
leo@shy:~/src/bleadperl/perl [git]
$ git diff
diff --git a/regen/opcodes b/regen/opcodes
index 2a2da77c5c..27114c9659 100644
--- a/regen/opcodes
+++ b/regen/opcodes
@@ -579,3 +579,5 @@ cmpchain_and comparison chaining ck_null |
cmpchain_dup comparand shuffling ck_null 1
catch catch {} block ck_null |
+
+banana banana operation ck_null s1
leo@shy:~/src/bleadperl/perl [git]
$ perl regen/opcode.pl
Changed: opcode.h opnames.h pp_proto.h lib/B/Op_private.pm</pre>
<p>The regeneration script has edited quite a few files this time. Take a look at those now. The notable parts are:</p>
<ul>
<li>A new value named <tt>OP_BANANA</tt> has been added to the list in <tt>opnames.h</tt>.</li>
<li>A new entry has been added to each of several arrays defined in <tt>opcode.h</tt>. These contain the name and description strings, function pointers, and various bitflags. Of specific note is the new entry in <tt>PL_ppaddr[]</tt> which points to a new function named <tt>Perl_pp_banana</tt>.</li>
<li>A new function prototype for <tt>Perl_pp_banana</tt> in <tt>pp_proto.h</tt>.</li>
</ul>
<p>If we were to try building perl now we'd find it won't currently even compile, because the opcode tables are looking for this new <tt>Perl_pp_banana</tt> function but we haven't even written it yet:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...
/usr/bin/ld: globals.o:(.data.rel+0xc88): undefined reference to `Perl_pp_banana'
collect2: error: ld returned 1 exit status
</pre>
<p>We'll have to provide an actual function for this. There are in fact a number of files which potentially could contain this function. <tt>pp_ctl.c</tt> contains the control-flow ops (such as entersub and return), <tt>pp_sys.c</tt> contains the various ops that interact with the OS (such as open and socket), <tt>pp_sort.c</tt> and <tt>pp_pack.c</tt> each contain just those specific ops (for various reasons), and the rest of the "normal" ops are scattered between <tt>pp.c</tt> and <tt>pp_hot.c</tt> - the latter containing a few of the smaller more-frequently invoked ops.</p>
<p>For adding a new feature like this, it's almost certain that we want to be adding it to <tt>pp.c</tt>. For now so that we can at least compile perl again and continue our work lets just add a little stub function that will panic if actually run.</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim pp.c
leo@shy:~/src/bleadperl/perl [git]
$ git diff pp.c
diff --git a/pp.c b/pp.c
index d0e639fa32..bc54a06aa3 100644
--- a/pp.c
+++ b/pp.c
@@ -7207,6 +7207,11 @@ PP(pp_cmpchain_dup)
RETURN;
}
+PP(pp_banana)
+{
+ DIE(aTHX_ "panic: we have no bananas");
+}
+
/*
* ex: set ts=8 sts=4 sw=4 et:
*/
leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl</pre>
<p>Before we conclude this already-long part, there's something we have to tidy up to keep the unit tests happy. There are a few tests which care about the total list of opcodes, and since we've added one more they will now need adjusting.</p>
<pre>porting/utils.t ........... 58/? # Failed test 59 - utils/cpan compiles at porting/utils.t line 85
# got "Untagged opnames: banana\nutils/cpan syntax OK\n"
# expected "utils/cpan syntax OK\n"
# when executing perl with '-c utils/cpan'
porting/utils.t ........... Failed 1/82 subtests
</pre>
<p>It's non-obvious from the error result, but this is actually complaining that the module <tt>Opcode::Opcode</tt> has not categorised this opcode into a category. We can fix that by editing the module file and again doing similar to whatever <tt>uc</tt> and <tt>lc</tt> do. Again as it's a shipped <tt>.pm</tt> file don't forget to update the <tt>$VERSION</tt> declaration:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim ext/Opcode/Opcode.pm
leo@shy:~/src/bleadperl/perl [git]
$ git diff ext/Opcode/Opcode.pm
diff --git a/ext/Opcode/Opcode.pm b/ext/Opcode/Opcode.pm
index f1b2247b07..eaabc43757 100644
--- a/ext/Opcode/Opcode.pm
+++ b/ext/Opcode/Opcode.pm
@@ -6,7 +6,7 @@ use strict;
our($VERSION, @ISA, @EXPORT_OK);
-$VERSION = "1.50";
+$VERSION = "1.51";
use Carp;
use Exporter ();
@@ -336,7 +336,7 @@ invert_opset function.
substr vec stringify study pos length index rindex ord chr
ucfirst lcfirst uc lc fc quotemeta trans transr chop schop
- chomp schomp
+ chomp schomp banana
match split qr
</pre>
<p>At this point, the tests should all run cleanly again. We're now getting perilously close to actually being able to implement something. Maybe we'll get around to that in the next part.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-3.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-5.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com1tag:blogger.com,1999:blog-9112560338291574360.post-80499392738339434062021-02-08T13:00:00.005+00:002021-02-11T21:14:26.422+00:00Writing a Perl Core Feature - part 3: Keywords<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-2.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-4.html">Next ></a></p>
<p>Some Perl features use a syntax entirely made of punctuation symbols; for example Perl 5.10's defined-or operator (<tt>//</tt>), or Perl 5.24's postfix dereference (<tt>->$*</tt>, etc..). Other features are based around new keywords spelled like regular identifiers; such as 5.10's <tt>state</tt> or 5.32's <tt>isa</tt>. It is rare to find examples where newly-added syntax can be done simply on existing operator symbols, so most new features come in the form of new keywords.</p>
<p>As with adding the named feature itself and its associated warning, the first step to adding a keyword begins with editing a regeneration file. The file required this time is called <tt>regen/keywords.pl</tt>.
<p>For example when the <tt>isa</tt> feature was added, it required a new keyword of the same name: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-bf7f7d2cdfd5d0d114f61240290902dee7017550f306f1bb09508527f23393ab">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/regen/keywords.pl
+++ b/regen/keywords.pl
@@ -46,6 +46,7 @@ my %feature_kw = (
evalbytes => 'evalbytes',
__SUB__ => '__SUB__',
fc => 'fc',
+ isa => 'isa',
);
my %pos = map { ($_ => 1) } @{$by_strength{'+'}};
@@ -217,6 +218,7 @@ __END__
-index
-int
-ioctl
+-isa
-join
-keys
-kill</pre>
<p>There are two parts to this change. The later part adds our new keyword to the main list of all the known keywords in the <tt>DATA</tt> section at the end of the script. If it wasn't for the first part of this change, then the new keyword would be recognised unconditionally in all code - almost certainly not what we want as that would cause compatibility issues in existing code. Since we have a lexical named feature for exactly this purpose, we made use of it here by listing the new keyword along with its associated feature into the <tt>%feature_kw</tt> hash so that the keyword is only recognised conditionally based on that feature being enabled.</p>
<p>For our new <tt>banana</tt> feature we need to decide if we're going to add some keywords, and if so what they will be called. Lets add two to make a more interesting example, called <tt>ban</tt> and <tt>ana</tt>. As before we'll start by editing the regeneration script and running it to have it rebuild some files.</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/keywords.pl
leo@shy:~/src/bleadperl/perl [git]
$ git diff
diff --git a/regen/keywords.pl b/regen/keywords.pl
index b9ae8cf0f2..adbec89c71 100755
--- a/regen/keywords.pl
+++ b/regen/keywords.pl
@@ -47,6 +47,8 @@ my %feature_kw = (
__SUB__ => '__SUB__',
fc => 'fc',
isa => 'isa',
+ ban => 'banana',
+ ana => 'banana',
);
my %pos = map { ($_ => 1) } @{$by_strength{'+'}};
@@ -125,8 +127,10 @@ __END__
-abs
-accept
-alarm
+-ana
-and
-atan2
+-ban
-bind
-binmode
-bless
leo@shy:~/src/bleadperl/perl [git]
$ perl regen/keywords.pl
Changed: keywords.c keywords.h</pre>
<p>We still have a few more files to edit before we're done adding the keywords, but before continuing you should take a look at these regenerated files to see what changes have been made. Notice that this time there are no changes to any Perl files, only C files. This is why we didn't need to update any <tt>$VERSION</tt> values.</p>
<p>The <tt>keywords.h</tt> file just contains a long list of macros named <tt>KEY_...</tt> which give numbers to each keyword. Don't worry that most of the numbers have now changed - <tt>regen/keywords.pl</tt> likes to keep them in alphabetical order, and since we added new ones near the beginning it has had to move the rest downwards. This won't be a problem because the numbers are only internal within the perl lexer and parser, so there's no API compatibility to worry about here.</p>
<p>The <tt>keywords.c</tt> file contains just one function, whose job is to recognise any of the keywords by name. It returns values of these <tt>KEY_...</tt> macros. Take a look at the added code, and notice that its recognition of each of our additions is conditional on the <tt>FEATURE_BANANA_IS_ENABLED</tt> macro we saw added when we added the named feature.</p>
<p>We're not quite done yet though. If we were to run the full test suite now, we'd already find a few tests that fail:</p>
<pre>op/coreamp.t .. 1/? # Failed test 591 - ana either has been tested or is not ampable at op/coreamp.t line 1178
# Failed test 593 - ban either has been tested or is not ampable at op/coreamp.t line 1178
op/coreamp.t .. Failed 2/778 subtests
...
op/coresubs.t .. 1/? perl: op.c:14795: Perl_ck_entersub_args_core: Assertion `!"UNREACHABLE"' failed.
op/coresubs.t .. All 52 subtests passed
...
../lib/B/Deparse-core.t .. 3690/3904 # keyword 'ana' seen in ../regen/keywords.pl, but not tested here!!
# keyword 'ban' seen in ../regen/keywords.pl, but not tested here!!
# Failed test 'sanity checks'
# at ../lib/B/Deparse-core.t line 430.
# Looks like you failed 1 test of 3904.
../lib/B/Deparse-core.t .. Dubious, test returned 1 (wstat 256, 0x100)
</pre>
<p>The two tests in <tt>t/op</tt> are checking variations on a theme of the <tt>&CORE::...</tt> syntax, by which core operators can be reïfied into regular code references to functions that behave like the operator. Often this is appropriate for operators which act like regular functions - for example the mathematical <tt>sin</tt> and <tt>cos</tt> operators, but isn't what we want for keywords that act more structural like basic syntax. We should tell these tests to skip the new keywords by adding them to each file's skip list:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim t/op/coreamp.t t/op/coresubs.t
leo@shy:~/src/bleadperl/perl [git]
$ git diff t/
diff --git a/t/op/coreamp.t b/t/op/coreamp.t
index b57609bef0..bd60ca83b9 100644
--- a/t/op/coreamp.t
+++ b/t/op/coreamp.t
@@ -1162,7 +1162,7 @@ like $@, qr'^Undefined format "STDOUT" called',
my %nottest_words = map { $_ => 1 } qw(
AUTOLOAD BEGIN CHECK CORE DESTROY END INIT UNITCHECK
__DATA__ __END__
- and cmp default do dump else elsif eq eval for foreach format ge given goto
+ ana and ban cmp default do dump else elsif eq eval for foreach format ge given goto
grep gt if isa last le local lt m map my ne next no or our package print
printf q qq qr qw qx redo require return s say sort state sub tr unless
until use when while x xor y
diff --git a/t/op/coresubs.t b/t/op/coresubs.t
index 1fa11c02f0..85c08a4756 100644
--- a/t/op/coresubs.t
+++ b/t/op/coresubs.t
@@ -15,7 +15,8 @@ BEGIN {
use B;
my %unsupported = map +($_=>1), qw (
- __DATA__ __END__ AUTOLOAD BEGIN UNITCHECK CORE DESTROY END INIT CHECK and
+ __DATA__ __END__ AUTOLOAD BEGIN UNITCHECK CORE DESTROY END INIT CHECK
+ ana and ban
cmp default do dump else elsif eq eval for foreach
format ge given goto grep gt if isa last le local lt m map my ne next
no or our package print printf q qq qr qw qx redo require
</pre>
<p>Now lets run those two tests in particular. We can do this by using our newly-built <tt>perl</tt> binary to run the <tt>t/harness</tt> script and pass in the paths (relative to the <tt>t/</tt> directory) to specific tests we wish to run:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness op/coreamp.t op/coresubs.t
op/coreamp.t ... ok
op/coresubs.t .. 1/? # Failed test 51 - no CORE::ana at op/coresubs.t line 53
# Failed test 58 - no CORE::ban at op/coresubs.t line 53
op/coresubs.t .. Failed 2/1099 subtests
Test Summary Report
-------------------
op/coresubs.t (Wstat: 0 Tests: 1099 Failed: 2)
Failed tests: 51, 58
Files=2, Tests=1875, 1 wallclock secs ( 0.35 usr 0.02 sys + 0.67 cusr 0.03 csys = 1.07 CPU)
Result: FAIL</pre>
<p>Well that's one solved, but the other is still upset. This time it is complaining that it expected not to find a <tt>&CORE::ana</tt> at all, but instead one was there. In order to fix that we will have to edit the list of exceptions in <tt>gv.c</tt>.</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim gv.c
leo@shy:~/src/bleadperl/perl [git]
$ git diff gv.c
diff --git a/gv.c b/gv.c
index 92bada56b1..10271159dc 100644
--- a/gv.c
+++ b/gv.c
@@ -543,8 +543,9 @@ S_maybe_add_coresub(pTHX_ HV * const stash, GV *gv,
switch (code < 0 ? -code : code) {
/* no support for \&CORE::infix;
no support for funcs that do not parse like funcs */
- case KEY___DATA__: case KEY___END__: case KEY_and: case KEY_AUTOLOAD:
- case KEY_BEGIN : case KEY_CHECK : case KEY_cmp:
+ case KEY___DATA__: case KEY___END__: case KEY_ana : case KEY_and :
+ case KEY_AUTOLOAD: case KEY_ban : case KEY_BEGIN : case KEY_CHECK :
+ case KEY_cmp :
case KEY_default : case KEY_DESTROY:
case KEY_do : case KEY_dump : case KEY_else : case KEY_elsif :
case KEY_END : case KEY_eq : case KEY_eval :
</pre>
<p>Now we rebuild <tt>perl</tt> (because we have edited a C file) and rerun the tests:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...
leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness op/coreamp.t op/coresubs.t
op/coreamp.t ... ok
op/coresubs.t .. ok
All tests successful.
Files=2, Tests=1875, 1 wallclock secs ( 0.43 usr 0.02 sys + 0.76 cusr 0.02 csys = 1.23 CPU)
Result: PASS</pre>
<p>The test under <tt>../lib/B/Deparse-core.t</tt> checks the behaviour of the <tt>B::Deparse</tt> module against the core keywords. (The path is relative to the <tt>t/</tt> directory, which is why it begins with <tt>..</tt>, and shows that tests within bundled core modules are counted as part of the full test suite.)</p>
<p>When the <tt>isa</tt> feature was added, this test file was updated to add some deparsing tests around the <tt>isa</tt> operator as a regular infix binary syntax. We'll come back later and add some unit tests for our new <tt>ban</tt> and <tt>ana</tt> keywords, but for now as with the coreamp and coresubs tests it is best to just add these to the skip list in that test file as well.</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim lib/B/Deparse-core.t
leo@shy:~/src/bleadperl/perl [git]
$ git diff lib/B/Deparse-core.t
diff --git a/lib/B/Deparse-core.t b/lib/B/Deparse-core.t
index cdbd27ce5e..edf86f809d 100644
--- a/lib/B/Deparse-core.t
+++ b/lib/B/Deparse-core.t
@@ -362,6 +362,8 @@ my %not_tested = map { $_ => 1} qw(
END
INIT
UNITCHECK
+ ana
+ ban
default
else
elsif
leo@shy:~/src/bleadperl/perl [git]
$ ./perl t/harness ../lib/B/Deparse-core.t
../lib/B/Deparse-core.t .. ok
All tests successful.
Files=1, Tests=3904, 17 wallclock secs ( 1.17 usr 0.06 sys + 16.86 cusr 0.06 csys = 18.15 CPU)
Result: PASS</pre>
<p>At this point we now have a named feature with its associated warning, and some conditionally-recognised keywords. In the next parts we will get the compiler to recognise these when parsing Perl code.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-2.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-4.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-1295345563300168002021-02-05T13:00:00.002+00:002021-02-08T13:08:30.058+00:00Writing a Perl Core Feature - part 2: warnings.pm<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-1.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-3.html">Next ></a></p>
<p>Ever since Perl version 5.18, newly added features are initially declared as experimental. This gives time for them to be more widely tested and used in practice, so that the design can be further refined and changed if necessary. In order to achieve this for a new feature our next step will be to add a warning to <tt>warnings.pm</tt>.</p>
<p>Similar to the named feature in <tt>feature.pm</tt> this file also isn't edited directly, but instead is maintained by a regeneration script; this one called <tt>regen/warnings.pl</tt>.</p>
<p>For example, the <tt>isa</tt> feature added a new warning here: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-6b464a6294feea2fbe02fe7848bc39919acfd2b75b0cc779e2348076c3bd06d5">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/regen/warnings.pl
+++ b/regen/warnings.pl
@@ -16,7 +16,7 @@
#
# This script is normally invoked from regen.pl.
-$VERSION = '1.45';
+$VERSION = '1.46';
BEGIN {
require './regen/regen_lib.pl';
@@ -117,6 +117,8 @@ my $tree = {
[ 5.029, DEFAULT_ON ],
'experimental::vlb' =>
[ 5.029, DEFAULT_ON ],
+ 'experimental::isa' =>
+ [ 5.031, DEFAULT_ON ],
}],
'missing' => [ 5.021, DEFAULT_OFF],</pre>
<p>This change simply adds another entry into the list of defined warnings. It has a name, a Perl version from which it appears, and is declared to be on by default (as all "experimental" warnings should be). We also have to bump the version number because that is the value inserted into the generated <tt>warnings.pm</tt> file.</p>
<p>For adding a new warning to go along with our <tt>banana</tt> feature, we follow a similar process to what we did for the named feature bit. We edit the regeneration file to make a similar change to the one seen above, then run the script to have it generate the required files.</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/warnings.pl
leo@shy:~/src/bleadperl/perl [git]
$ perl regen/warnings.pl
Changed: warnings.h lib/warnings.pm</pre>
<p>As before, we can see that it has generated the new <tt>lib/warnings.pm</tt> Perl pragma file, and also a header file for compiling the interpreter itself. Take a look at these files now to get a feel for what's there.</p>
<p>In particular, the items of note are:</p>
<ul>
<li>The generated <tt>warnings.pm</tt> file includes changes to the documented list of known warning categories.</li>
<li>A new <tt>WARN_EXPERIMENTAL__BANANA</tt> macro has been created in the <tt>warnings.h</tt> file. We shall be seeing this used soon.</li>
</ul>
<p>Now that we have both the named feature and the experimental warning we can check that the <tt>experimental</tt> pragma module can enable it:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...
leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -ce 'use experimental "banana";'
-e syntax OK</pre>
<p>We're now one step closer to being able to actually start implementing this feature.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature-part-1.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-3.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-33640794777260845932021-02-03T13:00:00.014+00:002021-02-05T15:18:44.380+00:00Writing a Perl Core Feature - part 1: feature.pm<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-2.html">Next ></a></p>
<p>The first step towards adding a new feature to Perl is introducing the new name into <tt>feature.pm</tt>, so that it may be requested by</p>
<pre>use feature 'banana';</pre>
<p>To accomplish this we don't actually edit <tt>feature.pm</tt> directly, because that is a file which is automatically generated from other source. The primary file we need to work on that lives in the <tt>regen/</tt> directory, called <tt>regen/feature.pl</tt>.</p>
<p>For example, when adding the <tt>isa</tt> feature this was the change made there: <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757#diff-d8e1b7cbf17955be25b7a5330c762893d6994a768c02277e772ae1990a26ec71">(github.com/Perl/perl5)</a>.</p>
<pre>--- a/regen/feature.pl
+++ b/regen/feature.pl
@@ -35,6 +35,7 @@ my %feature = (
unicode_strings => 'unicode',
fc => 'fc',
signatures => 'signatures',
+ isa => 'isa',
);
# NOTE: If a feature is ever enabled in a non-contiguous range of Perl
@@ -752,6 +753,14 @@ Reference to a Variable> for examples.
This feature is available from Perl 5.26 onwards.
+=head2 The 'isa' feature
+
+This allows the use of the C<isa> infix operator, which tests whether the
+scalar given by the left operand is an object of the class given by the
+right operand. See L<perlop/Class Instance Operator> for more details.
+
+This feature is available from Perl 5.32 onwards.
+
=head1 FEATURE BUNDLES
It's possible to load multiple features together, using</pre>
<p>We can see two distinct parts in here. The first, a single line addition to the <tt>%feature</tt> hash, is the part which actually introduces the new name. The second part adds some documentation for it, which will appear in the generated <tt>feature.pm</tt> file.</p>
<p>To add our new <tt>banana</tt> feature then, this is where we must start editing. For now don't worry too much about the documentation part - we'll come back to that later. Just add a single line into the <tt>%feature</tt> hash.</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/feature.pl</pre>
<p>Once we've made our required changes in here, we run the script to get it to regenerate its files. Note that we need to use a <tt>perl</tt> to run this, but it doesn't have to be the one we are trying to build (indeed - that would be problematic would it not? ;) ). Any recently up-to-date system Perl install will be fine.</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ perl regen/feature.pl
Changed: lib/feature.pm feature.h</pre>
<p>Here we can see that it has regenerated two files. The first of these is the <tt>lib/feature.pm</tt> file that the perl VM will use at runtime to implement the actual <tt>use feature</tt> pragma with. The second file is <tt>feature.h</tt> which is used during compiling the interpreter itself and contains the various feature-test macros. If you want, take a look now at the changes it has made.</p>
<p>Specifically, notice that:</p>
<ul>
<li>A new <tt>FEATURE_BANANA_BIT</tt> macro has been created, and a value assigned to it. These features are kept in numerical order, so also notice that the subsequent features have been renumbered. This is fine - the bit fields are only used internally and there are no API guarantees of numerical stability between major versions of Perl.</li>
<li>A new <tt>FEATURE_BANANA_IS_ENABLED</tt> macro has been created, which other code may use to test if the feature is currently in effect <em>during compile-time</em>. Keep note of this - we will be seeing it again later on.</li>
<li>The other change in the file is in the <tt>S_magic_sethint_feature()</tt> function, which adds code to recognise the string name of the new feature; this is ultimately used by <tt>use feature ...</tt> line itself to recognise the names of the requested features.</li>
</ul>
<p>At this point already, we can test that the newly-created feature is at least recognised by the <tt>feature.pm</tt> file itself:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ make -j4 perl
...
leo@shy:~/src/bleadperl/perl [git]
$ ./perl -Ilib -ce 'use feature "banana";'
-e syntax OK</pre>
<p>It actually turns out that the particular commit that added <tt>isa</tt> was somewhat atypical. It didn't actually need to change the <tt>$VERSION</tt> of the generated file, because another change earlier in the history had already done so. This is unlikely to be the case most of the time.</p>
<p>Now would be a good time to introduce the porting tests. This is a subset of the full test suite, which checks various details to do with whether the source code is being maintained properly. We can run these directly:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
porting/cmp_version.t ..... 1/4 # not ok 3 - lib/feature.pm version 1.62
porting/cmp_version.t ..... Failed 1/4 subtests
...
Test Summary Report
-------------------
porting/cmp_version.t (Wstat: 0 Tests: 4 Failed: 1)
Failed test: 3
Files=32, Tests=44043, 188 wallclock secs ( 7.88 usr 0.16 sys + 186.14 cusr 3.98 csys = 198.16 CPU)
Result: FAIL</pre>
<p>Here indeed we see that for our <tt>banana</tt> feature we have forgotten to bump the version number. No matter, we can do that now and test again:</p>
<pre>leo@shy:~/src/bleadperl/perl [git]
$ nvim regen/feature.pl
leo@shy:~/src/bleadperl/perl [git]
$ perl regen/feature.pl
Changed: lib/feature.pm
leo@shy:~/src/bleadperl/perl [git]
$ git diff
...
--- a/lib/feature.pm
+++ b/lib/feature.pm
@@ -5,7 +5,7 @@
package feature;
-our $VERSION = '1.62';
+our $VERSION = '1.63';
our %feature = (
fc => 'feature_fc',
...
leo@shy:~/src/bleadperl/perl [git]
$ make test_porting
...
All tests successful.
Files=32, Tests=44044, 175 wallclock secs ( 7.32 usr 0.12 sys + 174.11 cusr 3.58 csys = 185.13 CPU)
Result: PASS</pre>
<p>While working on core features it's often a good idea to make use of the porting tests regularly at least. The full test suite takes quite a while to run and likely most of it won't affect the particular parts of a new feature you are working on (especially as new features should be lexically guarded and thus limited in impact in the vast majority of the exiting test suite which won't be expecting it), but the porting tests are designed to be fairly small and lightweight to run often enough and keep an eye on the most likely things to check.</p>
<p><a href="/2021/02/writing-perl-core-feature.html">Index</a> | <a href="/2021/02/writing-perl-core-feature.html">< Prev</a> | <a href="/2021/02/writing-perl-core-feature-part-2.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-56141097248969030102021-02-01T14:42:00.014+00:002021-02-26T17:29:34.420+00:00Writing a Perl Core Feature<p>(Index) | < Prev | <a href="/2021/02/writing-perl-core-feature-part-1.html">Next ></a></p>
<p>One of the headline features that was added in Perl version 5.32.0 was the <a href="https://perldoc.perl.org/perl5320delta#The-isa-Operator"><tt>isa</tt> operator</a>. This feature was written by me, and while the actual development history of it spanned many commits, they were all squashed into one to be merged into the actual <tt>blead</tt> branch, which is the main development head for the Perl interpreter itself.</p>
<p>That commit can be seen <a href="https://github.com/Perl/perl5/commit/813e85a03dc214f719dc8248bda36156897b0757">on github</a>.</p>
<p>At initial glance the commit looks quite long and involved - some might even say scary. But in practice there's somewhat less to it than may first appear. For one thing, while the commit touches 33 different files, 12 of those files are automatically generated from other files in the repository, and comprise the majority of the actual lines of diff (413 of the 656 lines in total).</p>
<p>One thing I found while writing that and getting it reviewed was how few people are actually aware of all the inner details of what goes into creating such a feature. Therefore I've decided to write this blog post series, in which I will take that commit apart in detail, and go over all the individual pieces. My aim here is to not only explain what they're all doing there, but additionally to talk you through the process of creating a feature yourself. Along the way we'll also take a look at some other commits and details of other features. We'll also follow the development of a new, hypothetical feature called <tt>banana</tt> - a word unlikely to collide with any existing or future feature, so it should be easy enough to grep out, and find in these examples.</p>
<p>These examples have all been written midway through the Perl version 5.33.x development series, and so will relate to various internal details of that version. If you are reading this at some point off in the future when internals have change significantly you may have to adjust to cope - but hopefully I will have edited these posts to remain relevant to whatever is current-generation technology in the meantime.</p>
<p>I'll be entirely honest here - at least half of the point of my writing this series is as a handy reference for <em>me</em> to read again in future the next time I want to do one of these. But I hope other people will find it useful too. Readers are expected to be familiar with using and writing Perl code, as well as have some experience of writing C code. Some knowledge of the internals of the Perl interpreter (such as from writing XS code) might be useful, but I'll try to explain any particularly in-depth concepts as we encounter them, so don't worry too much there.</p>
<p>Rather than write the whole thing in one big post, I shall split it across various sections. Each of them will be linked from this list for easy reference. Not every potential new feature will need every one of these stages, and of course there may be situations where other things need adding or changing, but overall this is a reasonable first guess at what may need to be done.</p>
<p>If you're generally curious about what goes into these things, or looking for a general overview of the process, I suggest reading all of them in order - several will depend on concepts introduced previously. I'll also leave a handy index of all of them here, for easy reference if you want to look up particular things.</p>
<p>The next parts of this post series are:</p>
<ol>
<li><a href="/2021/02/writing-perl-core-feature-part-1.html"><tt>feature.pm</tt></a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-2.html"><tt>warnings.pm</tt></a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-3.html">Keyword</a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-4.html">Opcodes</a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-5.html">Lexer</a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-6.html">Parser</a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-7.html">Support functions</a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-8.html">Interpreter internals</a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-9.html">Tests</a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-10.html">Documentation</a></li>
<li><a href="/2021/02/writing-perl-core-feature-part-11-core.html">Core modules</a></li>
</ol>
<p>(Index) | < Prev | <a href="/2021/02/writing-perl-core-feature-part-1.html">Next ></a></p>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com4tag:blogger.com,1999:blog-9112560338291574360.post-5434168795398965332020-12-25T16:00:00.013+00:002021-02-03T23:20:33.147+00:002020 Perl Advent Calendar - Day 25<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-24.html">< Prev</a>
<h2>Bonus Day!</h2>
<p>Over this blog post series we have built up to the post on day 24, which explains that all of what we've seen this series is available and working in Perl, right now. It is the Perl we can write in 2020. All of this has been possible because of the custom keyword feature which was first introduced to Perl in version 5.14.</p>
<p>When Perl gained the ability to support custom keywords provided by modules it started down the path that CPAN modules would experiment with new language ideas. Already a number of such modules exist, and it is likely this idea will continue to develop. What new ideas might turn up in the next few years, and will any of them evolve to become parts of the actual core language?</p>
<p>Here's a collection of some thoughts of mine. Some of these can be implemented in CPAN modules, in the same way as the four modules we've already seen this series. Other ideas however go beyond what would be possible via keywords alone, and stray into the realm of ideas that really do need core Perl support.</p>
<h2>Match/case Syntax</h2>
<p>Perl 5.10 added the smartmatch operator, <tt>~~</tt>. I think we can mostly agree it has not been the success many had been hoping for. Its rules are complex and subtle, and there's far too many of them to remember. Furthermore, it still doesn't express the most basic question of whether basic scalar comparisons for equality are performed as string or number tests. For example, is the expression <tt>5 ~~ "5.0"</tt> true or false? I honestly don't know and the fact I'd have to look it up in a big table of behavior suggests that the thing has failed to achieve its goal.</p>
<p>Yet still we are left without a useful syntax to express control-flow dispatch based on comparing a given value to several example choices - a task for which many languages use keywords <tt>switch</tt> and <tt>case</tt>. I have already <a href="https://www.nntp.perl.org/group/perl.perl5.porters/2020/06/msg257494.html">written to Perl5 Porters</a> with my thoughts on a design I have nicknamed "dumb match", in response to this. The basic idea of dumb match is to make the programmer write down their choice of operator to be used to compare the given value with the various alternatives.</p>
<pre>match($var : eq) {
case("abc") { ... }
case("def") { ... }
case("ij", "kl", "mno") { ... } # any of these will match
}</pre>
<p>Here the programmer has specifically requested the <tt>eq</tt> operator, so we know these are stringy comparisons. Alternatively they could have requested any of</p>
<pre>match($var : ==) {
# numerical comparisons
case(123) { ... }
case(456) { ... }
}
match($string : =~) {
# regexp matches
case(m/^pattern/) { ... }
case(m/morestring$/) { ... } # only the first match wins
}
match($obj : isa) {
# object class matches
case(IO::Handle) { ... }
}</pre>
<h2>Type Assertions</h2>
<p>Various people have in various times written about or designed all sorts of variations on a theme of a "type system" for Perl. I have <a href="https://www.nntp.perl.org/group/perl.perl5.porters/2020/07/msg258001.html">written reactions</a> to some of those ideas before.</p>
<p>The idea I have in mind here is less a feature in itself, and more a piece of common ground for several of the other ideas, though it may have applications to existing pieces of Perl syntax. Common to several ideas is the need to be able to ask, at runtime, whether a given value satisfies some classification criteria. People often bring up thoughts of assertions like "this is a string" or "this is an integer" at the start of these discussions, but that isn't really within the nature or spirit of what Perl's value system can answer. Instead, I think any workable solution would be written in terms of the existing kinds of comparisons.</p>
<p>Perl 5.32 added the <tt>isa</tt> operator - a real infix operator that asks if its first operand is an object derived from the class given by its second.</p>
<pre>if($arg isa IO::Handle) {
...
}</pre>
<p>This is certainly one kind of type assertion. I could imagine a new keyword, for the sake of argument for now lets call it <tt>is</tt><sup>*</sup>, which can answer similar yes/no questions on a broader category of criteria. It is likely that the righthand side argument would have to be some sort of expression giving a "type constraint", though exactly what that is I admit I don't have a neat design for currently.</p>
<p><sup>*</sup>: <em>Yes, I'm aware this operator choice would interfere probably with <tt>Test::More::is</tt>. Likely a solution can be found somehow, either by a better naming choice, better parser disambiguation, or a lexical feature guard.</em></p>
<p>It may be the case that generic type constraints can be constructed with an arbitrary Perl expression to explain how to test if a value meets the constraint:</p>
<pre>type PositiveNumber is Numeric where { $_ > 0 };</pre>
<p>While in general that would be the most powerful system, it may not lead to a very good performance for several of the other ideas here, so I am still somewhat on the fence about this sort of detail. Because I don't have a firm design on this yet, for the rest of this post I'm just going to give examples using the <tt>isa</tt> operator instead. But any of the examples or ideas would definitely apply to a more generalised type constraint operator or system, whenever one came to exist.</p>
<p>In any case, once a generic <tt>is</tt> operator exists for testing type constraints, it feels natural to allow that in <tt>match/case</tt> syntax too:</p>
<pre>match($value : is) {
case(PositiveNumber) { ... }
case(NegativeNumber) { ... }
}</pre>
<p>In addition it would be wanted in function and method signatures:</p>
<pre>method exec($code is Callable)
{
...
}</pre>
<p>And also object slot variables:</p>
<pre>class Caption
{
has $text is Textual;
...
}</pre>
<h2>Multiple Dispatch</h2>
<p>Another idea that comes once you have assertions is the idea of hooking that into function dispatch itself. Some languages give you the ability to define the same-named function multiple times, with different kinds of assertion on its arguments, and at runtime the one that best matches the given arguments will be chosen. There are usually many rules and subtleties to this idea, so it may not ultimately be very suitable for Perl, but if a constraint system did exist then it would be relatively simple to write a CPAN module providing a <tt>multi</tt> keyword to allow these.</p>
<pre>multi sub speak($animal isa Cow) { say "Moo" }
multi sub speak($animal isa Sheep) { say "Baaah" }</pre>
<p>Naturally this syntax ought to be implemented in a way that means it still works with <tt>method</tt> and <tt>async</tt> as well, allowing us to just as easily</p>
<pre>async multi method speak_to($animal isa Goose)
{
await $self->say("Boo", to => $animal);
}</pre>
<h2>Signature-like List Assignment</h2>
<p>Perl 5.20 introduced signatures, which can be imagined as a neatening up of the familiar syntax of unpacking the <tt>@_</tt> list into variables. In some ways the following two functions could be considered identical:</p>
<pre>sub add_a
{
my ($x, $y) = @_;
return $x + $y;
}
sub add_b($x, $y)
{
return $x + $y;
}</pre>
<p>This does however brush over a few more subtle details of signatures. Firstly, signatures are more strict on the number of values they receive vs. how many they were expecting. While this is a useful feature, it seems odd that Perl now lacks any syntax for performing a list unpack and checking that it has exactly the right number of elements in any situation other than the arguments from function entry.</p>
<p>For that task, I could imagine an operator maybe spelled <tt>:=</tt> which acts exactly the same as a signature on a function:</p>
<pre>my ($x, $y) := (1, 2);
my ($x, $y) := (1, 2, 3); # complains about too many values
my ($x, $y) := (1); # complains about not enough values</pre>
<p>Of course, there's more to signatures than simply counting the elements. Signatures permit a default value to be used if the caller did not specify it; we could allow that too:</p>
<pre>my ($one, $two, $three = 3) := (1, 2);</pre>
<p>If signatures gain features like type assertions then it seems natural to apply them to the signature-like list assignment operator as well, allowing that to check also:</p>
<pre>my ($item isa Item, $group isa Group) := @itemgroup;</pre>
<p>If key/value unpacking of named arguments arrives then that too would be useful for unpacking a hash:</p>
<pre>my (:$height, :$width) := %params;</pre>
<h2>Twigils</h2>
<p>The slot variables introduced by <a href="https://metacpan.org/pod/Object::Pad"><tt>Object::Pad</tt></a> are written the same as regular lexical variables. I have for a while wished them to be distinct from regular lexicals, so they stand out better visually. The <tt>$:</tt> syntax can easily be made available, allowing them to be written with that instead:</p>
<pre>class Point
{
has $:x = 0;
has $:y = 0;
method describe($name) {
say "Hello $name, this point is at ($:x, $:y)";
}
}</pre>
<p>I accept this is a much more subjective idea than most of the other features. Personally I find it helps to visually distinguish object slots, now that they don't have such notation as <tt>$self->{...}</tt> to remind you.</p>
<h2>True Core Implementations</h2>
<p>As earlier mentioned, some of these ideas can be implemented as CPAN modules (those introduced by new keywords), but others (such as the <tt>:=</tt> operator) would require core Perl support. It would also be nice to see some of the more established and stable CPAN keyword modules implemented in core Perl as true syntax as well.</p>
<p>It would be great if, in 2025, we could simply</p>
<pre>use v5.40; # or maybe it will be use v7.x by then
try { ... }
catch ($e) { ... }
class Calculator {
method add($x, $y) { ... }
}</pre>
<p>Having these available to the core language would hopefully mean that a lot more code would more quickly adopt them as features. While these things are all available as CPAN modules, and work even on historic Perl versions as far back as 5.16 from 2012, it seems that some people don't want to make use of such syntax features unless they are provided by the core language itself. Moving the implementation into core may help for other reasons too, such as efficiency of operation, or allowing them to do yet more abilities not available to them while they are third-party modules.</p>
<p>All in all, it's something we can hope for over the next five years...</p>
<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-24.html">< Prev</a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com4tag:blogger.com,1999:blog-9112560338291574360.post-2009129845570433502020-12-24T12:00:00.102+00:002020-12-24T12:00:02.510+00:002020 Perl Advent Calendar - Day 24<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-23.html">< Prev</a>
<p>Over the course of this blog post series we have seen a number of syntax-providing modules from CPAN. Each of them sets out to neaten up some specific kind of structure often found in Perl code.</p>
<ul>
<li><a href="https://metacpan.org/pod/Future::AsyncAwait"><tt>Future::AsyncAwait</tt></a> aims to neaten up asynchronous code flow, replacing older techniques like <tt>->then</tt> method chaining and helper functions from <tt>Future::Utils</tt> by replacing them with regular Perl syntax.</li>
<li><a href="https://metacpan.org/pod/Syntax::Keyword::Try"><tt>Syntax::Keyword::Try</tt></a> brings the familiar <tt>try/catch</tt> pattern for handling exceptions, replacing more manual techniques involving <tt>eval {}</tt> blocks and inspecting the <tt>$@</tt> variable.</li>
<li><a href="https://metacpan.org/pod/Object::Pad"><tt>Object::Pad</tt></a> provides an entire set of syntax keywords for managing classes of objects, allowing stateful object-oriented code to be neatly written without the risk of things like hash key collsions on <tt>$self->{...}</tt>.</li>
</ul>
<p>Each one of these allows writing shorter, neater code that has less "machinery noise". With fewer distractions in the code it becomes clearer to see the detail of the specific situation the code is for. With less code to write there's less opportunity to introduce bugs.</p>
<p>Moreover we have seen that these syntax modules can be combined together, used in conjunction to allow even greater benefits. We saw on <a href="/2020/12/2020-perl-advent-calendar-day-4.html">day 4</a> that <tt>try/catch</tt> control flow works within <tt>async sub</tt>, on <a href="/2020/12/2020-perl-advent-calendar-day-22.html">day 22</a> that object methods can be marked as asynchronous with <tt>async method</tt>, and on <a href="/2020/12/2020-perl-advent-calendar-day-23.html">day 23</a> we explored how the <tt>dynamically</tt> assignment syntax can be combined with objects, asynchronous functions, and even both at the same time.</p>
<p>The various code examples we've seen over the past 22 days or so have been written using these syntax modules, and also make use of Perl's <tt>signatures</tt> feature, and other things where possible, all to help in this regard. The shorter neatness that comes from not needing to write the line (or two) of code to unpack the function's arguments from <tt>@_</tt> (and maybe the <tt>$self</tt> method invocant as well) removes yet another distraction and potential source of errors.</p>
<p>In summary: This series has been about what it feels like to write Perl code in the year 2020 - it has been about 2020 Perl. This is a language just as flexible and adaptable as Perl has ever been, yet still capable of any of the modern techniques common to other languages, which perhaps even the Perl of five or ten years ago was lacking in - neat function arguments, asynchronous control, exception handling, and syntax for object orientation. With all these new abilities, 2020 has been a great year for writing Perl code.</p>
<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-23.html">< Prev</a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-1609227857371107362020-12-23T12:00:00.005+00:002020-12-24T12:06:39.375+00:002020 Perl Advent Calendar - Day 23<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-22.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-24.html">Next ></a>
<p>For today's article, I'd like to take a look at yet another of my syntax-providing CPAN modules, <a href="https://metacpan.org/pod/Syntax::Keyword::Dynamically"><tt>Syntax::Keyword::Dynamically</tt></a>. This provides a single new keyword, <a href="https://metacpan.org/pod/Syntax::Keyword::Dynamically#dynamically"><tt>dynamically</tt></a>. To quote its documentation:</p>
<blockquote>Syntactically and semantically it is similar to the built-in perl keyword <tt>local</tt>, but is implemented somewhat differently to give two key advantages over regular <tt>local</tt>:
<ul>
<li>You can <tt>dynamically</tt> assign to lvalue functions and accessors.</li>
<li>You can <tt>dynamically</tt> assign to regular lexical variables.</li>
</ul>
</blockquote>
<p>This is important to us when working with <tt>Object::Pad</tt> because of the way slot variables work. Within a <tt>method</tt> body a slot looks like a regular lexical variable. This means that Perl's regular <tt>local</tt> keyword refuses to interact with one. If we want to assign a new value temporarily, only for the duration of one block of code and have it restored automatically afterwards, we must use <tt>dynamically</tt> instead.</p>
<p>For example, both <tt>Syntax::Keyword::Dynamically</tt> and <tt>Object::Pad</tt> contain a copy of a unit test which asserts that their interaction works as expected.:</p>
<pre>has $value = 1;
method value { $value }
method test
{
is $self->value, 1, 'value is 1 initially';
{
dynamically $value = 2;
is $self->value, 2, 'value is 2';
}
is $self->value, 1, 'value is 1 finally';
}</pre>
<p>If instead we were to try this using core Perl's <tt>local</tt> it fails to compile:</p>
<pre>...
{
local $value = 2;
...</pre>
<pre>$ perl -c example.pl
Can't localize lexical variable $value at ...</pre>
<p>When a variable is dynamically assigned a new value inside an asynchronous function it has to be swapped back to its original value while that function is suspended, and its new value put back when the function resumes. This may have to happen several times before the function eventually returns. The way that <tt>dynamically</tt> is implemented means it is supported by <tt>Future::AsyncAwait</tt> and can detect the times it needs to swap values back and forth.</p>
<p>There is also a unit test which checks this interaction in both <tt>Syntax::Keyword::Dynamically</tt> and <tt>Future::AsyncAwait</tt>:</p>
<pre>my $var = 1;
async sub with_dynamically
{
my $f = shift;
dynamically $var = 2;
is $var, 2, '$var is 2 before await';
await $f;
is $var, 2, '$var is 2 after await';
}
my $f1 = Future->new;
my $fret = with_dynamically( $f1 );
is $var, 1, '$var is 1 while suspended';
$f1->done;
is $var, 1, '$var is 1 after finish';</pre>
<p>Given these three modules are now known to be working nicely in each of the three pairwise combinations, you might wonder if all three can be combined at once - can you <tt>dynamically</tt> change the value of an object slot during an <tt>async method</tt>? The answer is still yes.</p>
<p>All three of these module distributions contain a copy of a unit test which checks this behaviour:</p>
<pre>
class Logger {
has $_level = 1;
method level { $_level }
async method verbosely {
my ( $code ) = @_;
dynamically $_level = $_level + 1;
is $self->level, 2, 'level is 2 before code';
await $code->();
is $self->level, 2, 'level is 2 after code';
}
}
my $logger = Logger->new;
my $f1 = Future->new;
my $fret = $logger->verbosely(async sub {
is $logger->level, 2, 'level is 2 before await';
await $f1;
is $logger->level, 2, 'level is 2 after await';
});
is $logger->level, 1, 'level is 1 outside';
$f1->done;
is $logger->level, 1, 'level is 1 finally';</pre>
<p>Each of these syntax modules has provided something useful on its own, but as we have seen both yesterday and today they can be combined with each other to provide even more useful behaviours. It is easily possible to create CPAN modules that operate together to extend the Perl language with new syntax and semantics, and have those extensions work and feel every bit as convenient and powerful as all of the native syntax built into the language.</p>
<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-22.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-24.html">Next ></a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-69763702203447621172020-12-22T12:00:00.096+00:002020-12-23T12:00:16.464+00:002020 Perl Advent Calendar - Day 22<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-21.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-23.html">Next ></a>
<p>We started off this advent calendar series looking at the <tt>async/await</tt> syntax provided by <a href="https://metacpan.org/pod/Future::AsyncAwait"><tt>Future::AsyncAwait</tt></a>, and the way that functions can be marked as <tt>async</tt>. More recently we have been looking at the class and object syntax provided by <a href="https://metacpan.org/pod/Object::Pad"><tt>Object::Pad</tt></a>, such as syntax to provide named methods. Some of you may be wondering whether these two things can be combined; whether methods can be marked as being asynchronous. The answer is yes.</p>
<p>The way that these two modules are implemented means that they can coöperate on how functions are parsed. The end result is that a method can be declared using the combined keywords <tt>async method</tt> and it behaves exactly as expected. Namely, that <tt>$self</tt> and the class's slot variables are available within the code, it returns a future-wrapped value, and permits the <tt>await</tt> keyword.</p>
<p>For example, back on <a href="/2020/12/2020-perl-advent-calendar-day-6.html">day 6</a> we saw an example of <tt>await</tt> with a <tt>//=</tt> shortcircuit expression to optionally wait for a read operation to fill a cache on an object, implemented with a <tt>$self->{...}</tt> key inside <tt>async sub</tt>. At the time I said that the example was slightly reworded from the original code. That is because in reality, the code is implemented using the combination of <tt>async</tt> and <tt>method</tt>:</p>
<pre>use Object::Pad;
use Future::AsyncAwait;
class Device::Chip::TSL256x extends Device::Chip;
...
has $_TIMINGbytes;
async method _cached_read_TIMING ()
{
return $_TIMINGbytes //= await $self->_read(REG_TIMING, 1);
}</pre>
<p>In fact, almost every post after that also had some code taken from modules that are implemented using <tt>async method</tt>. In each case, the real code was in fact shorter and more concise than the posted example because it did not have to start with the <tt>my $self = shift;</tt> line initially, and could use the shorter slot variables instead of hash key accesses on <tt>$self->{...}</tt>.</p>
<p>These two syntax modules - either individually or in combination - are able to greatly neaten a lot of common code patterns. To see just how much they provide here is what the method above might have been written if neither syntax module was used:</p>
<pre>sub _cached_read_TIMING
{
my $self = shift;
return Future->done($self->{TIMINGbytes})
if defined $self->{TIMINGbytes};
return $self->_read(REG_TIMING, 1)->then(sub {
($self->{TIMINGbytes}) = @_;
return Future->done($self->{TININGbytes});
});
}</pre>
<p>In this version of the code it is far less obvious to see the flow of the logic. The caching behaviour of the <tt>TIMINGbytes</tt> field is harder to see, hidden by the various machinery of the future return value and <tt>->then</tt> chaining. Additionally, the <tt>$self->{TIMINGbytes}</tt> field is referred to four times here - each one being just a hash key, and thus prone to typoes. Sure there are techniques to help detect such problems with classical Perl hash-based objects (such as locked hashes), but those all detect runtime attempts to actually touch the fields; none of them are able to point out problems at compiletime.</p>
<p>Such an error would be detected at compiletime using an <tt>Object::Pad</tt>-based slot variable:</p>
<pre>has $_TIMINGbytes;
async method _cached_read_TIMING {
return $_TININGbytes //= await $self->_read(REG_TIMING, 1);
}</pre>
<pre>$ perl -c example.pl
Global symbol "$_TININGbytes" requires explicit package name
(did you forget to declare "my $_TININGbytes"?) at ...</pre>
<p>By the way, did anyone spot the typo on the long example code above? I didn't, the first time I wrote it... ;)</p>
<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-21.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-23.html">Next ></a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-50143036078458710122020-12-21T12:00:00.013+00:002020-12-22T12:01:29.055+00:002020 Perl Advent Calendar - Day 21<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-20.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-22.html">Next ></a>
<p>So far we've been looking at features of some syntax modules that are relatively well-established - <tt>Future::AsyncAwait</tt> has a couple of years of production battle-testing against it, and even <tt>Object::Pad</tt>'s basic class features have been found to be quite stable over the past six months or so. For today's article I'd like to take a slightly different direction and take a look at something much newer and still under experimental design.</p>
<p>Some object systems which use inheritance to create derived classes out of base ones (including the base system in Perl itself) support the idea that a given class may have multiple bases. This is called <em>Multiple Inheritance</em>. Iniitally it may sound like a useful feature to have, but in practice trying to support it makes implementations of object systems more complicated, and can lead to situations where the choice of correct behaviour is non-obvious, or in some cases conflicting with what may seem sensible. Situations get especially complicated if the same partial class appears multiple times in the inheritance hierarchy leading up to a given class.</p>
<p>For this reason most modern object systems, including <tt>Object::Pad</tt>, do not support multiple interitance, to keep behaviours simpler. In order to try to provide the same useful properties (that of being able to share code from multiple component classes), they provide a somewhat different idea, called <em>roles</em>. A role can be considered similar to a partial class which can be merged into a real class. A role can provide methods, <tt>BUILD</tt> blocks, and slot variables. In many ways a role appears the same as a class, except that instances of it cannot be directly created. To be used as an instance a role must be <em>applied</em> to a class. This has the effect of copying all of the pieces of that role into the target class.</p>
<p>For example, in the <a href="https://metacpan.org/release/Tickit-Widget-Menu"><tt>Tickit-Widget-Menu</tt></a> distribution there are two different classes of object that can appear in a menu - an individual menu item, or a submenu. In order to avoid code duplication by copying parts of the implementation around both classes, the common behaviours are implemented in a role, by using the <a href="https://metacpan.org/pod/Object::Pad#role"><tt>role</tt></a> keyword:</p>
<pre>use Object::Pad 0.33;
role Tickit::Widget::Menu::itembase;
has $_name;
BUILD (%args)
{
$_name = $args{name}
}
...</pre>
<p>To apply this role to both of the required classes each uses the <tt>implements</tt> keyword on its <tt>class</tt> statement to copy the components of that role into the class:</p>
<pre>use Object::Pad 0.33;
class Tickit::Widget::Menu:::Item
implements Tickit::Widget::Menu::itembase;
...
class Tickit::Widget::Menu::base
implements Tickit::Widget::Menu::itembase;
...
</pre>
<p>Superficially this might feel like it suffers the same problems as multiple inheritance, but keep in mind that applying a role is basically just a fancy form of copy-pasting the code into the class. There is no runtime lookup of methods or other class items whenever they are accessed. The parts of a role are simply copied individually into the class that applies it. This means that any naming conflicts are detected as errors at compile-time, alerting the programmer to the potential problem:</p>
<pre>use Object::Pad 0.33;
role R
{
method collides() {}
}
class C implements R
{
method collides() {}
}</pre>
<pre>$ perl example.pl
Method 'collides' clashes with the one provided by role R at ...</pre>
<p>A program will only successfully compile if there are no naming collisions. As a result of this, and because the pieces of the role are simply copied into a class, it means that it does not matter in what order individual roles are applied to a class, nor does it matter if the same role is applied multiple times within the hierarchy (e.g. if both a class and its base class tried to apply the same role). The end result is always the same, presuming no conflicts. This compiletime check, and flexibility on ordering and duplicate application, helps to ensure more robust code.</p>
<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-20.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-22.html">Next ></a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-7523800334506220352020-12-20T12:00:00.105+00:002020-12-21T12:13:04.447+00:002020 Perl Advent Calendar - Day 20<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-19.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-21.html">Next ></a>
<p>We have now seen the way that the <tt>has</tt> keyword creates a new kind of variable, called a slot variable, where object instances can store their state values. All of the code in yesterday's examples creates variables that begin, like a new <tt>my</tt> variable, as the undefined value. Often though with an object instance we want to store some other value initially. For this there are two options available.</p>
<p>In simple cases where slot variables of any new object should start off with the same default value we can use an expression on the <tt>has</tt> statement itself to assign a default value. In these two examples, the slot is initialised from a simple constant.</p>
<pre>class Device::Chip::AD9833 extends Device::Chip;
has $_config = 0;</pre>
<pre>class Tickit::Widget::LinearSplit
extends Tickit::ContainerWidget;
has $_split_fraction = 0.5;</pre>
<p>These are compiletime constants, though any form of expression is allowed here. However, note: much like would apply to a <tt>my</tt> or <tt>our</tt> variable in the scope of an entire package or class, any expression is evaluated just once at the time the class itself is first created. The resulting value is stored as the default for every new instance. This expression is <em>not</em> evaluated for each new instance individually. Thus it is rare in practice to see anything other than a constant here. For example, using an expression that created some new helper object would mean that all new instances of the containing class will share the same reference to the same helper object - unlikely what was intended.</p>
<p>For more complex situations which require code to be evaluated for every new instance of a class we can use a <a href="https://metacpan.org/pod/Object::Pad#BUILD"><tt>BUILD</tt></a> block. This provides a block of code which is run as part of the construction process for every individual instance of the class. For example, this <tt>BUILD</tt> block allows us to create a new mutex helper instance for every instance of the containing class:</p>
<pre>class Device::Chip::LEO1306
extends Device::Chip::Base::RegisteredI2C;
use Future::Mutex;
has $_mutex;
BUILD
{
$_mutex = Future::Mutex->new;
}</pre>
<p>The <tt>BUILD</tt> block is basic syntax, similar to Perl's own <tt>BEGIN</tt> block for instance. People familiar with object systems like <tt>Moo</tt> and <tt>Moose</tt> especially should take note - a <tt>BUILD</tt> block is not a method. It does not take the <tt>sub</tt> or <tt>method</tt> keyword, and it cannot be called like one.</p>
<p>Whenever a new instance is invoked <tt>BUILD</tt> block is passed a copy of the argument list given to the constructor. A common task is to set slot variables from those, or perhaps applying defaults if values weren't specified. It is also a common style in Perl for constructor arguments to passed in an even-sized key/value list, so they can be easily unpacked as a hash variable. This makes it simple for <tt>BUILD</tt> blocks to inspect the named keys they're interested in. Despite not being a true method, a <tt>BUILD</tt> block still permits a signature to unpack its arguments as if it were one.</p>
<pre>class Device::Chip::CC1101 extends Device::Chip;
has $_fosc;
has $_poll_interval;
BUILD (%opts)
{
$_fosc = $opts{fosc} // 26E6;
$_poll_interval = $opts{poll_interval} // 0.05;
}</pre>
<p>There is still much ongoing design work here. It turns out in practice that a large majority of the code in <tt>BUILD</tt> blocks is something like this form - a series of lines, each setting a slot variable from one constructor argument.</p>
<p>There may be value in having <tt>Object::Pad</tt> provide a convenient way to let each slot variable declaration specify how it should be initialised from name constructor arguments. This would help keep the code less cluttered by the low-level machinery, and allow additional features such as error checking by rejecting unrecognised key names. This would, however, involve <tt>Object::Pad</tt> specifying that constructor arguments must be in named argument pairs, which it currently does not.</p>
<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-19.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-21.html">Next ></a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-91398348872681623532020-12-19T12:00:00.123+00:002021-01-18T19:13:54.087+00:002020 Perl Advent Calendar - Day 19<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-18.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-20.html">Next ></a>
<p>We have already discussed that the most fundamental property of an object-oriented programming is the idea that a collection of state can be encapsulated into a single piece, and given behaviours that operate on the state. In yesterday's article we saw how to create new classes of object (with the <tt>class</tt> keyword), and how to add behaviours (with the <tt>method</tt> keyword). Today we'll take a closer look at the other half of this - how to add state.</p>
<p>While the word "method" seems to be fairly well entrenched, various object systems across various languages have a variety of different words to describe the state values stored for each given instance. The word "field" has been used in Perl before, and refers specifically to the now-obsolete <a href="https://metacpan.org/pod/fields"><tt>fields</tt></a> pragma. Sometimes programmers refer to "attributes" of an object, but in Perl this is also an overloaded term referring to the <tt>:named</tt> annotations that can be applied to functions or variables. In <tt>Object::Pad</tt> the per-instance state is stored in variables called "slots".</p>
<p>Within a class, slots are created by the <a href="https://metacpan.org/pod/Object::Pad#has"><tt>has</tt></a> keyword. This looks and feels similar to the <tt>my</tt> and <tt>our</tt> keywords. It introduces a new variable, optionally initialised with the value of an expression. Whereas a <tt>my</tt> or <tt>our</tt> variable is visible to all subsequent code (including nested functions) within its scope, a <tt>has</tt> variable is only visible within functions declared as <tt>method</tt>, because it will be associated with individual instances of the object class.</p>
<p>In this example the slot variables storing the label and click behaviour are available within any method:</p>
<pre>class Tickit::Widget::Button extends Tickit::Widget;
has $_label;
has $_on_click;
method label { return $_label; }
method set_label
{
( $_label) = @_;
$self->redraw;
}
method on_click { return $_on_click; }
method click
{
$_on_click->($self);
}</pre>
<p>In terms of visibility these slot variables behave much like other kinds of lexical variable - namely, they are not visible from outside the source of this particular class. This means that by default any such state variables are private to the class's implementation, inaccessible by other code that uses the class. We can choose to expose certain parts of it via the class's interface by providing these accessor methods, but we are not required to do so.</p>
<p>It is a common style in <tt>Object::Pad</tt>-based code to name the slot variables with a leading underscore, as in this example, as it helps them to stand out visually in larger code. It helps remind people that these are slot variables, because they now lack other visual signalling (such as <tt>$self->{...}</tt>) to otherwise distinguish them.</p>
<p>Another common behaviour is creating simple accessor methods to simply return the value of a slot, thus deciding to expose that particular variable as part of the object's interface, visible to callers. So common in fact that <tt>Object::Pad</tt> provides a shortcut to create these accessor methods automatically:</p>
<pre>class Device::Chip::SSD1306 extends Device::Chip;
has $_rows :reader;
has $_columns :reader;
# now the class has ->rows and ->columns methods visible</pre>
<p>The <tt>:reader</tt> attribute requests that a simple accessor method is created to return the current value of the slot. It is named the same as the slot, with a leading underscore first removed to account for the common naming convention.</p>
<p>One key advantage that these variable-like slots have over classical Perl objects built on hash keys or data provided by accessor methods is that the names are scoped within just the class body that defines them. Names cannot collide with those defined by subclasses. This is even checked by one of <tt>Object::Pad</tt>'s own unit tests, which defines a base class and a subclass from it that both have a slot called <tt>$data</tt>:</p>
<pre>class Base::Class {
has $data;
method data { $data }
}
class Derived::Class extends Base::Class {
has $data;
method data { $data }
}</pre>
<p>It then has some tests to check that each of these methods behaves differently. In particular, this provides the guarantee that classes can freely add, delete, or rename their own slot variables without risking breaking other related classes. This leads to more robust class definitions.</p>
<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-18.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-20.html">Next ></a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-16934491647512050192020-12-18T12:00:00.095+00:002020-12-19T13:14:34.332+00:002020 Perl Advent Calendar - Day 18<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-17.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-19.html">Next ></a>
<p>Yesterday we took our first glance at some example code using <tt>Object::Pad</tt>. Today I'd like to continue with some more in-depth examples showing a few details of the new syntax provided. These will be real examples from actual code on CPAN.</p>
<p>The <a href="https://metacpan.org/pod/Object::Pad#class"><tt>class</tt></a> keyword introduces a new package that will form a class, much like Perl's existing <tt>package</tt> keyword. It creates the new package, much as the <tt>package</tt> statement does, and additionally sets up the various <tt>Object::Pad</tt>-related machinery to have the new package be a proper class. It also makes the other new keywords available - <tt>method</tt> and <tt>has</tt>. As with <tt>package</tt> it supports setting the <tt>$VERSION</tt> of the new package by specifying a version number after the name. It also supports several new sub-keywords to further specify details about the class, such as a base class that it is extending (via the <tt>extends</tt> keyword).</p>
<p>Even though the <tt>class</tt> keyword acts the same as the <tt>package</tt> keyword, it isn't currently recognised by parts of CPAN infrastructure, such as the indexer which creates package-to-file indexes. As such, any module uploaded to CPAN still needs to have a <tt>package</tt> statement as well, to keep these tools happy. It's usual to find them both in combination:</p>
<pre>use Object::Pad;
package Tickit::Widget::HBox 0.49;
class Tickit::Widget::HBox extends Tickit::Widget::LinearBox;
...</pre>
<p>Like with <tt>package</tt> the <tt>class</tt> syntax can be used in either of two forms. It can set the prevailing package name for following declarations if used as a simple statement, or it can take a block of code surrounded by braces, and applies just to the contents of that block. The first form is usually preferred for the toplevel class in a file, with the latter form being seen for internal "helper" classes within a file. For example, the <tt>Device::Chip::NoritakeGU_D</tt> module contains three small internal helper classes defined using a block</p>
<pre>class Device::Chip::NoritakeGU_D::_Iface::UART {
use constant DEFAULT_BAUDRATE => 38400;
has $_baudrate;
...
}</pre>
<p>The <tt>class</tt> keyword was at least partly designed during the 2019 Perl 5 Hackathon event in Amsterdam, at which there was a similar idea for a <tt>module</tt> keyword. That has yet to be implemented anywhere, but a common theme to both ideas was that they would imply a more modern set of default pragma settings than default Perl begins with. After a <tt>class</tt> statement (or inside its block), the <tt>strict</tt> and <tt>warnings</tt> pragmas are applied, and on versions of Perl new enough to support it, the <tt>signatures</tt> feature is turned on and the <tt>indirect</tt> feature is turned off.</p>
<p>The <a href="https://metacpan.org/pod/Object::Pad#method"><tt>method</tt></a> keyword adds a new function into the class namespace, much like <tt>sub</tt> does. The <tt>$self</tt> invocant parameter is handled automatically within the body of a method, meaning that a parameter signature or <tt>@_</tt> unpacking code does not have to handle it specially. The code can totally ignore this and it will work correctly.</p>
<p>Because the <tt>signatures</tt> feature is automatically enabled on supported Perl versions, it makes method declarations inside classes particularly short and neat. For example, this from <tt>Tickit::Widget::Scroller</tt>:</p>
<pre>method scroll ($delta, %opts)
{
return unless $delta;
my $window = $self->window;
@_items or return;
...
}</pre>
<p>Straight away we haven't needed to write the usual two lines of method setup code, of handling the <tt>$self</tt> variable and then unpacking the other arguments out of <tt>@_</tt>. As we have already seen with the use of <tt>async/await</tt> syntax, this <tt>method</tt> keyword helps reduce a lot of the "noise" of machinery out of the code, and lets us more clearly and easily see the domain-specific details inside it.</p>
<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-17.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-19.html">Next ></a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0tag:blogger.com,1999:blog-9112560338291574360.post-78682354083422907212020-12-17T12:00:00.003+00:002020-12-18T12:40:16.031+00:002020 Perl Advent Calendar - Day 17<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-16.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-18.html">Next ></a>
<p>For the past 16 days we've been looking at the subject of asynchronous programming, and how using <tt>async/await</tt> syntax as provided by the <tt>Future::AsyncAwait</tt> module leads to code that is much simpler and easier to read, as compared to other ways to achieve similar results. I now want to shift focus entirely, and take a look at an entirely different area - object-oriented programming.</p>
<p>Perl has supported object-oriented programming ever since version 5.000, though people tend to find the built-in mechanisms a little short on features. Over the years various CPAN modules have been created to fill in the missing pieces. Entire articles could be written just listing and comparing them, but <tt>Moo</tt> and <tt>Moose</tt> seem to be among the more commonly-used ones. Many of these systems are written in Perl, and thus to use them code has to be written entirely in existing Perl syntax. Even when some object systems end up being implemented in C for efficiency, they still require Perl syntax to operate them. This often leads to non-ideal behaviour.</p>
<p>Consider the most fundamental property of object systems: the idea that a collection of state can be bundled up into a convenient and encapsulated place, and given behaviours (which we call "methods") that can operate on that state. In classical Perl classes, we usually use a hash reference to store the state. Individual named keys can store fields of this state.</p>
<pre>package Point;
use feature 'signatures';
sub new($class)
{
return bless {
x => 0,
y => 0,
}, $class;
}
sub move($self, $dx, $dy)
{
$self->{x} += $dx;
$self->{y} += $dy;
}
sub describe($self)
{
say "A point at ($self->{x}, $self->{y})";
}</pre>
<p>Here we have used the keys "x" and "y" inside this blessed hash reference to store state about the object instance. It's accepted convention that code outside of the object class's implementation should not interfere with these. Still, there is no enforcement of this separation, and no automation of the various parts of code that need to be written for basically any class - namely, things like the <tt>bless</tt> expression, or the <tt>$self</tt> argument of method functions.</p>
<p>Object systems such as <tt>Moo</tt> or <tt>Moose</tt> have popularised the idea of a <tt>has</tt> statement, at the class level, which attempts to provide some automation around these kinds of object fields. These provide a certain amount of automation of tasks like instance constructors. But they don't add much overall convenience because they are limited to only working within existing Perl syntax, and that restricts the options available for accessing instance data. The usual style is to make internal state accessible via accessor methods.</p>
<pre>package Point;
use feature 'signatures';
use Moo;
has "x", is => "rw", default => 0;
has "y", is => "rw", default => 0;
sub move($self, $dx, $dy)
{
$self->x($self->x + $dx);
$self->y($self->y + $dy);
}
sub describe($self)
{
say "A point at (", self->x, ", ", $self->y, ")";
}</pre>
<p>This has helped in some ways (e.g. we didn't have to think about providing a constructor this time), but in other ways it feels less of an improvement. Notably, because object fields don't behave any more like regular Perl variables (as hash elements do), they can't be mutated by the convenient <tt>+=</tt> operator in the <em>move</em> method, nor interpolated into a string in the <em>describe</em> method. Moreover, there is nothing about this which separates, or even suggests a difference between, the external interface of method calls that users of this class should call to access it, from the internal interface that these methods use to access the state fields directly. Users of this class are not prevented from, or even discouraged against, calling <tt>$point->x</tt> on some instance, to either read or even modify a field. This does not encourage data encapsulation.</p>
<p>In an attempt to fix some of these shortcomings, Ovid has been working on a design called <a href="https://github.com/Ovid/Cor/wiki">Cor</a>. Along with this design I have been working on an implementation of it, as the CPAN module <a href="https://metacpan.org/pod/Object::Pad"><tt>Object::Pad</tt></a>.</p>
<p>The aim of this design is to provide new syntax as real keywords, which is therefore able to do things that none of the previous generation of object systems could do. An important feature is the way that instance data is provided.</p>
<pre>use Object::Pad;
class Point;
has $x = 0;
has $y = 0;
method move($dx, $dy)
{
$x += $dx;
$y += $dy;
}
method describe()
{
say "A point at ($x, $y)";
}</pre>
<p>This is close to ideal in terms of code size. We have expressed all the behaviours of the previous two examples, but with a minimum of extra "noise" of exposed machinery. We didn't need to provide a constructor method, or think about a <tt>bless</tt> expression. None of our methods have had to consider a <tt>$self</tt> - either in the list of arguments provided, nor in using it to access the instance fields. The fields have been directly accessible as if they were lexical variables.</p>
<p>Over the next several posts, we will continue to explore this syntax module in more detail, and see its various features and advantages in more detail.</p>
<a href="/2020/12/2020-perl-advent-calendar-day-1.html"><< First</a> | <a href="/2020/12/2020-perl-advent-calendar-day-16.html">< Prev</a> | <a href="/2020/12/2020-perl-advent-calendar-day-18.html">Next ></a>LeoNerdhttp://www.blogger.com/profile/06161372680495361467noreply@blogger.com0