Perl 6 RSS Feeds

Steve Mynott (Freenode: stmuk) steve.mynott (at) / 2018-09-22T07:11:15

Weekly changes in and around Perl 6: 2018.38 Three Versus Six

Published by liztormato on 2018-09-17T21:44:34

Patrick Spek has written a nice blog post about some Hackerrank solutions for Python 3 and Perl 6. Which created quite a few comments on /r/python, /r/programming and /r/perl6. For some people it provided a nice way to show off their versions of the code in question!

A new Oddmuse?

Something yours truly forgot to mention last week. Alex Schroeder has been thinking about re-implementing the Oddmuse wiki using Cro (FaceBook comments). Your views are very welcome!

Thoughts on sigils?

An interesting thread caused by the question: What are your thoughts on sigils? on Reddit. A good read, with even some APL mixed in.

Signatures in Perl 6

Yours truly had the fourth installment of her series on migrating code from Perl 5 to Perl 6 published on Which caused some comments on Reddit: /r/perl and /r/perl6.

The 128-Language Quine Relay

An interesting blog post on of last April actually also contains Perl 6! As ab6tract noticed.

Core Developments

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on FaceBook

Meanwhile on perl6-users

Perl  in comments

Perl 6 Modules

New modules:

Updated Modules:

Winding Down

Even though this week was a day shorter (on account of last week’s Perl 6 Weekly being a day late), there was plenty to mention and research. As you may have noticed. See you again next week!

Jo Christian Oterhals: Perl 6 small stuff #10: Q: How many seconds is a day? A: 86.400, 82.800 and 90.000

Published by Jo Christian Oterhals on 2018-09-08T12:16:53

Over on Twitter Joelle Maslak (Twitter handle @jmaslak) asked a question about a date and time problem I honestly had never thought about.

Am I missing something? Is there a good, portable date library for #Perl6? I.E. can I write this: How many seconds long is March 11, 2018 in my local time zone (start of daylight savings time in my locale, so it's not 86_400)?

 — @jmaslak

How to calculate the number of seconds in a day on dates starting or ending DST? I’ve always just assumed that 86.400 is the correct answer. It makes sense, doesn’t it? 24 hours * 60 minutes * 60 seconds = 86.400.

Joelle reminded me that these two days a year — at least here in Norway — are either more or less than 86.400. This year (2018) we switched to DST March 25. So I thought that, surely, that had to be something Perl’s Datetime object could figure out? To be honest I thought Joelle had misunderstood something. As it turned out, she hadn’t:

my $d1 ="2018-03-25").DateTime.posix;
my $d2 ="2018-03-26").DateTime.posix;
say $d2 - $d1;

Stubborn as I am I double checked what’d happen if I checked October 28, 2018 (the day Norway switches from DST to winter time).

my $d1 ="2018-10-28").DateTime.posix;
my $d2 ="2018-10-29").DateTime.posix;
say $d2 - $d1;

It seems that DateTime ignores daylight savings time here, and counts 1 day from the first to the latter. I guess there are good reasons for this — without having checked the underlying code; perhaps it is based on UTC which is without all that DST nonsense?

No matter what the reason is: Those of us with locales and time zones that suffer DST changes twice yearly, and who experience that days sometimes are longer, sometimes shorter, knows that the reality feels and is different. What to do?

Well, sometimes it helps to have lived for a while, because I vaguely remembered that Perl 5’s localtime function returned an array with an element that flagged whether that particular time was DST or not (the Perl 5 localtime documentation calls this flag $isdst). Alas, it seemed that Perl 6 didn’t have the similar built-in.

Researching this I found Liz’s module Time::localtime. It basically ports Perl 5’s localtime functions to Perl 6 [1]. Using zef to install it, I had what it took to write a code snippet that answered Joelle’s original question:

#!/usr/bin/env perl6
use Time::localtime;
sub seconds-in-a-day($start-date) {
my $d1 =$start-date).DateTime.posix;
my $d2 =$start-date).DateTime.later(days => 1).posix;
my $l1 = localtime($d1);
my $l2 = localtime($d2);
return ($l2.isdst ?? $d2 - 3600 !! $d2)
- ($l1.isdst ?? $d1 - 3600 !! $d1);
say seconds-in-a-day("2018-03-25");
say seconds-in-a-day("2018-05-17");
say seconds-in-a-day("2018-10-28");

So now we’ve solved Joelle’s problem. The number of seconds in a day is 86.400 and 82.800 and 90.000. Case closed…

…or is it? Since one day is 82.800, another is 90.000 and the rest is 86.400, a year surely consists of 365 * 86.400 = 31.536.000? Well, yes, unless you consider leap years which consists of 366 days.

Luckily, DateTime handles leap years perfectly and reports that the year 2016 consists of 31.622.400 seconds. Great. That was what I expected. Case closed…

…or is it? No, because 2016 also had a leap second, so the answer should really have been 31.622.401 seconds. If I’m honest we’ve now entered the realm of obscure details. But having fixed the 86.400 DST problem, it’s infuriating to learn that a day really isn’t 86.400 seconds long either. Rather it’s 86.400 and a few milliseconds; every year the day is slowing a little more as well.

This phenomena is called solar drift. From time to time it’s decided that we need a leap second to compensate for the drift. A bureau overlooks this, and the rule of thumb is that every time the drift closes in or exceeds 0.6 seconds compared to how we count, they insert a leap second (the goal is to keep the difference less than 0.9 seconds at all times). The leap second comes at irregular times and as such is probably difficult to take into count when programming date calculations. For these reasons I guess we shouldn’t expect core Perl 6 to handle it.

Anyway, we’re lucky that Joelle didn’t ask about these things as well, because that would have become a very different article :-)

Update: Read Brad Gilbert’s excellent comment below. Perl 6 do take leap seconds into consideration, provided that the programmer knows how to ask for it. I sure didn’t.


[1] There’s another module that implements a version of is-dst as well, DateTime::DST. For all I know it works as well as Time::localtime, but I couldn’t get it to install on MacOS. My guess is that I (or MacOS in general) lack some kind of OS level shared library for this to work. Luckily Time::localtime did the job for me.

Jo Christian Oterhals: Great respons as always, Brad.

Published by Jo Christian Oterhals on 2018-09-12T08:51:13

Great respons as always, Brad. I should have consulted the documentation and tried to understand local myself. Now I see a vague reference to leap seconds in the brief description of in-timezone, but I’m not sure I would have understood it hadn’t you pointed it out to me.

And I wouldn’t have thought of subtracting to DateTime objects to get the difference in seconds. The documentation doesn’t indicate this in a way that was understandable to me, as the description of output was entirely different.

From the documentation:

say perl -;           # OUTPUT: «␤»

I’m impressed by the work that’s gone into the documentation, but I have to admit that it’s often is surprisingly hard to understand, in particular when it comes to the finer points such as this. So what I’m perhaps really wondering where you guys pick up all of this :-)

As for the main subject of the article, I’m not able to see that your solution changes anything with regards to day length. I still get 86400:

$ perl6 -e 'my $a ="2018-03-25").DateTime.local; my $b ="2018-03-26").DateTime.local; say $b - $a; say $a; say $b;'

Again — it’s probably something I haven’t picked up on.

As for your answer I am, as always, very grateful that you took the time to share. I do, as I use to say, learn something new each time I blog about these things :-)

Weekly changes in and around Perl 6: 2018.37 A DEtour of Damian

Published by liztormato on 2018-09-11T12:19:11

Out of the blue, or so it seemed, Damian Conway has appeared to do a Tour of Germany, giving presentations in Frankfurt, Erlangen, Dresden and Berlin. The last day will be a full-day Advanced Technical Presentation class (thanks to Strato AG). Entrance is always free, thanks to sponsoring by the Frankfurt Perl Mongers (FaceBook comments).

Linux packages updated

Thanks to the tireless work of Claudio Ramirez, the Linux packages for the Rakudo 2018.08 Compiler Release have been updated. Which now also include support for Alpine 3.8 and openSUSE 15.0. Check them out at

Improved hygiene in JIT templating

After last week’s Rakudo Compiler release, Bart Wiegmans has merged a branch that provided many updates to the JIT expression template compiler in MoarVM. This should make it easier and less error-prone to write JIT expression templates, as he explains in a blog post.

What I did not steal from Perl 6

Ilya Sher, the author of the Next Generation Shell describes in his blog post things that he would (like to) steal from Perl 6, or not (Reddit comments).

Math::Matrix Introduction

Herbert Breunung introduces the Math::Matrix module in the first
blog post of a series (Reddit comments).

Tailgrepping Spinners

Brian Matatu has written a nice blog post about grepping the output of a tail -f (aka looking for things from a process that writes lines to standard output as they are written) (Reddit comments). Then Ralph Mellor took the idea to ask a question about the readability of the script. One interesting quote from the responses:

The start react ... whenever { } stuff looks interesting enough for me to stick this on my Big Pile of Stuff to Look intho, though.

Twisting the Rationals

Donaldh dug deeper into the performance of rational numbers (aka Rats) in a blog post titled A Twist To The Rational Story. It also showed a nice way to use a combination of [+] and race that yours truly hadn’t thought of yet!

Spotlight on Timo Paulssen

Timo Paulssen responded to a question on the new Curious Cat platform on how he got involved with Perl 6. He further described how he made videos of cellular automata in Perl 6. Interesting stuff!

Jo Christian Oterhals at it again!

The past week did not see 1, not 2 but 3 blog posts by Jo Christian Oterhals:

All very interesting reads!

I Am Sparrowdo

Alexey Melezhik has written an introduction on how to use Sparrowdo on Windows. Cool to see a nice tool getting a proper introduction!

Swiss Perl Workshop

The past weekend also saw the Swiss Perl Workshop 2018. Separate videos of each presentation are not available yet, but since everything was live streamed, there are archives available of the raw streams: Day 1 and Day 2. Kudos to the SPW organizers and to Lee Johnson to make all of the streaming and recording seem so easy.

Core Developments

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on FaceBook

Meanwhile on perl6-users

Perl 6 in comments

Perl 6 Modules

New Modules:

Updated Modules:

Winding Down

So. Much. Happening. And not enough time. Hopefully next week’s Perl 6 Weekly will be more in time than this one. See you then!

Jo Christian Oterhals: Thanks Brent!

Published by Jo Christian Oterhals on 2018-09-08T09:30:01

Thanks Brent! I hadn’t thought of this at all. Now that you’ve pointed this out to me, I feel that your version is the best illustration yet as to why learning 6 is to forget 5. BTW, I haven’t ventured into parallelizing at all, so your comment gave me the reasons I need.

However, I did some testing using your suggestions and experience the opposite of you. Here’s the code:

my $start = now;
say "^yl: " ~ @array.grep(*.starts-with("yl")).elems;
say "Ordinary Grep duration: " ~ now - $start;
$start = now;
say "^yl: " ~ @array.hyper.grep(*.starts-with("yl")).elems;
say "Hyper Grep duration: " ~ now - $start;
$start = now;
say "^yl: " ~ @array.race.grep(*.starts-with("yl")).elems;
say "race Grep duration: " ~ now - $start;

What’s interesting is that the ordinary grep is the fastest in my testing:

^yl: 1563
Ordinary Grep duration: 1.4747359
^yl: 1563
Hyper Grep duration: 10.06229302
^yl: 1563
race Grep duration: 5.56080517

As I’ve noted quite a few times in these blog posts, my machine is a relatively old MacBook Pro — 2013 version with a two-core i7 processor. Could it be that my machine is not quite up to the task of parallelization, and that attempts to parallelize incurres a speed penalty instead?

Jo Christian Oterhals: Another great one, Timo.

Published by Jo Christian Oterhals on 2018-09-07T18:54:53

Another great one, Timo. Regarding xx I have just assumed that the left side was evaluated once and repeated the number of times stated on the right side.

I attended a conference yesterday (Nordic Perl Workshop NPW) that had a double session on Perl 6. The speaker noted that Perl 6 was a *big* language. It sure is. It really takes the “there’s more than one way to do it’ mantra seriously :-)

Thanks for pointing out the xx usage to me.

Jo Christian Oterhals: Perl 6 small stuff #9: Vantage points and the perception of speed

Published by Jo Christian Oterhals on 2018-09-06T17:57:02

This is not a post complaining about speed. It’s a post about how expectations influence our perception of speed.

Speed isn’t everything. Sometimes clarity and elegance get precedence. I think Perl 6 enables elegance, so much so that I don’t mind that it’s not a speed daemon [1]. Perl 5 on the other hand is. And we shouldn’t expect any less of it, since 5 has been trimmed and tinkered with for a quarter of a century now.

Perl 6 is the newcomer, so it’d be unreasonable to expect the same optimisation. But the thing is that guys like me, i.e. people with some Perl 5 experience, are carriers of perl5-isms. Some of those -isms spill over when we write Perl 6 code [2]. Not only do I expect the result to be the same, but if I’m honest also that it’d match some of 5’s speed.

Consider the code below.

# Perl 5
$ time perl -E 'my @a = "a".."z"; my @e = map join("", map $a[rand @a], 1..8), 1..1_000_000; say "P5: $e[0]";'
P5: glgvlxzm
real 0m3.563s
user 0m3.478s
sys 0m0.069s

This snippet generates an array of 1,000,000 eight-character strings (in itself not very interesting, but I’ll use the array later on for something else). My old-ish MacBook Pro Perl 5 uses around 3,5 seconds to generate one million strings and populate an array with them.

If I — out of habit — program the same thing in Perl 6, using my perl 5 thinking, I end up with almost similar code.

$ time perl6 -e 'my @a = "a".."z"; my @e = map { map( { @a.pick }, 1..8).join }, 1..1_000_000; say "P6: @e[0]";'
P6: uywokcsh
real 0m49.320s
user 0m49.029s
sys 0m0.296s

The result’s exactly what I expected, what’s unexpected is the time the program takes to complete [3]. Had Perl 6 been called, say, Camelia or Century or whatever else, I think I wouldn’t have thought about speed in these terms. Sure, I’d notice that it was a little sluggish. I’d maybe compare the results to Python or Julia or Perl 5 and been a little surprised, but I wouldn’t have expected 1:1 similarity speed wise.

Perhaps I’d be more occupied with the ways Perl 6 enables beautiful, readable and concise code. Because those are, to me, Perl 6’s main selling points at the moment. It’s just that as it is now, in the shadow of Perl 5, they’re a little easy to forget.

If you’re interested you can now go on to read the second part: Perl small stuff #9½: Perception of speed — benchmarking grep. In that one I discover that it’s really the small stuff that makes a big difference

One day later: Read Simon Proctor’s insightful comment below. He pointed me to the Perl 6 solution that is the fastest [5].


[1] Here’s Zoffix Znet’s excellent presentation on how to speed up Perl 6 programs. Watch it. It’s worth spending 27 1/2 minutes on.

[2] Perl 5 has become so fast over time that I get away with lots of sloppy code, something that Perl 6 won’t let me — yet. So I could flip this on its head and say that Perl 6 teaches me to write better code. The value of that can’t be underestimated.

[3] The code above can become almost 2,5x faster just by un-perl5-ifying it. Replace the my @e part of the code with this:

my @e = ((@a.pick for ^8).join("") for ^1_000_000);

This little change makes the code execute in around 20 seconds. We’re closing in on acceptable territory here, and I’m sure the excellent work done by all the volunteers making Perl 6 will just make it even better over time.

[4] The most elegant way to generate a million element array with arbitrary strings is, in my eyes, the one below. It’s the way I’d do it if I wanted to show off a little, and maybe even if I prefered the most readable code:

my @e = (pick(8, "a".."z").join for ^1_000_000);

But showing off comes with a performance penalty — obviously because “a”..”z” is generated a million times here. The execution time (52 seconds) is even slower than the Perl 5-ish Perl 6 code we started with.

The middle ground seems, as always, to be the best.

[5] Simon Proctor made me aware of two things, one I didn’t know and one I had forgot. The first is that there is a certain speed difference between pick and roll. Since it’s not mandatory for me to have eight distinct letters — repeats are acceptable — I can switch from pick to roll. The one I forgot is that the routines roll and pick can take a parameter, i.e. how many to pick or roll. That makes my for ^8 loop redundant. The pick/roll method loops faster. So…

File: optimized-gather.p6
my @alphabet = "a".."z";
my @array = (@alphabet.roll(8).join for ^1_000_000);
say "^yl: " ~ @array.grep(*.starts-with("yl")).elems;
$ time perl6 optimized-gather.p6
^yl: 1452
real 0m8.536s
user 0m8.438s
sys 0m0.254s

This version is 5.8x faster than the one we started with, and just 2.4x times slower than the original Perl 5 version. But please note how not like Perl 5 the end result has become. This goes to show that style and other choices influence Perl 6's speed immensely. See my follow-up article for more on that.

brrt to the future: Template Compiler Update

Published by Bart Wiegmans on 2018-09-04T17:30:00

Hi everybody. After samcv++ released MoarVM again, it was finally time to introduce some updates to the expression template compiler.

As you may or may not know, the expression JIT backend maps MoarVM opcodes to its internal representation via expression templates. At runtime (during the JIT compilation phase), these templates are combined to form an expression tree. The expression template language remains somewhat underdocumented, but quite a few brave developers have still ventured to add templates. With the new template compiler, a few things will change:

(macro: ^foo (,bar)
(let (($obj ...))
(add $obj ,bar)))
(template: foobar
(let (($obj ...))
(^foo $obj))

Prior to these patches, this would expand to something like:

(template: foobar
(let (($obj ...))
(let (($obj ...)) # name conflict is here
(add $obj $obj))) # which $obj?

Although this code would previously fail during compilation, having to deal with such conflicts in remote code segments is annoying. But now named declarations are resolved prior to expanding the inner macro, and this means that the $obj for macro ^foo is resolved to the let: within that macro, and the outer $obj in the template refers to the outer declaration, as if you'd written:

(template: foobar
(let (($obj1 ...))
(let (($obj2 ...))
(add $obj2 $obj1)))

I believe that this makes writing templates safer and simpler, catching more errors at compile time and fewer at runtime. Happy hacking!

Weekly changes in and around Perl 6: 2018.36 Normality Returns

Published by liztormato on 2018-09-03T15:36:30

A little later than originally anticipated, but Rakudo has had another compiler release: 2018.08, thanks to Samantha McVey (MoarVM release) and Aleks-Daniel Jakimenko-Aleksejev. After it turned out impossible to make 2018.07 stable enough for a release, it was decided to scrap that release. The 2018.08 release had a few stability issues as well, but these were all fixed in the past week. This means we’re ready for a new round of optimizations to be tried and tested!

London Perl Workshop on 24 November

Due to a planning conflict, most notably with freenode #live, the London Perl Workshop has been moved from 3 November to 24 November. You can still submit your Perl 6 related presentation!

“Learning Perl 6” Available

brian d foy tells us that Learning Perl 6 is now available as ebook in various formats. Some reviews are already available. If you contributed to the Kickstarter, you should have been notified by now.

More about “AAA” .. “ABS” Strangeness

Jo Christian Oterhals continued his quest on finding the difference in behaviour of "AAA" .. "ABS" between Perl 5 and Perl 6 (FaceBook comments).

Containers in Perl 6

Elizabeth Mattijsen had part 3 of her series on the differences between Perl 5 and Perl 6 published on (Reddit comments).

Faster FASTA

Timo Paulssen got triggered by a question on StackOverflow, did some research which resulted in an excellent blog post (Reddit comments).

Awesome and Fascinating

Those were the words that Mr. Spaz used to describe a presentation about Perl 6 that JJ Merelo gave at the ZipRecruiter offices to the Los Angeles Perl Mongers: There’

One Year Of Squashathon

Although the past weekend’s documentation Squashathon wasn’t such a big success, it was the 13th Squashathon in a row. Which means we have now over a year of Squashathons! Kudos again to Aleks-Daniel Jakimenko-Aleksejev for setting all of this in motion!

Core Developments

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on perl6-users

Perl 6 in comments

Perl Modules

New Modules:

Updated Modules:

Winding Down

It’s good to be back home and be able to write the Perl 6 Weekly in a familiar surrounding for a change. Next week’s Perl 6 Weekly will be coming from somewhere near Switzerland, while returning from the Swiss Perl Workshop. See you next week!

my Timotimo \this: Faster FASTA, please

Published by Timo Paulssen on 2018-08-31T16:52:11

Faster FASTA, please

The other day on Stack Overflow: User Beuss asked "What is the best way for dealing with very big files?". The question features the task of parsing a FASTA file, which is a common file format in Bioinformatics. The format is extremely simple. It's based on lines, where a line starting with a > gives an identifier which is then followed by any number of lines containing protein sequences or nucleic acid sequences. The simplest you'll see will contain the letters A, C, G, and T, though there are far more letters that have meaning, and extra symbols. The "parsing" part of the question was limited to efficiently splitting the file up in its individual sequences and storing each sequence in a hash keyed on its identifier.

The end of the question asks, specifically, "Did I reach the maximum performances for the current version of Perl6 ?". This immediately caught my eye, of course.

In this post I'd like to take you through my process of figuring out the performance characteristics and potential gains for a program like this.

Let's start with the original code suggested by Beuss and make a few tiny modifications: The original Stack Overflow code was lacking the class seq, but as far as I could tell it needed nothing but two attributes. There was a line that would output every id it came across, but I/O like that is really slow, so I just removed it. I also added very simple timing code around the three stages: Slurping, splitting, and parsing. Here's the result:

my class seq {
  has $.id;
  has $.seq;
my class fasta {
  has Str $.file is required;
  has %!seq;

  submethod TWEAK() {
    my $id;
    my $s;
    my $now = now;

    say "Slurping ...";
    my $f = $!file.IO.slurp;
    say "slurped in { now - $now }";
    $now = now;

    say "Splitting file ...";
    my @lines = $f.split(/\n/);
    say "split in { now - $now }";
    $now = now;

    say "Parsing lines ...";
    for @lines -> $line {
      if $line !~~ /^\>/ {
          $s ~= $line;
      else {
        if $id.defined {
          %!seq{$id} = => $id, seq => $s);
        $id = $line;
        $id ~~ s:g/^\>//;
        $s = "";
    %!seq{$id} = => $id, seq => $s);
    say "parsed in { now - $now }";

sub MAIN()
    my $f = => "genome.fa");

And let's generate an example genome.fa file to test it out with. This one-liner will give you a genome.fa file that's 150_000_200 characters long, has 2_025_704 lines in total, 192_893 of which are lines with identifiers, and the remaining 1_832_811 lines are sequence lines with 80 characters each.

perl6 -e 'srand(2); my $f = "genome.fa"; while $f.tell < 150_000_000 { $f.put(">" ~ flat("A".."Z", "a".."z", "0".."9", "_", "-").roll((5..7).pick).join); $f.put(<A C G T>.roll(80).join()) for ^(3..16).pick }'

This script has not been optimized for performance at all ;)

Okay, we're just about ready to go with this. Let's have a look at how long only the first two stages take by just hitting Ctrl-C after it outputs "Parsing lines ...":

Slurping ...
slurped in 1.4252846
Splitting file ...
split in 30.75685953
Parsing lines ...

Huh. That's pretty darn slow, isn't it? 67k lines split per second? We should really be able to do better than that. Let's zoom in on the slurping and splitting:

say "Slurping ...";
my $f = $!file.IO.slurp;

say "Splitting file ...";
my @lines = $f.split(/\n/);

My experience with Rakudo has taught me many times that currently our regexes are much more expensive than they have to be. Even though this regex is extremely simple, the regex engine is currently an all-or-nothing deal.

Let's use the built-in method lines on Str instead and see how that fares:

Slurping ...
slurped in 1.4593975
Splitting file ...
split in 2.9614959
Parsing lines ...
parsed in 32.9007177

Cool, that's already a 10x as much performance for just the splitting! If I had let the program run to completion before, the whole program's run time would have been 50% slurping and splitting and 50% parsing. But if you look at the parsing part, there's two more regexes in there, too:

for @lines -> $line {
  if $line !~~ /^\>/ {
      # ...
  else {
    # ...
    $id ~~ s:g/^\>//;
    $s = "";

Can we do the same thing without regex here, too? Sure! $line !~~ /^\>/ is equivalent to the much, much faster not $line.starts-with(">"), and since we already know in that branch of the if statement that the line starts with > we can replace $id ~~ s:g/^\>// with just $id .= substr(1). Let's see what happens to the performance now:

Slurping ...
slurped in 1.463816
Splitting file ...
split in 2.9924887
Parsing lines ...
parsed in 3.8784822

Cool. It's about 8.5x as much speed. In total, it used to take about 1m10s, now it takes 8.6s.

Second implementation

Let's switch gears for a bit. Before I came into the discussion, Stack Overflow user Christoph already came up with a good answer. They also immediately had the instinct to cut out the regex/grammar engine to get a speed-up. The first suggested piece of code looks like this:

my %seqs = slurp('genome.fa', :enc<latin1>).split('>')[1..*].map: {
    .[0] => .[1..*].join given .split("\n");

It works like this: It splits the whole file by the > character. Now every chunk after the split is a string consisting of the ID line and all FASTA sequence lines that come after it and before the next ID line - except of course if there's a > in the middle of some line. [1]

Since the file itself starts with a > character [2] we have to skip the very first entry, as it would just be an empty string. The code does that with array slice syntax [1..*]. Then in the block it splits the individual strings that each start with the ID line followed by the sequence data into lines.

I like this answer a lot. It's short and sweet, but it's not golfed to the point of being unreadable. Let's see how it performs!

time perl6 christoph-fasta.p6 
38.29user 0.53system 0:38.85elapsed 99%CPU (0avgtext+0avgdata 1040836maxresident)k

Whoops, that's very slow compared to our optimized code from above! Since this program is mostly methods from built-in classes, we'll most likely have to find more efficient versions of what we've got in the code right now.

The slurp and split invocations in the beginning are probably as fast as we can get, but what about using [1..*] to skip the first element?

split returns a Seq object, which is the Perl 6 user's way to work with iterators. One important feature of Seq is that it throws away values after they have been consumed. However, if you use array accesses like [1], [4, 5, 2, 1] it can't do that. The code doesn't know if you're going to have lower indices later in the list, so writing that last example would lead to an error. So it caches the values - literally by calling the cache method on the Seq.

Surely there's a way to skip a single element without having to memoize the resulting list? Turns out that there is: The skip method is one of the few methods on the Seq class itself! Let's go ahead and replace the first [1..*] with a call to skip. Another thing we can do is replace .[0] with .head and the other .[1..*] with a .skip(1) as well. For these to work we'll have to add our own .cache call on the .split, though. Here's the code we end up with:

my %seqs = slurp('genome.fa', :enc<latin-1>).split('>').skip(1).map: {
    .head => .skip(1).join given .split("\n").cache;

And here's the run time:

time perl6 christoph-fasta-no-circumfix.p6 
12.18user 0.57system 0:12.79elapsed 99%CPU (0avgtext+0avgdata 1034176maxresident)k

That's already better, but no-where near where I'd like it to be. However, I couldn't yet come up with a way to make this variant any faster.

Third Implementation

Stack Overflow user Christoph also had a second implementation in their answer. It's based on finding the next interesting character with the .index method on strings. Let's see how it compares! Here's the code in full:

my %seqs;
my $data = slurp('genome.fa', :enc<latin1>);
my $pos = 0;
loop {
    $pos = $data.index('>', $pos) // last;

    my $ks = $pos + 1;
    my $ke = $data.index("\n", $ks);

    my $ss = $ke + 1;
    my $se = $data.index('>', $ss) // $data.chars;

    my @lines;

    $pos = $ss;
    while $pos < $se {
        my $end = $data.index("\n", $pos);
        $pos = $end + 1

    %seqs{$data.substr($ks..^$ke)} = @lines.join;

And a first timing run:

time perl6 christoph-two.p6 
15.65user 0.44system 0:16.05elapsed 100%CPU (0avgtext+0avgdata 1011608maxresident)k

Now that doesn't look too bad. It's already faster than the previous implementation's first version, but a bit slower than the final version I presented just above.

Let's grab a profile with the profiler and see what we can find!

Opening up the routines tab and sorting by exclusive time, we immediately see something rather suspicious:

Faster FASTA, please

Here we can see that the Range construction operator ..^ is responsible for a big chunk of time – almost a third – and the substr method is responsible for almost a quarter. The new method just below that isn't actually interesting, as it's just always called by the ..^ operator, and as such has its time captured in the operator's inclusive time.

So what are we using ..^ for? The code uses the form of substr that passes a Range instead of a start and length argument. If you have a start and an end position like in this case, it's a whole lot nicer to look at. Unfortunately, it seems to suffer from a large amount of overhead.

Let's rewrite the code to use the .substr($start, $amount) form instead. The transformation is very simple:

# becomes
@lines.push($data.substr($pos, $end - $pos));
# and
%seqs{$data.substr($ks..^$ke)} = @lines.join;
# becomes
%seqs{$data.substr($ks, $ke - $ks)} = @lines.join;

And now we can time the result:

time perl6 christoph-two-no-range.p6 
8.34user 0.44system 0:08.72elapsed 100%CPU (0avgtext+0avgdata 1010172maxresident)k

Great result! We've shaved off around 45% of the run time with just our first find!

What else can we do? Let's see if sprinkling some native types would help gain performance for this script. Let's make all the integer variables be typed int, which is a native 64bit integer instead of the potentially infinitely big Int. We can also turn the @lines array into a native string array, saving us one layer of indirection for every entry we have. Here's the full code:

my %seqs;
my $data = slurp('genome.fa', :enc<latin1>);
my int $pos = 0;
loop {
    $pos = $data.index('>', $pos) // last;

    my int $ks = $pos + 1;
    my int $ke = $data.index("\n", $ks);

    my int $ss = $ke + 1;
    my int $se = $data.index('>', $ss) // $data.chars;

    my str @lines;

    $pos = $ss;
    while $pos < $se {
        my int $end = $data.index("\n", $pos);
        @lines.push($data.substr($pos, $end - $pos));
        $pos = $end + 1

    %seqs{$data.substr($ks, $ke - $ks)} = @lines.join;

And here's the timing:

time perl6 christoph-two-no-range-native-int.p6 
6.29user 0.36system 0:06.60elapsed 100%CPU (0avgtext+0avgdata 1017040maxresident)k

Oh, huh. That's not quite an amazing improvement, but let's see if we can push it further by turning $data into a native string! Surely that'll give us a little speed-up?

time perl6 christoph-two-no-range-native-int-str.p6 
7.16user 0.36system 0:07.45elapsed 100%CPU (0avgtext+0avgdata 1017076maxresident)k

Isn't that interesting? Turns out that in order to call a method on a native string, rakudo has to create a temporary Str object to "box" the value into something that can have methods and such. That means that every method call on $data will create one shiny Str object for us. That's not quite what we want ☺

Conveniently, there are also sub forms of index and substr. We can either rewrite the method calls to sub calls and move the invocant (this is how we refer to the thing the method is called on) to be the first argument, or we can use the convenient "use sub but with method syntax" feature Perl 6 has. It looks like $data.&substr($ks, $ke - $ks) and all it does is put the invocant as the first argument of the sub and the rest of the arguments follow.

Unfortunately, there aren't actually candidates for these subs that will take native strings and ints, and so we'll end up with the same problem!

Eliminating boxed objects, Str, Int, Num, and similar things, is actually on the agenda for MoarVM. Recent improvements to the dynamic specializer "spesh" by jnthn have been laying the foundation on top of which improving this situation should be doable.

Illegal Performance Gains

So is this the most we can get? Not quite. That was actually a pun, because there's a thing called NQP, which stands for "Not Quite Perl". It's both a separate language with much stricter rules that rakudo itself is written in, and the namespace under which most of the low-level operations that the VM knows are available. These ops are not part of the Perl 6 Language Specification, and the rakudo developers do not guarantee that any code you write using NQP ops will continue working on newer versions of rakudo.

What it does allow us to do is find out where the performance ceiling is, roughly. I'll first write the code to use NQP ops, and then I'll explain what I mean by that.

use nqp;

my Mu $seqs := nqp::hash();
my str $data = slurp('genome.fa', :enc<latin1>);
my int $pos = 0;

my str @lines;

loop {
    $pos = nqp::index($data, '>', $pos);

    last if $pos < 0;

    my int $ks = $pos + 1;
    my int $ke = nqp::index($data, "\n", $ks);

    my int $ss = $ke + 1;
    my int $se = nqp::index($data ,'>', $ss);

    if $se < 0 {
        $se = nqp::chars($data);

    $pos = $ss;
    my int $end;

    while $pos < $se {
        $end = nqp::index($data, "\n", $pos);
        nqp::push_s(@lines, nqp::substr($data, $pos, $end - $pos));
        $pos = $end + 1

    nqp::bindkey($seqs, nqp::substr($data, $ks, $ke - $ks), nqp::join("", @lines));
    nqp::setelems(@lines, 0);

Let's go through it piece by piece. The first line is new, it's use nqp;. It's a synonym for use MONKEY-GUTS; which is a bold declaration meaning in essence "I know what I'm doing, and I deserve whatever I've got coming to me".

We'll use a low-level hash object taken from the nqp world by binding to a scalar variable. We use the type constraint Mu here, because nqp types aren't part of the Perl 6 type hierarchy, and thus will not go through a type check for Any, which is the default type constraint for scalar variables. Also, it does not do the Associative role, which is why we can't bind it to a variable with a % sigil.

Next, we'll pull out the @lines array, so that we don't have to allocate a new one for every round through the loop. We don't have to use nqp::list_s() here like with the hash, because the native string array you get from my str @foo has barely any overhead if we use nqp ops on it rather than methods.

I've removed usage of the // operator, though I am not actually sure how much overhead it has.

The signature of nqp::index is the same as the three-argument sub version of index, and the same is true for the nqp::substr op. There's also an nqp::join op that will only accept a native (or literal) string as first argument and a native string array as the second argument.

You'll also notice that the $end variable is now outside of the inner loop. That has a relatively simple reason: A block that introduces lexical variables cannot be inlined into the outer block. That means that the inner block has to be invoked as a closure, so that it has access to all of the relevant variables. This adds the combined overhead of invoking a code object and of taking a closure. The Garbage Collector has to sweep all of those closures up for us. It'll be best not to generate them in the first place.

We use the nqp op nqp::push_s to add to the @lines array because the regular nqp::push op works with "objects", rather than native strings.

Then there's something that has no corresponding piece of code in the previous version: nqp::setelems(@lines, 0). Since we keep the @lines array around instead of building a new one every time, we have to empty it out. That's what nqp::setelems does, and it's very cheap.

A profile of this code tells us that all that's left being allocated is Str objects, exactly 192_899 of them. This comes from the fact that the hash wants to store objects, not native strings.

Let's see what the run time is!

time perl6 christoph-two-no-range-native-nqp.p6 
2.04user 0.33system 0:02.27elapsed 104%CPU (0avgtext+0avgdata 1004752maxresident)k

Whew! Our fastest implementation so far took 6.6s, now we're down to 2.3s, which is close to a third of the time the second-fastest version takes.

What's a "performance ceiling"?

Everything our code does will at some point end up using nqp:: ops to actually do work. Have a look at the substr method of Str, the index method of Str, the push method of strarray, and the ASSIGN-KEY method of Hash, that sits behind postcircumfix:<[ ]>. In between our code and these ops there are often multiple layers of methods that make things more comfortable to work with, or check values for validity.

Rakudo's static optimizer already works towards simpler code consisting of fewer calls. For example it would replace a call to infix:<+> with a low-level nqp::add_i op if it knows only native ints are involved. MoarVM's dynamic specializer has a lot more knowledge to work with, as it watches the code during execution, and can speculatively inline sub and method calls. After inlining, more optimizations are available, such as removing boxing that was necessary for passing arguments into, and returning the result out of the routine – this is currently being worked on!

If the MoarVM specializer were flawless, it ought to be able to generate the equivalent of what I coded up by hand. It will not do something as bold as keeping that one array around between rounds and just clearing it at the right moment, as it is currently not able to prove that the array doesn't get stashed away somewhere. But all in all, most of what I did should be achievable with more intelligence in the specializer.

The word "performance ceiling" is still not quite accurate, though. There's still a lot of optimization potential in the JIT compiler, for example. Bart Wiegmans just blogged about a benchmark where recent improvements to MoarVM got us to the point where the code was only 30% slower than an equivalent implementation in C. That was mostly due to the focus of the code being floating point operations, which likely take so long individually that imperfect code-gen is less of a problem.

But this is about the most we can get from the current version of rakudo, unless we find a better algorithm to do what we want.

The Elephant in the RAM

One thing that you've surely noticed is that this program uses about one gigabyte of memory at its highest point (that's what "maxresident" means). Sadly, this is a property of MoarVM's string implementation. In order for grapheme-level access to be fast ("linear time"), we upgrade strings to 32 bits per grapheme if needed, rather than storing strings as utf8 internally. Our strings also support 8 bits per character storage, which I had expected to be used here, but something in the machinery upgrades the string data from 8 bits to 32 bits, even though all character values ought to fit.

In the medium to far future, we'll also get strings that sacrifice linear time access for storage efficiency, but we're not at that point just yet.

Is there something else we could do to get around this? Sure! Instead of saving the string we cut out of the source file with substr and concatenated with join, we could save the start and end value of every piece of string in a native int array. We could implement a Hash subclass or compose in a role that grabs the data from the source file whenever the user asks for it. Native int arrays are much faster to GC than string arrays, and if you instead hold an index into a single giant int array in the hash, you can reduce the pressure on the GC even further!

That's a task for another post, though, as this one rapidly approaches 4k words.

Yet another task would be to write the same program in perl 5, python, and/or ruby. It should be interesting to compare performance characteristics among those. Surely our fastest code is still slower than at least one of these, but having a target in sight could help figure out which parts exactly are slower than they should be.

I personally don't code any ruby or perl 5, so I'd be happy if someone would contribute those implementations!

Parting QWORDS

Thanks for sticking with me for the duration of this huge post, and thanks to raiph for requesting this article. It may have been a bit more work than I had anticipated, but it was fun, and hopefully it is interesting for you readers!

And here's the QWORDS I promised: 0x7fffffffd950, 0x7ffff3a17c88, and 0x7ffff62dd700.

Have a good time, and see you in the next one!
  - Timo

  1. This ought to be a relatively simple fix. Split by \n> instead of >, and handle the very first blob differently, because it now starts with a > still left in it. ↩︎

  2. We're ignoring the possibility of adding comments to the beginning of the file, or anywhere in the file, really. ↩︎

Weekly changes in and around Perl 6: 2018.35 A Quick One From Kilfenora

Published by liztormato on 2018-08-27T23:20:43

Coming to you while yours truly is enjoying some of the Burren, the Cliffs of Moher and Ireland in general: it’s good to meet up with old friends in Kilfenora.

A Glimpse Of The Future

Bart Wiegmans got inspired by one of Ovid‘s benchmarks to write a blog called “A Curious Benchmark“. Short-term it shows that all of the optimization work that Jonathan Worthington and Timo Paulssen have done over the past weeks (which will be merged after the 2018.08 Rakudo Compiler Release) already made that benchmark about 3x as fast in Perl 6. But maybe even more interestingly, the blog post shows the potential of Perl 6 becoming about 10x as fast than Perl 5 (for this particular benchmark at least) and being only 30% slower than compiled C code (Reddit comments).

Cro Middleware Tweak

Jonathan Worthington is inviting comments on a proposal to rework the functionality of before and after in Cro middleware.

Upcoming Conferences

Early September will see two Perl conferences, which are sadly overlapping: the Nordic Perl Workshop and Mojoconf and the Swiss Perl Workshop. Both will have some Perl 6 related presentations:

Doomed to Extinction

A poorly informed blog post by Nick Kowalski titled 5 Programming Languages Doomed to Extinction mentioned Perl 5 and Perl 6 as one of them. This spurred quite a lot of comments on Reddit.

LinkedIn Languages Index

Francesco Nidito took a fresh approach on trying to figure out the popularity of programming languages. An interesting approach that did not lump together Perl 5 and Perl 6 for a change. Yours truly must disagree with one of the points, though. Since I find Perl 6 being used by more and more people I do not know, the following statement appears to be in error:

«Perl 6» is the only language that permits you to know easily all the community 😊

How many between “AAA”..”ABS”?

Jo Christian Oterhals wrote a blog post about an intriguing difference between Perl 5 and Perl 6: Perl 6 small stuff #7. Your truly initially assumed a bug, fixed it then unfixed it after TimToady showed that the behaviour is intentional (although not tested for and not documented). To be continued…

Free all the butterflies

Nigel Hamilton re-kindled the naming debate with A plan for Perl’s branding – let’s free all the butterflies with quite a few Reddit comments.

The Perl Conference in Glasgow

Patrick Spek has written a nice blog post about his experiences at the Perl Conference in Glasgow (Reddit comments).

Core Developments

Trying to summarize core developments for 1 week has become more and more problematic. Doing 3 weeks, it feels nearly impossible to do justice to all the work many people have done. So if your work is not mentioned here, please take that as shoddy work by yours truly, rather than anything else.

Meanwhile on Twitter

Meanwhile on FaceBook

Meanwhile on StackOverflow

Meanwhile on PerlMonks

Perl 6 in comments

Perl 6 Modules

New Modules:

Updated Modules:

Winding Down

The past weeks went by much more quickly than yours truly ever imagined. But on my toes, I will need to be! Aleks-Daniel Jakimenko-Aleksejev found mention of twimbot, that is much like our notable6 bot, but with a twist:

…it either assists the author by making it easier to produce the blog post you’re reading, or it coldly replaces the author…

Hopefully it will not get to that point!. Please check in again next week to see whether yours truly really still is writing the Perl 6 Weekly!

brrt to the future: A Curious Benchmark

Published by Bart Wiegmans on 2018-08-25T15:57:00

Hi hackers! I recently saw a curious benchmark passed round on the #moarvm channel. (I understand that this originates from Ovid during his Future of Perl 5 and 6 presentation, but I haven't seen it myself, so if I'm wrong, don't hesitate to correct me). It's curious because it runs on perl5 and perl6, and because the difference is… significant:

my $x = 0;
$x += 1/$_ for 1..50_000_000;
print "$x\n";

Perl 5 (5.24.1), on my laptop (Intel i7-4700HQ CPU @ 2.40GHz, 16GB ram), takes approximately 2.7s to print the value 18.3047492382933. Perl 6, on the same script, takes just shy of 2m18s, or nearly 140s. This is 51 times as much. Perl 5 is not known as a particularly fast language, but in this case perl 6 is really very slow indeed.

Let's try and do a little better. As pointed out by timotimo, the benchmark above is a pessimal case for rationals numbers, which perl 6 uses by default. Perl 5 uses floating point throughout. So we can do better by explicitly using floating point calculations in perl6:

# reciprocal.pl6
my $x = 0e0;
$x += 1e0/$_.Num for 1..50_000_000;
say $x;

This takes approximately 30s on my machine, approximately 5 times faster, but still over 11 times slower than perl5. (NB: for all these numbers I'm posting here, I didn't run exhaustive tests and calculate the statistics, but I feel like this is reliable).

We can typically avoid some overhead in Perl 6 if we avoid using scalar containers by means of binding. We can avoid the dynamic lookup of $_ by replacing the for with a while loop. And we can skip the cast from Int to Num by using a Num iterator value. That gives us the following code:

# reciprocal-while.pl6
my $x := 0e0;
my $i := 0e0;
while (($i := $i + 1e0) < 5e7) {
$x := $x + 1e0/$i;
say $x;

This reduces the run time to approximately 26.5s. So instead of well over 11 times slower than perl 5, perl 6 is now a little less than 10 times slower.
I tried using native types, but that increased the run time to up to 36s. Native type performance (except for native typed arrays) has so far not met expectations for perl 6. (I understand that this is due to excessive boxing and unboxing, unfortunately). So it seems that I failed to make a perl6 program that performs comparably to perl 5.

And yet…

With all due respect to the perl 5 (and perl 6) developers, I think MoarVM ought to be able to do better. MoarVM has support for native-typed values and operators, the perl interpreter does not. It has a dynamic specializer and JIT compiles to native code, the perl interpreter does not. I should expect MoarVM to do better on this benchmark. (Otherwise, what have I been doing with my life, the last few years?)

Let's try the secret weapon: NQP. NQP stands for Not Quite Perl and it is the language the rakudo compiler is built in. It acts like a 'bootstrap' language and compatibility layer between the various backends of rakudo perl 6, such as MoarVM, Java and Javascript, and in the near future Truffle. I like to think that it relates to MoarVM in the same way that C relates to contemporary CPUs - it is a sort of low level, high level language. Although NQP fully support perl6 classes, regular expressions and grammars, it has no support for ranges or C-style for loops, so it does tend to look a bit primitive:

# reciprocal-boxed.nqp
my $x := 0e0;
my $i := 1e0;
while ($i < 5e7) {
$x := $x + 1e0/$i;
$i := $i + 1e0;

This version uses boxed objects and takes (on my machine) approximately 12.5s. If you recall, perl 5 took 2.7s, so this is just a bit less than 5 times slower than perl 5. Still not very satisfying though.

Let's improve this version a little bit and add native types here. The only difference between this code and the code above it, is that we have explicitly opted to use num values for $x and $i.

# reciprocal-native.nqp
my num $x := 0e0;
my num $i := 1e0;
while ($i < 5e7) {
$x := $x + 1e0/$i;
$i := $i + 1e0;

This code, with JIT enabled, consistently runs in approximately 0.28s on my machine. That is not a typing error. It prints the correct result. I emphatically want you to try this at home: simply save it as reciprocal.nqp and run time nqp reciprocal.nqp. (With JIT disabled, it runs in 2.4s, which is (finally) a bit faster than perl 5 in the interpreter).

Just out of curiosity, I tried comparing this result with the following C code:
#include <stdio.h>

int main(int argc, char **argv) {
double x = 0.0;
double i;
for (i = 1.0; i < 5e7; i += 1.0)
x += 1.0/i;
printf("%f", x);

On my machine, this takes approximately 0.22s per run, which means that NQP on MoarVM, using native types, is a little more than 1.3x slower than compiled C, excluding the compilation time of the C program. The NQP program does include JIT compilation.

For the record, the C compiled code is much simpler and faster than that generated by the MoarVM JIT - 75 bytes for C 'main' vs 1139 bytes for the NQP 'mainline' (149 bytes for the hot loop). So there is much to improve left, for sure. (EDIT: I originally wrote this in a rather confusing way, but the C code was always the shorter version).

So what does that tell us? Is perl6 50 times slower than perl5, or is NQP just 30% slower than C? There is actually much more to learn from the relative performance of these programs:
After I wrote this (but before publishing), I reran the benchmarks with the new branch postrelease-opts. The results were rather different and encouraging (units in seconds):


Congratulations to jnthn++ and timotimo++ for this fantastic result. As I understand it, we expect to merge this branch after the release of MoarVM 2018.08.

As always, benchmarks should be taken with a grain of salt - results do not always generalize cleanly to production code. But I hope this has given some insights into the performance characteristics of perl 6, and I think it highlights some areas of interest.

PS. The postrelease-opts has taken the place of the masterbranch as the mainline for development, as we aim to get master fit for a release. Personally, I'd rather have master open for development and a release branch for cutting a stable release. I expect most users get MoarVM from a specific revision via MOAR_REVISION anyway, so having master be the 'development' branch should cause little harm.

Weekly changes in and around Perl 6: 2018.34 A Quick One From Tyndrum

Published by liztormato on 2018-08-20T23:21:36

While enjoying some touristic activities in the Northern parts of Scotland (well, North of Glasgow anyway), this Perl 6 Weekly comes from the town of Tyndrum. According to the Wikipedia article:

Thus unusually there are two stations serving the same small village, only a few hundred yards apart, but about 10 miles (16 km) apart by rail.

Which yours truly finds oddly descriptive of the situation in the Perl community: so close, yet seen as so far apart by some.

The Perl Conference in Glasgow

And what an excellent conference it was! And live-streamed as well all over the world. Some blog reports have already surfaced, all of them referencing Perl 6 in some form or another:

By next week, there will be separate videos for each presentation. Until then, you will have to do with the raw recorded live-streams.

Alas, not all things where hunky dory: Mark Keating issued a formal apology for things that had gone wrong (Reddit comments).

Other Blog Posts

Core Developments

Alas, too much R&R to dive into this with the appropriate amount of depth and precision. Hopefully next week a 3-week overview.

Meanwhile on Twitter

Meanwhile on FaceBook

Meanwhile on StackOverflow

Meanwhile on PerlMonks

Meanwhile on perl6-users

Perl 6 in comments

Perl 6 Modules

New Modules:

Updated Modules:

Winding Down

Having done a whole-day workshop + a last-minute presentation has taken a lot of energy from your truly. Good thing the weather cooperated for a time by being typical late-summer weather for Glasgow. Please check in again next week for more Perl 6 news!

my Timotimo \this: The first public release!

Published by Timo Paulssen on 2018-08-15T14:52:11

Hello esteemed readers, and thank you for checking in on my progress. Not a full month ago I showed off the GC tab in the previous post, titled "Wow check out this garbage". Near the end of that post I wrote this:

I had already started the Routine Overview tab, but it's not quite ready to be shown off. I hope it'll be pretty by the time my next report comes out.

Well, turns out I got a whole lot of work done since then. Not only is the Routine tab pretty, but there's also a Call Graph explorer beside it. This post has the details, and at the end I'll give you the link to the github repository where you can get the code and try it out for yourself!

Routine Tab

Routine Overview

Now, the simplest part of the Routine tab is its overview function. Every routine that has been called while the profiler was recording shows up here. Here's a screenshot so you can understand what I'm referring to:


For comparison, here's the old profiler's Routine Tab


There's actually a lot of differences. Going from the old to the new, the Interp / Spesh / Jit column has disappeared, and with it the OSR badges. Also, the table headers are no longer clickable (to change the sorting order) and there is no filter search field. Both of these features will make a come-back in the new profiler's UI as well, though!

There are also additions, though: There's now a column labelled "Sites", some of the filename + line texts are clickable (they take you directly to github, though opening the files locally in an editor is a feature on my wish-list), and there's a column of mysterious buttons. On top of that, you can now see not only the exclusive / inclusive times for each routine, but also how much time that is when divided by the number of entries.

I wonder what happens when I click one of these!



Neat, clicking the button expands an extra section below the routine. It has three tabs: Callees, Paths, and Allocations. Let's go through them one by one.


Listed here are all routines that got called by the parent routine (in this case ACCEPTS from the Regex source file). They are ordered by inclusive time, as opposed to the outer list which is ordered by exclusive time. [1]

Since there is now a parent/child relationship between the routines, there's also the number of entries per entry in the entries column. That's simply the entries of the child divided by the entries of the parent. This number can tell you how often another routine is mentioned, or what the probability for the child being called by the parent is.

Paths, and what's this about "Sites"?


The next tab you can open in the expanded view is called "Paths". Here you can see a vaguely tree-shaped table with four rows. These rows actually correspond to the Sites column that I have not explained yet. That's because the number of Sites corresponds directly to the number of rows in this table, or the number of leafs in the tree it displays.

The same Routine can behave in different ways depending on where it was called from. Normally, a difference in arguments passed is the main cause of different behaviour, but it's very likely that each unique location in the program will call the Routine the same way every time. Such a "location" is sometimes called a "Call Site", i.e. the site where a call resides. A Site in the profiler's nomenclature refers to one specific path from the outermost routine of the profile to the given Routine. In the screenshot above, all paths to ACCEPTS go through to-json either once, twice, or three times. And every path goes through str-escape.

The names in the table/tree (trable?) are all clickable. I can tell you right now, that they bring you right over to the Call Graph. More on that later, though.



This is a small one, at least for ACCEPTS. It has one row per type, and splits the number of objects created for each type into before spesh and after spesh/jit. Coincidentally, the ACCEPTS method is already a good example for having the split: BOOTHash is just a lower-level hash class used in some internals. Notably, BOOTHash is used to pass named arguments to methods. Of course, many method invocations don't actually pass named arguments at all, and many methods don't care about named arguments either. Thankfully, spesh is competent at spotting this situation and removes all traces of the hash ever existing. The Scalar on the other hand seems to be used, so it stays even after spesh optimized the code.

There's also a Sites column here. This lets you spot cases where one or two sites differ strikingly from the others.

Call Graph Tab


Here's the new Call Graph tab. It's similar to the old one with one major omission. The new version of the call graph explorer currently lacks a flame graph (or icicle graph in this case). It will return later, of course.

Until then, there's a few improvements to be enjoyed. One of them is that the breadcrumbs navigation now works reliably, whereas in the previous profiler it tended to lose elements near the beginning or in the middle sometimes. On top of that, your browser's back and forward buttons will work between nodes in the call graph as well as the different tabs!

Something that's completely new is the Allocations section at the bottom of the page, shown in the screenshot below:


Here you can see that the Routine in question (it's the body of this for loop allocates Str and Scalar objects. The Str objects seem to be optimized by spesh again.

There's still two unexplained buttons here, though. The one at the bottom is labelled "Load inclusive allocations". Clicking on it reveals a second table of allocations, which is quite a lot bigger:


This view is about everything allocated by anything in the call graph from the current node downwards (towards the leaves, not towards the root. You know, because trees hang from the ceiling, right?). For something so close to the root in a rather deep call graph, you'll get quite a big list, and it's not very helpful in finding where exactly individual types are being allocated.

That's where the second button comes in. It says "Show allocations for all children" on it, and clicking it expands every row in the table of routines:


This way you can drill down from the root towards nodes that interest you.

What's missing?

Here's a little overview of what I'd like to include in the near and medium future:

Where can I get it?

I just uploaded all the code to my github. You can find it here. To use it, you will have to run npm install in the source folder to get all the javascript dependencies, and npm run build to compile all the javascript bundles. It will run a watcher process that will update as soon as any sources change, but if you're not working on the moarperf source code, you can just kill it after it has done its thing once.

Next, you'll have to install the dependencies of the Perl 6 program that acts as the back-end. You can do that with zef --depsonly install in the same folder as the META6.json. Please note that I haven't prepared the app for actually being installed, so don't do that yet :)

You can then start the backend with perl6 -Ilib service.p6 /path/to/profile.sql, where profile.sql is a profile generated with a commandline like perl6 --profile --profile-filename=profile.sql my_script.p6. Passing the filename to service.p6 on the commandline is optional, you can also enter it in the web frontend. By default, it is reachable on http port 20000 on localhost, but you can set MOARPERF_HOST and MOARPERF_PORT in the environment using the env or export commands of your shell.

A word of warning, though: It still has a bunch of rough edges. In some parts, loading data isn't implemented cleanly yet, which can lead to big red blocks with frowny faces. A refresh and/or doing things more slowly can help in that case. There's places where "there is no data" looks a lot like "something broke". Little usability problems all around.

I would appreciate a bit of feedback either on the github issue tracker, on #perl6 on the freenode IRC server, on reddit if someone's posted this article in /r/perl6, or via mail to the perl6-users mailing list or to timo at this website's domain name.

Thanks again to The Perl Foundation for funding my grant, and to you for reading my little report.

Have fun with the program!
  - Timo

  1. The reasoning behind the different default sorting modes is that in the first step you are likely more interested in routines that are expensive by themselves. Sorting by inclusive time just puts the entry frame first, and then gives you the outermost frames that quite often hardly do any work themselves. When you've found a routine that has a lot of exclusive time, the next step is - at least for me - to look at what routines below that take up the most time in total. That's why I prefer the inner routines list to be sorted by inclusive time. Once the user can change sorting in the UI i'll also offer a way to set the selected sorting as the default, i think. ↩︎

  2. The difference between managed and unmanaged bytes is that the managed bytes are what lands in the actual nursery, whereas unmanaged bytes refers to all kinds of extra data allocated for an Object. For example, an Array, Hash, or String would have a memory buffer somewhere in memory, and a header object pointing to that buffer in the nursery or gen2 memory pools. The managed size doesn't change for objects of the same type, which is why it's fine to put them in the allocations tabs. ↩︎

Zoffix Znet: The 100 Day Plan: The Update on Perl 6.d Preparations

Published on 2018-08-09T00:00:00

Info on how 6.d release prep is going Rakudo Star Release 2018.06

Published on 2018-08-06T00:00:00

Zoffix Znet: Introducing: Perl 6 Marketing Assets Web App

Published on 2018-08-05T00:00:00

Get your Perl 6 flyers and brochures

Zoffix Znet: Introducing: Newcomer Guide to Contributing to Core Perl 6

Published on 2018-08-02T00:00:00

Info on the new guide for newcomers

Zoffix Znet: Newcomer Guide to Contributing to Core Perl 6

Published on 2018-08-02T00:00:00

How to start contributing to Rakudo Perl 6 compiler

Zoffix Znet: Talk Slides and Recording: "Faster Perl 6 Programs"

Published on 2018-07-29T00:00:00

Tips and tricks for better performance in Perl 6

6guts: Redesigning Rakudo’s Scalar

Published by jnthnwrthngtn on 2018-07-26T23:54:45

What’s the most common type your Perl 6 code uses? I’ll bet you that in most programs you write, it’ll be Scalar. That might come as a surprise, because you pretty much never write Scalar in your code. But in:

my $a = 41;
my $b = $a + 1;

Then both $a and $b point to Scalar containers. These in turn hold the Int objects. Contrast it with:

my $a := 42;
my $b := $a + 1;

Where there are no Scalar containers. Assignment in Perl 6 is an operation on a container. Exactly what it does depending on the type of the container. With an Array, for example, it iterates the data source being assigned, and stores each value into the target Array. Assignment is therefore a copying operation, unlike binding which is a referencing operation. Making assignment the shorter thing to type makes it more attractive, and having the more attractive thing decrease the risk of action at a distance is generally a good thing.

Having Scalar be first-class is used in a number of features:

And probably some more that I forgot. It’s powerful. It’s also torture for those of us building Perl 6 implementations and trying to make them run fast. The frustration isn’t so much the immediate cost of the allocating all of those Scalar objects – that of course costs something, but modern GC algorithms can throw away short-lived objects pretty quickly – but also because of the difficulties it introduces for program analysis.

Despite all the nice SSA-based analysis we do, tracking the contents of Scalar containers is currently beyond that. Rather than any kind of reasoning to prove properties about what a Scalar holds, we instead handle it through statistics, guards, and deoptimization at the point that we fetch a value from a Scalar. This still lets us do quite a lot, but it’s certainly not ideal. Guards are cheap, but not free.

Looking ahead

Over the course of my current grant from The Perl Foundation, I’ve been working out a roadmap for doing better with optimization in the presence of Scalar containers. Their presence is one of the major differences between full Perl 6 and the restricted NQP (Not Quite Perl), and plays a notable part in the performance difference between the two.

I’ve taken the first big step towards improving this situation by significantly re-working the way Scalar containers are handled. I’ll talk about that in this post, but first I’d like to provide an idea of the overall direction.

In the early days of MoarVM, when we didn’t have specialization or compilation to machine code, it made sense to do various bits of special-casing of Scalar. As part of that, we wrote code handling common container operations in C. We’ve by now reached a point where the C code that used to be a nice win is preventing us from performing the analyses we need in order to do better optimizations. At the end of the day, a Scalar container is just a normal object with an attribute $!value that holds its value. Making all operations dealing with Scalar container really be nothing more than some attribute lookups and binds would allow us to solve the problem in terms of more general analyses, which stand to benefit many other cases where programs use short-lived objects.

The significant new piece of analysis we’ll want to do is escape analysis, which tells us which objects have a lifetime bounded to the current routine. We understand “current routine” to incorporate those that we have inlined.

If we know that an object’s usage lies entirely within the current routine, we can then perform an optimization known as scalar replacement, which funnily enough has nothing much to do with Scalar in the Perl 6 sense, even if it solves the problems we’re aiming to solve with Scalar! The idea is that we allocate a local variable inside of the current frame for each attribute of the object. This means that we can then analyze them like we analyze other local variables, subject them to SSA, and so forth. This for one gets rid of the allocation of the object, but also lets us replace attribute lookups and binds with a level of indirection less. It will also let us reason about the contents of the once-attributes, so that we can eliminate guards that we previously inserted because we only had statistics, not proofs.

So, that’s the direction of travel, but first, Scalar and various operations around it needed to change.

Data structure redesign

Prior to my recent work, a Scalar looked something like:

class Scalar {
    has $!value;        # The value in the Scalar
    has $!descriptor;   # rw-ness, type constraint, name
    has $!whence;       # Auto-vivification closure

The $!descriptor held the static information about the Scalar container, so we didn’t have to hold it in every Scalar (we usually have many instances of the same “variable” over a programs lifetime).

The $!whence was used when we wanted to do some kind of auto-vivification. The closure attached to it was invoked when the Scalar was assigned to, and then cleared afterwards. In an array, for example, the callback would bind the Scalar into the array storage, so that element – if assigned to – would start to exist in the array. There are various other forms of auto-vivification, but they all work in roughly the same way.

This works, but closures aren’t so easy for the optimizer to deal with (in short, a closure has to have an outer frame to point to, and so we can’t inline a frame that takes a closure). Probably some day we’ll find a clever solution to that, but since auto-vivification is an internal mechanism, we may as well make it one that we can see a path to making efficient in the near term future.

So, I set about considering alternatives. I realized that I wanted to replace the $!whence closure with some kind of object. Different types of object would do different kinds of vivification. This would work very well with the new spesh plugin mechanism, where we can build up a set of guards on objects. It also will work very well when we get escape analysis in place, since we can then potentially remove those guards after performing scalar replacement. Thus after inlining, we might be able to remove the “what kind of vivification does this assignment cause” checking too.

So this seemed workable, but then I also realized that it would be possible to make Scalar smaller by:

This not only makes Scalar smaller, but it means that we can use a single guard check to indicate the course of action we should take with the container: a normal assignment, or a vivification.

The net result: vivification closures go away giving more possibility to inline, assignment gets easier to specialize, and we get a memory saving on every Scalar container. Nice!

C you later

For this to be really worth it from an optimization perspective, I needed to eliminate various bits of C special-case code around Scalar and replace it with standard MoarVM ops. This implicated:

The first 3 became calls to code registered to perform the operations, using the 6model container API. The second two cases were handled by replacing the calls to C extops with desugars, which is a mechanism that takes something that is used as an nqp::op and rewrites it, as it is compiled, into a more interesting AST, which is then in turn compiled. Happily, this meant I could make all of the changes I needed to without having to go and do a refactor across the CORE.setting. That was nice.

So, now those operations were compiled into bytecode operations instead of ops that were really just calls to C code. Everything was far more explicit. Good! Alas, the downside is that the code we generate gets larger in size.

Optimization with spesh plugins

talked about specializer plugins in a recent post, where I used them to greatly speed up various forms of method dispatch. However, they are also applicable to optimizing operations on Scalar containers.

The change to decontainerizing return values was especially bad at making the code larger, since it had to do quite a few checks. However, with a spesh plugin, we could just emit a use of the plugin, followed by calling whatever the plugin produces.

Here’s a slightly simplified version of the the plugin I wrote, annotated with some comments about what it is doing. The key thing to remember about a spesh plugin is that it is not doing an operation, but rather it’s setting up a set of conditions under which a particular implementation of the operation applies, and then returning that implementation.

nqp::speshreg('perl6', 'decontrv', sub ($rv) {
    # Guard against the type being returned; if it's a Scalar then that
    # is what we guard against here (nqp::what would normally look at
    # the type inside such a container; nqp::what_nd does not do that).
    nqp::speshguardtype($rv, nqp::what_nd($rv));

    # Check if it's an instance of a container.
    if nqp::isconcrete_nd($rv) && nqp::iscont($rv) {
        # Guard that it's concrete, so this plugin result only applies
        # for container instances, not the Scalar type object.

        # If it's a Scalar container then we can optimize further.
        if nqp::eqaddr(nqp::what_nd($rv), Scalar) {
            # Grab the descriptor.
            my $desc := nqp::speshguardgetattr($rv, Scalar, '$!descriptor');
            if nqp::isconcrete($desc) {
                # Has a descriptor, so `rw`. Guard on type of value. If it's
                # Iterable, re-containerize. If not, just decont.
                my $value := nqp::speshguardgetattr($rv, Scalar, '$!value');
                nqp::speshguardtype($value, nqp::what_nd($value));
                return nqp::istype($value, $Iterable) ?? &recont !! &decont;
            else {
                # No descriptor, so it's already readonly. Return as is.
                return &identity;

        # Otherwise, full slow-path decont.
        return &decontrv;
    else {
        # No decontainerization to do, so just produce identity.
        return &identity;

Where &identity is the identity function, &decont removes the value from its container, &recont wraps the value in a new container (so an Iterable in a Scalar stays as a single item), and &decontrv is the slow-path for cases that we do not know how to optimize.

The same principle is also used for assignment, however there are more cases to analyze there. They include:

Vivifying hash assignments are not yet optimized by the spesh plugin, but will be in the near future.

The code selected by the plugin is then executed to perform the operation. In most cases, there will only be a single specialization selected. In that case, the optimizer will inline that specialization result, meaning that the code after optimization is just doing the required set of steps needed to do the work.

Next steps

Most immediately, a change to such a foundational part of the the Rakudo Perl 6 implementation has had some fallout. I’m most of the way through dealing with the feedback from toaster (which runs all the ecosystem module tests), being left with a single issue directly related to this work to get to the bottom of. Beyond that, I need to spend some time re-tuning array and hash access to better work with these changes.

Then will come the step that this change was largely in aid of: implementing escape analysis and scalar replacement, which for much Perl 6 code will hopefully give a quite notable performance improvement.

This brings me to the end of my current 200 hours on my Perl 6 Performance and Reliability Grant. Soon I will submit a report to The Perl Foundation, along with an application to continue this work. So, all being well, there will be more to share soon. In the meantime, I’m off to enjoy a week’s much needed vacation.

my Timotimo \this: Wow, check out this garbage

Published by Timo Paulssen on 2018-07-26T16:58:48

Wow, check out this garbage

Hello everyone! It's been more than a month since the last report, but I've been able to put hours in and get code out. And now I'll show you what's come out of the last weeks.

The Garbage Collector

One important aspect of dynamic languages is that memory is managed for the user. You don't usually see malloc and free in your perl code. But objects are constantly being created, often "in the background". Asking the user to take care of freeing up space again for objects that may have been created and become obsolete in the same line of code sounds like a good recipe for user dissatisfaction.

To take this potential full-time-job off the user's hands, MoarVM employs Garbage Collection, specifically a scheme based on "reachability". Whenever space needs to be freed up, the process begins. Starting from a set of objects that MoarVM happens to know for sure are currently needed, references from one object to another are followed until everything "reachable" has been reached and "marked" for being kept. Afterwards, everything that has been created but not "marked" will be removed in a step commonly called "sweeping". Garbage Collectors following this scheme are called "Mark & Sweep Garbage Collectors". Find a section about this on the "Tracing Garbage Collection" wikipedia article, though MoarVM's garbage collector has a few additional tweaks for performance reasons.

Naturally, going through a large amount of objects consumes time. Seeing what the program spends its time doing is an important part of making it perform better, of course. That's why the MoarVM profiler records data about the Garbage Collector's activities.

GC in the profiler

The old HTML profiler frontend already showed you how many GC runs happened during your program's run. For each time the GC ran you'll see how long the run took, whether it went through the nursery only (a "minor" collection) or everything on the heap (a "major" collection, called "full" in the tools). I recently added a column displaying when the run started in milliseconds from the start of recording. However, the old profiler frontend didn't do anything with information recorded by threads other than the main thread. Here's a screenshot from the old frontend:

Wow, check out this garbage

I'm not quite sure why it says it cleared about 4 gigabytes of data during the first two runs, but that'll go on the pile of things to check out later.

For now, I can use this to explain a few things before I go on to show the first version of the new interface and what it does with multithreaded programs.

The profile was taken from a hypered (i.e. automatically multithreaded) implementation of the fannkuch benchmark. The rightmost column shows how much data was copied over from the nursery to the next nursery, how much was taken into the "old generation" (either called "old" or "gen2"), and how much was freed.

There's also a count of how many references there were from objects in the old generation to objects in the nursery, the "gen2 roots". This happens when you keep adding objects to a list, for example. At some point the list becomes "old" and fresh objects that are inserted into it have to be specifically remembered so that they are considered reachable, even if the old objects aren't fully analyzed.

The new frontend

Wow, check out this garbage

Looking at the next screenshot, which shows the GC section of the new profiler frontend I've been working on, you'll see it looks very different. The existing information is re-organized. Instead of a column of "time taken" with a bar and "time since start", this info is now in simple text columns in addition to bar charts at the top. You can now expand individual GC runs to get at the other information from the previous profiler: The amounts kept, promoted, and freed, as well as the inter-generational roots. However, they are now per-thread, and there's an additional bar chart. It's lacking a legend or title right now, but if it had one, it'd say that it's the individual start and end times of each thread's GC work, in milliseconds. The little asterisk denotes which thread was responsible for kicking off the GC run.

You'll surely notice another difference: There's a whole lot more bars in the top bar chart than there were entries in the GC table in the old profiler frontend. The simplest explanation is that I just took a profile from a different program. However, it was the same code that generated these two profiles. The difference comes from the fact that the old frontend currently only displays the GC runs it finds in the first thread's list. That corresponds to entries in the new list that feature 1 in the "threads" column.

The near future

There's many more things I want to do with this tab, like getting statistics on a per-thread rather than per-run level, and maybe a "participation tally" graph that'd show when which threads participated. And there's details shown by the graphs that I haven't noticed before. For example, what causes some threads to join in near the end, and does that force the other threads to wait for a long time before doing anything at all? The screenshot below is an example of that pattern.

Wow, check out this garbage

The questions raised by this fresh perspective on the data we've already been capturing long ago has already made it worth quickly shoving the GC tab in between the other parts of the frontend. I had already started the Routine Overview tab, but it's not quite ready to be shown off. I hope it'll be pretty by the time my next report comes out.

Thanks to The Perl Foundation for the grant, and masak for this beautiful closing paragraph:

In conclusion, all of the people who oppose my plan for world domination will be treated in the most appropriate manner. I wish nothing but the best for my enemies. May you rest easily at night, and may your futures bloom and burgeon!

- Timo

6guts: Dynamic lookups and context introspection with inlining

Published by jnthnwrthngtn on 2018-07-22T23:27:41

Inlining is one of the most important optimizations that MoarVM performs. Inlining lets us replace a call to some BlockSub, or Method with the code that is inside of it. The most immediate benefit is to eliminate the overhead of calling, but that’s just the start. Inlined code has often already been specialized for a certain set of argument types. If we already have proven those argument types in the caller, then there’s no need to re-check them. Inlining can also expose pairs of operations that can cancel, such as box/unbox, and bring the point a control exception is thrown into the some body of code where it is caught, which may allow the exception throw to be rewritten to a far cheaper goto.

In a language like Perl 6, where every operator is a call to a multiple dispatch subroutine, inlining can be a significant win. In the best cases, inlining can lead to smaller code, because the thing that is inlined ends up being smaller than the bytecode for the call sequence. Of course, often it leads to bigger code, and so there’s limits to how much of it we really want to do. But still, we’ve been gradually pushing on with increasing the range of things that we’re able to inline.

The problem with inlining is that the very call boundaries it does away with may carry semantic significance for the program. In this post, I’ll talk about a couple of operations that became problematic as we ramped up our inlining capabilities, and discuss a new abstraction I recently added to MoarVM – the frame walker – which provides a common foundation for solving the problem.

A little inlining history

Inlining first showed up in MoarVM back in 2014, not too many months after the type-specializing optimizer was added. MoarVM has done speculative optimizations from the start, performing deoptimization (falling back to the interpreter) in the case that an unexpected situation shows up. But what if we had to deoptimize in code that had been inlined? Then we’d have to pretend we never did the inlines! Therefore, MoarVM can uninline too – that is, untangle the results of inlining and produce a call stack as if we’d been running the unoptimized code all along.

MoarVM has also from the start supported nested inlines – that is, inlining things that themselves contained inlines. However, the initial implementation of inlining was restricted in what it could handle. The first implementation could not inline anything with exception handlers, although that was supported within a couple of months. It also could not inline closures. Only subs in the outermost scope or from the CORE.setting, along with simple method calls, were possible to inline, because those were the only cases where we had enough information about what was being called, which is a decided prerequisite for inlining it.

Aside from bug fixes, things stayed the same until 2017. The focus in that time largely switched away from performance and towards the Perl 6.c release. Summer of 2017 brought some very large changes to how dynamic optimization worked in MoarVM, moving optimization to a background thread, along with changing and extending the statistics that were collected. A new kind of call optimization became possible, whereby if we could not prove what we were going to call, but the statistics showed a pattern, then we could insert a guard and speculatively optimize the call. Speculative inlining fell neatly out of that. Suddenly, a bunch more things could be considered for inlining.

Further work lifted some of the inlining restrictions. Deoptimization learned how to cope if we deoptimized in the middle of processing named arguments, so we could optimize code where that situation occurred. It became possible to inline many closures, by rewriting the lexical lookup operations into an indirection through the code object of the code that we had inlined. It also became possible to inline code involving lexical throws of exceptions and their handlers. Since that is how return works in Perl 6, that again made quite a few more things possible to inline. A more fine-grained analysis allowed us to do some amount of cross-language inlining, meaning bits of the Rakudo internals written in NQP could be inlined into the Perl 6 code calling them, including closure cloning. I’ll add at this point that while it’s easy to write a list of these improvements now, realizing various of them was quite challenging.

Now it’s summer 2018, and my work has delivered some more advances. Previously, we would only do an inlining if we already had produced a specialized version of the callee. This usually worked out, and we sorted by maximum call stack depth and specialized deepest first to help with that. However, sometimes that was not enough, and we missed inlining opportunities. So, during the last month, I added support for producing code to inline on-demand. I also observed that we were only properly doing speculative (that is, based on statistics) inlines of calls made that were expected to return an object, but not those in void context. (If that sounds like an odd oversight, it’s because void calls were previously rare. It was only during the last month, when I improved code-gen to spot a lot more opportunities to emit void context calls, that we got a lot more of them and I spotted the problem.)

More is better, no?

Being able to inline a wider range of calls is a good thing. However, it also made it far more likely that we would run into constructs that don’t cope well with inlining. We’ve got a long way by marking ops that we know won’t cope well with it as :noinline (and then gradually liberalizing that over time where it was beneficial). The improvements over the previous month created a more difficult problem, however. We have a number of ops that allow for introspection and walking of the call stack. These are used to implement Perl 6 features such as the CALLER:: pseudo-package. However, they are also the way that $/ can be set by things like match.

Marking the ctx op as :noinline got us a long way. However, we ran into trouble because once a context handle has been obtained, one could then start traversing from it to callers or outers, and then starting some kind of lookup lookup relative to that point. But what if the caller was an inline? Then we don’t have a callframe to reference in the context object that we return.

A further problem was that a non-introspection form of dynamic lookup, which traverses the lexical chain hanging off each step of the dynamic chain, also was not aware of inlines. In theory, this would have become a problem when we started doing inlining of closures last year. However, since it is used in a tiny number of places, and those places didn’t permit inlining, we didn’t notice until this month, when inlining started to cover more cases.

Normal dynamic lookup, used for $*foo style variables, has been inline-aware for about as long as we’ve had inlining. However, this greatly complicated the lookup code. Replicating such inline compensation code in a bunch of places was clearly a bad idea. It’s a tricky problem, since we’re effectively trying to model a callstack that doesn’t really exist by using information telling us what it would look like if it did. It’s the same problem that deoptimization has to solve, except this time we’re just imagining what the call stack would look like unoptimized, not actually trying to recreate it. It’s certainly not a problem we want solved repeatedly around MoarVM’s implementation.

A new frame walker

To help tackle all of these problems, I introduced a new abstraction: the specialization-aware frame walker. It provides an iterator over the call stack as if no inlining had taken place, figuring out as much as it needs to in order to recreate the information that a particular operation wants.

First, I used it to make caller-dynamic lookup inline-aware. That went pretty well, and immediately fixed one of the module regressions that was “caused” by the recent inlining improvements.

Next, I used it to refactor the normal dynamic lookup. That needed careful work teasing out the details of dynamic lookup and caching of dynamic variable lookups from the inlining traversal. However, the end result was far simpler code, with much less duplication, since the JITted inline, interpreted inline, and non-inline paths largely collapsed and were handled by the frame walker.

Next up, contexts. Alas, this would be trickier.

Embracing laziness

Previously, context traversal had worked eagerly. When we asked for a context object representing the caller or outer of a particular context we already had, we immediately walked one frame in the appropriate direction and produced a result. Of course, this did not go well if there were inlines, since it always walked one real frame, but that may have multiple inlined frames within it.

One possibility was to make the ctx op immediately deoptimize the whole call stack, so that we then had a chain of real call frames to traverse. However, when I looked at some of the places using ctx, it became clear this would have some very negative performance consequences: every regex match causing a global deopt was not going to work!

Another option, that would largely preserve the existing design, was to store extra information about the inline we were inside of at the time we walked to a caller. This, however, had the weakness that we might do a deoptimization between the two points, thus invalidating the information. That was probably also possible to fix up, but the complexity of doing so put me off that approach.

Instead, I switched to a model where “move to caller” and “move to outer” would be stored as displacements to apply when the context object was used in order to obtain information. The frame walker could make these movements, before doing whatever lookup was required. Thus, even if a deoptimization were to take place between obtaining the context and using it, we could still do a correct traversal.

Too much laziness

This helped, but wasn’t quite enough either. If a context handle was taken and used immediately, things worked out fine. However, if the situation was like this:

| Frame we used ctx op on | (Frame 1)
|    Frame with inlines   | (Frame 2)

And then we used the handle some time later, things worked out less well. The problem was that I used the current return address of Frame 2 in order to understand which inline(s) we were inside of. However, if we’d executed more code in Frame 2, then it could have made another call. The current return address could thus point to the wrong inline. Oops.

However, since the ctx operation at the start of the lookup is never inlined, and a given call frame can only ever be called from one location in the caller, there’s a solution. If the ctx op is used to get a first-class reference to a frame on the call stack, we walk down the call stack and make sure that each frame called from inlined code preserves enough location information that we can later reconstruct what we need. It only needs to walk down the call stack until it sees a point where another ctx operation already preserved that information, so in programs with lots of use of the ctx op, we can avoid doing full stack walks each time, and just walk over recently created frames.

In closing

With those changes, the various Perl 6 modules exhibiting lookup problems since this month’s introduction of more aggressive inlining were fixed. Along the way, a common mechanism was introduced allowing us to walk a call stack as if no inlines had taken place. There’s at least one more place that we can use this: in order to make stack trace output not be sensitive to inlining. I’ll get to that in the coming weeks, or it might make a nice task for somebody looking to get themselves (more) involved with MoarVM development. Last but not least, I’d like to once again thank The Perl Foundation for organizing the funding that made this work possible.

6guts: More precise deoptimization usage tracking

Published by jnthnwrthngtn on 2018-07-21T15:59:19

In my previous post here, I talked about deoptimization and its implications for usage information. If you didn’t read that post, I suggest reading it before continuing, since the work described in this post builds upon it. Further background on deoptimization and its use in MoarVM may be found in my talk slides from last year’s Swiss Perl Workshop.

An example to consider

To keep things a bit simpler, we’ll look at an NQP program. NQP is a simplified subset of Perl 6, and so its naive compilation – before the optimizer gets at it – is much simpler than that produced by Rakudo. Here’s a small program to consider.

class Wrapper {
    has $!x;
    method x() { $!x }
class C { }

sub test($w) {
    my $var := "Used later";
    if nqp::istype($w.x, C) {
    else {

my int $i := 0;
my $wrapper := =>;
while $i < 1_000_000 {
say(test( => NQPMu)));

We’ll consider the test subroutine’s optimization. First, let’s walk through the bytecode before optimization. We always have a dummy empty basic block at the start of the graph, used for internal purposes, so we can disregard that.

  BB 0 (0x7f1be817b070):
    line: 7 (pc 0)
    Successors: 1
    Dominance children: 1

The next basic block starts with a bunch of null instructions, which will be mostly deleted. In the slow path interpreter, we null out registers in case code tries to read them as part of callframe setup. However, we get rid of that in the optimized code by letting the optimizer prove that such work is mostly not needed. Since we didn’t do any optimization yet, here’s all of those instructions.

  BB 1 (0x7f1be817b0f8):
    line: 7 (pc 0)
      null              r5(1)
      null              r4(1)
      null              r3(1)
      null              r1(1)
      null              r0(1)

Next we receive the parameter.

      checkarity      liti16(1), liti16(1)
      param_rp_o        r0(2), liti16(0)

Then we have the line my $var := "used later";. Rakudo is smarter than NQP in compiling such a thing: it would just emit a reference to a single constant string rather than boxing it each time like NQP’s simpler code-gen does.

      [Annotation: Line Number: x.nqp:7]
      const_s           r2(1), lits(Used later)
      hllboxtype_s      r3(2)
      box_s             r3(3),   r2(1),   r3(2)
      set               r1(2),   r3(3)

Now we have the code for $x.w. It starts out with a decont, since we may have been passed something in a Scalar container (note that NQP code may be called from full Perl 6 code, so this is possible). We then look up the method and call it.

      [Annotation: INS Deopt One (idx 0 -> pc 46; line 9)]
      [Annotation: Logged (bytecode offset 40)]
      decont            r4(2),   r0(2)
    Successors: 2
    Predecessors: 0
    Dominance children: 2

  BB 2 (0x7f1be817b158):
    line: 9 (pc 46)
      findmeth          r3(4),   r4(2), lits(x)
    Successors: 3
    Predecessors: 1
    Dominance children: 3

  BB 3 (0x7f1be817b1b8):
    line: 9 (pc 56)
      [Annotation: INS Deopt One (idx 1 -> pc 56; line 9)]
      prepargs        callsite(0x7f1bf1f0b340, 1 arg, 1 pos, nonflattening, interned)
      arg_o           liti16(0),   r0(2)
      [Annotation: INS Deopt All (idx 3 -> pc 72; line 9)]
      [Annotation: INS Deopt One (idx 2 -> pc 72; line 9)]
      [Annotation: Logged (bytecode offset 66)]
      invoke_o          r3(5),   r3(4)
    Successors: 4
    Predecessors: 2
    Dominance children: 4

Notice how two various instructions here are annotated with Deopt points. These are places that we might, after optimization has taken place, insert an instruction that could cause us to deoptimize. The pc 72 refers to the offset into the unoptimized bytecode that we should continue execution back in the interpreter.

There’s also various Logged annotations, which indicate instructions that may log some statistics – for example, about what code is invoked, and what type of value it returns (for invoke_o) or what kind of type we get out of a decont operation that actually had to read from a container.

Next up is the type check. Again, there’s a decont instruction, just in case the call to $x.w returned something in a container. We then have the istype instruction.

  BB 4 (0x7f1be817b218):
    line: 9 (pc 72)
      [Annotation: INS Deopt One (idx 4 -> pc 78; line 9)]
      [Annotation: Logged (bytecode offset 72)]
      decont            r4(3),   r3(5)
    Successors: 5
    Predecessors: 3
    Dominance children: 5

  BB 5 (0x7f1be817b278):
    line: 9 (pc 78)
      wval              r5(2), liti16(0), liti16(5) (P6opaque: C)
      istype            r6(1),   r4(3),   r5(2)
    Successors: 6
    Predecessors: 4
    Dominance children: 6

Next comes the if part of the branch:

  BB 6 (0x7f1be817b2d8):
    line: 9 (pc 94)
      unless_i          r6(1),   BB(8)
    Successors: 8, 7
    Predecessors: 5
    Dominance children: 7, 8, 9

  BB 7 (0x7f1be817b338):
    line: 9 (pc 102)
      const_s           r2(2), lits(C)
      hllboxtype_s      r4(4)
      box_s             r4(5),   r2(2),   r4(4)
      set               r5(3),   r4(5)
      goto              BB(9)
    Successors: 9
    Predecessors: 6
    Dominance children: 

Followed by the else part. And what is r1(2)? It’s the “Used later” string from earlier.

  BB 8 (0x7f1be817b398):
    line: 12 (pc 134)
      set               r5(4),   r1(2)
    Successors: 9
    Predecessors: 6
    Dominance children: 

Finally, we’re done, and return the result of the branch of the if statement that was executed.

  BB 9 (0x7f1be817b3f8):
    line: 12 (pc 140)
      PHI               r5(5),   r5(3),   r5(4)
      PHI               r4(6),   r4(5),   r4(3)
      PHI               r2(3),   r2(2),   r2(1)
      return_o          r5(5)
    Predecessors: 7, 8
    Dominance children: 

How we optimize it

Let’s now walk through the optimized output. The argument handling has been reduced to a single instruction that does an unchecked read of the incoming argument. This is because we’re producing a specialization for a particular input callsite shape and set of input arguments. In this case, it will be a single argument of type Wrapper.

  BB 0 (0x7f1be817b070):
    line: 7 (pc 0)
    Successors: 1
    Dominance children: 1

  BB 1 (0x7f1be817b0f8):
    line: 7 (pc 0)
      sp_getarg_o       r0(2), liti16(0)

What comes next is the code to store that "Used later" string. The ops look fine, but do you notice something odd?

      const_s           r2(1), lits(Used later)
      hllboxtype_s      r3(2)
      [Annotation: INS Deopt One (idx 0 -> pc 46; line 9)]
      box_s             r1(2),   r2(1),   r3(2)

Yup, there’s a deopt annotation moved on to that box_s. Huh? Well, let’s look at what comes next.

      [Annotation: INS Deopt One (idx 1 -> pc 56; line 9)]
      sp_getspeshslot   r7(0), sslot(2)
    Successors: 2
    Predecessors: 0
    Dominance children: 2

  BB 2 (0x7f1be8356d38):
    line: 7 (pc 0)
      [Annotation: FH Start (0)]
      [Annotation: Inline Start (0)]
      [Annotation: INS Deopt Inline (idx 5 -> pc 20; line 8)]
      set               r9(1),   r0(2)
      [Annotation: INS Deopt Inline (idx 6 -> pc 42; line 9)]
      sp_p6ogetvt_o    r11(1),   r9(1), liti16(8), sslot(4)
      [Annotation: FH End (0)]
      set               r3(5),  r11(1)
    Successors: 3
    Predecessors: 3
    Dominance children: 3

Recall that in the unoptimized code we next did $w.x by a findmeth instruction, which came after a decont of $w, and the we did an invocation of that method. What’s happened to all of that lot?

First, since $w is the argument we are producing a specialization for, we thus know it’s Wrapper, and we know that’s not a container type, so the decont can go. Since we also know its type and we know the method name, we can just resolve that method once. The resolution of it is then stored in a “spesh slot”, which you can think of as a constants table for this particular specialization. What follows is, instead of the invocation, the code for the method x() { $!x }, which has been inlined. (The sp_p6ogetvt_o instruction is what attribute lookup has been optimized into.)

Oh, and about that Deopt annotation on the box_s? That’s just because code got deleted and it got shifted. We’ll look at the consequences of that later.

Here is the rest of the code:

  BB 3 (0x7f1be817b218):
    line: 9 (pc 72)
      [Annotation: Inline End (0)]
      [Annotation: FH Goto (0)]
      [Annotation: INS Deopt One (idx 2 -> pc 72; line 9)]
      [Annotation: INS Deopt One (idx 4 -> pc 78; line 9)]
      sp_guardconc      r3(5), sslot(0), litui32(72)
      const_s           r2(2), lits(C)
      hllboxtype_s      r4(4)
      box_s             r5(3),   r2(2),   r4(4)
      PHI               r5(5),   r5(3)
      return_o          r5(5)
    Predecessors: 2
    Dominance children: 6

Well, that’s pretty different from what we started out with too. What on earth has happened? Where did our if statement go?!

The sp_guardconc instruction is a guard. It checks, in this case, that we have a concrete instance of C in register r3(5). It was inserted because the gathered statistics said that, so far, it had been such 100% of the time. The guard will deoptimize – that is, fall back to the interpreter – if it fails, but otherwise proceed. Since we have guarded that, then the istype will become a constant. That means we know which way the branch would go, and can delete the other part of the branch. A type check, a conditional branch, and a branch all go away, to be replaced by a single cheap guard.

But what about that “Used later” string?

Notice how we executed:

      box_s             r1(2),   r2(1),   r3(2)

But its result value, r1(2), is completely unused anywhere in the code that we have left after optimization. The instruction was, however, retained, for the sake of deoptimization. In the original code, the value was written prior to a guard that might deoptimize. Were we to throw it away, then after we deoptimized the interpreter would try to read a value that wasn’t written, and crash in some interesting way.

The original approach

The original approach taken to this problem was to:

  1. Whenever we see a Deopt annotation, take its index as our current deopt point
  2. Whenever we see a write, label it with the current deopt point
  3. Whenever we see a read, check it the deopt point of the write is not equal to the deopt point of the read. If that is the case, mark the write as needing to be retained for deopt purposes.

Effectively, if a value written before a deopt point might be read after a deopt point, then we retain it. That was originally done by bumping its usage count. In my last post here, I described how we switched to setting a “needed for deopt” flag instead. But in the grand scheme of things, that changed nothing much about the algorithm described above; only step 3 was changed.

Note that this algorithm works in the case of loops – where we might encounter a value being read in a PHI node prior to seeing it being written – because the lack of a deopt point recorded on the writer will make it unequal to the current deopt point.

Correct, but imprecise

The problem with this approach isn’t with correctness, but rather with precision. A deopt retention algorithm is correct if it doesn’t throw away anything that is needed after a deoptimization. Of course, the simplest possible algorithm would be to mark everything as required, allowing no instruction deletions! The method described above is also correct, and marks fewer things. And for a while, it was enough. However, it came to be a blocker for various other optimizations we wish to do.

There are two particular problems that motivated looking for a more precise way to handle deopt usage. First of all, many instructions that may be replaced with a guard and deoptimize are actually replaced with something else or even deleted. For example, decont will often be replaced by a set because we know that it’s not a container type. A set can never trigger deoptimization. However, we had no way to update our deopt usage information based on this change. Therefore, something written before the set that used be a decont and read after it, but otherwise not needed, would be kept alive because the decont could have had a guard inserted, even though we know it did not.

A larger problem is that even when we might insert a guard, we might later be able to prove it is not needed. Consider:

my int $i = $str.chars;

The chars method will be tiny, so we can inline it. Here’s the code that we currently produce; I’ve shown the end of the inlining of the chars method together with the assignment into $i.

      chars            r15(1),  r14(1)
      hllboxtype_i     r13(1)
      [Annotation: INS Deopt Inline (idx 7 -> pc 134; line -1)]
      box_i            r13(2),  r15(1),  r13(1)
      [Annotation: FH End (0)]
      set               r2(5),  r13(2)
    Successors: 3
    Predecessors: 4
    Dominance children: 3

  BB 3 (0x7efe0479b5f0):
    line: 1 (pc 100)
      [Annotation: Inline End (0)]
      [Annotation: FH Goto (0)]
      [Annotation: INS Deopt One (idx 3 -> pc 100; line 1)]
      sp_guardconc      r2(5), sslot(2), litui32(100)
      set               r2(6),   r2(5)
      set               r5(1),  r15(1)
      bindlex         lex(idx=0,outers=0,$i),   r5(1)

Since $i is a native integer lexical, we don’t need to box the native integer result of the chars op at all here. And you can see that we have done a rewrite such that r15(1) is used to assign to $inot the boxed result. However, the box_i instruction is retained. Why?

The immediate reason is that it’s used by the guard instruction. And indeed, I will do some work in the future to eliminate that. It’s not a hugely difficult problem. But doing that still wouldn’t have been enough. Why? Because there is a deopt point on the guard, and the boxed value is written before it and used after it. This example convinced me it was time to improve our deopt handling: it was directly in the way of optimizations that could provide a significant benefit.

A more precise algorithm

It took me three attempts to reach a solution to this. The first simple thing that I tried follows from the observation that everything written after the last deopt instruction in the optimized code can never possibly be used for deoptimization purposes. This was far from a general solution, but it did help a bit with very small functions that are free of control flow and have no or perhaps just some very early guards. This was safe, easy to reason about, easy to implement – but ultimately not powerful enough. However, it was helpful in letting me frame the problem and start to grapple with it, plus it gave me a set of cases that a more powerful solution should be able to take care of.

Attempt number two was to do a repeat of the initial deopt analysis process, but after the optimizations had taken place. Thus, cases where a Deopt annotation was on an instruction that was turned into something that could never deoptimize would not be counted. This quickly fell apart, however, since in the case where entire branches of a conditional were deleted then reads could disappear entirely. They simply weren’t there to analyze any more. So, this was an utter failure, but it did drive home that any analysis that was going to work had to build up a model of deoptimization usages before we performed any significant optimizations, and then manipulate that model safely even in the light of a mutated program graph.

Attempt three took much longer to come up with and implement, though thankfully was rather more successful. The new algorithm proceeds as follows.

  1. When we see a write instruction – and provided it has at least one reader – place it into a set OW of writes with still outstanding reads.
  2. When we see a deopt point, take the set OW and record the index of this deopt point on each of those writes. This means that we are now associating the writes with which deopt points are keeping them alive.
  3. Whenever we see a read, mark it as processed. Check if the writer has now had all of its reads processed. If so, remove it from the set OW.

This algorithm works in a single pass through the program graph. However, what about loops? In a loop, no matter what order we traverse the graph in, we will always see some reads that happen before writes, and that breaks the algorithm that I described above.

After some amount of scribbling graphs and staring at them, I hit upon a way to solve it that left me with a single pass through the graph, rather than having to iterate to a fixed point. When we see a read whose writer was not yet processed, we put it into a set of reads that are to be processed later. (As an aside, we know thanks to the SSA form that all such instructions are PHI (merge) instructions, and that these are always placed at the start of the graph.) We will then process a pending read when we have processed all of the basic blocks that are its predecessors – which means that by then all possible writes will have been processed.

The result is that we now have a list of all of the deopt points that make use of a particular write. Then, after optimization, we can go through the graph again and see which deopt points actually had guards or other potentially deoptimizing instructions placed at them. For all the cases where we have no such instruction under that Deopt annotation, we can delete the deopt usage. That way, if all normal usages and deopt usages of a value are gone, and the writing instruction is pure, we can delete that instruction.

This further means that once we gain the ability to delete guards that we can prove are not required any longer – perhaps because of new information we have after inlining – we will also be able to delete the deopt usages associated with them.

Last but not least, the specializer’s log output also includes which deopt points are keeping a value alive, so we will be able to inspect the graph in cases where we aren’t entirely sure and understand what’s happening.

Future work

With this done, it will make sense to work on guard elimination, so that is fairly high on my list of upcoming tasks. Another challenge is that while the new deopt algorithm is far more precise, it’s also far more costly. Its use of the DU chains means we have to run it as a second pass after the initial facts and usage pass. Further, the algorithm to eliminate unrequired deopt usages is a two pass algorithm; with some engineering we can likely find a way to avoid having to make the first of those. The various sets are represented as linked lists too, and we can probably do better than that.

One other interesting deoptimization improvement to explore in the future is to observe that any pure instructions leading up to a deopt point can be replayed by the interpreter. Therefore, we can deopt not to the instruction mapping to the place where we put the deopt point, but to the start of a run of pure instructions before that. That would in some cases allow us to delete more code in the optimized version.

Next time, I’ll be looking at how increasingly aggressive inlining caused chaos with our context introspection, and how I made things better. Thanks go to The Perl Foundation for making this work possible.

6guts: Better usage information in the MoarVM specializer

Published by jnthnwrthngtn on 2018-07-20T00:21:39

I’ve been doing lots of work on the MoarVM specializer of late, and will be writing a few posts here to explain it. This work has been covered by my grant from The Perl Foundation.

This post covers the recent addition of DU (Define-Use) chains. I’ll explain what they are, what kind of optimizations they have helped with so far, and how they can help us ensure the specializer is free of certain kinds of bug.

A little background

The MoarVM specializer helps programs run faster by ripping out as much unrequired generality as it knows how. This involves a bunch of different analyses, and those are aided by the program being turned into SSA (Static Single Assignment) form. This happens not at the Perl 6 program level, but rather at the bytecode level. MoarVM’s interpreter is a register machine, so something like:

($a + $b) * $c

Could, assuming these are all native integer variables, compile into something like:

getlex r0, '$a'
getlex r1, '$b'
add_i r0, r0, r1
getlex r1, '$c'
mul_i r0, r0, r1

Notice how registers r0 and r1 are re-used for multiple things. Now imagine that these registers hold objects and we are trying to track types and other such information. Since a register may be used for completely different things over its lifetime, we can’t just associate information with the register.

Transforming the bytecode into SSA form helps. We give each use of the register a version number:

getlex r0(1), '$a'
getlex r1(1), '$b'
add_i r0(2), r0(1), r1(1)
getlex r1(2), '$c'
mul_i r0(3), r0(2), r1(2)

Now we can associate information with each version of the register, greatly easing analysis of the program.


When a program is in SSA form, every versioned register has one definition: the instruction that writes it. Since it can never be written again anywhere in the SSA form of the bytecode, the writer is a single instruction. There are no MoarVM instructions that write more than one register, and so an instruction defines at most one value, and every versioned register has precisely one instruction that defines it.

So, defines are easy. For as long back as I can remember, we’ve stored a reference to the writing instruction of each versioned register, so whenever we see a read of it then we can always quickly find the defining instruction.


Until recently, we stored a counter of how many times each versioned register was used. We made an initial pass through the graph representing the bytecode to be optimized, bumping the usage count each time we saw a versioned register being used. Then, as we optimized, we could update those counts.

Usage information is especially useful taken together with knowledge of which instructions are pure – that is to say, they produce a result, but don’t have any effects besides that. If the usage count of such an instruction drops to zero, then we can delete it.

For example, if we have an attribute has str $!value in a class, it would be compiled into something like this:

  wval              r4(2), liti16(1), liti16(36) (P6opaque: Str)
  getattr_s         r5(1),   r0(2),   r4(2), lits($!value)

The wval instruction grabs the type object of the class that declares the attribute. This is used together with the attribute name to do a lookup (since parent and child classes may have attributes of the same name, but they are different attributes since they are in different classes). Provided we know the type of r0(2) – which is holding self – then we might optimize it into:

  wval              r4(2), liti16(1), liti16(36) (P6opaque: Str)
  sp_p6oget_s       r5(1),   r8(3), liti16(8)

Where the 8 is an offset in bytes indicating where the attribute lives in the memory of the object. We’ve turned a lookup by name into pointer chasing, which will later JIT into some pretty simple machine code (not quite as simple as we want yet, but still vastly faster than the normal lookup path).

But wait! What about that wval there? We don’t need it now. And so, after the various optimizations have taken place, we do Dead Instruction Elimination. So long as the usage count has dropped to zero – that is, nothing else is using the value – we can delete the instruction, meaning the end result is just:

  sp_p6oget_s       r5(1),   r8(3), liti16(8)

Deoptimization complication

So far, so relatively simple. Alas, there’s complications. Some values might become unused after we optimize the code, but we still can’t delete them. We use statistics to drive optimization, and do a great deal of speculation. For example, if we see 99% of the time a particular type shows up in the program, we optimize it assuming that type. But what if the 1% case shows up? Or what if we saw a certain type 100% of the time so far, but there’s a different one in the future? In that case, we drop back to the normal interpreter to handle it. For that to work out, however, we must make sure that the values the interpreter needs are still available after this deoptimization has taken place.

Up until recently, whenever we detected that a value might be needed if deoptimization happens, we simply gave its usage count an extra bump. This meant that even if we deleted all of its uses in the graph, we’d still not delete it. (This was a very coarse-grained analysis. I’ll discuss that more in a future post.)

You can’t count on this

The usage count was a fine enough approach to start out with, but it gradually came to be insufficient.

One bug that it’s quite possible to make is to forget to increment or decrement the usage count. The cases where it ended up too high could prevent us from deleting an instruction we didn’t need, leading to worse code. This wasn’t very serious, though a bit sub-optimal. The other way round – failing to increment the count – is of course more dangerous, since it may lead to an instruction being deleted that we really need. This didn’t happen often – we’re relatively careful – but it’d be nice if we had a way to verify it wasn’t happening at all. However, the +1 for the sake of deoptimization would have frustrated doing such an analysis.

A further issue is that while finding the place that a given versioned register was defined was easy, there was no cheap way to find its usages. Having such information would make some optimizations we already did easier and more effective, as well as make it easier to do some that we’re keen to add in the near future.

Beyond that, a single number was uninformative for those of us working on the optimizer. We could see the number, but what was it telling us? Why was the register still in use?

Adding use chains

So, instead of storing a count, we’ve started storing a linked list that points to each instruction that uses the versioned register. Often we only care about used, used precisely once, or unused; in fact, the only place we need the exact count is for debug output. Therefore, we can answer all the common questions we could with the usage count without having to traverse the chain. Building the chain is easy: everywhere we used to bump the counter, we now add an entry into the chain.

This chain works well for instructions that use the value, but what about the deoptimization usage? This was handled by storing that piece of information as a separate flag. It could then be displayed alongside the real usage count in the debug output, so we could quickly understand which registers were in use only for the purpose of deoptimization.

Checking the chains

Along with this, I implemented a chain checker. It goes through the instruction graph and the use chains, and makes sure:

This isn’t done by default – it costs something – but is available as a flag MoarVM developers can turn on when implementing new optimizations to aid with verifying they are, at least in this regard, correct.

Improving elimination of set instructions

Often in Perl 6, code has to deal with both values and values held in Scalar containers – that is to say, it’s polymorphic over the two cases. In the case that we have a Scalar container, we have to remove the value from it. This is an incredibly common operation in Perl 6, and there is a single op – decont – that checks if we have a container and takes the value out of it if so. Code generation conservatively inserts quite a lot of these.

Often, we simply have a value, and so there’s nothing to do. And often, the specializer can tell there will be nothing to do. Thus, something like this:

  decont            r5(2),   r0(2)
  findmeth          r4(2),   r5(2), lits(chars)

Is turned into this:

  set               r5(2),   r0(2)
  findmeth          r4(2),   r5(2), lits(chars)

Where set simply sets the value of one register into another. For this and various other reasons, it’s quite common that – after optimizations – we end up with code chock full of set instructions. They’re cheap, but they certainly aren’t free – on two counts. Firstly, there’s the execution cost of them. Secondly, they make the optimized code larger than it needs to be. This both makes less efficient use of the CPU’s code cache once we JIT the optimized result, but also can push the code over the inlining size limit, and thus it might miss out on further powerful optimizations.

We did have some code to try and get rid of set instructions. It was less than awesome on multiple counts. Firstly, it still left quite a few behind that we could see by inspection of the code could go away. Secondly, it could make a mess of the SSA form. Since it was one of the very last optimizations we did, that wasn’t a big deal, but it did make the debug output confusing, plus we will be adding more optimizations to this second pass in the future. Thirdly, it was somewhat adhoc, mostly written to handle peephole patterns that commonly showed up.

The usage chains provide a way to do better. The new set elimination algorithm covers the previous cases and new ones, and yet only does two fairly straightforward things.

Firstly, it looks if the writer of the set‘s second operand has only one usage, which is that set instruction, and no deopt usages. If so, and if there are no interfering uses of different versions of the register that the set writes, then it can have the writing instruction changed to write to the register that the set would, and the set instruction can then be deleted.

Failing that, it uses the use chain to check if there is a single user of the versioned register that the set instruction writes to. Again, given no conflicts, it can eliminate the set instruction by arranging for the user of the set instruction to instead use the value that the set would read. So in our case:

  set               r5(2),   r0(2)
  findmeth          r4(2),   r5(2), lits(chars)

We’d end up with:

  findmeth          r4(2),   r0(2), lits(chars)

To give a practical example of this, here is how the optimized code of the chars method called on a Scalar holding a Str looks without the set elimination:

  sp_getarg_o       r1(2), liti16(0)
  set               r8(2),   r1(2)
  set               r1(3),   r8(2)
  [Annotation: Logged (bytecode offset 24)]
  sp_p6oget_o       r8(3),   r1(3), liti16(16)
  [Annotation: INS Deopt One (idx 0 -> pc 30; line 2838)]
  sp_guardconc      r8(3), sslot(1), litui32(30)
  set              r11(2),   r8(3)
  set               r0(2),  r11(2)
  [Annotation: Line Number: SETTING::src/core/Str.pm6:2838]
  takedispatcher    r3(2)
  sp_p6oget_s       r5(1),   r0(2), liti16(8)
  chars             r6(1),   r5(1)
  hllboxtype_i      r4(3)
  [Annotation: INS Deopt One (idx 1 -> pc 134; line 2839)]
  box_i             r4(4),   r6(1),   r4(3)
  return_o          r4(4)

Notice the four set instructions in there. With the new set elimination algorithm, we end up with:

  sp_getarg_o       r1(3), liti16(0)
  [Annotation: Logged (bytecode offset 24)]
  sp_p6oget_o       r8(3),   r1(3), liti16(16)
  [Annotation: INS Deopt One (idx 0 -> pc 30; line 2838)]
  sp_guardconc      r8(3), sslot(1), litui32(30)
  [Annotation: Line Number: SETTING::src/core/Str.pm6:2838]
  takedispatcher    r3(2)
  sp_p6oget_s       r5(1),   r8(3), liti16(8)
  chars             r6(1),   r5(1)
  hllboxtype_i      r4(3)
  [Annotation: INS Deopt One (idx 1 -> pc 134; line 2839)]
  box_i             r4(4),   r6(1),   r4(3)
  return_o          r4(4)

Elimination of box/unbox pairs

Another interesting use of DU chains is to eliminate boxing of native values into objects only to unbox them again a short time later. This can happen due to the compiler not being smart enough, but if it happens across two subs or methods, and especially when we have multiple dispatch and polymorphic method dispatch happening, there’s not so much we could do better at that phase.

However, MoarVM does inlining, including speculative inlining. We can therefore see between boundaries that we cannot at compile time. Recall how chars produced this boxing code, as it is declared to return an Int:

  chars             r6(1),   r5(1)
  hllboxtype_i      r4(3)
  box_i             r4(4),   r6(1),   r4(3)

What if we were to write:

my int $chars = $str.chars;

Then the boxing happens just over the boundary. It turns out that there’s quite a lot to do in order to get rid of the boxing instruction, but with use chains we can already make a start. When we encounter a box, we look if any of its users are an unbox. After inlining, we’d see that there are such cases. Therefore, that unbox instruction can be rewritten to use r6(1) – the unboxed value.

That much works now. For reasons I’ll dig into in my next post, that’s not yet quite enough to eliminate the box_i instruction. So in this case, the saving is minor. Once we can get rid of the boxing operation, however, it will be a notable saving in such cases.

Coming in the future: native ref/deref pairs

One current performance challenge we have is that if we call a method and pass it a variable declared with a native type:

my int $foo = $a + $b;

Then we don’t know if that method is declared as taking an rw parameter or not. Therefore, we must not pass a native integer value, but instead form a reference that points to where $foo lives, so we can update it. Of course, in most cases is rw is not used.

After inlining, we’ll be able to see this, and so will be able to use the use chain to discover when a formed reference is used for nothing more than to do a dereference. Then we can eliminate that reference taking process entirely.

In summary

Adding use chains has allowed us to detect and fix a small number of usage handling bugs, given us a way to prevent such bugs happening in the future, allowed us to improve an existing optimization, provided for efficiently implementing a new one, and will be an important part of improving the performance of code using native types in the future. Furthermore, it means those of us working on MoarVM have more detailed information about why an operation has to take place in the optimized code, so we can better understand if we have missed opportunities.

However, that’s not the end of the usage story. It turned out that a single flag for deopt usage would not suffice. Next time, I’ll look at why, and what I’ve done to address that.

brrt to the future: Perl 6 on MoarVM has had a JIT for a few years

Published by Bart Wiegmans on 2018-07-05T15:32:00

Dear readers, I recently came across the following comment on the internet (via Perl 6 weekly):

perl6 on MoarVM is starting to grow a JIT

 Which is very interesting because:
  1. Someone had at least heard of our efforts, so that's a win.
  2. But they had left with the impression that it was still in a beginning phase.
Clearly, we have some PR work to do. Consider this my attempt.

When people are talking about a 'JIT' colloquially and especially in the context of dynamic languages, they usually refer to a system that has both a dynamic specialization functionality, as well as a machine-code emitting functionality. MoarVM has had support for both since 2014. Historically we've called the specializer 'spesh' and the machine code emitter 'jit'. Maybe for communicating with the rest of the world it is better to call both 'the JIT' frontend and backend, respectively.

Without further ado, let me list the history of the MoarVM JIT compiler:
  1. The April 2014 release of MoarVM introduced the dynamic specialization framework spesh (the 'frontend').
  2. In June 2014, I started working in earnest on the machine code emitter (the part that we call the JIT), and halfway through that month had compiled the first function.
  3. In July 2014 we saw the introduction of inlining and on-stack-replacement in spesh.
  4. Also July 2014 saw the introduction of the invokish mechanism by which control flow can be returned to the interpreter.
  5. In August 2014 the JIT backend was completed and merged into MoarVM master.
  6. In June 2015, I started work on a new JIT compiler backend, first by patching the assembler library we use (DynASM), then proceeded with a new intermediate representation and instruction selection.
  7. After that progress slowed until in March 2017 I completed the register allocator. (I guess register allocation was harder than I thought it would be). A little later, the register allocator would support call argument placement
  8. The new 'expression' JIT backend was merged in August 2017. Since then, many contributors have stepped up to develop expression JIT templates (that are used to generate machine code for MoarVM opcodes).
  9. In August 2017, Jonathan started working on reorganizing spesh, starting with moving specialization to a separate thread, central optimization planning, improving optimizations and installing argument guards (which makes the process of selecting the correct specialized variant more efficient).
  10. Somewhere after this - I'm not exactly sure when - nine implemented inlinig the 'NativeCall' foreign function interface into JIT compiled code.
  11. Most recently Jonathan has started work to write specializer plugins - ways for rakudo to inform MoarVM on how to cache the result of method lookups, which should help increase the effectiveness of optimization for Perl 6.
  12. Arround the same time I reworked the way that the interpreter handles control flow changes for  JIT compiled code (e.g. in exception handling).
We (mostly Jonathan and I) have also given presentations on the progress of the JIT compiler:
  1. At YAPC::EU 2014 and FOSDEM 2015, Jonathan gave a presentation on the MoarVM dynamic specialization system.
  2. At YAPC::EU 2015 (Granada) I also gave a presentation. Sadly I can no longer find the presentation online or offline.
  3. At the Swiss Perl Workshop in 2017 Jonathan gave a presentation on deoptimization  and how it is used for supporting speculative optimizations.
  4. At The Perl Conference in Amsterdam (2017) I gave a presentation on how to implement JIT expression templates in MoarVM.
I'm sure there is more out there that I haven't yet linked to. And there's still a lot of exciting work going on and still more work to do. However, I hope that after this post there can be no doubt about the reality of a Perl6 implementation backed by a specializing JIT compiler ;-)

PS: You might read this and be reasonably surprised that Rakudo Perl 6 is not, after all this, very fast yet. I have a - not entirely serious - explanation for that:
  1. All problems in computer science can be solved with a layer of indirection.
  2. Many layers of indirection make programs slow.
  3. Perl 6 solves many computer science problems for you ;-) 
In the future, we'll continue to solve those problems, just faster.

PPS: Also, it is important to note that many of practical speed improvements that Rakudo users have come to enjoy did not come from VM improvements per se, but from better use of the VM by core library routines, for which many volunteers are responsible.

    my Timotimo \this: No Major Breakthroughs

    Published by Timo Paulssen on 2018-06-15T13:45:32

    Sadly, the time since the last post on this blog hasn't been fruitful with regards to the profiling project. There have been slight improvements to the profiler inside MoarVM, like handling profiles with a very deep call graph better, making the first GC run show up again, capturing allocations from optional parameters properly, and hopefully finally making programs that have multiple threads running no longer crash during the profile dumping phase. A recently merged branch by esteemed colleague brrt will allow me to properly fix one nasty issue that remains in the profiler that relates to inlining.

    Even though I can't show off lovely screenshots of the profiler UI (if you consider Coder's Art™ lovely), I can briefly go over the changes that have happened and what's next on the list. And of course I'm still very much interested in finishing the grant work!


    Missed Optional Parameters

    The first change I'd like to talk about is the one that was causing allocations from boxing optional parameters to go missing from the profile. Optional parameters are implemented as an op that accesses the passed arguments to see if something was present or not. Then it either runs code to put the default value in - if no argument was present - or it skips over that code. Additionally, it handles arguments that were passed as native ints, nums, or strings.

    If an object was expected by the code that uses the parameter, this op will also create a box for the value, for example an Int object. The crucial mistake was in the instrumentation by the profiler.

    Finding everything that is allocated is done by putting a little "take note of this object" op after every op that may create an object. This op then checks if the object was probably allocated by the last instruction, or if it was probably already logged earlier. If it was just allocated, that allocation is recorded for the profile.

    The problem in this case lies in the placement of the logging op: It was placed right after the instruction that grabs the argument. However, that made it land in the place that gets skipped over if an argument was present. So either no argument was passed, and the logging op was just asked to log that a null was allocated, or an argument was passed that was perhaps boxed, and the logging op was skipped over. Oops!

    Fixing this was simply a matter of following the skip and putting the logging op in the right place.

    Multi-threaded Programs Crashing Mysteriously

    If you used the profiler on code that runs multiple threads, you may have seen very suspicious looking internal error messages like "const_iX NYI" pop up. This was caused by the instrumentation aspect of the profiler, more specifically what it did when the instrumentation was no longer needed. Allow me to explain:

    Instrumentation in this context refers to creating a version of the program bytecode that does some extra work in the right places. For the profiler this includes putting ops in the code that record that a function was called or exited, and ops that record allocations of objects.

    This instrumentation happens lazily, i.e. when a function is entered the first time, it runs into the "instrumentation barrier", which pauses the program and creates the instrumented code right then and there. The instrumented code then gets installed and the program continues. This is implemented by having a global "instrumentation level" that just gets increased by 1 every time functions should go through an instrumentation step. This is done when profiling starts, and it is done when profiling ends.

    Here's where the problem lies: Profiling is turned on before user code runs, which just happens to always be in single-threaded territory. However, profiling gets turned off as soon as the main thread is finished. This is done by increasing the instrumentation level by 1 again. Every function that is entered from now on will have to go through instrumentation again, which will restore the original bytecode in this case.

    Other threads might still continue running, though. The first example that made this problem clear was finding the 1000st prime by grepping over a hypered range from 0 to infinity. Crucially, after finding the 1000st prime, some workers were still busy with their batch of numbers.

    Here's where the instrumentation barrier becomes a problem. One of the remaining workers calls into a function, for example is-prime, for the first time since the instrumentation level was changed. It will have its instrumented bytecode replaced by the original bytecode. However, the other threads, which may still be inside is-prime in this example, will not know about this. They keep happily interpreting the bytecode when all of a sudden the bytecode changes.

    Since the uninstrumented bytecode is shorter than the instrumented bytecode, the worst case is that it reads code past the end of the bytecode segment, but the more common case is that the instruction pointer just suddenly points either at the wrong instruction, or in the middle of an instruction.

    Instructions usually start with the opcode, a 16 bit number usually between 0 and 1000. The next part is often a 16 bit number holding the index of a register, which is usually a number below about 40, but quite often below 10. If the instruction pointer accidentally treats the register number as an opcode, it will therefor often land on ops with low numbers. Opcode 0 is no_op, i.e. "do nothing". The next three ops are const_i8 through const_i32, which all just throw the exception that I mentioned in the first paragraph: "const_iX NYI". Two spots ahead is the op "const_n32", which also thrown as NYI error.

    And there you have it, mystery solved. But what's the solution to the underlying problem? In this case, I took the easy way out. All the profiling ops first check if profiling is currently turned on or not anyway, so leaving the instrumented code in after profiling has ended is not dangerous. That's why MoarVM now just keeps instrumentation the same after profiling ends. After all, the next thing is usually dumping the profile data and exiting anyway.

    The Next Steps

    The MoarVM branch that brrt recently merged is very helpful for a very specific situation that can throw the profiler off and cause gigantic profile files: When a block has its bytecode inlined into the containing routine, and the block that was inlined had a "return" in it, it knows that it has to "skip" over the inner block, since blocks don't handle returns.

    However, the block is still counted as a routine that gets entered and left. The long and short of it is that returning from the inner block jumps directly to the exit, but having the block inlined frees us from doing the whole "create a call frame, and tear it down afterwards" dance. That dance would have contained telling the profiler that a frame was exited "abnormally"; since the regular "prof_exit" op that would have recorded the exit will be skipped over, tearing down the frame would have contained the logging.

    In this particular case, though, no exit would be logged! This makes the call graph - think of it like a flame graph - look very strange. Imagine a function being called in a loop, and returning from an inner block as described above. It would miss all of the exits, so every time the function is called again, it will look like the function called itself, never returning to the loop. Every time around the loop, the call will seem to be nested deeper and deeper. Since the profiler keeps around the whole call graph, the file will just keep growing with every single iteration.

    Now, how does brrt's code change this situation? It will allow very easily to figure out how many inlines deep a "return from this routine" op is, so that the profiler can accurately log the right amount of exits.

    On the UI side of things, I want to bring the routine overview list into a good state that will finally be worth showing. The list of GC runs will also be interesting, especially since the profiler recently learned to log how each individual thread performed its GC run, but the current HTML frontend doesn't know how to display that yet.

    Hopefully the wait for the next post on my blog won't be as long as this time!
      - Timo

    brrt to the future: Controlled Stack Hacking for the MoarVM JIT Compiler

    Published by Bart Wiegmans on 2018-06-10T16:29:00

    Hi readers! Today I have a story about a recently-merged set of patches that allows MoarVM to use the on-stack return pointer to reduce the overhead of exception handling and other VM features for JIT compiled code. Maybe you'll find it interesting.

    As you might know, MoarVM Is a bytecode interpreter. In some situations, MoarVM internals need to know the current position in the execution of the program. For instance in exception handling, all exception thrown within a block are caught by the associated CATCH block or propagated if no such block exists. Such blocks are indicated as a range within the bytecode, and we find the associated CATCH block by comparing the current position with the known ranges.

    This is relatively straightforward to implement for the interpreter, because the interpreter must maintain a 'current position' pointer simply to function. (MoarVM stores a pointer to this pointer in a thread context object so that it is available throughout the VM). For the JIT that is another matter, because the control flow is handled implicitly by the CPU. The instruction pointer register (called %rip on amd64) cannot be read directly. Moreover, as soon as you enter a function that might want to use the current address (like the functions responsible for exception handling), you've left the 'program' code and entered VM code.

    So what we used to do instead is take the address of a position within the bytecode (as indicated by a label in the bytecode, a somewhat involved process) and store that in a per-frame field called the jit_entry_label. This field is necessary to support another MoarVM feature  - we use the interpreter as a trampoline (in the first or second sense of that definition). Because the interpreter is not recursive, JIT compiled code needs to return to the interpreter to execute a subroutine that was invoked (as opposed to calling an interpreter function, as perl5 does for exception handling). The primary purpose of this label is to continue where we left off after returning from another invoked program. But it can be used just as well for finding where we are in the execution of the program.

    Only problem then is that we need to keep it up to date, which we did. On the entry of every basic block (uninterrupted sequence of code), we stored the current position in this field. This is quite common - every conditional statement, loop or other control flow change needs one, as well as every exception-handler scope change needed a little snippet storing the current position. This was annoying.

    Furthermore, there are numerous MoarVM instructions that might change the control flow (or might not). For instance, the instruction responsible for converting an object to a boolean value might need to invoke the Bool method specific to that objects' class - or, if no such method exists, fallback to a default implementation. We call such instructions invokish. When compiling code that contains such invokish instructions, we installed 'control guards' to check if the VM had in fact invoked another routine, and if so, to return to the interpreter to execute that routine. This too added quite a bit of overhead.

    I keep writing in the past tense because all of that is now gone, and that happened due to a simple realization. When we call a function (in C or assembly), we place the return address (the machine instruction after the call instruction) on the stack. We can read this value from the stack and use it wherever we want to know about the current position.

    I initially had implemented that using a stack walker function similar to the one in the link, except that I implemented it in assembly instead. (When writing this post I learned of the GCC __builtin_return_address and MSVC _ReturnAddress intrinsic functions, which presumably do the same thing). Unfortunately, that strategy didn't really work - it relies on the frame base pointer (%rbp) being placed right 'on top' of the return address pointer on the stack. Even with special compiler flags intended to preserve that behaviour, this assumption turned out to be unreliable.

    Fortunately I realized later that it was also unnecessary. Because the JIT compiler controls the layout of the compiled code frame, it also controls exactly where the return address will be stored when we compile a (C) function call. That means that we can simply take a pointer to this address and store that in the thread context structure. From that address, we can read exactly the current position in the compiled code, without having to explicitly store it so often. Furthermore, we can also write to this location, changing the address the function will return to. Effectively, this is a controlled 'on-stack goto', an idiom more often used for exploits than for good purposes - clearly this is an exception! We use this to force a return to the interpreter (with proper stack frame cleanup) for 'invokish' instructions that end up invoking. We can change control to go directly to an exception handler if it is in the same frame. This makes all the earlier control 'guard' fragments redundant, allowing us to remove them entirely. Thus, an invokish instruction that doesn't actually invoke now carries no extra cost.

    How much does this save? It depends a lot on the exact program, but I estimate about 5% of compiled code size, and from a hopelessly optimal (and fairly unrealistic) benchmark which I lifted from this blog post, approximately 10% of runtime. In real code, the effect is definitely nowhere near what jnthn++ or samcv++ achieved lately, but it's still nice. Also nice is that the code is quite a bit simpler than it was before.

    Anyway, that's all I have to tell today. Have fun hacking, and until next time!

    6guts: Faster dispatches with MoarVM specializer plugins

    Published by jnthnwrthngtn on 2018-06-09T00:01:42

    One of the goals for the current round of my Perl Foundation Performance and Reliability grant is to speed up private method calls in roles, as well as assignments in to Scalar containers. What I didn’t expect at the time I wrote the grant application is that these two would lead to a single new mechanism in MoarVM to make them possible.

    The Scalar container assignment improvements are still to come; currently I have a plan and hope to make good progress on it next week. I do, however, have a range of dispatch-related performance improvements to show, including the private method case.


    MoarVM runs programs faster by analyzing how they run and producing specialized versions of parts of the program based on that information. It takes note of which code is run often (frequently called methods and hot loops), which types a block of code is called with, what types are returned from calls, what code a closure points to, and more. Note that it observes the runtime behavior, and so is not dependent on whether the program has type annotations or not.

    Calls are one of the most important things that the optimizer considers, be they method calls, subroutine calls or invoking a received closure. Method calls are especially interesting, because with a call like $obj.meth($arg), the method to be called depends on the exact type of $obj. Often, we end up producing a version of the code that is specialized for a particular type of $obj. We can therefore resolve the method once in this specialization, saving the method lookup overhead.

    But there’s more. Once we know exactly what method we’ll be calling, and if the method is fairly small, we can inline it into the caller, thus eliminating the call overhead too. Further, since we are inlining a specialized version of the code and have already proved that we meet the conditions for using that specialization, we can eliminate type checks on parameters. Inlining is even more powerful than that: it opens the door to a wider range of analyses that would not be possible without it, which lead to futher program optimizations.

    The problem

    We can do this kind of optimization with method calls because MoarVM understands about method calls. It knows that if it is holding the type of the invocant constant, then the result of the dispatch can also be considered a constant.

    Unfortunately, there’s more than one case of method calling in Perl 6. While the majority of calls take the familiar $ form, we also have:

    In the first case, if the call is in a class, then we can resolve it at compilation time, since private methods aren’t virtual. Such calls are thus pretty fast. But what if the private method call is in a role? Well, then it was far slower. It took a method call on the meta-object, which then did a hash lookup to find the method, followed by invoking that method. This work was done by a call to a dispatch:<!> utility method. It was the same story for qualified calls and duck calls.

    So, let’s extend MoarVM to understand these kinds of calls?

    So if normal method calls are faster because MoarVM understands them, surely we can do better by teaching it to understand these other forms of calling too? Perhaps we could add some new ops to the VM to represent these kinds of calls?

    Maybe, but all of them come with their own rules. And those rules are already implemented in the metamodel, so we’d be doing some logic duplication. We make normal method calls fast by precomputing a method cache, which is just a hash table, and have the specializer do its lookups in that. While such an approach might work for private methods, it gets decidedly trickier in the other two cases. Plus those precomputed hashes take up a lot of space. There are hundreds of exception types in CORE.setting and every one of them has a precomputed hash table of all of its methods, with those methods from base classes denormalized in to it. This means hundreds of hashes containing mappings for all of the methods that are inherited from MuAny, and Exception. We do lazily deserialize these, which helps, but it’s still fairly costly. Introducing more such things, when I already want rid of that one, didn’t feel like a good direction.

    Let’s make MoarVM teachable

    Earlier in the post, I wrote this:

    It [the optimizer] knows that if it is holding the type of the invocant [of a method call] constant, then the result of the dispatch can also be held constant.

    And this is the key. The important thing isn’t that the specializer knows the precise semantics of the method dispatch. The important thing is that it knows the relationship between the arguments to a dispatch (e.g. the type that we’re calling the method on) and the result of the dispatch.

    This, along with considering the challenges of optimizing Scalar assignments, led me to the idea of introducing a mechanism in MoarVM where we can tell it about these relationships. This enables the specializer to insert guards as needed and then simply use the calculated result of the dispatch.

    Specializer plugins

    The new mechanism is known as “spesh plugins”, and I merged it into MoarVM’s master branch today. It works in a few steps. The first is that one registers a spesh plugin. Here’s the one for helping optimize private method calls:

    nqp::speshreg('perl6', 'privmeth', -> $obj, str $name {
        nqp::speshguardtype($obj, $obj.WHAT);
        $obj.HOW.find_private_method($obj, $name)

    The registration provides the language the plugin is for, the name of the plugin, and a callback. The callback takes an object and a method name. The second line is the key to how the mechanism works. It indicates that the result that will be returned from this plugin will be valid provided the type of $obj precisely matches (that is, with no regard to subtyping relationships) the type of the $obj we are currently considering. Therefore, it establishes a relationship between the invocant type and the private method call result.

    Then, we just need to compile a private method call like:

    self!foo($bar, $baz)


    nqp::speshresolve('privmeth', self, 'foo')(self, $bar, $baz)

    Taking care to only evaluate self once (obviously not a problem for self, but in general it can be any expression, and may have side-effects).

    And that’s it. So what happens at runtime?

    When the interpreter encounters this call for the first time, it calls the plugin. It then stores the result along with the conditions. On later calls made in the interpreter, it uses this mapping table to quite quickly map the invocant type into the appropriate result. It’s a little cache. (Aside: this is a little more involved because we want lookups without locking, but also need to cope with multiple threads creating resolution races. Thanks to a generalized free-at-safepoint mechanism in MoarVM, this isn’t so hard.)

    So that’s nice, and on its own would already be an improvement over what it replaced. But we haven’t even got to the exciting part yet! Each time we use this mapping, it records which mapping was used for the benefit of the optimizer. This information is stored in such a way that the specializer can work out which mappings are used with a particular set of parameter types to the method. So, in:

    role R {
        method foo() {
    class C1 does R {
        method !bar() { 1 }
    class C2 does R {
        method !bar() { 2 }

    The method foo might be invoked with invocants of type C1 and C2. Thus the mapping table for the call self!bar will have two entries. We may (if the code is hot) produce two specializations of method foo, and if we do, then we will also be able to see that there is only ever one target of the private method call in each case. Thus, we can inline the appropriate !bar into the matching specialization of foo.


    Writing a module PM.pm6 that contains:

    role R {
        method m() { self!p }
        method !p() { 42 }
    class C does R {
    for ^10_000_000 {

    And then running it with perl6 -I. -e 'use PM6' used to run in 5.5s on my development machine. That’s only 1.8 million iterations of the loop per second, which means each is eating a whopping 1,650 CPU cycles assuming a 3GHz CPU.

    With the new spesh plugin mechanism, it runs in 0.83s, over 6.5x faster. It’s over 12 million iterations of the loop per second, or around 250 CPU cycles per iteration. That’s still a good bit higher than would be good, but it’s a heck of a lot better.

    Note that due to the way roles are handled in non-precompiled code, the use of the spesh plugin will not happen at present in a role in a script, thus why in this case I put the code into a module. This restriction can be lifted later.

    But wait, there’s more

    I also wrote a spesh plugin for qualified dispatches, like $obj.Foo::meth(). This one guards on two of its inputs, and has an error case to handle. Notice how we can avoid replicating this logic inside of MoarVM itself and just write it in NQP code.

    nqp::speshreg('perl6', 'qualmeth', -> $obj, str $name, $type {
        nqp::speshguardtype($obj, $obj.WHAT);
        if nqp::istype($obj, $type) {
            # Resolve to the correct qualified method.
            nqp::speshguardtype($type, $type.WHAT);
            $obj.HOW.find_method_qualified($obj, $type, $name)
        else {
            # We'll throw an exception; return a thunk that will delegate to the
            # slow path implementation to do the throwing.
            -> $inv, *@pos, *%named {
                $inv.'dispatch:<::>'($name, $type, |@pos, |%named)

    This gave an even more dramatic speedup. The program:

    role R1 {
        method m() { 1 }
    role R2 {
        method m() { 2 }
    class C does R1 does R2 {
        method m() {
    for ^10_000_000 {

    Used to take 13.3s. With the spesh plugin in effect, it now takes 1.07s, a factor of more than 12x improvement.

    And even a little more…

    I also wondered if I could get $obj.?foo duck dispatches to do better using a spesh plugin too. The answer turned out to be yes. First of all, here’s the plugin:

    sub discard-and-nil(*@pos, *%named) { Nil }
    nqp::speshreg('perl6', 'maybemeth', -> $obj, str $name {
        nqp::speshguardtype($obj, $obj.WHAT);
        my $meth := $obj.HOW.find_method($obj, $name);
            ?? $meth
            !! &discard-and-nil

    There’s a couple of cases I decided to measure here. The first is the one where we wrote code with a .? call to handle the general case (for example, in a module), but then the program using the module always (or > 99% of the time) gives an object where we can call the method.

    class C {
    class D {
        method m() { 42 }
    for ^10_000_000 {
        (rand > 0.999 ?? C !! D).?m()

    The rand call, compare, and conditional are all costs in this code besides the call I wanted to measure, so it’s not such a direct measurement of the real speedup of .?. Still, this program went from taking 10.9s before to 4.29s with the spesh plugin in place – an improvement of 2.5x. It achieves this by doing a speculative inline of the method m anyway, and then using deoptimization to fall back to the interpreter to handle the 0.1% of cases where we get C and not D. (It then, at the end of the loop body, falls back into the hot code again.) Note that the inlining and deopt just naturally fell out of things the specializer already knew how to do.

    But had this come at the cost of making really polymorphic cases slower? Here’s another benchmark:

    class C {
    class D {
        method m() { 42 }
    for ^10_000_000 {
        (rand > 0.5 ?? C !! D).?m()

    This one goes from 7.60s to 4.92s, a 1.5x speedup. Spesh can’t just punt this to doing a deopt for the uncommon case, because there is no uncommon case. Still, the guard table scan comes out ahead.

    (By the way, I think a lot of the slowness in this code – though I didn’t think of it when I wrote the benchmark – is that rand returns a Num, but 0.5 and 0.999 are Rats, so it is doing a costly type coercion before comparing.)

    And what next?

    Next I’ll be taking on Scalar containers and assignment, seeing what I can do with spesh plugins there, and hoping my ideas lead to as positive results as has been seen here.

    Also, this isn’t the final word on the various benchmarks in this post either. I know full well that the current spesh plugin implementation is inserting some redundant guards, and a bit of effort on that front can probably get us another win.

    samcv: Secure Hashing for MoarVM to Prevent DOS Attacks

    Published on 2018-05-16T07:00:00

    Hashes are very useful data structures and underlie many internal representations in Perl 6 as well as being used as themselves. These data structures are very nice since they offer O(1) insertion time and O(1) lookup time on average. Hashes have long been considered an essential feature for Perl, much loved by users. Though when exploited, hashes can cause servers to grind to a halt. New in Rakudo Perl 6 2018.5 will be a feature called hash randomization which does much to help protect against this attack. In this article I explain some hashing basics as well as how the attack against non-randomized hashing can work.

    Table of Contents

    Hashing Basics

    Some hashing basics: when we use a hash, we take a string and come up with a unique integer to represent the string. Similar to how md5 or sha1 sums take an arbitrary amount of data and condense it into a shorter number which can identify it, we do a similar thing for strings.

    my %hash; %hash<foo> = 10

    In this code, MoarVM takes the string foo and performs a hashing function on it using a series of bitwise operations. The goal is to create a shorter number which allows us to put the foo key into one of the 8 buckets that MoarVM initializes when a hash is created.

    8 Hash buckets

    Our hashing code sets up a predefined number of buckets . When a bucket fills up to have 10 items it doubles the number of buckets. In normal operation the hashes will be randomly distributed, so it would take ≈47 keys added (≈47 is the average number of items to result in one bucket being filled to 10 items) before we have to expand the buckets the first time.

    When the buckets are expanded, we will now have 16 buckets. In normal operation our previous ≈47 items should be evenly distributed into those 16 buckets.

    The Attack

    Without a random hash seed it is easy for an attacker to generate strings which will result in the same hash. This devolves to O(n️2) time for the hash lookup. This O(n2) is actually O(string_length * num_collisions). When we have hash collisions, that means that no matter how many times we double the number of buckets we have, the strings which have hash collisions will always remain in the same bucket as each other. To locate the correct string, MoarVM must go down the chain and compare each hash value with the one we’re looking for. Since they are all the same, we must fall back to also checking each string itself manually until we find the correct string in that bucket.

    Hash collision

    This attack is done by creating a function that essentially is our hashing function backward (for those curious see here for an example of code which does forward and backward hashing for Chrome V8 engine’s former hashing function). We hash our target string, t. We then use random 3 character sequences (in our case graphemes) and plug them into our backward hashing function along with the hash for our target t. The backward hash and the random character sequence are stored in the dictionary and the process is repeated until we have a very large number of backward hash’s and random 3 grapheme prefixes.

    We can then use this dictionary to construct successively longer strings (or short if we so desire) which are the same hash as our target string t. This is a simplification of how the Meet-In-The-Middle attack works.

    This has been fixed in most programming languages (Python, Ruby, Perl), and several CVE’s have been issued over the years for this exploit (See CVE’s for PHP, OCaml, Perl, Ruby and Python).

    Assuming everything is fine for the next release I will also merge changes which introduce a stronger hashing function called SipHash. SipHash is meant to protect against an attacker discovering a hash secret remotely. While randomizing the seed makes this attack much harder, a determined attacker could discover the hash and if that is done they can easily perform a meet in the middle attack. SipHash was designed to solve the vulnerability of the hash function itself to meet-in-the-middle attacks. Both the randomization of the hash secret in addition with a non-vulnerable hashing function work work together to avert hash collision denial of service attacks.

    While the hash secret randomization will be out in Rakudo 2018.05, SipHash is planned to be introduced in Rakudo 2018.06.

    Randomness Source

    On Linux and Unix we prefer function calls rather than reading from /dev/urandom. There are some very important reasons for this.

    Relying on an external file existing is potentially problematic. If we are in a chroot and /dev is not mounted we will not have access to /dev/urandom. /dev/urandom is not special, it can be deleted by accident (or on purpose) or a sparse data file mounted in its place undetectable by programs. Trusting it simply because of its path is not ideal. Also, if we exhaust the maximum number of open file descriptors we will be unable to open /dev/urandom as well.

    System Functions

    On Windows we use the pCryptGenRandom which is provided by advapi32.dll since Windows XP.

    Linux, FreeBSD, OpenBSD and MacOS all use system provided random calls (if available) to get the data rather than having to open /dev/urandom. All these OS’s guarantee these calls to be non-blocking, though MacOS’s documentation does not comment on it. This is mostly important in very early userspace, which bit Python when a developer accidentally changed the randomness source causing systems which relied on very early Python scripts to stop booting due to waiting for randomness source to initialize.

    If the function doesn’t exist we fall back to using /dev/urandom. If opening or reading it fails, on BSD’s we will use the arc4random() function. In many BSD’s this is seeded from the system’s random entropy pool, providing us with a back up in case /dev/urandom doesn’t exist.

    On Linux we use the getrandom() system call which was added to kernel 3.17 instead of using the glibc wrapper since the glibc wrapper was added much later than to the kernel.

    On MacOS, Solaris and FreeBSD we use getrandom() while on OpenBSD we use getentropy().

    User Facing Changes

    From Rakudo Perl 6 2018.05, the order that keys are returned will be random between each perl6 instance.

    perl6 -e 'my %hash = <a 1 b 1 c 1 d 1 e 1 f 1>; say %hash.keys'
    (d f c a b e)
    perl6 -e 'my %hash = <a 1 b 1 c 1 d 1 e 1 f 1>; say %hash.keys'
    (e f a d c b)

    This will also effect iterating a hash without sorting: for %hash { }

    What Do I Have To Do?

    Users and module developers should make sure that they explicitly sort hashes and not rely on a specific order being constant. If you have a module, take a look at the code and see where you are iterating on a hash’s keys and whether or not the order of processing the hash’s keys affects the output of the program.

    # This should be okay since we are putting the hash into another hash, order
    # does not matter.
    for %hash.keys -> $key {
        %stuff{$key} = $i++;
    # This can potentially cause issues, depending on where `@stuff` is used.
    for %hash.keys -> $key {
        @stuff.push: $key;
    # This should be OK since we are using is-deeply and comparing a hash with another
    # hash
    is-deeply my-cool-hash-returning-function($input), %( foo => 'text', bar => 'text', baz => 'text');
    # Probably best to avoid using `is`. The `is` test function converts the input to a string before
    # checking for equality, but works since we stringify it in sorted order.
    is %hash,  %( foo => 'text', bar => 'text', baz => 'text');
    # NO. Keys are not guaranteed to be in the same order on each invocation
    is %hash.keys, <a b c d>;

    Module Developers

    Module developers should check out the git master of Rakudo, or if 2018.05 has been released, use that to run the tests of your module. Make sure to run the tests multiple times, ideally at least 10 times or use a loop:

    while prove -e 'perl6 -Ilib'; do true; done

    This loop will run again and again until it encounters a test failure, in which case it will stop.

    You must run your tests many times because the hash order will be different on each run. For hashes will a small number of items, it may not fail on every run. Make sure that you also look at the source to identify items that need fixing; don’t just rely on the test’s to tell you if you must make changes to your module.

    Further Reading

    Hardening Perl’s Hash Function, article by about changes Perl 5 has made to harden hashing.

    gfldex: Deconstructing Simple Grammars

    Published by gfldex on 2018-05-10T14:51:34

    Last year I wrote an egg timer that was parsing command line arguments similar to GNU sleep. I was happy with the stringent form of the parser as follows.

    my Seconds $to-wait = @timicles»\
        .split(/<number>/, :v)\
        .map(-> [$,Rat(Any) $count, Str(Any) $unit] --> Seconds { %unit-multipliers{$unit} * $count })\

    It does a few simple things and does them one after another. A grammar with an action class would be overkill. I wasn’t happy with using splits ability to return the needle with the parts. It certainly does not improve readability.

    After quite a few iterations (and stepping on a bug), I came up with a way to use Str.match instead. If I convert each Match-object into a Hash I can use deconstruction in a signature of a pointy block.

    my Seconds $to-wait = @timicles»\
        .match(/<number> <suffix>+/)».hash\ # the +-quatifier is a workaround
        .map(-> % ( Rat(Any) :$number, Str(Any) :$suffix ) { %unit-multipliers{$suffix} * $number })\

    Instead of using positionals I can use named arguments that correspond to the named regexes inside the match arguments.

    Even in such a small pice of code things fall into place. Hyper-method-calls get rid of simple loops. The well crafted buildin types allow signature deconstruction to actually work without loads of temporary variables. It’s almost as certain language designers where aiming to make a most elegant language. Rakudo Star Release 2018.04

    Published on 2018-05-07T00:00:00

    Perl 6 Inside Out: 🔬 75. my $x = $x in Perl 6

    Published by andrewshitov on 2018-04-10T08:51:58

    What happens if you’ll try to create a new variable and immediately initialise it by itself, as shown in the following test code:

    my $x = $x;

    This does not work (which is expected), but Perl 6 is so kind to the user  that it gives an error message prepared especially for this case:

    ===SORRY!=== Error while compiling:
    Cannot use variable $x in declaration to initialize itself
    ------> my $x = $⏏x;
      expecting any of:

    Let us find the place in the code where the error message is triggered. This case is captured in the Grammar of Perl 6, at the place where variable is parsed:

    token variable {
        . . .
        | <sigil>
          [ $<twigil>=['.^'] <desigilname=desigilmetaname>
            | <twigil>? <desigilname> ]
          [ <?{ !$*IN_DECL && $*VARIABLE && $*VARIABLE eq 
            $<sigil> ~ $<twigil> ~ $<desigilname> }>
                  self.typed_panic: 'X::Syntax::Variable::Initializer', 
                  name => $*VARIABLE
        . . .

    The condition to throw an exception is a bit wordy, but you can clearly see here that the whole variable name is checked, including both sigil and potential twigil.

    The exception itself is located in src/core/Exception.pm6 (notice that file extensions were changed from .pm to .pm6 recently), and it is used only for the above case:

    my class X::Syntax::Variable::Initializer does X::Syntax {
        has $.name = '<anon>';
        method message() {
            "Cannot use variable $!name in declaration to initialize itself"

    And that’s all for today. Rakudo Perl 6 sources can be really transparent sometimes! 🙂

    Perl 6 Inside Out: 🦋 74. Typed hashes in Perl 6

    Published by andrewshitov on 2018-04-08T09:35:41

    In Perl 6, you can restrict the content of a variable container by specifying its type, for example:

    my Int $i;

    There is only one value in a scalar variable. You can extend the concept to arrays and let its element to keep only integers, as it is done in the next example:

    > my Int @i;
    > @i.push(42);
    > @i.push('Hello');
    Type check failed in assignment to @i;
    expected Int but got Str ("Hello")
      in block <unit> at <unknown file> line 1

    Hashes keeps pairs, so you can specify the type of both keys and values. The syntax is not deductible from the above examples.

    First, let us announce the type of the value:

    my Str %s;

    Now, it is possible to have strings as values:

    > %s<Hello> = 'World'
    > %s<42> = 'Fourty-two'

    But it’s not possible to save integers:

    > %s<x> = 100
    Type check failed in assignment to %s;
    expected Str but got Int (100)
      in block <unit> at <unknown file> line 1

    (By the way, notice that in the case of %s<42> the key is a string.)

    To specify the type of the second dimension, namely, of the hash keys, give the type in curly braces:

    my %r{Rat};

    This variable is also referred to as object hash.

    Having this, Perl expects you to have Rat keys for this variable:

    > %r<22/7> = pi
    > %r
    {22/7 => 3.14159265358979}

    Attempts to use integers or strings, for example, fail:

    > %r<Hello> = 1
    Type check failed in binding to parameter 'key';
    expected Rat but got Str ("Hello")
      in block <unit> at <unknown file> line 1
    > %r{23} = 32
    Type check failed in binding to parameter 'key';
    expected Rat but got Int (23)
      in block <unit> at <unknown file> line 1

    Finally, you can specify the types of both keys and values:

    my Str %m{Int};

    This variable can be used for translating month number to month names but not vice versa:

    > %m{3} = 'March'
    > %m<March> = 3
    Type check failed in binding to parameter 'key';
    expected Int but got Str ("March")
      in block <unit> at <unknown file> line 1


    Perl 6 Inside Out: 🔬73. Keys, values, etc. of hashes in Perl 6

    Published by andrewshitov on 2018-04-07T09:46:26

    Today, we will take a look at a few methods of the Hash class that return all hash keys or values or both:

    > my %h = H => 'Hydrogen', He => 'Helium', Li => 'Lithium';
    {H => Hydrogen, He => Helium, Li => Lithium}
    > %h.keys;
    (H Li He)
    > %h.values;
    (Hydrogen Lithium Helium)
    > %h.kv;
    (H Hydrogen Li Lithium He Helium)

    While you may want to go directly to the src/core/Hash.pm6 file to see the definitions of the methods, you will not find them there. The Hash class is a child of Map, and all these methods are defined in src/core/Map.pm6. Getting keys and values is simple:

    multi method keys(Map:D:) {
    multi method values(Map:D:) {

    For the kv method, more work has to be done:

    multi method kv(Map:D:) { :: does Rakudo::Iterator::Mappy {
            has int $!on-value;
            method pull-one() is raw {
                . . .
            method skip-one() {
                . . .
            method push-all($target --> IterationEnd) {
                . . .

    As you see, the method returns a sequence that is built using an anonymous class implementing the Rakudo::Iterator::Mappy role. We already saw how this approach is used in combination with defining pull-one and push-all methods.

    Let us look at another set of methods, pairs and antipairs. One of them is simple and straightforward:

    multi method pairs(Map:D:) {

    Another one is using an intermediate class:

    multi method antipairs(Map:D:) { :: does Rakudo::Iterator::Mappy {
            method pull-one() {
                . . .
            method push-all($target --> IterationEnd) {
            . . .

    Both methods produce results of the same structure:

    > %h.antipairs
    (Hydrogen => H Lithium => Li Helium => He)
    > %h.pairs
    (H => Hydrogen Li => Lithium He => Helium)


    Perl 6 Inside Out: 🔬72. Superscripts in Perl 6

    Published by andrewshitov on 2018-04-05T08:33:58

    In Perl 6, you can use superscript indices to calculate powers of numbers, for example:

    > 2⁵
    > 7³

    It also works with more than one digit in the superscript:

    > 10¹²

    You can guess that the above cases are equivalent to the following:

    > 2**5
    > 7**3
    > 10**12

    But the question is: How on Earth does it work? Let us find it out.

    For the Numeric role, the following operation is defined:

    proto sub postfix:<ⁿ>(Mu $, Mu $) is pure {*}
    multi sub postfix:<ⁿ>(\a, \b) { a ** b }

    Aha, that is what we need, and the superscript notation is converted to the simple ** operator here.

    You can visualise what exactly is passed to the operation by printing the operands:

    multi sub postfix:<ⁿ>(\a, \b) { 
        nqp::say('# a = ' ~ a);
        nqp::say('# b = ' ~ b);
        a ** b 

    In this case, you’ll see the following output for the test examples above:

    > 2⁵
    # a = 2
    # b = 5
    > 10¹²
    # a = 10
    # b = 12

    Now, it is time to understand how the postfix that extracts superscripts works. Its name, , written in superscript, should not mislead you. This is not a magic trick of the parser, this is just a name of the symbol, and it can be found in the Grammar:

    token postfix:sym<ⁿ> {
        <sign=[⁻⁺¯]>? <dig=[⁰¹²³⁴⁵⁶⁷⁸⁹]>+ <O(|%autoincrement)>

    You see, this symbol is a sequence of superscripted digits with an optional sign before them. (Did you think of a sign before we reached this moment in the Grammar?)

    Let us try negative powers, by the way:

    > say 4⁻³
    # a = 4
    # b = -3

    Also notice that the whole construct is treated as a postfix operator. It can also be applied to variables, for example:

    > my $x = 9
    > say $x²
    # a = 9
    # b = 2

    So, a digit in superscript is not a part of the variable’s name.

    OK, the final part of the trilogy, the code in Actions, which parses the index:

    method postfix:sym<ⁿ>($/) {
        my $Int := $*W.find_symbol(['Int']);
        my $power := nqp::box_i(0, $Int);
        for $<dig> {
            $power := nqp::add_I(
               nqp::mul_I($power, nqp::box_i(10, $Int), $Int),
               nqp::box_i(nqp::index("⁰¹²³⁴⁵⁶⁷⁸⁹", $_), $Int),
        $power := nqp::neg_I($power, $Int) 
            if $<sign> eq '⁻' || $<sign> eq '¯';
        make<call>, :name('&postfix:<ⁿ>'), 
                          $*W.add_numeric_constant($/, 'Int', $power));

    As you can see here, it scans the digits and updates the $power variable by adding the value at the next decimal position (it is selected in the code above).

    The available characters are listed in a string, and to get its value, the offset in the string is used. The $<dig> match contains a digit, you can see it in the Grammar:



    Perl 6 Inside Out: 🔬71. Implementing Int.sleep() in Perl 6

    Published by andrewshitov on 2018-04-04T08:32:37

    Hello! Yesterday, I was giving my Perl 6 Intro course at the German Perl Workshop in Gummersbash. It was a great pleasure to prepare and run this one-day course, and, while it was difficult to cover everything, we touched all main aspects of the Perl 6 language: from variables to regexes and parallel computing. Of course, it was only a top-level overview, and there was not enough time to make all the exercises. You can do them at home, here’s the Perl 6 Intro – Exercises PDF file.

    Among the rest, we tried to implement the sleep method for integers. The rationale behind that is that it is possible to say:

    > 10.rand

    But not:

    > 10.sleep
    No such method 'sleep' for invocant of type 'Int'
      in block <unit> at <unknown file> line 1

    OK, so let’s first implement the simplest form of sleep for Ints only. Go to src/core/Int.pm6 and add the following:

    my class Int does Real {
        method sleep() {

    Here’s a photo from the screen:


    There is no declaration of the $!value attribute in this file, but we know that it can be found somewhere in Perl6/Metamodel/BOOTSTRAP.nqp:

    # class Int is Cool {
    # has bigint $!value is box_target;
    Int.HOW.add_parent(Int, Cool);
    Int.HOW.add_attribute(Int,<$!value>, :type(bigint), 
                          :box_target(1), :package(Int)));
    Int.HOW.set_boolification_mode(Int, 6);

    Compile and run. The desired code works now:

    > 3.sleep
    # sleeping 3 seconds

    What can be changed here? The first idea is to allow non-integer numbers as the delay duration. As Int does the Real role, just move the method to src/core/ and get the value using the Num method instead of reading $!value directly (there is no such attribute in the Real role):

    my role Real does Numeric {
        method sleep() { 

    Now it also works with rationals and floating-point numbers:

    > 2.sleep
    > 3.14.sleep
    > pi.sleep

    Before wrapping it up, let us take a look at the body of the sleep subroutine. It is defined in src/core/Date.pm6:

    proto sub sleep(|) {*}
    multi sub sleep(--> Nil) { sleep(*) }
    multi sub sleep($seconds --> Nil) {
        # 1e9 seconds is a large enough value that still makes VMs sleep
        # larger values cause nqp::sleep() to exit immediatelly (esp. on 32-bit)
        if nqp::istype($seconds,Whatever) || $seconds == Inf {
            nqp::sleep(1e9) while True;
        elsif $seconds > 1e9 {
            nqp::sleep($_) for gather {
                1e9.take xx ($seconds / 1e9);
                take $seconds - 1e9 * ($seconds / 1e9).Int;
        elsif $seconds > 0e0 {

    The code is very clear and does not need any comments.

    And maybe just to see why our modified Rakudo printed the time after sleep in the tests above, let’s refer to the documentation of NQP to see that its sleep function’s return value is the number of seconds:

    ## sleep
    * `sleep(num $seconds --> num)`
    Sleep for the given number of seconds (no guarantee is made
    how exact the time sleeping is spent.)
    Returns the passed in number.


    my Timotimo \this: Tangentially related work

    Published by Timo Paulssen on 2018-03-26T17:30:30

    Hi there, it's already been three weeks since my first blog post on my TPF grant work. In between then and now the nice folks over at Edument made public the work I've been doing on the side for a couple of months. Fortunately, my grant work benefits a whole lot from this as well. Being able to debug (set breakpoints, inspect variable contents, single-step execution) the profile dumping code in nqp (re-used in Rakudo), as well as the heap snapshot loading code in App::MoarVM::HeapAnalyzer::Model lets me more easily figure out why things might be wrong or even crashing.

    Since Edument's product is Cro, the reactive framework for implementing and consuming microservices, I use simple Cro applications as test subjects for the profilers, as well as the debugger.

    Photo by Erik-Jan Leusink / Unsplash

    Yesterday I took the first close look at the Cro app that powers the new profiler frontends, by running it under the heap profiler. I was rather surprised to find that even while no requests were being made, the heap snapshot file kept growing at a hundred megabytes every few seconds. Something was clearly amiss.

    To understand why this happens you must know that MoarVM will take a heap snapshot after every GC run. That means something must be causing the GC to run frequently even if no actual work is being done.

    Fortunately, I know that Rakudo's ThreadPoolScheduler has a built-in supervisor that has an eye on the performance of the thread pool. It runs on its own thread and wakes up a hundred times every second.

    My recent profiler work to make multi-threaded applications run properly under the regular profiler let me have a closer look at what was being allocated. Turns out a lot of objects related to iterating over a Range object were being created. A single function was using range iteration, but that accounted for a huge chunk of allocations. Looking at what functions allocate the different types of objects, you can see that pretty much 100% of all Scalar allocations were caused by iterating over a Range.

    The Instrumented Profiler shows a table of which routines allocate how many Objects of type Scalar
    Scalars may only hold 3 pointers in them on top of the common object header that's 16 bytes big, but it surely adds up! (measurement taken over a 60 second period)

    So just changing the for ^worker-list.elems into an equivalent loop loop got allocations down significantly.

    There was, of course, more left to do. The math we were doing to figure out how active the process has been since we last woke up (including a moving average) caused some boxing and unboxing. A call to a helper method was allocating empty hashes for named arguments every time (something we can often optimize away completely, but in this case we couldn't be sure that it'd be safe). And finally, the getrusage op was creating a fresh int array every time.

    I was initially reluctant to make the code less readable in the interest of performance. However, the supervisor allocating absolutely nothing was almost achieved. Lizmat inlined the getrusage-total helper sub that caused boxing and unboxing on every call, and I decided to inline the prod-affinity-workers helper method as well – this one was only used in a single place, anyway.

    The last piece of the puzzle was the getrusage op that allocated integer arrays every time it was called. To get the last drop of performance out of the supervisor thread, I changed the implementation of nqp::getrusage to take an already allocated array.

    After all this work, the ThreadPoolScheduler will not allocate anything at all if nothing is happening.

    I hope I have given you a little insight into what working on performance can look like.

    Now that I've shaved this yak, I can properly profile and analyze Cro applications and anything that runs tasks with the start keyword, like the backend of the heap analyzer!

    I hope to see you back on my blog when the next blog post hits the 'net!
      - Timo

    brrt to the future: Some things I want

    Published by Bart Wiegmans on 2018-02-27T16:38:00

    Lately I've been fixing a few bugs here and there in the JIT compiler, as well as trying to advance the JIT expression optimizer. The story of those bugs is interesting but in this post I want to focus on something else, namely some support infrastructure that I think we should have that would make working on MoarVM and spesh/jit in particular much nicer.

    There are a bunch of things related to runtime control of spesh and the JIT:
    Then there's more ambitious stuff, that still falls under 'housekeeping':
    And there's more ambitious stuff that would fall under optimizations and general functionality improvements:
    There is definitively more out there I want to do, but this should be enough to keep me busy for a while. And if anyone else wants to take a try at any of these, they'd be very welcome to :-). Rakudo Star Release 2018.01

    Published on 2018-01-29T00:00:00

    gfldex: Expensive Egg-Timers

    Published by gfldex on 2017-12-31T13:28:01

    If you use a CLI you might have done something along the line.

    sleep 1m 30s; do-the-next-thing

    I have a script called OK that will display a short text in a hopeful green and morse code O-K via the PC speaker. By doing so I turn my computer into an expensive egg-timer.

    As of late I found myself waiting for longer periods of time and was missing a count-down so I could estimate how much more time I can waste playing computer games. The result is a program called count-down.

    Since I wanted to mimic the behaviour of sleep as closely as possible I had a peek into its source-code. That made me realise how lucky I am to be allowed to use Perl 6. If I strip all the extra bits a count-down needs I’m at 33 lines of code compared to 154 lines of GNU sleep. The boilerplate I have is mostly for readability. Like defining a subset called Seconds and a Rexex called number.

    Errors in the arguments to the script will be cought by the where clause in MAINs signature. Since there are no further multi candidates for MAIN that might interfere, the usage message will be displayed automatically if arguments are not recognized. Pretty much all lines in the C implementation deal with argument handling and the fact that they can’t trust their arguments until the last bit of handling is done. With a proper signature a Perl 6 Routine can fully trust its arguments and no further error handling is needed. Compared to the C version (that does a lot less) the code can be read linear from top to bottom and is much more expressive. After changing a few identifiers I didn’t feel the need for comments anymore. Even some unclear code like the splitting on numbers and keeping the values, becomes clear in the next lines where I sum up a list of seconds.

    Now I can comfortably count down the rest of a year that was made much better by a made better Perl 6. I wish you all a happy 2018.

    Perl 6 Advent Calendar: Bonus Xmas – Concurrent HTTP Server implementation and the scripter’s approach

    Published by ramiroencinas on 2017-12-25T01:52:24

    First of all, I want to highlight Jonathan Worthington‘s work with Rakudo Perl6 and IO::Socket::Async. Thanks Jon!


    I like to make scripts; write well-organized sequences of actions, get results and do things with them.

    When I began with Perl6 I discovered a spectacular ecosystem, where I could put my ideas into practice in the way that I like: script manner. One of these ideas was to implement a small HTTP server to play with it. Looking at other projects and modules related to Perl6, HTTP and sockets I discovered that the authors behind were programmers with a great experience with Object-Oriented programming.

    Perl6 paradigms

    Perl6 supports the three most popular programming paradigms:

    I think that the Object-Oriented paradigm is fine when you design an application or service that will grow, will do many and varied things and will have many changes. But I don’t like things that grow too much and will have many changes; that’s why I like scripts, for its native procedural approach, because it promote simplicity and effectiveness quickly. I like small (step by step) things that do great things quickly.

    The Functional paradigm is awesome in my opinion; you can take a function and use it like a var, among other amazings things.

    Perl6 Supplies are like a V12 engine

    When I started with Perl6 shortly after I started the translation of to Spanish language. Looking at the documentation of Perl6 I discovered the great concurrent potential that Perl6 has. The concurrent aspect of Perl6 was more powerful than I thought.

    The idea I had of the HTTP server with Perl6 began with the Perl6 Supplies (Asynchronous data stream with multiple subscribers), specifically with the class IO::Socket::Async. All socket management, data transmission and concurrency is practically automatic and easy to understand. It was perfect for making and play with a small concurrent but powerful service.

    Based on the examples of the IO::Socket::Async documentation I started to implement a small HTTP server with pseudoCGI support in the mini-http-cgi-server project, and it worked as I expected. As I got what I wanted, I was satisfied and I left this project for a while. I didn’t like things to grow too much.

    But then, preparing a talk for the Madrid Perl Workshop 2017 (thanks to Madrid Perl Mongers and Barcelona Perl Mongers guys for the event support), I had enough motivation to do something more practical, something where web front-end coders could do their job well and communicate with the back-end where Perl6 is awaiting. On the one hand, the typical public html static structure, and on the other hand a Perl6 module including several webservices waiting for the web requests from the front-end guys.

    Then Wap6 was born (Web App Perl6).

    The Wap6 structure

    I like the structure for a web application that Wap6 implements:

    public folder contains the friendly front-end stuff, like static html, javascript, css, etc., that is, the front-end developer space. The webservices folder contains the back-end stuff: a Perl6 module including a function per webservice.

    This same folder level contains the solution entry point, a Perl6 script that, among other things like initialization server parameters, contains the mapping between routes and webservices:

    my %webservices =
      '/ws1' => ( &ws1, 'html' ),
      '/ws2' => ( &ws2, 'json' )

    As you can see, not only the routes are mapped to the corresponding webservice, but also specify the return content-type of the webservice (like HMTL or JSON). That is, you type http://domain/ws1 in the web browser and the ws1 function returns the response data with the corresponding content-type as we will see later.

    All the routes to the webservices are in %webservices hash and it is passed to the main funcion wap with other useful named params:

    wap(:$server-ip, :$server-port, :$default-html, :%webservices);

    The core of Wap6

    The wap funcion is located out side, in the core lib module that Wap6 use and contains the concurrent and elegant V12 engine:

    react {   
      whenever IO::Socket::Async.listen($server-ip,$server-port) -> $conn {
        whenever $conn.Supply(:bin) -> $buf {
          my $response = response(:$buf, :$current-dir, :$default-html, :%webservices);
          $conn.write: $response.encode('UTF-8');

    This is a threes (react – whenever – IO::Socket::Async) reactive, concurrent and asynchronous context. When a transmission arrives from the web client ($conn), it is placed in a new Supply $buf of bin type ($conn.Suply(:bin)), and $buf with other things like the %webservices hash are sent to the response function that runs the HTTP logic. Finally, the return from the response function is written back to the web client.

    The response function (located out side, in the core lib too) contains the HTTP parser stuff: it splits the incoming data (the HTTP entity) into headers and body, it performs validations, it takes basic HTTP header information like the method (GET or POST) and the URI (Uniform Resource Identifier), it determines if the requested resource is a webservice (from the webservices folder) or static file (from the public folder), get the data from the resource (from static file or webservice) and returns back to wap function to write the response to the web client, as we have seen before.

    The Webservices

    The response function, validates $buf and extract the HTTP method from the request header that can be GET or POST (I don’t think that in the future it will support more HTTP methods). Case of GET method it puts the URL params (if any) into $get-params. Case of POST method, it puts the body request into $body.

    Then it’s time to check if the web client has requested a webservice. $get-params includes the URI and is extracted with the URI module, finally the result is placed in $path:

    given $path {
      when %webservices{"$_"}:exists {
        my ( &ws, $direct-type ) = %webservices{"$_"};
        my $type = content-type(:$direct-type);
        return response-headers(200, $type) ~ &ws(:$get-params, :$body);

    If $path exists in the %webservices hash, the client wants a webservice. Then it extracts the corresponding webservice callable function &ws from %webservices hash (yes, I also love Functional paradigm :-) ) and the correspondig content-type. Then it calls the webservice function &ws with the $get-params and the request $body parameters. Finally it returns the HTTP response entity that concatenates:

    The callable webservice &ws can be ws1, located in the Perl6 module from webservices folder:

    sub ws1 ( :$get-params, :$body ) is export {
      if $get-params { return 'From ws1: ' ~ $get-params; }
      if $body { return 'From ws1: ' ~ $body; }

    In this demo context the webservice simply returns the input, that is, the $get-params (when GET) or the $body (when POST).

    When the client request a static file

    After discarding all the other possibilities, if the client request a static file hosted in the public folder, like html, js, css, etc, then:

    given $path {
      default {
        my $filepath = "$current-dir/public/$path";
        my $type = content-type(:$filepath);
        return response-headers(200, $type) ~ slurp "$current-dir/public/$path";

    It returns the response headers including the matched content-type and the requested file contents with slurp.

    And that’s all folks! a concurrent web server in the script-procedural manner: Wap6.


    I’m happy with the results of Wap6. I don’t pretend that it grows a lot, but I’m always tempted to continue adding more features: SSL support (completed!), session management (in progress), cookies, file uploads, etc.

    Perl6 has put on the table a very powerful way to perform concurrent network operations: IO::Socket::Async, a masterpiece. Also, with Perl6 you can mix the Object-Oriented, Procedural and Functional paradigms as you wish. With these capabilities you can design a concurrent asynchronous service and implement it quickly.

    If you want something more serious approach with HTTP services and concurrency in the Perl6 ecosystem, take a look at Cro, it represents a great opportunity to establish Perl6 as a powerful entity in the HTTP services space. Jonathan Worthington wrote about it last 9th on this same Advent Calendar.

    Meanwhile, I will continue playing with Wap6, in the script manner, contributing with the Perl6 ecosystem and learning from the bests coders in the world, I mean: Perl and Perl6 coders, of course :-)

    Perl 6 Advent Calendar: Day 24 – Solving a Rubik’s Cube

    Published by coke on 2017-12-24T00:33:51


    I have a speed cube on my wish list for Christmas, and I'm really excited about it. :) I wanted to share that enthusiasm with some Perl 6 code.

    I graduated from high school in '89, so I'm just the right age to have had a Rubik's cube through my formative teen years. I remember trying to show off on the bus and getting my time down to just under a minute. I got a booklet from a local toy store back in the 80s that showed an algorithm on how to solve the cube, which I memorized. I don't have the booklet anymore. I've kept at it over the years, but never at a competitive level.

    In the past few months, YouTube has suggested a few cube videos to me based on my interest in the standupmaths channel; seeing the world record come in under 5 seconds makes my old time of a minute seem ridiculously slow.

    Everyone I've spoken to who can solve the cube has been using a different algorithm than I learned, and the one discussed on standupmaths is yet a different one. The advanced version of this one seems to be commonly used by those who are regularly setting world records, though.

    Picking up this algorithm was not too hard; I found several videos, especially one describing how to solve the last layer. After doing this for a few days, I transcribed the steps to a few notes showing the list of steps, and the crucial parts for each step: desired orientation, followed by the individual turns for that step. I was then able to refer to a single page of my notebook instead of a 30-minute video, and after a few more days, had memorized the steps: being able to go from the notation to just doing the moves is a big speed up.

    After a week, I was able to solve it reliably using the new method in under two minutes; a step back, but not bad for a week's effort in my off hours. Since then (a few weeks now), I've gotten down to under 1:20 pretty consistently. Again, this is the beginner method, without any advanced techniques, and I'm at the point where I can do the individual algorithm steps without looking at the cube. (I still have a long way to go to be competitive though.)


    A quick note about the notation for moves – given that you're holding the cube with a side on the top, and one side facing you, the relative sides are:

    L (Left) R (Right) U (Up) D (Down) F (Front) B (Back)

    If you see a lone letter in the steps, like B, that means to turn that face clockwise (relative to the center of the cube, not you). If you add a ʼ to the letter, that means counter clockwise, so would have the top piece coming down, while a R would have the bottom piece coming up.

    Additionally, you might have to turn a slice twice, which is written as U2; (Doesn't matter if it's clockwise or not, since it's 180º from the starting point.)


    The beginner's algorithm I'm working with has the following basic steps:

    1. White cross 2. White corners 3. Second layer 4. Yellow cross 5. Yellow edges 6. Yellow corners 7. Orient yellow corners

    If you're curious as to what the individual steps are in each, you'll be able to dig through the Rubik's wiki or the YouTube video linked above. More advanced versions of this algorithm (CFOP by Jessica Fridrich) allow you to combine steps, have specific "shortcuts" to deal with certain cube states, or solve any color as the first side, not just white.

    Designing a Module

    As I began working on the module, I knew I wanted to get to a point where I could show the required positions for each step in a way that was natural to someone familiar with the algorithm, and to have the individual steps also be natural, something like:


    I also wanted to be able to dump the existing state of the cube; For now as text, but eventually being able to tie it into a visual representation as well,

    We need to be able to tell if the cube is solved; We need to be able to inspect pieces relative to the current orientation, and be able to change our orientation.

    Since I was going to start with the ability to render the state of the cube, and then quickly add the ability to turn sides, I picked an internal structure that made that fairly easy.

    The Code

    The latest version of the module is available on github. The code presented here is from the initial version.

    Perl 6 lets you create Enumerations so you can use actual words in your code instead of lookup values, so let's start with some we'll need:

    enum Side «:Up('U') :Down('D') :Front('F') :Back('B') :Left('L') :Right('R')»;
    enum Colors «:Red('R') :Green('G') :Blue('B') :Yellow('Y') :White('W') :Orange('O')»;

    With this syntax, we can use Up directly in our code, and its associated value is U.

    We want a class so we can store attributes and have methods, so our class definition has:

    class Cube::Three {
    has %!Sides;
    submethod BUILD() {
    %!Sides{Up} = [White xx 9];
    %!Sides{Front} = [Red xx 9];

    We have a single attribute, a Hash called %.Sides; Each key corresponds to one of the Enum sides. The value is a 9-element array of Colors. Each element on the array corresponds to a position on the cube. With white on top and red in front as the default, the colors and cell positions are shown here with the numbers & colors. (White is Up, Red is Front)

             W0 W1 W2
             W3 W4 W5
             W6 W7 W8
    G2 G5 G8 R2 R5 R8 B2 B5 B8 O2 O5 O8
    G1 G4 G7 R1 R4 R7 B1 B4 B7 O1 O4 O7
    G0 G3 G6 R0 R3 R6 B0 B3 B6 B0 B3 B6
             Y0 Y1 Y2
             Y3 Y4 Y5
             Y6 Y7 Y8

    The first methods I added were to do clockwise turns of each face.

    method F {
    self!fixup-sides([, [6,7,8]),, [2,1,0]),, [2,1,0]),, [6,7,8]),

    This public method calls two private methods (denoted with the !); one rotates a single Side clockwise, and the second takes a list of Pairs, where the key is a Side, and the value is a list of positions. If you imagine rotating the top of the cube clockwise, you can see that the positions are being swapped from one to the next.

    Note that we return self from the method; this allows us to chain the method calls as we wanted in the original design.

    The clockwise rotation of a single side shows a raw Side being passed, and uses array slicing to change the order of the pieces in place.

    # 0 1 2 6 3 0
    # 3 4 5 -> 7 4 1
    # 6 7 8 8 5 2
    method !rotate-clockwise(Side \side) {
    %!Sides{side}[0,1,2,3,5,6,7,8] = %!Sides{side}[6,3,0,7,1,8,5,2];

    To add the rest of the notation for the moves, we add some simple wrapper methods:

    method F2 { self.F.F; }
    method Fʼ { self.F.F.F; }

    F2 just calls the move twice; Fʼ cheats: 3 rights make a left.

    At this point, I had to make sure that my turns were doing what they were supposed to, so I added a gist method (which is called when an object is output with say).

          W Y W
          Y W Y
          W Y W
    G B G R O R B G B O R O
    B G B O R O G B G R O R
    G B G R O R B G B O R O
          Y W Y
          W Y W
          Y W Y

    The source for the gist is:

    method gist {
    my $result;
    $result = %!Sides{Up}.rotor(3).join("\n").indent(6);
    $result ~= "\n";
    for 2,1,0 -> $row {
    for (Left, Front, Right, Back) -> $side {
    my @slice = (0,3,6) >>+>> $row;
    $result ~= ~%!Sides{$side}[@slice].join(' ') ~ ' ';
    $result ~= "\n";
    $result ~= %!Sides{Down}.rotor(3).join("\n").indent(6);

    A few things to note:

    The gist is great for stepwise inspection, but for debugging, we need something a little more compact:

    method dump {
    gather for (Up, Front, Right, Back, Left, Down) -> $side {
    take %!Sides{$side}.join('');

    This iterates over the sides in a specific order, and then uses the gather take syntax to collect string representations of each side, then joining them all together with a |. Now we can write tests like:

    use Test; use Cube::Three;
    my $a =;
    is $a.R.U2...R....U2.L.U..U.L.dump,
    'corners rotation';

    This is actually the method used in the final step of the algorithm. With this debug output, I can take a pristine cube, do the moves myself, and then quickly transcribe the resulting cube state into a string for testing.

    While the computer doesn't necessarily need to rotate the cube, it will make it easier to follow the algorithm directly if we can rotate the cube, so we add one for each of the six possible turns, e.g.:

    method rotate-F-U {
    # In addition to moving the side data, have to
    # re-orient the indices to match the new side.
    my $temp = %!Sides{Up};
    %!Sides{Up} = %!Sides{Front};
    %!Sides{Front} = %!Sides{Down};
    %!Sides{Down} = %!Sides{Back};
    %!Sides{Back} = $temp;

    As we turn the cube from Front to Up, we rotate the Left and Right sides in place. Because the orientation of the cells changes as we change faces, as we copy the cells from face to face, we also may have to rotate them to insure they end up facing in the correct direction. As before, we return self to allow for method chaining.

    As we start testing, we need to make sure that we can tell when the cube is solved; we don't care about the orientation of the cube, so we verify that the center color matches all the other colors on the face:

    method solved {
    for (Up, Down, Left, Right, Back, Front) -> $side {
    return False unless
    %!Sides{$side}.all eq %!Sides{$side}[4];
    return True;

    For every side, we use a Junction of all the colors on a side to compare to the center cell (always position 4). We fail early, and then succeed only if we made it through all the sides.

    Next I added a way to scramble the cube, so we can consider implementing a solve method.

    method scramble {
    my @random = <U D F R B L>.roll(100).squish[^10];
    for @random -> $method {
    my $actual = $method ~ ("", "2", "ʼ").pick(1);

    This takes the six base method names, picks a bunch of random values, then squishes them (insures that there are no dupes in a row), and then picks the first 10 values. We then potentially add on a 2 or a ʼ. Finally, we use the indirect method syntax to call the individual methods by name.

    Finally, I'm ready to start solving! And this is where things got complicated. The first steps of the beginner method are often described as intuitive. Which means it's easy to explain… but not so easy to code. So, spoiler alert, as of the publish time of this article, only the first step of the solve is complete. For the full algorithm for the first step, check out the linked github site.

    method solve {
    method solve-top-cross {
    sub completed {
    %!Sides{Up}[1,3,5,7].all eq 'W' &&
    %!Sides{Front}[5] eq 'R' &&
    %!Sides{Right}[5] eq 'B' &&
    %!Sides{Back}[5] eq 'O' &&
    %!Sides{Left}[5] eq 'G';
    while !completed() {
    # Move white-edged pieces in second row up to top
    # Move incorrectly placed pieces in the top row to the middle
    # Move pieces from the bottom to the top

    Note the very specific checks to see if we're done; we use a lexical sub to wrap up the complexity – and while we have a fairly internal check here, we see that we might want to abstract this to a point where we can say "is this edge piece in the right orientation". To start with, however, we'll stick with the individual cells.

    The guts of solve-top-cross are 100+ lines long at the moment, so I won't go through all the steps. Here's the "easy" section

    my @middle-edges =
    [Front, Right],
    [Right, Back],
    [Back, Left],
    [Left, Front],
    for @middle-edges -> $edge {
    my $side7 = $edge[0];
    my $side1 = $edge[1];
    my $color7 = %!Sides{$side7}[7];
    my $color1 = %!Sides{$side1}[1];
    if $color7 eq 'W' {
    # find number of times we need to rotate the top:
    my $turns = (
    @ordered-sides.first($side1, :k) -
    @ordered-sides.first(%expected-sides{~$color1}, :k)
    ) % 4;
    self.U for 1..$turns;
    self.for 1..$turns;
    next MAIN;
    } elsif $color1 eq 'W' {
    my $turns = (
    @ordered-sides.first($side7, :k) -
    @ordered-sides.first(%expected-sides{~$color7}, :k)
    ) % 4;
    self.for 1..$turns;
    self.U for 1..$turns;
    next MAIN;

    When doing this section on a real cube, you'd rotate the cube without regard to the side pieces, and just get the cross in place. To make the algorithm a little more "friendly", we keep the centers in position for this; we rotate the Up side into place, then rotate the individual side into place on the top, then rotate the Up side back into the original place.

    One of the interesting bits of code here is the .first(..., :k) syntax, which says to find the first element that matches, and then return the position of the match. We can then look things up in an ordered list so we can calculate the relative positions of two sides.

    Note that the solving method only calls to the public methods to turn the cube; While we use raw introspection to get the cube state, we only use "legal" moves to do the solving.

    With the full version of this method, we now solve the white cross with this program:

    #!/usr/bin/env perl6
    use Cube::Three;
    my $cube =;
    say $cube;
    say '';
    say $cube;

    which generates this output given this set of moves (Fʼ L2 B2 L Rʼ Uʼ R Fʼ D2 B2). First is the scramble, and then is the version with the white cross solved.

          W G G
          Y W W
          Y Y Y
    O O B R R R G B O Y Y B
    R G O B R R G B G W O B
    Y B B R O W G G G W W O
          W W O
          Y Y O
          B R R
          Y W W
          W W W
          G W R
    O G W O R Y B B G R O G
    Y G G R R B R B Y R O G
    O O R Y O W O O R W Y B
          G G B
          B Y Y
          Y B B

    This sample prints out the moves used to do the scramble, shows the scrambled cube, "solves" the puzzle (which, as of this writing, is just the white cross), and then prints out the new state of the cube.

    Note that as we get further along, the steps become less "intuitive", and, in my estimation, much easier to code. For example, the last step requires checking the orientationof four pieces, rotating the cube if necessary, and then doing a 14-step set of moves. (shown in the test above).

    Hopefully my love of cubing and Perl 6 have you looking forward to your next project!

    I'll note in the comments when the module's solve is finished, for future readers.

    Perl 6 Advent Calendar: Day 23 – The Wonders of Perl 6 Golf

    Published by AlexDaniel on 2017-12-23T00:00:05

    Ah, Christmas! What could possibly be better than sitting around the table with your friends and family and playing code golf! … Wait, what?

    Oh, right, it’s not Christmas yet. But you probably want to prepare yourself for it anyway!

    If you haven’t noticed already, there’s a great website for playing code golf: The cool thing about it is that it’s not just for perl 6! At the time of writing, 6 other langs are supported. Hmmm…

    Anyway, as I’ve got some nice scores there, I thought I’d share some of the nicest bits from my solutions. All the trickety-hackety, unicode-cheatery and mind-blowety. While we are at it, maybe we’ll even see that perl 6 is quite concise and readable even in code golf. That is, if you have a hard time putting your Christmas wishes on a card, maybe a line of perl 6 code will do.

    I won’t give full solutions to not spoil your Christmas fun, but I’ll give enough hints for you to come up with competitive solutions.

    All I want for Christmas is for you to have some fun. So get yourself rakudo to make sure you can follow along. Later we’ll have some pumpkin pie and we’ll do some caroling. If you have any problems running perl 6, perhaps join #perl6 channel on freenode to get some help. That being said, itself gives you a nice editor to write and eval your code, so there should be no problem.

    Some basic examples

    Let’s take Pascal’s Triangle task as an example. I hear ya, I hear! Math before Christmas, that’s cruel. Cruel, but necessary.

    There’s just one basic trick you have to know. If you take any row from the Pascal’s Triangle, shift it by one element and zip-sum the result with the original row, you’ll get the next row!

    So if you had a row like:

    1 3 3 1

    All you do is just shift it to the right:

    0 1 3 3 1

    And sum it with the original row:

    1 3 3 1
    + + + +
    0 1 3 3 1
    1 4 6 4 1

    As simple as that! So let’s write that in code:

    for ^16 { put (+combinations($^row,$_) for 0..$row) }

    You see! Easy!

    … oh… Wait, that’s a completely different solution. OK, let’s see:

    .put for 1, { |$_,0 Z+ 0,|$_ } … 16


    1 1
    1 2 1
    1 3 3 1
    1 4 6 4 1
    1 5 10 10 5 1
    1 6 15 20 15 6 1
    1 7 21 35 35 21 7 1
    1 8 28 56 70 56 28 8 1
    1 9 36 84 126 126 84 36 9 1
    1 10 45 120 210 252 210 120 45 10 1
    1 11 55 165 330 462 462 330 165 55 11 1
    1 12 66 220 495 792 924 792 495 220 66 12 1
    1 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1
    1 14 91 364 1001 2002 3003 3432 3003 2002 1001 364 91 14 1
    1 15 105 455 1365 3003 5005 6435 6435 5005 3003 1365 455 105 15 1

    Ah-ha! There we go. So what happened there? Well, in perl 6 you can create sequences with a very simple syntax: 2, 4, 8 … ∞. Normally you’ll let it figure out the sequence by itself, but you can also provide a code block to calculate the values. This is awesome! In other languages you’d often need to have a loop with a state variable, and here it does all that for you! This feature alone probably needs an article or 𝍪.

    The rest is just a for loop and a put call. The only trick here is to understand that it is working with lists, so when you specify the endpoint for the sequence, it is actually checking for the number of elements. Also, you need to flatten the list with |.

    If you remove whitespace and apply all tricks mentioned in this article, this should get you to 26 characters. That’s rather competitive.

    Similarly, other tasks often have rather straightforward solutions. For example, for Evil Numbers you can write something like this:

    .base(2).comb(~1) %% 2 && .say for ^50

    Remove some whitespace, apply some tricks, and you’ll be almost there.

    Let’s take another example: Pangram Grep. Here we can use set operators:

    a..z .lc.comb && .say for @*ARGS

    Basically, almost all perl 6 solutions look like real code. It’s the extra -1 character oomph that demands extra eye pain, but you didn’t come here to listen about conciseness, right? It’s time to get dirty.


    Let’s talk numbers! 1 ² ③ ٤ ⅴ ߆… *cough*. You see, in perl 6 any numeric character (that has a corresponding numeric value property) can be used in the source code. The feature was intended to allow us to have some goodies like ½ and other neat things, but this means that instead of writing 50 you can write . Some golfing platforms will count the number of bytes when encoded in UTF-8, so it may seem like you’re not winning anything. But what about 1000000000000 and 𖭡? In any case, is unicode-aware, so the length of any of these characters will be 1.

    So you may wonder, which numbers can you write in that manner? There you go:

    -0.5 0.00625 0.025 0.0375 0.05 0.0625 0.083333 0.1
    0.111111 0.125 0.142857 0.15 0.166667 0.1875 0.2
    0.25 0.333333 0.375 0.4 0.416667 0.5 0.583333 0.6
    0.625 0.666667 0.75 0.8 0.833333 0.875 0.916667 1
    1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 10
    11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
    28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
    45 46 47 48 49 50 60 70 80 90 100 200 300 400 500
    600 700 800 900 1000 2000 3000 4000 5000 6000 7000
    8000 9000 10000 20000 30000 40000 50000 60000 70000
    80000 90000 100000 200000 216000 300000 400000
    432000 500000 600000 700000 800000 900000 1000000
    100000000 10000000000 1000000000000

    This means, for example, that in some cases you can save 1 character when you need to negate the result. There are many ways you can use this, and I’ll only mention one particular case. The rest you figure out yourself, as well as how to find the actual character that can be used for any particular value (hint: loop all 0x10FFFF characters and check their .univals).

    For example, when golfing you want to get rid of unnecessary whitespace, so maybe you’ll want to write something like:

    say 5max3 # ERROR

    It does not work, of course, and we can’t really blame the compiler for not untangling that mess. However, check this out:

    saymax# OUTPUT: «5␤»

    Woohoo! This will work in many other cases.


    If there is a good golfing language, that’s not Perl 6. I mean, just look at this:

    puts 10<30?1:2 # ruby
    say 10 <30??1!!2 # perl 6

    Not only TWO more characters are needed for the ternary, but also some obligatory whitespace around < operator! What’s wrong with them, right? How dare they design a language with no code golf in mind⁉

    Well, there are some ways we can work around it. One of them is operator chaining. For example:

    say 5>3>say(42)

    If 5 is ≤ than 3, then there’s no need to do the other comparison, so it won’t run it. This way we can save at least one character. On a slightly related note, remember that junctions may also come in handy:

    say yes! if 5==3|5

    And of course, don’t forget about unicode operators: , , .

    Typing is hard, let’s use some of the predefined strings!

    You wouldn’t believe how useful this is sometimes. Want to print the names of all chess pieces? OK:

    say (.uniname».words»[2]

    This saves just a few characters, but there are cases when it can halve the size of your solution. But don’t stop there, think of error messages, method names, etc. What else can you salvage?

    Base 16? Base 36? Nah, Base 0x10FFFF!

    One of the tasks tells us to print φ to the first 1000 decimal places. Well, that’s very easy!

    say 1.6180339887498948482045868343656381177203091798057628621354486227052604628189024497072072041893911374847540880753868917521266338622235369317931800607667263544333890865959395829056383226613199282902678806752087668925017116962070322210432162695486262963136144381497587012203408058879544547492461856953648644492410443207713449470495658467885098743394422125448770664780915884607499887124007652170575179788341662562494075890697040002812104276217711177780531531714101170466659914669798731761356006708748071013179523689427521948435305678300228785699782977834784587822891109762500302696156170025046433824377648610283831268330372429267526311653392473167111211588186385133162038400522216579128667529465490681131715993432359734949850904094762132229810172610705961164562990981629055520852479035240602017279974717534277759277862561943208275051312181562855122248093947123414517022373580577278616008688382952304592647878017889921990270776903895321968198615143780314997411069260886742962267575605231727775203536139362


    Okay, that takes a bit more than 1000 characters… Of course, we can try to calculate it, but that is not exactly in the Christmas spirit. We want to cheat.

    If we look at the docs about polymod, there’s a little hint:

    my @digits-in-base37 = 9123607.polymod(37 xx *); # Base conversion

    Hmmm… so that gives us digits for any arbitrary base. How high can we go? Well, it depends on what form we would like to store the number in. Given that counts codepoints, we can use base 0x10FFFF (i.e. using all available codepoints). Or, in this case we will go with base 0x10FFFE, because:

    ☠☠☠⚠⚠⚠ WARNING! WARNING! WARNING! ⚠⚠⚠☠☠☠
    ☠☠☠⚠⚠⚠ WARNING! WARNING! WARNING! ⚠⚠⚠☠☠☠

    When applied to our constant, it should give something like this:


    How do we reverse the operation? During one of the squashathons I found a ticket about a feature that I didn’t know about previously. Basically, the ticket says that Rakudo is doing stuff that it shouldn’t, which is of course something we will abuse next time. But for now we’re within the limits of relative sanity:

    say 1.,:1114110[o򲔐𦔏򄠔񟯶󐚉񯓦򝼤񋩟󅾜󖾩񆔈򡔙򝤉񎗎񕧣񡉽󎖪󽡂􂳚񖨸򆀍􋵔󴈂𨬎򭕴򢑬񛉿򰏷𰑕󜆵򾩴ந񘚡𐂇򘮇񢻳𺐅࿹𪏸񄙍򞏡򈘏󬥝𫍡𱀉򌝓򭀢񤄓􋯱󜋝񟡥𖏕񖾷򇋹🼟򠍍񿷦𧽘嗟󬯞񿡥𸖉񿒣򄉼󣲦󉦩󸾧󎓜𦅂񰃦񲍚􍰍𧮁񦲋򶟫𰌡򡒶䨀𗋨𛰑򾎹򄨠󑓮򁇐𵪶𱫞񱛦󿥐򌯎񾖾򳴪򕩃󧨑𥵑򦬽񡇈򌰘񿸶񿜾寡򔴩񻊺񛕄񌍌󶪼􁇘񶡁󃢖򗔝񽑖򮀓󘓥󼿶󢽈򰯬끝󡯮磪󂛕򩻛񲽤򊥍􆃂뎛𘝞򊕆𝧒񰕺𭙪򺗝󲝂󊹛𺬛𛒕񿢖󵹱󮃞󟝐񱷳􋻩𿞸񫵗򣥨򚘣򶝠򯫞󌋩򑠒򅳒𔇆񘦵򌠐𢕍򡀋𪱷𢍟񗈼򙯬񨚑񙦅󘶸󜹕򷒋񤍠󻁾.ords]

    Note that the string has to be in reverse. Other than that it looks very nice. 192 characters including the decoder.

    This isn’t a great idea for printing constants that are otherwise computable, but given the length of the decoder and relatively dense packing rate of the data, this comes handy in other tasks.

    All good things must come to an end; horrible things – more so

    That’s about it for the article. For more code golf tips I’ve started this repository:

    Hoping to see you around on! Whether using perl 6 or not, I’d love to see all of my submissions beaten.


    Perl 6 Advent Calendar: Day 22 – Features of Perl 6.d

    Published by liztormato on 2017-12-22T00:00:49

    So there we are. Two years after the first official release of Rakudo Perl 6. Or 6.c to be more precise. Since Matt Oates already touched on the performance improvements since then, Santa thought to counterpoint this with a description of the new features for 6.d that have been implemented since then. Because there have been many, Santa had to make a selection.

    Tweaking objects at creation

    Any class that you create can now have a TWEAK method. This method will be called after all other initializations of a new instance of the class have been done, just before it is being returned by .new. A simple, bit contrived example in which a class A has one attribute, of which the default value is 42, but which should change the value if the default is specified at object creation:

    class A {
        has $.value = 42;
        method TWEAK(:$value = 0) { # default prevents warning
            # change the attribute if the default value is specified
            $!value = 666 if $value == $!value;
    # no value specified, it gets the default attribute value
    dd;              # => 42)
    # value specified, but it is not the default
    dd => 77); # => 77)
    # value specified, and it is the default 
    dd => 42); # => 666)

    Concurrency Improvements

    The concurrency features of Rakudo Perl 6 saw many improvements under the hood. Some of these were exposed as new features. Most prominent are Lock::Async (a non-blocking lock that returns a Promise) and atomic operators.

    In most cases, you will not need to use these directly, but it is probably good that you know about atomic operators if you’re engaged in writing programs that use concurrency features. An often occurring logic error, especially if you’ve been using threads in Pumpking Perl 5, is that there is no implicit locking on shared variables in Rakudo Perl 6. For example:

       my int $a;
        await (^5).map: {
            start { ++$a for ^100000 }
        say $a; # something like 419318

    So why doesn’t that show 500000? The reason for this is that we had 5 threads that were incrementing the same variable at the same time. And since incrementing consists of a read step, an increment step and write step, it became very easy for one thread to do the read step at the same time as another thread. And thus losing an increment. Before we had atomic operators, the correct way of doing the above code would be:

       my int $a;
        my $l =;
        await (^5).map: {
           start {
               for ^100000 {
                   $l.protect( { ++$a } )
        say $a; # 500000

    This would give you the correct answer, but would be at least 20x as slow.

    Now that we have atomic variables, the above code becomes:

       my atomicint $a;
        await (^5).map: {
            start { ++⚛$a for ^100000 }
        say $a; # 500000

    Which is very much like the original (incorrect) code. And this is at least 6x as fast as the correct code using Lock.protect.

    Unicode goodies

    So many, so many. For instance, you can now use , , as Unicode versions of <=, >= and != (complete list).

    You can now also create a grapheme by specifying the Unicode name of the grapheme, e.g.:

    say "BUTTERFLY".parse-names; # 🦋

    or create the Unicode name string at runtime:

    print "$t-$_".parse-names for 3..6; # 👍🏼👍🏽👍🏾👍🏿

    Or collate instead of just sort:

    # sort by codepoint value
    say <ä a o ö>.sort; # (a o ä ö)
    # sort using Unicode Collation Algorithm
    say <ä a o ö>.collate; # (a ä o ö)

    Or use unicmp instead of cmp:

    say "a" cmp "Z"; # More
     say "a" unicmp "Z"; # Less

    Or that you can now use any Unicode digits Match variables ( for $1), negative numbers ( for -1), and radix bases (:۳("22") for :3("22")).

    It’s not for nothing that Santa considers Rakudo Perl 6 to have the best Unicode support of any programming language in the world!

    Skipping values

    You can now call .skip on Seq and Supply to skip a number of values that were being produced. Together with .head and .tail this gives you ample manipulexity with Iterables and Supplies.

    By the way, .head now also takes a WhateverCode so you can indicate you want all values except the last N (e.g. .head(*-3) would give you all values except the last three). The same goes for .tail (e.g. .tail(*-3) would give you all values except the first three).

    Some additions to the Iterator role make it possible for iterators to support the .skip functionality even better. If an iterator can be more efficient in skipping a value than to actually produce it, it should implement the skip-one method. Derived from this are the skip-at-least and skip-at-least-pull-one methods that can be provided by an iterator.

    An example of the usage of .skip to find out the 1000th prime number:

    say (^Inf).grep(*.is-prime)[999]; # 7919


    say (^Inf).grep(*.is-prime).skip(999).head; # 7919

    The latter is slightly more CPU efficient, but more importantly much more memory efficient, as it doesn’t need to keep the first 999 prime numbers in memory.

    Of Bufs and Blobs

    Buf has become much more like an Array, as it now supports .push, .append, .pop, .unshift, .prepend, .shift and .splice. It also has become more like Str with the addition of a subbuf-rw (analogous with .substr-rw), e.g.:

    my $b =;
    $b.subbuf-rw(2,3) =^5);
    say $b.perl; #,101,0,1,2,3,4,105)

    You can now also .allocate a Buf or Blob with a given number of elements and a pattern. Or change the size of a Buf with .reallocate:

    my $b = Buf.allocate(10,(1,2,3));
    say $b.perl; #,2,3,1,2,3,1,2,3,1)
    say $b.perl; #,2,3,1,2)

    Testing, Testing, Testing!

    The plan subroutine of now also takes an optional :skip-all parameter to indicate that all tests in the file should be skipped. Or you can call bail-out to abort the test run marking it as failed. Or set the PERL6_TEST_DIE_ON_FAIL environment variable to a true value to indicate you want the test to end as soon as the first test has failed.

    What’s Going On

    You can now introspect the number of CPU cores in your computer by calling Kernel.cpu-cores. The amount of CPU used since the start of the program is available in Kernel.cpu-usage, while you can easily check the name of the Operating System with VM.osname.

    And as if that is not enough, there is a new Telemetry module which you need to load when needed, just like the Test module. The Telemetry module provides a number of primitives that you can use directly, such as:

    use Telemetry;
    say T<wallclock cpu max-rss>; # (138771 280670 82360)

    This shows the number of microseconds since the start of the program, the number of microseconds of CPU used, and the number of Kilobytes of memory that were in use at the time of call.

    If you want get to a report of what has been going on in your program, you can use snap and have a report appear when your program is done. For instance:

    use Telemetry;
    Nil for ^10000000;  # something that takes a bit of time

    The result will appear on STDERR:

    Telemetry Report of Process #60076
    Number of Snapshots: 2
    Initial/Final Size: 82596 / 83832 Kbytes
    Total Time:           0.55 seconds
    Total CPU Usage:      0.56 seconds
    No supervisor thread has been running
    wallclock  util%  max-rss
       549639  12.72     1236
    --------- ------ --------
       549639  12.72     1236
    wallclock  Number of microseconds elapsed
        util%  Percentage of CPU utilization (0..100%)
      max-rss  Maximum resident set size (in Kbytes)

    If you want a state of your program every .1 of a second, you can use the snapper:

    use Telemetry;
    Nil for ^10000000;  # something that takes a bit of time

    The result:

    Telemetry Report of Process #60722
    Number of Snapshots: 7
    Initial/Final Size: 87324 / 87484 Kbytes
    Total Time:           0.56 seconds
    Total CPU Usage:      0.57 seconds
    No supervisor thread has been running
    wallclock  util%  max-rss
       103969  13.21      152
       101175  12.48
       101155  12.48
       104097  12.51
       105242  12.51
        44225  12.51        8
    --------- ------ --------
       559863  12.63      160
    wallclock  Number of microseconds elapsed
        util%  Percentage of CPU utilization (0..100%)
      max-rss  Maximum resident set size (in Kbytes)

    And many more options are available here, such as getting the output in .csv format.

    The MAIN thing

    You can now modify the way MAIN parameters are handled by setting options in %*SUB-MAIN-OPTS. The default USAGE message is now available inside the MAIN as the $*USAGE dynamic variable, so you can change it if you want to.

    Embedding Perl 6

    Two new features make embedding Rakudo Perl 6 easier to handle:
    the &*EXIT dynamic variable now can be set to specify the action to be taken when exit() is called.

    Setting the environment variable RAKUDO_EXCEPTIONS_HANDLER to "JSON" will throw Exceptions in JSON, rather than text, e.g.:

    $ RAKUDO_EXCEPTIONS_HANDLER=JSON perl6 -e '42 = 666'
      "X::Assignment::RO" : {
        "value" : 42,
        "message" : "Cannot modify an immutable Int (42)"

    Bottom of the Gift Bag

    While rummaging through the still quite full gift bag, Santa found the following smaller prezzies:

    Time to catch a Sleigh

    Santa would like to stay around to tell you more about what’s been added, but there simply is not enough time to do that. If you really want to keep up-to-date on new features, you should check out the Additions sections in the ChangeLog that is updated with each Rakudo compiler release.

    So, catch you again next year!

    Best wishes from


    Perl 6 Advent Calendar: Day 21 – Sudoku with Junctions and Sets

    Published by scimon on 2017-12-21T00:00:31

    There are a number of core elements in Perl6 that give you powerful tools to do things in a concise and powerful way. Two of these are Junctions and Sets which share a number of characteristics but are also wildly different. In order to demonstrate the power of these I’m going to look at how they can be used with a simple problem, Sudoku puzzles.

    Sudoku : A refresher

    So for those of you who don’t know a Sudoku puzzle is a 9 by 9 grid that comes supplied with some cells filled in with numbers between 1 and 9. The goal is to fill in all the cells with numbers between 1 and 9 so that no row, column or sub square has more than one of any of the numbers in it.

    There’s a few ways to represent a Sudoku puzzle, my personal favourite being a 9 by 9 nested array for example :

    my @game = [

    In this situation the cells with no value assigned are given a 0, this way all the cells have an Integer value assigned to them. The main thing to bear in mind with this format is you need to reference cells using @game[$y][$x] rather than @game[$x][$y].

    Junctions : Quantum Logic Testing

    One of the simplest ways to use Junctions in Perl6 is in a logical test. The Junction can represent a selection of values you are wanting to test against. For example :

    if ( 5 < 1|10 < 2 ) { say "Spooky" } else { say "Boo" }

    So, not only does this demonstrate operator chaining (something that experienced programmers may already be looking confused about) but the any Junction ( 1|10 ) evaluates to True for both 5 < 10 and 1 < 2. In this way Junctions can be extremely powerful already, it’s when you assign a variable container to them that it gets really interesting.

    One of the tests we’d like to be able to make on our Sudoku puzzle is to see if it’s full. By which I mean every cell has been assigned a value greater than 0. A full puzzle may not be completed correctly but there’s a guess in each cell. Another way of putting that would be that none of the cells has a value of 0. Thus we can define a Junction and store it in a scalar variable we can test it at any point to see if the puzzle is full.

    my $full-test = none( (^9 X ^9).map(-> ($x,$y) { 
    } ) );
    say so $full-test == 0;

    In this case the game has a number of 0’s still in it so seeing if $full-test equals 0 evaluates to False. Note that without the so to cast the result to a Boolean you’ll get a breakdown of the cells that are equal to 0 only if all of these are False will the Junction evaluate to True.

    Note also the use of the ^9 and X operators to generate two Ranges from 0 to 8 and then the cross product of these two lists of 9 characters to make a list of all the possible X,Y co-ordinates of the puzzle. It’s this kind of powerful simplicity that is one of the reasons I love Perl6. But I digress.

    The strength of this method is that once you’ve defined the Junction you don’t need to modify it. If you change the values stored in the Array the Junction will look at the new values instead (note this only holds true for updating individual cells, if you swap out a whole sub array with a new one you’ll break the Junction).

    So that’s a simple use of a Junction so store a multi-variable test you can reuse. But it gets more interesting when you realise that the values in a Junction can themselves be Junctions.

    Lets look at a more complex test, a puzzle is complete if for every row, column and square in the puzzle there is only one of each number. In order to make this test we’re going to need three helper functions.

    subset Index of Int where 0 <= * <= 8; 
    sub row( Index $y ) {
        return (^9).map( { ( $_, $y ) } ); 
    sub col( Index $x ) {
         return (^9).map( { ( $x, $_ ) } ); 
    multi sub square( Index $sq ) {
        my $x = $sq % 3 * 3;
        my $y = $sq div 3 * 3;
        return self.square( $x, $y );
    multi sub square( Index $x, Index $y ) {
         my $tx = $x div 3 * 3;
         my $ty = $y div 3 * 3;
         return ( (0,1,2) X (0,1,2) ).map( -> ( $dx, $dy ) { 
            ( $tx + $dx, $ty + $dy ) 
        } );

    So here we define an Index as a value between 0 and 8 and then define our sub‘s to return a List of List‘s with the sub lists being a pair of X and Y indices’s. Note that our square function can accept one or two positional arguments. In the single argument we define the sub squares with 0 being in the top left then going left to right with 8 being the bottom right. The two argument version gives use the list of cells in the square for a given cell (including itself).

    So with these in place we can define our one() lists for each row, column and square. Once we have them we can them put them into an all() junction.

    my $complete-all = all(
                    one( row( $_ ).map( -> ( $x, $y ) { 
                    } ) ),
                    one( col( $_ ).map( -> ( $x, $y ) { 
                    } ) ),
                    one( square( $_ ).map( -> ( $x, $y ) { 
                    } ) )

    Once we have that testing to see if the puzzle is complete is quite simple.

    say [&&] (1..9).map( so $complete-all == * );

    Here we test each possible cell value of 1 through 9 against the Junction, in each case this will be True if all the one() Junctions contains only one of the value. Then we use the [] reduction meta-operator to chain these results to give a final True / False value (True if all the results are True and False otherwise). Again this test can be reused as you add values to the cells and will only return True when the puzzle has been completed and is correct.

    Once again we’ve got a complex test boiled down to a single line of code. Our $complete-all variable needs to be defined once and is then valid for the rest of the session.

    This sort of nested junction tests can reach many levels, a final example is if we want to test if a current puzzle is valid. By which I mean it’s not complete but it doesn’t have any duplicate numbers in and row, column or square. Once again we can make a Junction for this, for each row, column or square it’s valid if one or none of the cells is set to each of the possible values.  Thus our creation of the Junction is similar to the $complete-all one.

    $valid-all = all(
                        none( row( $_ ).map( -> ( $x, $y ) {
                        } ) ),
                        one( row( $_ ).map( -> ( $x, $y ) {
                        } ) ) 
                        none( col( $_ ).map( -> ( $x, $y ) {
                        } ) ),
                        one( col( $_ ).map( -> ( $x, $y ) { 
                        } ) ) 
                        none( square( $_ ).map( -> ( $x, $y ) {
                        } ) ),
                        one( square( $_ ).map( -> ( $x, $y ) {
                        } ) ) 

    The test for validity is basically the same as the test for completeness.

    say [&&] (1..9).map( so $valid-all == * );

    Except in this case our puzzle is valid and so we get a True result.

    Sets : Collections of Objects

    Whilst the Junctions are useful to test values they aren’t as useful if we want to try solving the puzzle. But Perl6 has another type of collection that can come in very handy. Sets, (and their related types Bags and Mixes) let you collect items and then apply mathematical set operations to them to find how different Sets interact with each other.

    As an example we’ll define a possible function  that returns the values that are possible for a given cell. If the cell has a value set we will return the empty list.

    sub possible( Index $x, Index $y, @game ) {
        return () if @game[$y][$x] > 0;
                ( row($y).map( -> ( $x, $y ) { 
                } ).grep( * > 0 ) ),
                ( col($x).map( -> ( $x, $y ) { 
                } ).grep( * > 0 ) ),
                ( square($x,$y).map( -> ( $x, $y ) { 
                } ).grep( * > 0 ) )

    Here we find the different between the numbers 1 through 9 and the Set made up of the values of the row, column and square the given cell is in. We ignore cells with a 0 value using grep. As Sets store their details as unordered key / value pairs we get the keys and then sort them for consistency. Note that here we’re using the ascii (-) version of the operator, we could also use the Unicode version instead.

    We could define the set as the union of each of the results from row, col and square and the result would be the same. Also we’re using the two argument version of square in this case.

    It should be noted that this is the simplest definition of possible values, there’s no additional logic going on but even this simple result lets us do the simplest of solving algorithms. If this case we loop around every cell in the grid and if it’s got 1 possible value we can set the value to that. In this case we’ll loop round, get a list of cells to set, then loop through the list and set the values. If the list of ones to set is empty or the puzzle is complete then we stop.

    my @updates;
    repeat {
        @updates = (^9 X ^9).map( -> ($x,$y) { 
            ($x,$y) => possible($x,$y,@game) 
        } ).grep( *.value.elems == 1 );
        for @updates -> $pair { 
            my ( $x, $y ) = $pair.key; 
            @game[$y][$x] = $pair.value[0];
    } while ( @updates.elems > 0 && 
              ! [&&] (1..9).map( so $complete-all == * ) );

    So we make a list of Pairs where the key is the x,y coordinates and the value is the possible values. Then we remove all those that don’t have one value. This is continued until there are no cells found with a single possible value or the puzzle is complete.

    Another way of finding solutions is to get values that only appear in one set of possibilities in a given, row, column or square. For example if we have the following possibilities:


    1 and 5 only appear in the row once each. We can make use of the symmetric set difference operator and operator chaining to get this.

    say (1,2,3) (^) (2,3,4) (^) () (^) () (^) (4,5) (^) () (^) () (^) (2,3,4) (^) ()
    set(1 5)

    Of course in that case we can use the reduction meta-operator on the list instead.

    say [(^)] (1,2,3),(2,3,4),(),(),(4,5),(),(),(2,3,4),()
    set(1 5)

    So in that case the algorithm is simple (in this case I’ll just cover rows, the column and square code is basically the same).

    my @updates;
    for ^9 -> $idx {
        my $only = [(^)] row($idx).map( -> ( $x,$y ) { 
        } );
        for $only.keys -> $val {
            for row($idx) -> ($x,$y) {
                if $val (elem) possible($x,$y,@game) {
                    @updates.push( ($x,$y) => $val );

    We then can loop through the updates array similar to above. Combining these two algorithms can solve a large number of Sudoku puzzle by themselves and simplify others.

    Note we have to make two passes, firstly we get the numbers we’re looking for and then we have to look through each row and find where the number appears. For this we use the (elem) operator. Sets can also be referenced using Associative references for example:

    say set(1,5){1}

    A note on Objects

    So for all the examples so far I’ve used basic integers. But there’s nothing stopping you using Objects in your Junctions and Sets. There are a few things to bear in mind though, Sets use the === identity operator for their tests. Most objects will fail an identity check unless you have cloned them or have defined the WHICH method in a way that will allow them to be compared.

    For the Sudoku puzzle you may want to create a CellValue class that stores whether the number was one of the initial values in the puzzle. If you do this though you’ll need to override WHICH and make it return the Integer value of the Cell. As long as you are fine with an identity check being technically invalid in this case (two different CellValues may have the same value but the won’t be the same object) then you can put them in Sets.

    I hope you’ve found this interesting, Junctions and Sets are two of the many different parts of Perl6 that give you power to do complex tasks simply. If you’re interested in the code here there’s a Object based version available to use you can install with :

    zef install Game::Sudoku

    Strangely Consistent: Has it been three years?

    Published by Carl Mäsak

    007, the toy language, is turning three today. Whoa.

    On its one-year anniversary, I wrote a blog post to chronicle it. It seriously doesn't feel like two years since I wrote that post.

    On and off, in between long stretches of just being a parent, I come back to 007 and work intensely on it. I can't remember ever keeping a side project alive for three years before. (Later note: Referring to the language here, not my son.) So there is that.

    So in a weird way, even though the language is not as far along as I would expect it to be after three years, I'm also positively surprised that it still exists and is active after three years!

    In the previous blog post, I proudly announce that "We're gearing up to an (internal) v1.0.0 release". Well, we're still gearing up for v1.0.0, and we are closer to it. The details are in the roadmap, which has become much more detailed since then.

    Noteworthy things that happened in these past two years:

    Things that I'm looking forward to right now:

    I tried to write those in increasing order of difficulty.

    All in all, I'm quite eager to one day burst into #perl6 or #perl6-dev and actually showcase examples where macros quite clearly do useful, non-trivial things. 007 is and has always been about producing such examples, and making them run in a real (if toy) environment.

    And while we're not quite over that hump yet, we're perceptibly closer than we were two years ago.

    Belated addendum: Thanks and hugs to sergot++, to vendethiel++, to raiph++ and eritain++, for sharing the journey with me so far. Rakudo Star Release 2017.10

    Published on 2017-11-09T00:00:00

    gfldex: Racing Rakudo

    Published by gfldex on 2017-11-05T17:39:33

    In many racing sports telemetry plays a big role in getting faster.  Thanks to a torrent of commits by lizmat you can use telemetry now too!

    perl6 -e 'use Telemetry; snapper(½); my @a = (‚aaaa‘..‚zzzz‘).pick(1000); say @a.sort.[*-1 / 2];'
    Telemetry Report of Process #30304 (2017-11-05T17:24:38Z)
    No supervisor thread has been running
    Number of Snapshots: 31
    Initial Size:        93684 Kbytes
    Total Time:          14.74 seconds
    Total CPU Usage:     15.08 seconds
    wallclock  util%  max-rss  gw      gtc  tw      ttc  aw      atc
       500951  53.81     8424
       500557  51.92     9240
       548677  52.15    12376
       506068  52.51      196
       500380  51.94     8976
       506552  51.74     9240
       500517  52.45     9240
       500482  52.33     9504
       506813  51.67     6864
       502634  51.63
       500520  51.78     6072
       500539  52.13     7128
       503437  52.29     7920
       500419  52.45     8976
       500544  51.89     8712
       500550  49.92     6864
       602948  49.71     8712
       500548  50.33
       500545  49.92      320
       500518  49.92
       500530  49.92
       500529  49.91
       500507  49.92
       506886  50.07
       500510  49.93     1848
       500488  49.93
       500511  49.93
       508389  49.94
       508510  51.27      264
        27636  58.33
    --------- ------ -------- --- -------- --- -------- --- --------
     14738710  51.16   130876
    wallclock  Number of microseconds elapsed
        util%  Percentage of CPU utilization (0..100%)
      max-rss  Maximum resident set size (in Kbytes)
           gw  The number of general worker threads
          gtc  The number of tasks completed in general worker threads
           tw  The number of timer threads
          ttc  The number of tasks completed in timer threads
           aw  The number of affinity threads
          atc  The number of tasks completed in affinity threads

    The snapper function takes an interval at which data is collected. On termination of the program the table above is shown.

    The module comes with plenty of subs to collect the same data at hand and file your own report. What may be sensible in long running processes. Or you call the reporter sub by hand every now and then.

    use Telemetry;
    react {
        whenever Supply.interval(60) {
            say report;

    If the terminal wont cut it you can use http to fetch telemetry data.

    Documentation isn’t finished nor is the module. So stay tuning for more data. Main Development Branch Renamed from "nom" to "master"

    Published on 2017-10-27T00:00:00