pl6anet

Perl 6 RSS Feeds

Steve Mynott (Freenode: stmuk) steve.mynott (at)gmail.com / 2018-11-20T14:11:16


Weekly changes in and around Perl 6: 2018.47 Piensa en Perl 6

Published by liztormato on 2018-11-19T20:31:28

Luis F. Uceta has created a Spanish version of Think Perl 6 (by Laurent Rosenfeld, based on a book by Allan Downey), teaching you the art of programming using Perl 6. Luis is looking for people proofreading / commenting on Piensa en Perl 6. So if you have mastered Spanish and you want your name in a book as a proofreader, this is your chance! Oh, and of course, you can download either book for free!

Rakudo Star 2018.10 in Chocolatey

brian d foy tells us that Rakudo Star 2018.10 is now available on Windows using Chocolatey. Not sure who to thank for this, but kudos to brian d foy to let us know it’s out there!

London Perl Workshop

Next weekend will see the 20th anniversary of the London Perl Workshop, a free one-day workshop of all things Perl in London. With a pre and post social event. And a full schedule of which the following presentations are about Perl 6:

There is still some room in the program, so if you’re going to be in London and you want to do a Perl presentation, it is not too late to submit a talk yet!

The European Perl Conference 2019

Andrew Shitov plugs the European Perl Conference in a lightning talk at the Barcelona Perl Workshop the other day. Main points: it’s in Riga, Latvia from Wed 7 to Fri 9 August 2019, with workshops on Mon 5 and Tue 6 August. The cost will be approximately 150 euros. Start planning your trips!

An interpreter in 4 minutes

Andrew Shitov also did a lightning talk on how to create a simple interpreter. Check it out if you want to get an idea of how simple the use of grammars can be!

Did you mean X?

Jo Christian Oterhals shows how he really uses Perl 6 code at work to improve the functionality of his company’s web site (Reddit comments).

Progress with RED ORM

Fernando Correa de Oliveira shows a really nice new feature of RED that allows you to express a SQL statement as Perl 6 code in a .map. Very cool abstractions there!

Core Developments

Questions about Perl 6

Meanwhile on Twitter

Meanwhile on FaceBook

Meanwhile on perl6-users

Perl 6 in comments

Three weeks worth, as promised last week:

Perl 6 Modules

New Modules:

Updated Modules:

Winding Down

Some nice speed improvements this week. Aleks-Daniel Jakimenko-Aleksejev is still busy shaking out the final blockers for the 2018.11 release of the Rakudo Perl 6 Compiler. That will be the first compiler release that defaults to 6.d. Hope to be able to tell you all about that next week. Until then!

Perl 6 Inside Out: 📺 Perl 6 grammars to create a simple interpreter

Published by andrewshitov on 2018-11-17T20:29:36

Here’s a screencast from a lightning talk that I gave at ‘Perl & Friends’ event in Barcelona today. That’s a simplified version of the previous talk 🙂

Jo Christian Oterhals: Perl 6 small stuff #13: Did you mean X?

Published by Jo Christian Oterhals on 2018-11-13T20:41:19

Given that I a while back wrote an article suggesting that Perl 6 should change its name to Century, you’d probably expect me to have an opinion about the Raku debacle. Well, I don’t. For me, I think the time has come to share something else than interesting peculiarities of Perl 6, namely real code I use in my work.

Too soon?

I work as a kind of Jack-of-all-trades at a Norwegian media company, mostly wrangling numbers. Now and then I need to program a few ad hoc scripts to help extract interesting data from various databases, web pages, etc. Some of these end up in production, deployed in the cloud. But a surprising amount of what I do is to write ad hoc code solving a one-time specific problem. Code that is neither particularly optimised nor especially robust, given that it will be used once and never leave my MacBook.

What follows is one example of the latter.

One day I noticed that quite a few of the searches our clients made in one of our databases returned zero results. It was easy to see that many of them were caused by incorrect spelling [1]. Wanting to investigate this a little further, I pulled six months worth of queries from our log database. Since this was not meant as anything but an experiment, I reduced the amount of that by removing multi-word queries, and keeping only one-word queries that contained alphabetic characters. I also normalized the words, i.e. I dropped accented versions of characters. That reduced the dataset to a manageable size. I stored the resulting dataset in a simple CSV file.

File: st2.csv (containing 726,609 queries; a subset shown here)
Gaza,94279,editorial
Kaulitz,1667,editorial
senkvald,0,editorial
senkveld,1608,editorial
plenklipper,0,editorial

The first column is the word that was searched for, the second column contained the number of results (the third is irrelevant for the discussion here).

Looking at this data I started wondering whether I could make qualified guesses as to what words the misspellers actually thought they wrote. In short: Could I create data that could form the basis of a Google-like “Did you mean X” feature?

I knew that I could use the Levenshtein-Damerau algorithm (among others) to compare one word to another and figure out how similar they were. But it seemed like overkill to compare each word to all the words in, say, a dictionary. A program doing that would take forever to run.

But: If you look at the example CSV above you see that the word “senkvald” has zero results while “senkveld” has 1608 results. That made me realize that for each mis-spelled query there’d likely be a correctly spelled query that returned results. I could limit the comparisons to queries with results.

What I needed was a program that created a list of mis-spelled zero-hit words and a list of words with hits; just that optimisation alone would speed things up. In addition I thought that the speed would increase even further if I utilized Perl 6 parallelisation stuff.

Now, bring out your torches and pitchforks, because I’m sure you’ll see lots of stuff to improve and optimize here. But in this case it’s a little beside the point, because some of the stuff I’ve done here is done because it’s simple to do and easy to read, and not because it’s the fastest way to do these things [2].

#!/usr/bin/env perl6
use v6.c;
use Text::Levenshtein::Damerau;
my %term;
my %alphabet-hits;
my %alphabet-zero-hits;
my @results;
my @promises;
my @zero-hit;
my @non-zero-hit;
my $PARALLELS = 4;
my $MAX-DISTANCE = 0.3;
my $MIN-COMP-LENGTH = 4;
my $MAX-COMPARISONS = 1_000_000;
constant @ALPHABET = (("a".."z"), (<æ ø å>)).flat;
$*ERR.out-buffer = 0;
$*OUT.out-buffer = 0;
read-words;
similarity;
sort-and-spurt;
sub read-words {
my $file = open "st2.csv";
for $file.lines -> $line {
my @elms = $line.split(",");
%term{lc(@elms[0])} += @elms[1];
}
$file.close;
note "Read search queries.";
for (%term.keys) -> $t {
@zero-hit.push($t) if %term{$t} == 0;
@non-zero-hit.push($t) if %term{$t} > 0;
}
@zero-hit = @zero-hit.grep(
{ $_.chars >= $MIN-COMP-LENGTH } ).sort;
@non-zero-hit = @non-zero-hit.grep(
{ $_.chars >= $MIN-COMP-LENGTH } ).sort;
note "Processed " ~ %term.keys.elems ~ " terms.";
for (@ALPHABET) -> $char {
%alphabet-hits{$char} =
(@non-zero-hit.grep( { $_.starts-with($char) } ));
%alphabet-zero-hits{$char} =
(@zero-hit.grep( { $_.starts-with($char) } ));
$max-no-comparisons +=
%alphabet-hits{$char}.elems *
%alphabet-zero-hits{$char}.elems;
}
note "Split dictionaries into alphabetized chunks.";
}
sub sort-and-spurt {
for (@results.sort) -> $o {
my ($a, $b) = $o.split(',');
my $near-equal =
($a.split('').unique.sort cmp $b.split('').unique.sort
== Same ?? '*' !! '');
say "$o,$near-equal";
}
note "Sorted and output finished.";
}
sub similarity {
my $processed-comparisons = 0;
my $zero-chunk-size = (@ALPHABET.elems / $PARALLELS).Int;
my @alphabet-copy = @ALPHABET;
for 1..$PARALLELS -> $t {
$zero-chunk-size++ if
($t == $PARALLELS && $zero-chunk-size < @alphabet-copy.elems);
my @alpha-chunk = @alphabet-copy.splice(0, $zero-chunk-size);
push @promises, start compare(@alpha-chunk, $t);
}
note "Finished waiting: " ~ await Promise.allof(@promises);
  sub compare(@alphabet-subset, $parallel-no) {
note "Parallel $parallel-no: " ~ @alphabet-subset.elems;
for (@alphabet-subset) -> $start-char {
for ((%alphabet-zero-hits{$start-char}).flat) -> $z {
for ((%alphabet-hits{$start-char}).flat) -> $nz {
my $distance = dld($z, $nz);
my $fraction = $z.chars > $nz.chars ??
$distance / $z.chars !! $distance / $nz.chars;
if $fraction < $MAX-DISTANCE {
@results.push: $z ~ "," ~ $nz ~ "," ~ $fraction;
}
note "Compared $processed-comparisons of " ~
"$max-no-comparisons."
if $processed-comparisons %% 10_000;
return False if ++$processed-comparisons > $MAX-COMPARISONS;
}
}
}
}
}

This script did the job! I got a list of around 17.000 suggestions that could be useful for implementing a “Did you mean X” type of routine. But what was unexpectedly more useful, and more immediately useful at that, is that we could use it internally to improve how we tagged our content. Instead of improving things automatically, it first and foremost improves manual work.

Anyway, back to the interesting stuff (at least to me) this uncovered about Perl 6:

  1. I used the keyword note a lot; note is like say, only that it outputs to STDERR. A little like warn in Perl 5. I could direct the output of the program to a file, while still being able to see the debug/error info.
  2. Output is often buffered. Most of the time I want it not to be. I want to see the results immediately, so I used this: $*OUT.out-buffer = 0; — This is equivalent to Perl 5’s $|++, except that the Perl 6 version is unexpectedly verbose but at the same time self-explanatory.
  3. constant @ALPHABET = ((“a”..”z”), (<æ ø å>)).flat; —as I had normalized all of the keywords and only kept keywords with Norwegian alphabetic characters, I had to generate a list of norwegian characters. I couldn’t use a range, i.e. “a”..”å”, as that would contain all kinds of characters. So I combined two lists, the range “a”..”z” and the list <æ ø å>, into one by using the .flat method.
  4. my @alpha-chunk = @alphabet-copy.splice(0, $zero-chunk-size); — I must have used this in other variants before, but splice felt like a new discovery. Splice is kind of like a super pop. It removes and returns not only a single element, but an entire chunk of an array. In other words, the original array is shortened accordingly, while you can assign the removed elements to a new array. This was useful for splitting the array into pieces I could use when parallelising the comparison task.
  5. And parallelisation is dead simple! All I needed was the start keyword. I did this: push @promises, start compare(@alpha-chunk, $t); — the subroutine compare is parallelised by the start keyword. You don’t need the push, but you should: The @promises array contains all the, well, promises. The note “Finished waiting: “ ~ await Promise.allof(@promises); ensures that all the parallels had finished running before moving on. That was all there was to it! The $PARALLELS variable manages how many parallels you want to run. I chose 4 as that’s the number of cores in my MacBook’s processor.
  6. A sub within a sub: Since the compare sub was a subroutine that was only to be used by the similarity sub, I used Perl 6’s ability to have a sub within a sub. So the compare sub is only visible from within compare. I probably did this only to impress myself.
  7. The similarity calculation my $distance = dld($z, $nz); my $fraction = $z.chars > $nz.chars ?? $distance / $z.chars !! $distance / $nz.chars; is a combination of Levenshtein-Damerau and my own stuff. The L-D algorithm — simplified — compares two string and tries to calculate how many substitutions you need in word 2 for it to be similar to word 1 (think of how many keyboard presses you’d need). You’ll get an integer with the number of substitutions as result. But two substitutions in a short word is more significant than in a long word. So I try to create a factor based on the number of substitutions divided by the length of the longest of the two words. This factor will be a floating point number between 0 and 1. The lower the number, the more similar the words are.
  8. In the line my $near-equal =
    ($a.split(‘’).unique.sort cmp $b.split(‘’).unique.sort
    == Same ?? ‘*’ !! ‘’); I reduce each word to a sorted array of the unique characters in each words. I.e. ABBA is reduced to the array <A B>, but so is BABA. My assumption was that similar words containing the identical characters are more similar than the other matches, even though the difference may be larger. I.e. ABBA and BABA is more similar than ABBA and ABDA — <A B> and <A B D>. But I discovered that if you compare two arrays for similarity, i.e. whether their elements contain the same values in the same order, the comparison return Same. In Perl 5 you get 0 as a result. In Perl 6 you can choose to check for 0 as well as Same and both will work. But I have to say Same is more self explanatory.

So it’s safe to say that I learned a lot by programming this. It was equally nice that the results were useful!

NOTES
[1]
Of course quite a few of the zero-hit searches was caused by the fact that we didn’t have matching content, but those were beyond my scope.

[2] But I probably should have been more concerned with speed. Even though I’ve said earlier that speed depends on your point of view, slow is slow. Even with the parallelisation, this script used almost five hours before it was finished. It may be faster in the latest version of Rakudo; for this I used Rakudo Start 2018.06.

POSTSCRIPT
I had to do it, didn’t I? Since I’m flirting with two ladies at the moment, both Camelia and Julia, I couldn’t resist implementing the same thing in Julia version 1.0.0.

One thing I didn’t expect: Although Julia supports parallelisation, arguably just as easy if not easier to implement than with Perl 6, Julia parallelisation was unusable for this. If I understood the documentation correctly, Julia is good at parallelising code that deals with IO; but if you try to parallelise non-IO code, it’s queued up and run in a single thread.

A win for Perl 6 there, in other words.

However, the Julia code was 2/3 the size of the Perl 6 version. And execution speed? Well, this being a Perl 6 blog I’m almost shameful to admit that the non-parallelised Julia version finished the comparisons in 12 minutes, approximately 20 times faster than Perl 6.

When it comes to the size of the code, much of the difference is probably caused by the fact that having already written this once, I knew where I could make things easier. But the thinking behind the Julia version is the same, and how I filter and flip and flop the data, is the same. That’s why I don’t have a good explanation of the speed difference (just to double check I also made a Perl 5 version as well, single threaded like the Julia version; that version more or less used the same amount of time as my original Perl 6 version). One of these days I have to run the profiler on the Perl 6 version. I guess that much of the time is spent calculating Levenshtein-Damerau distance, but I believe the for loops in themselves are taking time as well.

When we’re talking about Levenshtein, it’s worth noting that Julia’s Levensthein implementation automatically calculates a distance factor. I made my own in the Perl 6 version. The sharp-eyed reader will see that in my code the lower the factor is, the larger the similarity. The Julia version is exactly the opposite.

All the optimisation stuff is a story for another day, if ever. Meanwhile, here’s the Julia version. Thanks for bothering to read all the way to the end!

#!/usr/bin/env julia
using Base.Unicode
using StringDistances
alphabet = vcat('a':'z', ['æ','ø','å'])
dictionary = Dict{String,Int}()
zero_words = []
hit_words = []
similarity_count = 0
total_similarities = 0
function read_words()
open("st2.csv") do file
for ln in eachline(file)
c, d = split(ln, ",")
e = parse(Int, d)
try
dictionary[lowercase(c)] += e
catch
dictionary[lowercase(c)] = e
end
end
end
for (k, v) in enumerate(dictionary)
(word, value) = v
if value == 0
push!(zero_words, word)
else
push!(hit_words, word)
end
end
sort!(zero_words)
sort!(hit_words)
for c in alphabet
hw = size(filter(p->startswith(p, c), hit_words))[1]
zw = size(filter(p->startswith(p, c), zero_words))[1]
global total_similarities += (hw * zw)
end
end
function similarity()
for k1 in zero_words
for k2 in filter(p->startswith(p, string(first(k1))), hit_words)
x = compare(DamerauLevenshtein(), k1, k2)
if x > 0.7
println("$k1\t$k2\t$x")
end
global similarity_count += 1
if similarity_count % 10000 == 0
perc = (similarity_count / total_similarities) * 100
println(stderr, "Progress... $perc % [ $similarity_count of $total_similarities]")
flush(stdout)
end
end
end
end
read_words()
similarity()

Weekly changes in and around Perl 6: 2018.45/46 Post Diwali

Published by liztormato on 2018-11-12T21:59:40

It doesn’t happen often (anymore) that a Perl 6 Weekly is not published for a given week. In the past week, yours truly was more or less at the center of a large discussion about how Perl 6 should be (nick)named. This drained yours truly of the energy to write the Perl 6 Weekly. If you haven’t heard about this new episode in the naming debate yet and you do want to know about it, then please see the summary at the far end of this Perl 6 Weekly. Meanwhile, a lot of other things happened. So let’s get on with that!

Rakudo Star 2018.10

Steve Mynott has released the latest version of Rakudo Star, based on the Rakudo 2018.10 Compiler release (Reddit comments). This marks the end of an era in more than one way: this is the last Rakudo Star release that is based on the 6.c language definition of Perl 6. It is also the last Rakudo Star release that Steve Mynott committed to doing. So we’re on the lookout for a new Rakudo Star release manager to perform the release.

Yours truly would like to thank Steve Mynott on behalf of the Perl 6 community for this work. He definitely deserves a lot of kudos for having done this for basically the past 3 years (2016.04 .. 2018.10)!

1000 Rosettacode Entries

thundergnat informed us that of 11 November, Perl 6 has 1000 entries on Rosettacode, and is now tied for third place on the leaderboard. Which should be easy to fix 🙂

Squashathon

Already more than a week ago, but there was another Squashathon, which was focused on fixing documentation issues. Thanks again to Aleks-Daniel Jakimenko-Aleksejev for organising. And the winner was chsanch! The next Squashathon will be on 3 December 2018. The target of that Squashathon has not been set yet: suggestions welcome!

PCiG videos

The videos of the Perl Conference in Glasgow have become available for your perusal. These are the Perl 6 related presentations:

Other things to watch

Simon Proctor gave a talk about writing Perl 6 Command Line Scripts at the last London Perl Mongers meeting.

Not quite a video, but a nice screencast of how to create a compiler with Perl 6 has also become available (from the last Amsterdam Perl Mongers meeting). In it, Andrew Shitov shows the basics and some advanced grammar usages. Which he will most likely use in his next book.

Perl 6 calendar

Andrew Shitov has created a Perl 6 Calendar for 2019 that you can actually put on your wall at the office or at home. Each month highlighting a fascinating Perl 6 feature. A great present to give your friends, co-workers or yourself! And if you were planning on getting Andrew Shitov‘s Using Perl 6 book, you can now get them both using a special XMas offer! (FaceBook comments).

Perl 6 At A Glance for free

The very first Perl 6 book Perl 6 At A Glance (also by Andrew Shitov by the way) is now available for free as a series of blog posts, or as a PDF or as an EPUB. Great to see such a source of information to become available for free! (FaceBook comments: 1, 2).

Les regex et grammaires de Perl 6

Laurent Rosenfeld has written a large tutorial about regular expressions and grammars for the French developpez.com website. Good to see Perl 6 tutorial material in languages other than English! Keep them coming!

Advent Calendar submissions

Zoffix Znet reminds us that to have a successful Perl 6 Advent Calendar, people need to write blog entries for it. So please claim a day in the schedule and start writing about what you like to do with Perl 6!

Running Perl 6 in the browser

Paweł Murias has created a very nice interactive way of running Perl 6 code in the browser, called 6pad. Unfortunately, at the moment of writing you will need a Chrome browser to be able to use it because other browsers do not support bigints (yet).

Where did I leave my AT-KEYs

Timo Paulssen showcased some more features of the upcoming MoarVM Performance tool. Yours truly can’t wait to see that land in master!

How to make Perl more classy

Elizabeth Mattijsen had the 7th instalment of her series of blog posts about the differences and similarities between Perl 5 and Perl 6 published on opensource.com (Reddit, FaceBook comments).

Perl 6 Appetizer

Mauro Panigada wrote a very nice introductory blog post titled: Perl 6 Appetizer. In this blog post, he highlights several features of Perl 6, such as Grammars and the handling of command line arguments (Reddit comments).

Showcase Perl 6 Features

ogniloud asked people on Reddit what they would find interesting and/or fun of Perl 6. And this was the the result.

TPF Grant Reports

The past weeks also saw three TPF Grant reports:

Perl 6 2019 Coding Contest

Moritz Lenz is seeking task masters for the 2019 Perl 6 Coding Contest. In the first phase of setting up this contest, he is looking for volunteers who come up with coding tasks collaboratively. But there are other ways you can contribute as well, such as pledging a prize, creating a website for the contest or ironing out the rules.

Core Developments

Questions about Perl 6

Meanwhile on Twitter

Meanwhile on FaceBook

Meanwhile on perl6-users

Perl 6 in other comments

Yours truly did not have the energy anymore to work on this section. Next week should contain a selection of comments of the then past 3 weeks.

Perl 6 Modules

New Modules:

Updated Modules:

Winding Down

This concludes the part of the Perl 6 Weekly that yours truly recommends everybody should read. It was an enormous amount of information to sift through and categorize. Please let yours truly know if you think something has fallen through the cracks. If so, I will mention it next week. Until then!

What follows below is an attempt by yours truly to objectively describe the events related to the naming discussion of Perl 6 in the past week.

On Raku

People have asked to give Perl 6 another name for many, many years. All this time, Larry Wall did not want to change the name of Perl 6. As far as yours truly knows, he mentioned that there potentially could be an alias for Perl 6 for marketing purposes in markets where “perl” is a four-letter word in its worst meaning, at the Perl Conference in Amsterdam (2017). Zoffix Znet started this discussion again early October 2018. While Larry Wall was on vacation, he checked in on the 3rd of November for a little while to let Zoffix Znet know that the alias would be “Raku”. On the 25th of October, Larry Wall explained why “Raku” was on the top of his list as alias.

Zoffix Znet took it from there very expediently (EDIT: this was uncalled for) and used the bump in the language specification to merge with the use of an alias. Except that in a lot of cases(EDIT: at least one case), “Perl 6” was simply omitted, effectively making it a rename of “Perl 6” to “Raku”. And it was welcomed as such by a number of prominent Perl 5 developers. This upset a number of people very much, amongst which yours truly. And that caused quite a few discussions on the various IRC channels, and blog posts and commentaries on sites such as FaceBook, Reddit and Hacker News. This in turn was perceived as an attack on his person by Zoffix Znet, which led to his departure from the IRC channels and cancelling of planned blog posts. Since then, a status quo appears to have arisen, while waiting for a reply by Larry Wall, who was on vacation / helping family evacuating from wild fires in Southern California.

What follows below are places on the Web that mentioned the situation as it developed. Some of this is not nice reading, some posts have been removed (or maybe will be removed after publishing it here). Clearly a lot of people were upset about a lot of things. Read at your own risk for your mental health:

Blog Posts + associated comments

Comments

It’s Raku! /r/perl6, /r/perl, Perl 6 remains Perl 6?, A warning to the community, On Hacker News, Moving forward, Lack of Clarity on Authorized Activities?.

FaceBook

Rename indeed, Marketing down the drain, Slave master, 6.d is coming, The new p5p nest, No Weekly, Here it was, Perl 6 after all, More and more embarrassing, Documentation?, Positive on Raku, Extremely unhappy, What is Raku, PerlCon 2019 I, PerlCon 2019 II, On Raku, Thanks, On Raku (Again), Tagging on StackOverflow, Changing the Perl 6 group photo, Just an alias?, use raku?, Smashed, No Advent, Back from vacation

Twitter

This is really only a selection of the most prominent tweets about the alias “Raku” (or have a look at all “Perl 6” related tweets):

Its’ Raku!, Kinda meh, Formerly known as…, Now raku?, Tumbleweed, Sebastian Riedel: Raku!, Alternative name, Raku vs Perl 6, A convenient lie,
A new name for Perl 6, Things are settled, Can we get one too?, Official alternate name, Release announcement, DBA as, Three camps, Holding hostage, PerlCon 2019 cancelled?, Fully spelt out (not), Like the new name, Another name for Perl 6, Nicknamed “Raku”, Fumble?, Deleted comments, Also officially known as, Taking a stand, / >♥ ♥< \, A warning to the Perl 6 Community, This is worse, Kill off Perl 5?, Completely unrelated?, Just call it “Raku”, Implies Raku Perl 5, Brainfuck, Raku sounds good, Not first, Don’t buy the hype, Makes p5p look almost functional, Une annonce, Whirlpool, Perceived as rebranding, Confusion, Offical alias, Don’t make big announcements.

I hope that this is the last time I had to write about “Raku”, the alias for Perl 6.

Perl 6 Inside Out: 🎁 Christmas sale!

Published by andrewshitov on 2018-11-11T18:58:30

Only until the end of the year, you can buy the Perl 6 Calendar 2019 and Using Perl 6 book as a pack!

xmas2018

rakudo.org: Rakudo Star Release 2018.10

Published on 2018-11-11T00:00:00

Perlgeek.de: Perl 6 Coding Contest 2019: Seeking Task Makers

Published by Moritz Lenz on 2018-11-10T23:00:01

I want to revive Carl Mäsak's Coding Contest as a crowd-sourced contest.

The contest will be in four phases:

For the first phase, development of tasks, I am looking for volunteers who come up with coding tasks collaboratively. Sadly, these volunteers, including myself, will be excluded from participating in the second phase.

I am looking for tasks that ...

This is non-trivial, so I'd like to have others to discuss things with, and to come up with some more tasks.

If you want to help with task creation, please send an email to moritz.lenz@gmail.com, stating your intentions to help, and your freenode IRC handle (optional).

There are other ways to help too:

In these cases you can use the same email address to contact me, or use IRC (moritz on freenode) or twitter.

my Timotimo \this: Rakudo Perl 6 performance analysis tooling: Progress Report

Published by Timo Paulssen on 2018-11-10T00:22:45

Rakudo Perl 6 performance analysis tooling: Progress Report

I have started properly working on the grant later than I had hoped, and have not been able to invest as much time per week as I had initially planned. However, recently more and more parts of the tool have come together to form a useful product.

Recently, core developer Stefan 'nine' Seifert has been able to make good use of the tool to assist in analyzing and optimizing the performance of a re-write of MoarVM's bytecode compiler in NQP code. Here are quotes from Stefan on IRC, along with links to the public IRC logs:

https://colabti.org/irclogger/irclogger_log/perl6-dev?date=2018-10-09#l56

I now managed to get a profile of the MAST stage using a patched --profile-stage, but the profile is too large even for the QT frontend :/
timotimo++ # moarperf works beautifully with this huge profile :)

https://colabti.org/irclogger/irclogger_log/perl6-dev?date=2018-10-23#l66

timotimo++ # the profiler is indespensable in getting nqp-mbc merge worthy.

Below you can find a copy of the "Deliverables and Inchstones" section of the original grant proposal with explanations of the current status/progress and any additional features I've been able to implement, along with a couple of screenshots to illustrate what I mean.

The "web frontend for heap snapshot analyzer" and "bunch of examples" sections have had their sub-headings removed, because work on that area has not yet started.

Deliverables and Inchstones

A blog with progress reports.

There have been 6 reports, mostly due to a long starting delay and some additional intervention from Life Itself:

A web frontend for the heap snapshot analyzer

I have pushed this off to the future, opting to prioritize the instrumented profiler frontend instead.

A new web frontend for the instrumented profiler

Rakudo Perl 6 performance analysis tooling: Progress Report

Rakudo Perl 6 performance analysis tooling: Progress Report

Rakudo Perl 6 performance analysis tooling: Progress Report

Rakudo Perl 6 performance analysis tooling: Progress Report

Rakudo Perl 6 performance analysis tooling: Progress Report

Rakudo Perl 6 performance analysis tooling: Progress Report

Rakudo Perl 6 performance analysis tooling: Progress Report

User-facing documentation on using the profiling facilities and interpreting the data

This has not yet happened, except for the blog post for the release that explains how to run the profiler, and the README.

Build a bunch of examples that show different performance issues and describe how to figure them out

There are currently big changes going on in MoarVM's optimizer relating to removing unnecessary boxing of intermediate values in calculations and such, which may make any examples moot real soon. I've delayed this part of the work a bit for that reason.

In Summary ...

I was positively surprised when I last opened the original tasks list and saw what was already done, and what could be finished with just a little bit of work!

I hope you'll give moarperf a try some time soon and let me know what you think! Here's a link to the project on github: MoarPerf project on github

Thanks for staying around and reading my blog posts :)
  - Timo

my Timotimo \this: Where did I leave my AT-KEYs?

Published by Timo Paulssen on 2018-11-09T21:17:24

Where did I leave my AT-KEYs?

Even though it's only been a week and a half since my last report, there's already enough new stuff to report on! Let's dig right in.

Where did I leave my AT-KEYs?
Photo by NeONBRAND / Unsplash

Overview Page

Where did I leave my AT-KEYs?

Is there terribly much to say about this? It now shows the same data that was already available in the old profiler's overview page. It's the go-to page when you've changed your code in the hopes of making it faster, or changed your version of rakudo in hopes of not having to change your code in order to make it faster.

Here's a screenshot from the other profiler for comparison:

Where did I leave my AT-KEYs?

The main addition over the previous version is the "start times of threads" piece at the top left. In multi-threaded programs it shows you when more threads were added, for example if you use start blocks on the default ThreadPoolScheduler.

The GC Performance section gives you not only the average time spent doing minor and major collections, but also the minimum and maximum time.

The rest is pretty much the same, except the new version puts separators in numbers with more than three digits, which I find much easier on the eyes than eight-digit numbers without any hints to guide the eye.

GC Run List

Where did I leave my AT-KEYs?

The graphs at the top of the GC tab has changed a bit! There's now the option to show only major, only minor, or all collections in the graph, and there are three different display modes for the "Amounts of data" graphs.

The one shown by default gives bars split into three colors to signify how much of the nursery's content has been freed (green), kept around for another round (orange), or promoted to the old generation (red). That's the mode you can see in the screenshot above.

Where did I leave my AT-KEYs?

The second mode is "Combined Chart", which just stacks the amounts in kilobytes on top of each other. That means when more threads get added, the bars grow. In this example screenshot, you can barely even see orange or red in the bars, but this program is very light on long-lived allocations.

Where did I leave my AT-KEYs?

The third mode is "Split Charts", which has one chart for each "color". Since they all have their own scales, you can more easily see differences from run to run, even if some of the charts appear tiny in the "combined" or "combined relative" charts.

Routines List

The routines overview – and actually all lists of routines in the program – have a new little clickable icon now. Here it is:

Where did I leave my AT-KEYs?

The icon I'm talking about is the little up-right-arrow in a little box after a routine's name. When you click on it, the row turns blue. Huh, that doesn't sound so useful? That's because the button brings you to the routines list and scrolls to and highlights the routine you've clicked on. If you're already right there, you will not notice a lot of change, of course.

However, it gets more interesting in the callers or callees lists:

Where did I leave my AT-KEYs?

Even better, since it actually uses links to actual URLs, the browser's back/forward buttons work with this.

Other useful places you can find this navigation feature are the allocations list and the call graph explorer:

Where did I leave my AT-KEYs?

Where did I leave my AT-KEYs?

Where are my AT-KEYs at?

If you have a very big profile, a routine you're interested in may be called in many, many places. Here's a profile of "zef list". Loading up the call graph for this routine may just explode my computer:

Where did I leave my AT-KEYs?

Note the number of Sites: 27 thousand. Not good.

But what if you're already in the call graph explorer anyway, and you just want to find your way towards functions that call your routine?

Enter the search box:

Where did I leave my AT-KEYs?

As you can see, when you input something in the search bar, hand icons will point you towards your destination in the call graph.

I'm looking to add many more different kinds of searches, for example I can imagine it would be interesting to see at a glance "which branches will ever reach non-core code". Searching for files ought to also be interesting.

Another idea I've had is that when you've entered a search term, it should be possible to exclude specific results, for example if there are many routines with the same name, but some of them are not the ones you mean. For example, "identity" is in pretty much every profile, since that's what many "return"s will turn into (when there's neither a decont nor a type check needed). However, Distributions (which is what zef deals in) also have an "identity" attribute, which is about name, version, author, etc.

At a much later point, perhaps even after the grant has finished, there could also be search queries that depend on the call tree's shape, for example "all instances where &postcircumfix:{ } is called by &postcircumfix:{ }".

That's it?

Don't worry! I've already got an extra blog post in the works which will be a full report on overall completion of the grant. There'll be a copy of the original list (well, tree) of the "deliverables and inchstones" along with screenshots and short explanations.

I hope you're looking forward to it! I still need to replace the section that says "search functionality is currently missing" with a short and sweet description of what you read in the previous section :)

With that I wish you a good day and a pleasant weekend
  - Timo

Perl 6 Inside Out: 📺 Creating a compiler with Perl 6

Published by andrewshitov on 2018-11-06T22:08:32

A screencast recorded during a talk at the Amsterdam Perl mongers meeting on 6 November 2018.

Perl 6 Inside Out: 📘 Variables in Perl 6: Introspection

Published by andrewshitov on 2018-10-31T16:04:13

Due to the mechanism of introspection, it is easily possible to tell the type of the data living in a variable (a variable in Perl 6 is often referred as a container). To do that, call the predefined WHAT method on a variable. Even if it is a bare scalar, Perl 6 treats it internally as an object; thus, you may call some methods on it.

For scalars, the result depends on the real type of data residing in a variable. Here is an example (parentheses are part of the output):

my $scalar = 42;
my $hello-world = "Hello, World";

say $scalar.WHAT;      # (Int)
say $hello-world.WHAT; # (Str)

For those variables, whose names start with the sigils @ and %, the WHAT method returns the strings (Array) and (Hash).

Try with arrays:

my @list = 10, 20, 30;
my @squares = 0, 1, 4, 9, 16, 25;

say @list.WHAT;    # (Array)
say @squares.WHAT; # (Array)

Now with hashes:

my %hash = 'Language' => 'Perl';
my %capitals = 'France' => 'Paris';

say %hash.WHAT;     # (Hash)
say %capitals.WHAT; # (Hash)

The thing, which is returned after a WHAT call, is a so-called type object. In Perl 6, you should use the === operator to compare these objects.

For instance:

my $value = 42;
say "OK" if $value.WHAT === Int;

There’s an alternative way to check the type of an object residing in a container — the isa method. Call it on an object, passing the type name as an argument, and get the answer:

my $value = 42;
say "OK" if $value.isa(Int);

Perl 6 Inside Out: 📘 Int type in Perl 6

Published by andrewshitov on 2018-10-31T16:04:08

The Int type is intended to host integer variables of arbitrary size. For example, no digit is lost in the following assignment:

my Int $x =
    12389147319583948275874801735817503285431532;
say $x;

A special syntax exists for defining integers with an other-than-10 base:

say :16<D0CF11E0>

Also, it is allowable to use the underscore character to separate digits so that big numbers can be read more easily:

my Int $x = 735_817_503_285_431_532;

Of course, when you print the value, all the underscores are gone.

On the Int object, you may call some other handy methods, for example, to convert a number to a character or to check if the integer in hand is prime (yes, is-prime is a built-in method!).

my Int $a = 65;
say $a.chr; # A

my Int $i = 17;
say $i.is-prime; # True

say 42.is-prime; # False

Zoffix Znet: Perl 6 Advent Calendar 2018 Call for Authors

Published on 2018-10-31T00:00:00

Write a blog post about Perl 6

Weekly changes in and around Perl 6: 2018.44 Diwali Approaching

Published by liztormato on 2018-10-29T19:20:37

Zoffix Znet has been very busy again. Not only did he create and release a new 6.d teaser document, he also did most of the work of making 6.d the default implementation of Rakudo in the bleeding edge version of the code. Looks like a release on Diwali 2018 is getting more and more certain. The biggest breakage so far has been the (too) late usage of the version pragma.

Rakudo 2018.10 Compiler Release

Aleks-Daniel Jakimenko-Aleksejev and Samantha McVey have done it again: release MoarVM and the Rakudo Compiler. Claudio Ramirez took this as an immediate cue to update the directly installable Linux distributions. Rakudo Star 2018.10 will be based on this release. Kudos again to all involved! Note for the curious: this is the last release of the Rakudo compiler that implements version 6.c of Perl 6 by default.

Rakudo running on AIX 7.1

Someone named ItchyPlant performed a lot of research and work to get Rakudo working on AIX 7.1/2. Kudos! Yet another operating system and a whole family of hardware now also support Rakudo!

Perl 6 Media Group

Zoffix Znet is looking for people to participate in a Perl 6 Core Media Group to improve consistency in Perl 6 marketing / messaging. Interested, please participate in the discussion. Yours truly hopes for a lot of participants!

Full Screen Ahead

Timo Paulssen shows off the improvements to the new MoarVM profiler user interface in his latest TPF Grant Report. It’s especially great to hear that this work is already paying off by helping Stefan Seifert with his work on the bytecode writing refactor (which should be in a mergeable state soon). You can directly follow this work in the associated repo: comments, suggestions and Pull Requests are welcomed!

Blin is Toast

Well, actually quite the opposite! Aleks-Daniel Jakimenko-Aleksejev is re-imagining the toaster functionality (basically spectesting the whole Perl 6 ecosystem) with Blin, which is capable of checking the ecosystem (1251 modules at time of writing) in about 60 minutes (on a 24 core machine) for any version of Rakudo Perl 6. But you can also use Blin on a single module to see which commit introduced a regression! A word of caution: only run this on throw-away virtual machines!

Small stuff #12

Jo Christian Oterhals added another episode to his “small stuff” series called Cobol precision, revisited (or… even more fun with FatRats)

Math Matrix

Herbert Breunung continued his series of blog posts about Math::Matrix with Part 5: Patient with docs.

Hackerrank solutions (part 2)

Patrick Spek has published part 2 of his Python 3 and Perl 6 solutions to Hackkerrank challenges (Reddit comments). Always nice to see Perl 6 and Python 3 solutions side by side.

Exportation Exploration

Joshua Yeshouroun wrote a blog post about his (dis)taste of modules that export symbols willy-nilly, followed by an update (Reddit comments). Recommended for those of you trying to grok the EXPORT semantics of Perl 6.

Meanwhile on Codegolf

Jo King created a nice Perl 6 solution to the Written Digits Sequence problem using Unicode introspection.

How phasers work

Elizabeth Mattijsen had the sixth article about migrating Perl 5 code to Perl 6 published: How phasers work in Perl 6. Which generated a lot of tweets (reaching up to 99000 people) (Reddit comments).

perl11.org

Someone posted a link to the perl11.org website on Hacker News which set of a barrage of comments (r/perl, r/programming, r/programmingcirclejerk). Some Perl 6 specific comments on Hacker News:

Go 2 Transition Proposal

Ian Lance Taylor wrote a proposal on how to make incompatible changes from Go 1 to Go 2 while breaking as little as possible. It mentions Perl 6 specifically (Reddit, Hacker News comments).

Other Core Developments

Questions about Perl 6

Meanwhile on Twitter

Meanwhile on FaceBook

Meanwhile on perl6-users

Perl 6 in other comments

Perl 6 Modules

New Modules:

Updated Modules:

Winding Down

Ginormous. That is the phrase yours truly used earlier today about this Perl 6 Weekly. Having 6.d now being default language implementation, is something that will need to sink in the coming weeks. All great stuff. Hope there will be more great stuff next week. Well, pretty sure of that. So please check in again next week for more Perl 6 news!

my Timotimo \this: Full Screen Ahead!

Published by Timo Paulssen on 2018-10-28T18:50:54

Full Screen Ahead!

Whew, it's been a long time since the last report already! Let's see what's new.

Full Screen Ahead!
Photo by Jack Anstey / Unsplash

Improvements to the Routine Overview

The first tab you'll usually go to is the routine overview. It gives you a list of every routine that was encountered in the program. In a bunch of columns it shows you the name of the routine, along with the file it comes from and the line it's defined on. It tells you how often the routine was entered and how many of these entries were into or inside of jitted code.

The last column shows how much time the routine accounted for. The last part is split up twice. Once into inclusive vs exclusive time, which tells you how much total time was spent from entry to exit vs how much time was spent only inside the routine itself, rather than any routines called by it. The second split is into total time vs time divided by number of entries. That lets you more easily figure out which routines are slow by themselves vs which routines take a lot of time because they are called very, very often.

Usually it's best to start with the routines that take the most time in total, but routines that take longer per call may have more obvious optimization opportunities in them in my experience.

It is for that reason that being able to sort by different columns is extra important, and that wasn't available in the new tool for a whole while. That changed just yesterday, though! It still looks quite odd, because it's just a bunch of buttons that don't even react to which sorting mode is currently active, but that will change soon enough.

Full Screen Ahead!

The routines overview of course lets you expand the individual routines to get at more details. The details that were already in the previous blog post, were Callees, Paths, and Allocations. All three of them have changed a bit, and one more has been added recently.

All that happened in the Callees tab is that it is now also sortable.

The Paths tab has got a little toggle button called "Minimal" that reduces every cell in the tree to a little square. On top of that, the cells are now colored by the routine's filename - in the normal view the lower border of the cells shows the color, in the minimal view the cells themselves are colored in.

Here's two screenshots comparing regular and minimal view:

Full Screen Ahead!

Full Screen Ahead!

The Allocations tab now shows the total amount of memory allocated by the given routine for the given type. Of course this doesn't directly correspond to memory usage, since the profiler can't tell you how long the given objects survive.

Full Screen Ahead!

The new tab that was recently added is a very useful one. It's the Callers tab. It lets you see which routines have called the given routine, how often they called it, and to what percentage the routine got inlined into the callers. Here's a screenshot:

Full Screen Ahead!

Allocations Tab

There's a whole new top-level tab, too. It's titled "Allocations" and does very much what you would expect.

The tab contains a big table of all types the profiler saw in the program. Every row shows the Type name along with potentially a link to the docs website, the size of each object and how much memory in total was allocated for it, and how often that type of object was allocated in total.

On the left there is a button that lets you expand the type's column to get extra details:

Full Screen Ahead!

None of these things are very surprising, but they are useful.

Overview Tab

The overview tab used to just have a placeholder text. However, it now displays at least a little bit of information. Have a look!

Full Screen Ahead!

It's still very fresh and I expect it to look completely different in a week or two.

What's with the "Full Screen" pun?

This post starts with a silly little pun on "Full Steam Ahead". What does it mean for the profiler tool?

The answer is very simple, it lets you use the whole width of your browser window. Bootstrap – which is the library of CSS styles and primitives I use so that I don't have to spend three fourths of the entire development time fiddling with CSS rules so that things look okay or even work at all – is convinced that the outermost Container element of the site shouldn't be wider than a specific width, probably because things can look kind of "lost" on a page that's not filled side-to-side with stuff. If you have big tables full of data or wide graphs with tiny bars, it sure is good to be able to expand the page sideways.

Here's a little animation I made a few weeks ago that shows the fullscreen button in action for one example.

You're my favourite customer!

Stefan 'nine' Seifert has recently been working on replacing the "mast" stage in the Perl 6 compiler.

In the simplest terms, it used to take the abstract syntax tree to rewrite it into a very flat tree consisting of lists of moar instructions. Those were then fed into MoarVM as objects, which were then translated into actual moar bytecode, literally as bytes.

The main drawback of the earlier approach was memory use. On top of all the QAST node objects that were the input to the mast stage, a huge batch of new objects were created before it was all passed to moar's bytecode compiler as one set. The more alive objects there are, the higher the maximum memory usage gets, and the longer it takes the GC to go through everything to decide what's garbage and what needs to stick around.

The new approach starts with the QAST nodes again, but immediately starts writing out bytes into buffers. That has a whole bunch of implications immediately: Every Object consists of at least a header that's a couple of bytes big. Buffers (and native arrays) are just a single object with a big consecutive blob of memory they "own". No matter how big the blob grows, the garbage collector only has to look at the object header and can ignore the blob itself! Not only does it save memory by not having as many individual objects, it also saves the garbage collector time!

Stefan told me on IRC that the new profiler tool already helped him a lot in making the new implementation faster.

I now managed to get a profile of the MAST stage using a patched --profile-stage, but the profile is too large even for the QT frontend :/
timotimo++ # moarperf works beautifully with this huge profile :)

https://colabti.org/irclogger/irclogger_log/perl6-dev?date=2018-10-09#l56

timotimo++ # the profiler is indespensable in getting nqp-mbc merge worthy.

https://colabti.org/irclogger/irclogger_log/perl6-dev?date=2018-10-23#l66

This obviously makes yours truly very happy :)

There's still lots of "paper cuts" to be ironed out from a User Experience standpoint, but I'm very glad that the tool is already making work possible that used to be impossible.

That's all the improvements I have to report for now, but I'll be continuing work on the UI and backend and write up another report (hopefully) soon!

BTW, a big conference on React.js (the tech I use to make the frontend tick) just happened and the react core dev team unveiled a few new APIs and tricks that I expect will let me make the existing code a lot simpler and write new code faster than before!

Also, during the conference Chris Trevino presented a new charting library based around the "Grammar of Graphics" philosophy(?) that is also the basis of vega and vega-lite which as I understand it is The Big Cheese among data visualization experts. I've been having some gripes with recharts, so I'll be checking this out soon!

Hope to see y'all in the next one!
  - Timo

Jo Christian Oterhals: Perl 6 small stuff #12: Cobol precision, revisited (or… even more fun with FatRats)

Published by Jo Christian Oterhals on 2018-10-22T12:00:58

Let me start by saying that this will be the last article exploring peculiarities of Perl 6’s maths functions. I promise. At least for a while. With that said, let’s dive back into rational numbers vs floating point one last time.

Binary representation of the floating point number 0.15625. CC © Wikimedia Commons.

Back in number 4 of this series, I tried to figure out the precision of Perl 6‘s ’rational numbers compared to Cobol’s fixed precision types. That may sound like a strange thing to do, but I had just read an article about why Cobol still was preferred by financial institutions (read it if you haven’t already — this article will be easier to understand if you do, especially why the particular algorithm below is used to check precision). It had to do with how Cobol could control floating point precision to avoid rounding errors that accumulated over time could cost banks, stock exchanges, the IRS, etc., millions of dollars. The author proved that by comparing a little Python snippet to Cobol, a script that broke Python in all the expected ways.

Since Perl 6 use the Rat (rational) data type under the hood for lots of things, even without explicit typing on the programmer’s part, my hypothesis was that Perl 6 wouldn’t have these kinds of errors at all. So I made a Perl 6 implementation of the error provoking algorithm outlined in the Cobol article. And wouldn’t you know it: Perl 6 didn’t have the errors mentioned in the Cobol article. I proudly spread that conclusion in my Having fun with Rats post.

As it turns out I was a little too trigger happy. In my article I just tested the algorithm to 20 iterations since that was what the Cobol article did. When tinkering with the code afterwards I found that my Perl 6 code broke at around 28 iterations. So although it fared better than Python, and went beyond the Cobol example, even Perl 6 wasn’t foolproof.

That made me think: What if I implemented it using explicit typing, this time forcing the use of FatRats? So I did:

#!/usr/bin/env perl6
sub rec(FatRat $y, FatRat $z) {
return 108 — (815–1500/$z)/$y;
}
sub default-behavior(Int:D $N) {
my @x = FatRat.new(4), FatRat.new(17, 4);
for (2..$N+1) -> $i {
my $n = rec(@x[$i-1], @x[$i-2]);
@x.append($n);
}
return @x;
}
my @y = default-behavior(2000);
for @y.kv -> $i, $p {
say $i.fmt(“%02d”) ~ “ | “ ~ @y[$i].fmt(“%3.20f”);
say join “ / “, @y[$i].nude;
}

As you can see I ran the code up to 2000 iterations. And Perl 6 didn’t break once. I figure I could continue much longer, but that wouldn’t prove my point any more than it already had: FatRats seemingly handle everything.

At around 2000 iterations, the floating point representation was an even 5. The FatRat had increased to something beyond readable:

2001 | 5.00000000000000000000
10887262270271520844470244368473590369823879678381735770804366536872861066757834267230923438499734851331955108971818900795255377964985420406353488530653796076376888302887871648059636404237386779810073204677465433988288796371463779266725835301768375910295365966893549342070113935097170738554786589822697649361389182715711445880206813457196212233603705832810491425136584469585820510373307587755977621707298634317089590138855983741462833338843271340817765566670690233393031687431055903364260747271559744687940769179892188275670612186517021074968728089773868903232112413409395441640658761326225416663244033887317133246740410946847989349983326675880888985445567168787598735624717364724914686476958266124806775762813974242476406932409145206237566267025023985919713838560053160634848162125063714368558880445253361406701423828947352575925454715484530130867655804116616415778049891344295072286232982121196487495343247308360407488300676499377578310332816377975448193225536813649463486171661227928324492008794414474774496406886870868799498753785243478428165856592662574587026062708505894963955557204003803392615937919983143327086591424359681275817919140507987737463394218763462404759849484338109892177212883565977961986616331952406228280515381736751779687947623033277314097434771424807829329022829618524311375471611964574251871376643107540529697952215346315210658268262117062285956468458853775876425932095612817 / 2177452454054304168894048873694718073964775935676347154160873307374572213351566853446184687699946970266391021794363780159051075592997084081270697706130759215275377660577574329611927280847477355962014640935493086797657759274292755853345167060353675182059073193378709868414022787019434147710957317964539529872277836543142289176041362691439242446720741166562098285027316893917164102074661517551195524341459726863417918027771196748292566667768654269212275864367729012474591108985007522990289641915425059624720960922390197391499416012195286133802412263320144198333588435944700149853436940773893604588831895901210854703334389818287374956988979446767663405991767869404053875825243861633344428650812796466524645611445569036116412001976972688038699652154919003410804565289198476051171009210210514432258823939333189758824739429056132928366378633982114880626729252784764043010629666202312909247751673764726461337254048195958324091876383541819502316785741383639145866071856825051549231447161710399611381911575660385601546360041084280896676911889810584537683240864405819307049463732925606631546639518426651121208176285007436520251461854127484076350233519219480398658641001242539048046529742114192549782403287806556840860371337966097173350218236381329518702831401361914811597448759832549227759610938548970537888455950617834972888158093407922606309451101643311842471881838667105291086022959240750179576618885386564

I can’t say that I’m anything but impressed. So go on. Reimplement your bank systems in Perl 6 🤪.

Since these articles are nothing without a comparison to something else, I figured I’d do it again. Since I had so much success using Perl 5 with the use bignum pragma in my last article (Numberphile calculator test), I decided, just for fun, to implement this in Perl 5 as well.

#!/usr/bin/env perl
use 5.18.0;
use bignum;
use strict;
my $cnt = 0;
sub rec {
my ($y, $z) = (shift, shift);
return 108 - (815-1500/$z)/$y;
}
sub default_behavior {
my $N = shift;
my @x = ( 4, 4.25 );
for my $i (2..$N+1) {
my $n = rec($x[$i-1], $x[$i-2]);
push(@x, $n);
}
return @x;
}
my @y = default_behavior(30);
for my $i (0..30) {
say $i . " | " . $y[$i];
}

This time Perl 5 is not as impressive. This code croaks around iteration 26. Remove the use bignum pragma and it croaks around 13.[1]

Still: For most us this is just theoretical stuff, so in reality you’ll make do with both Perls I guess. At least I know I will.

NOTE
[1]
Julia, that was a part of the comparison mix in my last article, was good until iteration 52 using the BigFloat type. To have a fair comparison with Perl 6 I also tried Julia with Julia’s rational type, specified to the highest precision — Rational{UInt128}. Julia gave up at around the same iteration using that as well. So Julia fared better than Perl 5 but nowhere near Perl 6. But as I noted above about the two Perls, the difference is theoretical. For all practical purposes even Julia will do.

Weekly changes in and around Perl 6: 2018.43 Coding Contest

Published by liztormato on 2018-10-22T09:24:50

Moritz Lenz is testing the waters to see if there is room for a crowd-sourced Perl 6 Coding Contest, and he’s inviting comments / suggestions / participations. This all models after Carl Mäsak‘s initial Perl 6 Coding Contest from 2010 (Twitter comments).

Perl 6 Deep Dive Free Download

Only 13 hours to go to get your free download of “Perl 6 Deep Dive” by Andrew Shitov. It’s filled with practical examples, exploring all aspects of Perl 6 with different programming paradigms (object oriented, functional, and reactive). On Twitter it had some weird previews: 1, 2.

Sparky

Alexey Melezhik introduces Sparky, a powerful pocket size task runner server. A more flexible and more maintainable alternative to using crontab, written in Perl 6!

Core Developments

Questions about Perl 6

Meanwhile on Twitter

Meanwhile on FaceBook

Meanwhile on perl6-users

Perl 6 in comments

Perl 6 Modules

New modules:

Updated Modules:

Winding Down

While preparations for the 2018.10 Rakudo compiler release are in full swing (which should also be the basis for the 2018.10 Rakudo Star distribution), this week was mostly about cleanups and fixing. And of course some nice new modules and some nice updates. Please check in again next week for more Perl 6 news! Take care!

Weekly changes in and around Perl 6: 2018.42 A better Atom for 6

Published by liztormato on 2018-10-15T21:15:32

Ahmad M. Zawawi has completed the first version of Perl 6 language support for the Atom IDE, based on an App::Perl6LangServer module that can be used by any editor / IDE that supports the Microsoft AppServer architecture. If you’re a fan of the Atom editor / Atom IDE, this will make it a lot easier to work with Perl 6 in it. So now Perl 6 doesn’t have one IDE, but two (the other one being the Comma IDE of course).

Perl[56] on the 35c3

Daniel Böhmer is trying to get a Perl Assembly together for the 35th Chaos Communication Congress in Leipzig, Germany. So if you would like to hang out together with other Perl people at the CCC, contact Daniel to make this happen! (FaceBook comments).

6.d review completed

Zoffix Znet has completed his pre-release review of 6.d spec (Twitter comments 1, 2). A truly impressive piece of work at almost 3500 commits reviewed and more than 400 corrections and improvements. Kudos!

JIT Grant Proposal

Bart Wiegmans has submitted a Perl Foundation grant request titled: MoarVM JIT Compiler Expression Backend Maturation. Focus of this grant will be on JITting floating point operations, improving generated code quality and improved handling of “irregular” instructions such as div. Comments as always welcome! (Reddit and grant comments).

Stupid Numeric Ticks

scruss got a little bit carried away trying to do as much as possible with Unicode numbers in a blog post titled: 𒐳 / ༳ == ( ⑽ – 𐹭 ) * ( 𒐲 / 𐅉 ), of course. Why does one do this? Well, because one can! In any case, a nice example of the flexibility of Perl 6 (Hacker News, Reddit comments).

Markatu – a lightweight markup language

Brian Duggan dove into the world of markup languages by creating a markup language called Markatu, inspired by markdown’s brevity and slim’s flexibility. In the article he describes the (Perl 6) techniques he used to create this markup language, and also lists the source that he created to render the HTML of the article (Twitter, Reddit comments).

OSCON 2018

Jim Brandt reports on the Perl booth at OSCON in Portland, OR this year. Caution: contains explicit pictures of swag.

No Perl DevRoom at FOSDEM 2019

The organizers of the Perl DevRoom at FOSDEM have been told that there will not be a Perl DevRoom at the coming FOSDEM. This is a big disappointment, but in the view of the number of groups wanting to have a DevRoom (about 140) and the number of rooms available (about 30), Perl has had a good run in the past years (FaceBook comments).

Core Developments

Questions about Perl 6

Meanwhile on Twitter

Meanwhile on FaceBook

Meanwhile on perl6-users

Perl 6 in comments

Perl 6 Modules

New Modules:

Updated Modules:

Winding Down

A busy week again, with some sad news with regards to FOSDEM. But please don’t cancel your travel plans to Brussels because of that: one could also see this as an opportunity to spread knowledge about Perl to other tracks. So if you see an opportunity of submitting a presentation about Perl (be it Perl 5 or Perl 6), please do take that opportunity to get out of the echo chamber! And on that note yours truly wishes everybody a good week. Until next week, for more Perl 6 news!

Jo Christian Oterhals: Not bad! My only issue with it is that depending on your point of view the Pearly Gate is either…

Published by Jo Christian Oterhals on 2018-10-11T11:22:24

Not bad! My only issue with it is that depending on your point of view the Pearly Gate is either the beginning of something or the end. How do we avoid the latter?

Jo Christian Oterhals: Bill Ricker: I didn’t know about the peculiarities of perl’s rules for conversion.

Published by Jo Christian Oterhals on 2018-10-09T08:45:25

Bill Ricker: I didn’t know about the peculiarities of perl’s rules for conversion. It’s great to learn something new about 5 as well since I’m still using Perl 5 for tasks at work. True story: There’s much I don’t know, because for a few years now I have just assumed that the version of Perl that ships with Mac OS X is v5.6. So it’s just recently I’ve understood that Apple has upgraded and made 5.18 available. I have a lot of catching up to do!

Zoffix Znet: A Request to Larry Wall to Create a Language Name Alias for Perl 6

Published on 2018-10-07T00:00:00

The culmination of the naming discussion

6guts: Speeding up object creation

Published by jnthnwrthngtn on 2018-10-06T22:59:11

Recently, a Perl 6 object creation benchmark result did the rounds on social media. This Perl 6 code:

class Point {
    has $.x;
    has $.y;
}
my $total = 0;
for ^1_000_000 {
    my $p = Point.new(x => 2, y => 3);
    $total = $total + $p.x + $p.y;
}
say $total;

Now (on HEAD builds of Rakudo and MoarVM) runs faster than this roughly equivalent Perl 5 code:

use v5.10;

package Point;

sub new {
    my ($class, %args) = @_;
    bless \%args, $class;
}

sub x {
    my $self = shift;
    $self->{x}
}

sub y {
    my $self = shift;
    $self->{y}
}

package main;

my $total = 0;
for (1..1_000_000) {
    my $p = Point->new(x => 2, y => 3);
    $total = $total + $p->x + $p->y;
}
say $total;

(Aside: yes, I know there’s no shortage of libraries in Perl 5 that make OO nicer than this, though those I tried also made it slower.)

This is a fairly encouraging result: object creation, method calls, and attribute access are the operational bread and butter of OO programming, so it’s a pleasing sign that Perl 6 on Rakudo/MoarVM is getting increasingly speedy at them. In fact, it’s probably a bit better at them than this benchmark’s raw numbers show, since:

While dealing with Int values got faster recently, it’s still really making two Int objects every time around that loop and having to GC them later. An upcoming new set of analyses and optimizations will let us get rid of that cost too. And yes, startup will get faster with time also. In summary, while Rakudo/MoarVM is now winning that benchmark against Perl 5, there’s still lots more to be had. Which is a good job, since the equivalent Python and Ruby versions of that benchmark still run faster than the Perl 6 one.

In this post, I’ll look at the changes that allowed this benchmark to end up faster. None of the new work was particularly ground-breaking; in fact, it was mostly a case of doing small things to make better use of optimizations we already had.

What happens during construction?

Theoretically, the default new method in turn calls bless, passing the named arguments along. The bless method then creates an object instance, followed by calling BUILDALL. The BUILDALL method goes through the set of steps needed for constructing the object. In the case of a simple object like ours, that involves checking if an x and y named argument were passed, and if so assigning those values into the Scalar containers of the object attributes.

For those keeping count, that’s at least 3 method calls (newbless, and BUILDALL).

However, there’s a cheat. If bless wasn’t overridden (which would be an odd thing to do anyway), then the default new could just call BUILDALL directly anyway. Therefore, new looks like this:

multi method new(*%attrinit) {
    nqp::if(
      nqp::eqaddr(
        (my $bless := nqp::findmethod(self,'bless')),
        nqp::findmethod(Mu,'bless')
      ),
      nqp::create(self).BUILDALL(Empty, %attrinit),
      $bless(|%attrinit)
    )
}

The BUILDALL method was originally a little “interpreter” that went through a per-object build plan stating what needs to be done. However, for quite some time now we’ve instead compiled a per-class BUILDPLAN method.

Slurpy sadness

To figure out how to speed this up, I took a look at how the specializer was handling the code. The answer: not so well. There were certainly some positive moments in there. Of note:

However, the new method was getting only a “certain” specialization, which is usually only done for very heavily polymorphic code. That wasn’t the case here; this program clearly constructs overwhelmingly one type of object. So what was going on?

In order to produce an “observed type” specialization – the more powerful kind – it needs to have data on the types of all of the passed arguments. And for the named arguments, that was missing. But why?

Logging of passed argument types is done on the callee side, since:

The argument logging was done as the interpreter executed each parameter processing instruction. However, when there’s a slurpy, then it would just swallow up all the remaining arguments without logging type information. Thus the information about the argument types was missing and we ended up with a less powerful form of specialization.

Teaching the slurpy handling code about argument logging felt a bit involved, but then I realized there was a far simpler solution: log all of the things in the argument buffer at the point an unspecialized frame is being invoked. We’re already logging the entry to the call frame at that point anyway. This meant that all of the parameter handling instructions got a bit simpler too, since they no longer had logging to do.

Conditional elimination

Having new being specialized in a more significant way was an immediate win. Of note, this part:

      nqp::eqaddr(
        (my $bless := nqp::findmethod(self,'bless')),
        nqp::findmethod(Mu,'bless')
      ),

Was quite interesting. Since we were now specializing on the type of self, then the findmethod could be resolved into a constant. The resolution of a method on the constant Mu was also a constant. Therefore, the result of the eqaddr (same memory address) comparison of two constants should also have been turned into a constant…except that wasn’t happening! This time, it was simple: we just hadn’t implemented folding of that yet. So, that was an easy win, and once done meant the optimizer could see that the if would always go a certain way and thus optimize the whole thing into the chosen branch. Thus the new method was specialized into something like:

multi method new(*%attrinit) {
    nqp::create(self).BUILDALL(Empty, %attrinit),
}

Further, the create could be optimized into a “fast create” op, and the relatively small BUILDALL method could be inlined into new. Not bad.

Generating simpler code

At this point, things were much better, but still not quite there. I took a look at the BUILDALL method compilation, and realized that it could emit faster code.

The %attrinit is a Perl 6 Hash object, which is for the most part a wrapper around the lower-level VM hash object, which is the actual hash storage. We were, curiously, already pulling this lower-level hash out of the Hash object and using the nqp::existskey primitive to check if there was a value, but were not doing that for actually accessing the value. Instead, an .AT-KEY('x') method call was being done. While that was being handled fairly well – inlined and so forth – it also does its own bit of existence checking. I realized we could just use the nqp::atkey primitive here instead.

Later on, I also realized that we could do away with nqp::existskey and just use nqp::atkey. Since a VM-level null is something that never exists in Perl 6, we can safely use it as a sentinel to mean that there is no value. That got us down to a single hash lookup.

By this point, we were just about winning the benchmark, but only by a few percent. Were there any more easy pickings?

An off-by-one

My next surprise was that the new method didn’t get inlined. It was within the size limit. There was nothing preventing it. What was going on?

Looking closer, it was even worse than that. Normally, when something is too big to be inlined, but we can work out what specialization will be called on the target, we do “specialization linking”, indicating which specialization of the code to call. That wasn’t happening. But why?

Some debugging later, I sheepishly fixed an off-by-one in the code that looks through a multi-dispatch cache to see if a particular candidate matches the set of argument types we have during optimization of a call instruction. This probably increased the performance of quite a few calls involving passing named arguments to multi-methods – and meant new was also inlined, putting us a good bit ahead on this benchmark.

What next?

The next round of performance improvements – both to this code and plenty more besides – will come from escape analysis, scalar replacement, and related optimizations. I’ve already started on that work, though expect it will keep me busy for quite a while. I will, however, be able to deliver it in stages, and am optimistic I’ll have the first stage of it ready to talk about – and maybe even merge – in a week or so.

brrt to the future: A future for fork(2)

Published by Bart Wiegmans on 2018-10-04T05:18:00

Hi hackers. Today I want to write about a new functionality that I've been developing for MoarVM that has very little to do with the JIT compiler. But it is still about VM internals so I guess it will fit.

Many months ago, jnthn wrote a blog post on the relation between perl 5 and perl 6. And as a longtime and enthusiastic perl 5 user - most of the JIT's compile time support software is written in perl 5 for a reason - I wholeheartedly agree with the 'sister language' narrative. There is plenty of room for all sorts of perls yet, I hope. Yet one thing kept itching me:
Moreover, it’s very much the case that Perl 5 and Perl 6 make different trade-offs. To pick one concrete example, Perl 6 makes it easy to run code across multiple threads, and even uses multiple threads internally (for example, performing optimization and JIT compilation on a background thread). Which is great…except the only winning move in a game involving both threads and fork() is not to play. Sometimes one just can’t have their cake and eat it, and if you’re wanting a language that more directly gives you your POSIX, that’s probably always going to be a strength of Perl 5 over Perl 6.
(Emphasis mine). You see, I had never realized that MoarVM couldn't implement fork(), but it's true. In POSIX systems, a fork()'d child process inherits the full memory space, as-is, from its parent process. But it inherits only the forking thread. This means that any operations performed by any other thread, including operations that might need to be protected by a mutex (e.g. malloc()), will be interrupted and unfinished (in the child process). This can be a problem. Or, in the words of the linux manual page on the subject:

       *  The child process is created with a single thread—the one that
called fork(). The entire virtual address space of the parent is
replicated in the child, including the states of mutexes,
condition variables, and other pthreads objects; the use of
pthread_atfork(3) may be helpful for dealing with problems that
this can cause.

* After a fork() in a multithreaded program, the child can safely
call only async-signal-safe functions (see signal-safety(7)) until
such time as it calls execve(2).

Note that the set of signal-safe functions is not that large, and excludes all memory management functions. As noted by jnthn, MoarVM is inherently multithreaded from startup, and will happily spawn as many threads as needed by the program. It also uses malloc() and friends rather a lot. So it would seem that perl 6 cannot implement POSIX fork() (in which both parent and child program continue from the same position in the program) as well as perl 5 can.

I was disappointed. As a longtime (and enthusiastic) perl 5 user, fork() is one of my favorite concurrency tools. Its best feature is that parent and child processes are isolated by the operating system, so each can modify its own internal state without concern for concurrency,. This can make it practical to introduce concurrency after development, rather than designing it in from the start. Either process can crash while the other continues. It is also possible (and common) to run a child process with reduced privileges relative to the parent process, which isn't possible with threads. And it is possible to prepare arbitrarily complex state for the child process, unlike spawn() and similar calls.

Fortunately all hope isn't necessarily lost. The restrictions listed above only apply if there are multiple threads active at the moment that fork() is executed. That means that if we can stop all threads (except for the one planning to fork) before actually calling fork(), then the child process can continue safely. That is well within the realm of possibility, at least as far as system threads are concerned.

The process itself isn't very exciting to talk about, actually - it involves sending stop signals to system threads, waiting for these threads to join, verifying that the forking thread is the really only active thread, and restarting threads after fork(). Because of locking, it is a bit subtle (tbere may be another user thread that is also trying to fork), but not actually very hard. When I finally merged the code, it turned out that an ancient (and forgotten) thread list modification function was corrupting the list by not being aware of generational garbage collection... oops. But that was simple enough to fix (thanks to timotimo++ for the hint).

And now the following oneliner should work on platforms that support fork():(using development versions of MoarVM and Rakudo):

perl6 -e 'use nqp; my $i = nqp::fork(); say $i;'

The main limitation of this work is that it won't work if there are any user threads active. (If there are any, nqp::fork() will throw an exception). The reason why is simple: while it is possible to adapt the system threads so that I can stop them on demand, user threads may be doing arbitrary work, hold arbitrary locks and may be blocked (possibly indefinitely) on a system call. So jnthn's comment at the start of this post still applies - threads and fork() don't work together.

In fact, many programs may be better off with threads. But I think that languages in the perl family should let the user make that decision, rather than the VM. So I hope that this will find some use somewhere. If not, it was certainly fun to figure out. Happy hacking!


PS: For the curious, I think there may in fact be a way to make fork() work under a multithreaded program, and it relies on the fact that MoarVM has a multithreaded garbage collector. Basically, stopping all threads before calling fork() is not so different from stopping the world during the final phase of garbage collection. And so - in theory - it would be possible to hijack the synchronization mechanism of the garbage collector to pause all threads. During interpretation, and in JIT compiled code, each thread periodically checks if garbage collection has started. If it has, it will synchronize with the thread that started GC in order to share the work. Threads that are currently blocked (on a system call, or on acquiring a lock) mark themselves as such, and when they are resumed always check the GC status. So it is in fact possible to force MoarVM into a single active thread even with multiple user threads active. However, that still leaves cleanup to deal with, after returning from fork() in the child process. Also, this will not work if a thread is blocked on NativeCall. Altogether I think abusing the GC in order to enable a fork() may be a bit over the edge of insanity :-)

Jo Christian Oterhals: Good points as always.

Published by Jo Christian Oterhals on 2018-09-30T10:18:43

Good points as always. I didn’t state it explicitly in the article, but I noted in the comment above that I believe the sqrt(2)*sqrt(2) behavior is a trick on Perl 5’s part (as is any calculations of sqrt(n) * n). For all practical purposes — at least mine — it is of no consequence whether the result is exact or an approximation close to perfect. The only reason for testing this was whether Perl 6 passed Matt Parker’s fun calculator tests.

As for trying FatRat: The only thing I wanted to check here was whether FatRat had its own implementation of sin, using an internal “FatNum” type for the calculation. It did not. Julia has the Float128 type and Perl 5 can use bignum should that level of precision be necessary.

Personally I have no use for bignums, other than for demonstration purposes like this. Take my Cobol vs Perl 6 article where I implemented a version of Muller’s Recurrence. There I praised Perl 6’s Rats. But someone commented that even Rats will break given a high enough number of iterations. I have never read the Perl 6 specification, so I don’t know whether this is one the road map. But it seems to me there could be some use for a use bignum type of functionality in a case like that, for instance by using FatRats instead of Rats under the hood when declaring stuff like my $a = 1/9. The same goes for Nums.

At least it could have been useful in a just-for-fun article such as this :-)

6guts: Eliminating unrequired guards

Published by jnthnwrthngtn on 2018-09-29T19:59:28

MoarVM’s optimizer can perform speculative optimization. It does this by gathering statistics as the program is interpreted, and then analyzing them to find out what types and callees typically show up at given points in the program. If it spots there is at least a 99% chance of a particular type showing up at a particular program point, then it will optimize the code ahead of that point as if that type would always show up.

Of course, statistics aren’t proofs. What about the 1% case? To handle this, a guard instruction is inserted. This cheaply checks if the type is the expected one, and if it isn’t, falls back to the interpreter. This process is known as deoptimization.

Just how cheap are guards?

I just stated that a guard cheaply checks if the type is the expected one, but just how cheap is it really? There’s both direct and indirect costs.

The direct cost is that of the check. Here’s a (slightly simplified) version of the JIT compiler code that produces the machine code for a type guard.

/* Load object that we should guard */
| mov TMP1, WORK[obj];
/* Get type table we expect and compare it with the object's one */
MVMint16 spesh_idx = guard->ins->operands[2].lit_i16;
| get_spesh_slot TMP2, spesh_idx;
| cmp TMP2, OBJECT:TMP1->st;
| jne >1;
/* We're good, no need to deopt */
| jmp >2;
|1:
/* Call deoptimization handler */
| mov ARG1, TC;
| mov ARG2, guard->deopt_offset;
| mov ARG3, guard->deopt_target;
| callp &MVM_spesh_deopt_one_direct;
/* Jump out to the interpreter */
| jmp ->exit;
|2:

Where get_spesh_slot is a macro like this:

|.macro get_spesh_slot, reg, idx;
| mov reg, TC->cur_frame;
| mov reg, FRAME:reg->effective_spesh_slots;
| mov reg, OBJECTPTR:reg[idx];
|.endmacro

So, in the case that the guard matches, it’s 7 machine instructions (note: it’s actually a couple more because of something I omitted for simplicity). Thus there’s the cost of the time to execute them, plus the space they take in memory and, especially, the instruction cache. Further, one is a conditional jump. We’d expect it to be false most of the time, and so the CPU’s branch predictor should get a good hit rate – but branch predictor usage isn’t entirely free of charge either. Effectively, it’s not that bad, but it’s nice to save the cost if we can.

The indirect costs are much harder to quantify. In order to deoptimize, we need to have enough state to recreate the world as the interpreter expects it to be. I wrote on this topic not so long ago, for those who want to dive into the detail, but the essence of the problem is that we may have to retain some instructions and/or forgo some optimizations so that we are able to successfully deoptimize if needed. Thus, the presence of a guard constrains what optimizations we can perform in the code around it.

Representation problems

A guard instruction in MoarVM originally looked like:

sp_guard r(obj) sslot uint32

Where r(obj) is an object register to read containing the object to guard, the sslot is a spesh slot (an entry in a per-block constant table) containing the type we expect to see, and the uint32 indicates the target address after we deoptimize. Guards are inserted after instructions for which we had gathered statistics and determined there was a stable type. Things guarded include return values after a call, reads of object attributes, and reads of lexical variables.

This design has carried us a long way, however it has a major shortcoming. The program is represented in SSA form. Thus, an invoke followed by a guard might look something like:

invoke r6(5), r4(2)
sp_guard r6(5), sslot(42), litui32(64)

Where r6(5) has the return value written into it (and thus is a new SSA version of r6). We hold facts about a value (if it has a known type, if it has a known value, etc.) per SSA version. So the facts about r6(5) would be that it has a known type – the one that is asserted by the guard.

The invoke itself, however, might be optimized by performing inlining of the callee. In some cases, we might then know the type of result that the inlinee produces – either because there was a guard inside of the inlined code, or because we can actually prove the return type! However, since the facts about r6(5) were those produced by the guard, there was no way to talk about what we know of r6(5) before the guard and after the guard.

More awkwardly, while in the early days of the specializer we only ever put guards immediately after the instructions that read values, more recent additions might insert them at a distance (for example, in speculative call optimizations and around spesh plugins). In this case, we could not safely set facts on the guarded register, because those might lead to wrong optimizations being done prior to the guard.

Changing of the guards

Now a guard instruction looks like this:

sp_guard w(obj) r(obj) sslot uint32

Or, concretely:

invoke r6(5), r4(2)
sp_guard r6(6), r6(5), sslot(42), litui32(64)

That is to say, it introduces a new SSA version. This means that we get a way to talk about the value both before and after the guard instruction. Thus, if we perform an inlining and we know exactly what type it will return, then that type information will flow into the input – in our example, r6(5) – of the guard instruction. We can then notice that the property the guard wants to assert is already upheld, and replace the guard with a simple set (which may itself be eliminated by later optimizations).

This also solves the problem with guards inserted away from the original write of the value: we get a new SSA version beyond the guard point. This in turn leads to more opportunities to avoid repeated guards beyond that point.

Quite a lot of return value guards on common operations simply go away thanks to these changes. For example, in $a + $b, where $a and $b are Int, we will be able to inline the + operator, and we can statically see from its code that it will produce an Int. Thus, the guard on the return type in the caller of the operator can be eliminated. This saves the instructions associated with the guard, and potentially allows for further optimizations to take place since we know we’ll never deoptimize at that point.

In summary

MoarVM does lots of speculative optimization. This enables us to optimize in cases where we can’t prove a property of the program, but statistics tell us that it mostly behaves in a certain way. We make this safe by adding guards, and falling back to the general version of the code in cases where they fail.

However, guards have a cost. By changing our representation of them, so that we model the data coming into the guard and after the guard as two different SSA versions, we are able to eliminate many guard instructions. This not only reduces duplicate guards, but also allows for elimination of guards when the broader view afforded by inlining lets us prove properties that we weren’t previously able to.

In fact, upcoming work on escape analysis and scalar replacement will allow us to start seeing into currently opaque structures, such as Scalar containers. When we are able to do that, then we’ll be able to prove further program properties, leading to the elimination of yet more guards. Thus, this work is not only immediately useful, but also will help us better exploit upcoming optimizations.

6guts: Faster box/unbox and Int operations

Published by jnthnwrthngtn on 2018-09-28T22:43:55

My work on Perl 6 performance continues, thanks to a renewal of my grant from The Perl Foundation. I’m especially focusing on making common basic operations faster, the idea being that if those go faster than programs composed out of them also should. This appears to be working out well: I’ve not been directly trying to make the Text::CSV benchmark run faster, but that’s resulted from my work.

I’ll be writing a few posts in on various of the changes I’ve done. This one will take a look at some related optimizations around boxing, unboxing, and common mathematical operations on Int.

Boxing and unboxing

Boxing is taking a natively typed value and wrapping it into an object. Unboxing is the opposite: taking an object that wraps a native value and getting the native value out of it.

In Perl 6, these are extremely common. Num and Str are boxes around num and str respectively. Thus, unless dealing with natives explicitly, working with floating point numbers and strings will involve lots of box and unbox operations.

There’s nothing particularly special about Num and Str. They are normal objects with the P6opaque representation, meaning they can be mixed into. The only thing that makes them slightly special is that they have attributes marked as being a box target. This indicates the attribute out as being the one to write to or read from in a box or unbox operation.

Thus, a box operations is something like:

While unbox is:

Specialization of box and unbox

box is actually two operations: an allocation and a store. We know how to fast-path allocations and JIT them relatively compactly, however that wasn’t being done for box. So, step one was to decompose this higher-level op into the allocation and the write into the allocated object. The first step could then be optimized in the usual way allocations are.

In the unspecialized path, we first find out where to write the native value to, and then write it. However, when we’re producing a specialized version, we almost always know the type we’re boxing into. Therefore, the object offset to write to can be calculated once, and a very simple instruction to do a write at an offset into the object produced. This JITs extremely well.

There are a couple of other tricks. Binds into a P6opaque generally have to check that the object wasn’t mixed in to, however since we just allocated it then we know that can’t be the case and can skip that check. Also, a string is a garbage-collectable object, and when assigning one GC-able object into another one, we need to perform a write barrier for the sake of generational GC. However, since the object was just allocated, we know very well that it is in the nursery, and so the write barrier will never trigger. Thus, we can omit it.

Unboxing is easier to specialize: just calculate the offset, and emit a simpler instruction to load the value from there.

I’ve also started some early work (quite a long way from merge) on escape analysis, which will allow us to eliminate many box object allocations entirely. It’s a great deal easier to implement this if allocations, reads, and writes to an object have a uniform representation. By lowering box and unbox operations into these lower level operations, this eases the path to implementing escape analysis for them.

What about Int?

Some readers might have wondered why I talked about Num and Str as examples of boxed types, but not Int. It is one too – but there’s a twist. Actually, there’s two twists.

The first is that Int isn’t actually a wrapper around an int, but rather an arbitrary precision integer. When we first implemented Int, we had it always use a big integer library. Of course, this is slow, so later on we made it so any number fitting into a 32-bit range would be stored directly, and only allocate a big integer structure if it’s outside of this range.

Thus, boxing to a big integer means range-checking the value to box. If it fits into the 32-bit range, then we can write it directly, and set the flag indicating that it’s a small Int. Machine code to perform these steps is now spat out directly by the JIT, and we only fall back to a function call in the case where we need a big integer. Once again, the allocation itself is emitted in a more specialized way too, and the offset to write to is determined once at specialization time.

Unboxing is similar. Provided we’re specializing on the type of the object to unbox, then we calculate the offset at specialization time. Then, the JIT produces code to check if the small Int flag is set, and if so just reads and sign extends the value into a 64-bit register. Otherwise, it falls back to the function call to handle the real big integer case.

For boxing, however, there was a second twist: we have a boxed integer cache, so for small integers we don’t have to repeatedly allocate objects boxing them. So boxing an integer is actually:

  1. Check if it’s in the range of the box cache
  2. If so, return it from the cache
  3. Otherwise, do the normal boxing operation

When I first did these changes, I omitted the use of the box cache. It turns out, however, to have quite an impact in some programs: one benchmark I was looking at suffered quite a bit from the missing box cache, since it now had to do a lot more garbage collection runs.

So, I reinstated use of the cache, but this time with the JIT doing the range checks in the produced machine code and reading directly out of the cache in the case of a hit. Thus, in the cache hit case, we now don’t even make a single function call for the box operation.

Faster Int operations

One might wonder why we picked 32-bit integers as the limit for the small case of a big integer, and not 64-bit integers. There’s two reasons. The most immediate is that we can then use the 32 bits that would be the lower 32 of a 64-bit pointer to the big integer structure as our “this is a small integer” flag. This works reliably as pointers are always aligned to at least a 4-byte boundary, so a real pointer to a big integer structure would never have the lowest bits set. (And yes, on big-endian platforms, we swap the order of the flag and the value to account for that!)

The second reason is that there’s no portable way in C to detect if a calculation overflowed. However, out of the basic math operations, if we have two inputs that fit into a 32-bit integer, and we do them at 64-bit width, we know that the result can never overflow the 64-bit integer. Thus we can then range check the result and decide whether to store it back into the result object as 32-bit, or to store it as a big integer.

Since Int is immutable, all operations result in a newly allocated object. This allocation – you’ll spot a pattern by now – is open to being specialized. Once again, finding the boxed value to operate on can also be specialized, by calculating its offset into the input objects and result object. So far, so familiar.

However, there’s a further opportunity for improvement if we are JIT-compiling the operations to machine code: the CPU has flags for if the last operation overflowed, and we can get at them. Thus, for two small Int inputs, we can simply:

  1. Grab the values
  2. Do the calculation at 32-bit width
  3. Check the flags, and store it into the result object if no overflow
  4. If it overflowed, fall back to doing it wider and storing it as a real big integer

I’ve done this for addition, subtraction, and multiplication. Those looking for a MoarVM specializer/JIT task might like to consider doing it for some of the other operations. :-)

In summary

Boxing, unboxing, and math on Int all came with various indirections for the sake of generality (coping with mixins, subclassing, and things like IntStr). However, when we are producing type-specialized code, we can get rid of most of the indirections, resulting in being able to perform them faster. Further, when we JIT-compile the optimized result into machine code, we can take some further opportunities, reducing function calls into C code as well as taking advantage of access to the overflow flags.

samcv: Adding and Improving File Encoding Support in MoarVM

Published on 2018-09-26T07:00:00

Encodings. They may seem to some as horribly boring and bland. Nobody really wants to have to worry about the details. Encodings are how we communicate, transmit data. Their purpose is to be understood by others. When they work, nobody should even know they are there. When they don’t — and everyone can probably remember a time when they have tried to open a poorly encoded or corrupted file and been disappointed — they cannot be ignored.

Here I talk about the work I have done in the past and work still in progress to improve the support of current encodings, add new encodings, and add new features and new options to allow Perl 6 and MoarVM to support more encodings and in a way which better achieves the goals encodings were designed to solve.

Table of Contents

Which Encodings Should We Add?

I started looking at the most common encodings on the Web. We supported the two most common, UTF-8 and ISO-8859-1 but we did not support windows-1251 (Cyrillic) or ShiftJIS. This seemed like the makings of a new project so I got to work.

I decided to start with windows-1251 since it was an 8 bit encoding while Shift-JIS was a variable length one or two byte encoding (the shift in the name comes from the second byte 'shift'). While the encoding itself was simpler than Shift-JIS, I was soon realizing some issues with both windows-1251 and our already supported windows-1252 encoding.

One of the first major issues I found in windows-1252, you could create files which could not be decoded by other programs. An example of this is codepoint 0x81 (129) which does not have a mapping in the windows-1252 specification. This would cause many programs to not detect the file as windows-1252, or to fail saying the file was corrupted and could not be decoded.

While our non-8bit encodings would throw an error if asked to encode text which did not exist in that codemapping, our 8 bit encodings would happily pass through invalid codepoints, as long as they fit in 8 bits.

As I said at the start, encodings are a way for us to communicate, to exchange data and to be understood. This to me indicated a major failing. While the solution could be to never attempt to write codepoints that don’t exist in the target encoding, that was not an elegant solution and would just be a workaround.

To remedy this I had to add new ops, such as decodeconf to replace decode op, and encodeconf to replace encode (plus many others). This would allow us to specify a configuration for our encodings and allow Perl 6 to tell MoarVM if it wanted to encode strictly according to the standard or to be more lenient.

I added a new :strict option to open, encode and a few others to allow you to specify if it should be encoding strict or not strict. In the needs of backwards compatibility it still defaults to leniently encoding and decoding. Strict is planned to become the default for Perl 6 specification 6.e.

Replacements

Some of our other encodings had support for 'replacements'. If you tried to encode something that would not fit in the target encoding, this allowed you to supply a string that could be one or more characters which would but substituted instead of causing MoarVM to throw an exception.

Once I had the strictness nailed down I was able to add support for replacements so one could have the ability to write data in a strict mode while still ensuring all compatible characters would get written properly, and you did not have to choose between unstrict and writing incompatible files and strict and having the encoding or file write failing when you really needed it not to.

Shift-JIS

Encodings are not without difficulty. As with the previous encodings I talked about, Shift-JIS would not be without decisions that had to be made. With Shift-JIS, the question became, "which Shift-JIS?".

You see, there are dozens of different extensions to Shift-JIS, in addition to original Shift-JIS encoding. As a variable length encoding, most of the one byte codepoints had been taken while there were many hundreds of open and unallocated codepoints in the two byte range. This saw the wide proliferation of manufacturer specific extensions which other manufacturers may adopt while adding their own extensions to the format.

Which of the dozens of different Shift-JIS encodings should we support? I eventually decided on windows-932 because that is the standard that is used by browsers when they encounter Shift-JIS text and encompasses a great many symbols. It is the most widely used Shift-JIS format. This would allow us to support the majority of Shift-JIS encoded documents out there, and the one which web browsers use on the web. Most but not all the characters in it are compatible with the other extensions, so it seemed like it was the sensible choice.

The only notable exception to this is that windows-932 is not totally compatible with the original Shift-JIS encoding. While the original windows-932 encoding (and some extensions to it) map the ASCII’s backslash to the yen symbol ¥, windows-932 keeps it to the backslash. This was to retain better compatibility with ASCII and other formats.

UTF-16

While MoarVM had a UTF-16 encoder and decoder, it was not fully featured.

  • You could encode a string into a utf16 buffer:

    • "hello".encode('utf16') #> utf16:0x<68 65 6c 6c 6f>

  • You could decode a utf16 buffer into a string:

    • utf16.new(0x68, 0x65, 0x6C, 0x6C, 0x6F).decode('utf16') #> hello

That was about it. You couldn’t open a filehandle as UTF-16 and write to it. You couldn’t even do $fh.write: "hello".encode('utf16') as the write function did not know how to deal with writing a 16 bit buffer (it expected an 8-bit buffer).

In addition there was no streaming UTF-16 decoder, so there was no way to read a UTF-16 file. So I set out to work, first allowing us to write 16-bit buffers to a file and then being able to .print a filehandle and write text to it.

At this point I knew that I would have to confront the BOM in the room, but decided to first implement the streaming decoder and ensure all of our file handle operations worked.

The BOM in the Room

You may be noticing a theme here. You do not just implement an encoding. It does not exist in a vacuum. While there may be standards, there may be multiple standards. And those standards may or may not be followed in practice or it may not be totally straightforward.

Endianess defines which order the bytes are in a number. Big endian machines will store a 16-bit number with the larger section first, similar to how we write numbers, while little endian machines will put the smallest section first. This only matters for encoding numbers that are more than one byte. UTF-16 can be either big endian or little endian. Meaning big and little endian files will contain the same bytes, but the order is different.

Since swapping the two bytes of a 16-bit integer causes a different integer instead of an invalid one, the creators of UTF-16 knew it was not possible to determine with certainty the endianess of a 16 bit number. And so the byte order mark was created, a codepoint that would be added at the start of a file to signify which endianess the file was. Since they didn’t want to break already existing software, they used a "Zero width no-break space" (ZWNBSP) and designated that a ZWNBSP with bytes reversed would be invalid in UTF-16, a "non-character".

If the BOM was not removed, it would not be visible since it is zero width. If your program opens a file and sees a byte order mark it knew it was in the correct endianess. If it’s the "non-character" then it knew it had to swap the bytes when it read the file.

The standard states that utf16 should always use a BOM and that when reading a file as utf16 the BOM should be used to detect the endianess. It states the BOM is not passed through as it is assumed to not be part of the actual data. If it doesn’t exist then the system is supposed to decode as the endianess that the context suggests. So you get a format where you may lose the first codepoint of your file if that codepoint happens to be a zero width non-break space.

For utf16be (big endian) and utf16le (little endian) the standard states the byte order mark should not exist, and that any zero width no-break space at the start of a file is just that, and should be passed through.

Standards and practice are not identical and in practice many programs will remove a leading zero width non-break space even if set explicitly to utf16be or utf16le. I was unsure which route I should take. I looked at how Perl 5 handled it and after thinking for a long time I decided that we would follow the spec. Even though other software will get rid of any detected byte order mark, I think it’s important that if a user specifies utf16be or utf16le, they will always get back the same codepoints that they write to a file.

Current Work

I am currently working on adding support so that when you write a file in 'utf16' it will add a byte order mark, as utf16 is supposed to always have one. Until this is added you can write a file with Perl 6 on one computer using the 'utf16' encoding which will not decode correctly using the same 'utf16' setting if the two computers are of different endianess.

Since there was no functionality for writing utf-16 to a file previously, there should be no backwards compatibility issues to worry about there and we can set it to add a byte order mark if the utf16 encoder is used.

Release 2018.09

In the last release utf16, utf16le and utf16be should work, aside from utf16 not adding a byte order mark on file write. Anyone who uses utf16 and finds any issues or comments are free to email me or make an issue on the MoarVM or Rakudo issue trackers.

brrt to the future: Template Compiler Update

Published by Bart Wiegmans on 2018-09-04T17:30:00

Hi everybody. After samcv++ released MoarVM again, it was finally time to introduce some updates to the expression template compiler.

As you may or may not know, the expression JIT backend maps MoarVM opcodes to its internal representation via expression templates. At runtime (during the JIT compilation phase), these templates are combined to form an expression tree. The expression template language remains somewhat underdocumented, but quite a few brave developers have still ventured to add templates. With the new template compiler, a few things will change:

(macro: ^foo (,bar)
(let (($obj ...))
(add $obj ,bar)))
(template: foobar
(let (($obj ...))
(^foo $obj))

Prior to these patches, this would expand to something like:

(template: foobar
(let (($obj ...))
(let (($obj ...)) # name conflict is here
(add $obj $obj))) # which $obj?

Although this code would previously fail during compilation, having to deal with such conflicts in remote code segments is annoying. But now named declarations are resolved prior to expanding the inner macro, and this means that the $obj for macro ^foo is resolved to the let: within that macro, and the outer $obj in the template refers to the outer declaration, as if you'd written:

(template: foobar
(let (($obj1 ...))
(let (($obj2 ...))
(add $obj2 $obj1)))

I believe that this makes writing templates safer and simpler, catching more errors at compile time and fewer at runtime. Happy hacking!

my Timotimo \this: Faster FASTA, please

Published by Timo Paulssen on 2018-08-31T16:52:11

Faster FASTA, please

The other day on Stack Overflow: User Beuss asked "What is the best way for dealing with very big files?". The question features the task of parsing a FASTA file, which is a common file format in Bioinformatics. The format is extremely simple. It's based on lines, where a line starting with a > gives an identifier which is then followed by any number of lines containing protein sequences or nucleic acid sequences. The simplest you'll see will contain the letters A, C, G, and T, though there are far more letters that have meaning, and extra symbols. The "parsing" part of the question was limited to efficiently splitting the file up in its individual sequences and storing each sequence in a hash keyed on its identifier.

The end of the question asks, specifically, "Did I reach the maximum performances for the current version of Perl6 ?". This immediately caught my eye, of course.

In this post I'd like to take you through my process of figuring out the performance characteristics and potential gains for a program like this.

Let's start with the original code suggested by Beuss and make a few tiny modifications: The original Stack Overflow code was lacking the class seq, but as far as I could tell it needed nothing but two attributes. There was a line that would output every id it came across, but I/O like that is really slow, so I just removed it. I also added very simple timing code around the three stages: Slurping, splitting, and parsing. Here's the result:

my class seq {
  has $.id;
  has $.seq;
}
my class fasta {
  has Str $.file is required;
  has %!seq;

  submethod TWEAK() {
    my $id;
    my $s;
    my $now = now;

    say "Slurping ...";
    my $f = $!file.IO.slurp;
    say "slurped in { now - $now }";
    $now = now;

    say "Splitting file ...";
    my @lines = $f.split(/\n/);
    say "split in { now - $now }";
    $now = now;

    say "Parsing lines ...";
    for @lines -> $line {
      if $line !~~ /^\>/ {
          $s ~= $line;
      }
      else {
        if $id.defined {
          %!seq{$id} = seq.new(id => $id, seq => $s);
        }
        $id = $line;
        $id ~~ s:g/^\>//;
        $s = "";
      }
    }
    %!seq{$id} = seq.new(id => $id, seq => $s);
    say "parsed in { now - $now }";
  }
}

sub MAIN()
{
    my $f = fasta.new(file => "genome.fa");
}

And let's generate an example genome.fa file to test it out with. This one-liner will give you a genome.fa file that's 150_000_200 characters long, has 2_025_704 lines in total, 192_893 of which are lines with identifiers, and the remaining 1_832_811 lines are sequence lines with 80 characters each.

perl6 -e 'srand(2); my $f = "genome.fa".IO.open(:w); while $f.tell < 150_000_000 { $f.put(">" ~ flat("A".."Z", "a".."z", "0".."9", "_", "-").roll((5..7).pick).join); $f.put(<A C G T>.roll(80).join()) for ^(3..16).pick }'

This script has not been optimized for performance at all ;)

Okay, we're just about ready to go with this. Let's have a look at how long only the first two stages take by just hitting Ctrl-C after it outputs "Parsing lines ...":

Slurping ...
slurped in 1.4252846
Splitting file ...
split in 30.75685953
Parsing lines ...

Huh. That's pretty darn slow, isn't it? 67k lines split per second? We should really be able to do better than that. Let's zoom in on the slurping and splitting:

say "Slurping ...";
my $f = $!file.IO.slurp;

say "Splitting file ...";
my @lines = $f.split(/\n/);

My experience with Rakudo has taught me many times that currently our regexes are much more expensive than they have to be. Even though this regex is extremely simple, the regex engine is currently an all-or-nothing deal.

Let's use the built-in method lines on Str instead and see how that fares:

Slurping ...
slurped in 1.4593975
Splitting file ...
split in 2.9614959
Parsing lines ...
parsed in 32.9007177

Cool, that's already a 10x as much performance for just the splitting! If I had let the program run to completion before, the whole program's run time would have been 50% slurping and splitting and 50% parsing. But if you look at the parsing part, there's two more regexes in there, too:

for @lines -> $line {
  if $line !~~ /^\>/ {
      # ...
  }
  else {
    # ...
    $id ~~ s:g/^\>//;
    $s = "";
  }
}

Can we do the same thing without regex here, too? Sure! $line !~~ /^\>/ is equivalent to the much, much faster not $line.starts-with(">"), and since we already know in that branch of the if statement that the line starts with > we can replace $id ~~ s:g/^\>// with just $id .= substr(1). Let's see what happens to the performance now:

Slurping ...
slurped in 1.463816
Splitting file ...
split in 2.9924887
Parsing lines ...
parsed in 3.8784822

Cool. It's about 8.5x as much speed. In total, it used to take about 1m10s, now it takes 8.6s.

Second implementation

Let's switch gears for a bit. Before I came into the discussion, Stack Overflow user Christoph already came up with a good answer. They also immediately had the instinct to cut out the regex/grammar engine to get a speed-up. The first suggested piece of code looks like this:

my %seqs = slurp('genome.fa', :enc<latin1>).split('>')[1..*].map: {
    .[0] => .[1..*].join given .split("\n");
}

It works like this: It splits the whole file by the > character. Now every chunk after the split is a string consisting of the ID line and all FASTA sequence lines that come after it and before the next ID line - except of course if there's a > in the middle of some line. [1]

Since the file itself starts with a > character [2] we have to skip the very first entry, as it would just be an empty string. The code does that with array slice syntax [1..*]. Then in the block it splits the individual strings that each start with the ID line followed by the sequence data into lines.

I like this answer a lot. It's short and sweet, but it's not golfed to the point of being unreadable. Let's see how it performs!

time perl6 christoph-fasta.p6 
38.29user 0.53system 0:38.85elapsed 99%CPU (0avgtext+0avgdata 1040836maxresident)k

Whoops, that's very slow compared to our optimized code from above! Since this program is mostly methods from built-in classes, we'll most likely have to find more efficient versions of what we've got in the code right now.

The slurp and split invocations in the beginning are probably as fast as we can get, but what about using [1..*] to skip the first element?

split returns a Seq object, which is the Perl 6 user's way to work with iterators. One important feature of Seq is that it throws away values after they have been consumed. However, if you use array accesses like [1], [4, 5, 2, 1] it can't do that. The code doesn't know if you're going to have lower indices later in the list, so writing that last example would lead to an error. So it caches the values - literally by calling the cache method on the Seq.

Surely there's a way to skip a single element without having to memoize the resulting list? Turns out that there is: The skip method is one of the few methods on the Seq class itself! Let's go ahead and replace the first [1..*] with a call to skip. Another thing we can do is replace .[0] with .head and the other .[1..*] with a .skip(1) as well. For these to work we'll have to add our own .cache call on the .split, though. Here's the code we end up with:

my %seqs = slurp('genome.fa', :enc<latin-1>).split('>').skip(1).map: {
    .head => .skip(1).join given .split("\n").cache;
}

And here's the run time:

time perl6 christoph-fasta-no-circumfix.p6 
12.18user 0.57system 0:12.79elapsed 99%CPU (0avgtext+0avgdata 1034176maxresident)k

That's already better, but no-where near where I'd like it to be. However, I couldn't yet come up with a way to make this variant any faster.

Third Implementation

Stack Overflow user Christoph also had a second implementation in their answer. It's based on finding the next interesting character with the .index method on strings. Let's see how it compares! Here's the code in full:

my %seqs;
my $data = slurp('genome.fa', :enc<latin1>);
my $pos = 0;
loop {
    $pos = $data.index('>', $pos) // last;

    my $ks = $pos + 1;
    my $ke = $data.index("\n", $ks);

    my $ss = $ke + 1;
    my $se = $data.index('>', $ss) // $data.chars;

    my @lines;

    $pos = $ss;
    while $pos < $se {
        my $end = $data.index("\n", $pos);
        @lines.push($data.substr($pos..^$end));
        $pos = $end + 1
    }

    %seqs{$data.substr($ks..^$ke)} = @lines.join;
}

And a first timing run:

time perl6 christoph-two.p6 
15.65user 0.44system 0:16.05elapsed 100%CPU (0avgtext+0avgdata 1011608maxresident)k

Now that doesn't look too bad. It's already faster than the previous implementation's first version, but a bit slower than the final version I presented just above.

Let's grab a profile with the profiler and see what we can find!

Opening up the routines tab and sorting by exclusive time, we immediately see something rather suspicious:

Faster FASTA, please

Here we can see that the Range construction operator ..^ is responsible for a big chunk of time – almost a third – and the substr method is responsible for almost a quarter. The new method just below that isn't actually interesting, as it's just always called by the ..^ operator, and as such has its time captured in the operator's inclusive time.

So what are we using ..^ for? The code uses the form of substr that passes a Range instead of a start and length argument. If you have a start and an end position like in this case, it's a whole lot nicer to look at. Unfortunately, it seems to suffer from a large amount of overhead.

Let's rewrite the code to use the .substr($start, $amount) form instead. The transformation is very simple:

@lines.push($data.substr($pos..^$end));
# becomes
@lines.push($data.substr($pos, $end - $pos));
# and
%seqs{$data.substr($ks..^$ke)} = @lines.join;
# becomes
%seqs{$data.substr($ks, $ke - $ks)} = @lines.join;

And now we can time the result:

time perl6 christoph-two-no-range.p6 
8.34user 0.44system 0:08.72elapsed 100%CPU (0avgtext+0avgdata 1010172maxresident)k

Great result! We've shaved off around 45% of the run time with just our first find!

What else can we do? Let's see if sprinkling some native types would help gain performance for this script. Let's make all the integer variables be typed int, which is a native 64bit integer instead of the potentially infinitely big Int. We can also turn the @lines array into a native string array, saving us one layer of indirection for every entry we have. Here's the full code:

my %seqs;
my $data = slurp('genome.fa', :enc<latin1>);
my int $pos = 0;
loop {
    $pos = $data.index('>', $pos) // last;

    my int $ks = $pos + 1;
    my int $ke = $data.index("\n", $ks);

    my int $ss = $ke + 1;
    my int $se = $data.index('>', $ss) // $data.chars;

    my str @lines;

    $pos = $ss;
    while $pos < $se {
        my int $end = $data.index("\n", $pos);
        @lines.push($data.substr($pos, $end - $pos));
        $pos = $end + 1
    }

    %seqs{$data.substr($ks, $ke - $ks)} = @lines.join;
}

And here's the timing:

time perl6 christoph-two-no-range-native-int.p6 
6.29user 0.36system 0:06.60elapsed 100%CPU (0avgtext+0avgdata 1017040maxresident)k

Oh, huh. That's not quite an amazing improvement, but let's see if we can push it further by turning $data into a native string! Surely that'll give us a little speed-up?

time perl6 christoph-two-no-range-native-int-str.p6 
7.16user 0.36system 0:07.45elapsed 100%CPU (0avgtext+0avgdata 1017076maxresident)k

Isn't that interesting? Turns out that in order to call a method on a native string, rakudo has to create a temporary Str object to "box" the value into something that can have methods and such. That means that every method call on $data will create one shiny Str object for us. That's not quite what we want ☺

Conveniently, there are also sub forms of index and substr. We can either rewrite the method calls to sub calls and move the invocant (this is how we refer to the thing the method is called on) to be the first argument, or we can use the convenient "use sub but with method syntax" feature Perl 6 has. It looks like $data.&substr($ks, $ke - $ks) and all it does is put the invocant as the first argument of the sub and the rest of the arguments follow.

Unfortunately, there aren't actually candidates for these subs that will take native strings and ints, and so we'll end up with the same problem!

Eliminating boxed objects, Str, Int, Num, and similar things, is actually on the agenda for MoarVM. Recent improvements to the dynamic specializer "spesh" by jnthn have been laying the foundation on top of which improving this situation should be doable.

Illegal Performance Gains

So is this the most we can get? Not quite. That was actually a pun, because there's a thing called NQP, which stands for "Not Quite Perl". It's both a separate language with much stricter rules that rakudo itself is written in, and the namespace under which most of the low-level operations that the VM knows are available. These ops are not part of the Perl 6 Language Specification, and the rakudo developers do not guarantee that any code you write using NQP ops will continue working on newer versions of rakudo.

What it does allow us to do is find out where the performance ceiling is, roughly. I'll first write the code to use NQP ops, and then I'll explain what I mean by that.

use nqp;

my Mu $seqs := nqp::hash();
my str $data = slurp('genome.fa', :enc<latin1>);
my int $pos = 0;

my str @lines;

loop {
    $pos = nqp::index($data, '>', $pos);

    last if $pos < 0;

    my int $ks = $pos + 1;
    my int $ke = nqp::index($data, "\n", $ks);

    my int $ss = $ke + 1;
    my int $se = nqp::index($data ,'>', $ss);

    if $se < 0 {
        $se = nqp::chars($data);
    }

    $pos = $ss;
    my int $end;

    while $pos < $se {
        $end = nqp::index($data, "\n", $pos);
        nqp::push_s(@lines, nqp::substr($data, $pos, $end - $pos));
        $pos = $end + 1
    }

    nqp::bindkey($seqs, nqp::substr($data, $ks, $ke - $ks), nqp::join("", @lines));
    nqp::setelems(@lines, 0);
}

Let's go through it piece by piece. The first line is new, it's use nqp;. It's a synonym for use MONKEY-GUTS; which is a bold declaration meaning in essence "I know what I'm doing, and I deserve whatever I've got coming to me".

We'll use a low-level hash object taken from the nqp world by binding to a scalar variable. We use the type constraint Mu here, because nqp types aren't part of the Perl 6 type hierarchy, and thus will not go through a type check for Any, which is the default type constraint for scalar variables. Also, it does not do the Associative role, which is why we can't bind it to a variable with a % sigil.

Next, we'll pull out the @lines array, so that we don't have to allocate a new one for every round through the loop. We don't have to use nqp::list_s() here like with the hash, because the native string array you get from my str @foo has barely any overhead if we use nqp ops on it rather than methods.

I've removed usage of the // operator, though I am not actually sure how much overhead it has.

The signature of nqp::index is the same as the three-argument sub version of index, and the same is true for the nqp::substr op. There's also an nqp::join op that will only accept a native (or literal) string as first argument and a native string array as the second argument.

You'll also notice that the $end variable is now outside of the inner loop. That has a relatively simple reason: A block that introduces lexical variables cannot be inlined into the outer block. That means that the inner block has to be invoked as a closure, so that it has access to all of the relevant variables. This adds the combined overhead of invoking a code object and of taking a closure. The Garbage Collector has to sweep all of those closures up for us. It'll be best not to generate them in the first place.

We use the nqp op nqp::push_s to add to the @lines array because the regular nqp::push op works with "objects", rather than native strings.

Then there's something that has no corresponding piece of code in the previous version: nqp::setelems(@lines, 0). Since we keep the @lines array around instead of building a new one every time, we have to empty it out. That's what nqp::setelems does, and it's very cheap.

A profile of this code tells us that all that's left being allocated is Str objects, exactly 192_899 of them. This comes from the fact that the hash wants to store objects, not native strings.

Let's see what the run time is!

time perl6 christoph-two-no-range-native-nqp.p6 
2.04user 0.33system 0:02.27elapsed 104%CPU (0avgtext+0avgdata 1004752maxresident)k

Whew! Our fastest implementation so far took 6.6s, now we're down to 2.3s, which is close to a third of the time the second-fastest version takes.

What's a "performance ceiling"?

Everything our code does will at some point end up using nqp:: ops to actually do work. Have a look at the substr method of Str, the index method of Str, the push method of strarray, and the ASSIGN-KEY method of Hash, that sits behind postcircumfix:<[ ]>. In between our code and these ops there are often multiple layers of methods that make things more comfortable to work with, or check values for validity.

Rakudo's static optimizer already works towards simpler code consisting of fewer calls. For example it would replace a call to infix:<+> with a low-level nqp::add_i op if it knows only native ints are involved. MoarVM's dynamic specializer has a lot more knowledge to work with, as it watches the code during execution, and can speculatively inline sub and method calls. After inlining, more optimizations are available, such as removing boxing that was necessary for passing arguments into, and returning the result out of the routine – this is currently being worked on!

If the MoarVM specializer were flawless, it ought to be able to generate the equivalent of what I coded up by hand. It will not do something as bold as keeping that one array around between rounds and just clearing it at the right moment, as it is currently not able to prove that the array doesn't get stashed away somewhere. But all in all, most of what I did should be achievable with more intelligence in the specializer.

The word "performance ceiling" is still not quite accurate, though. There's still a lot of optimization potential in the JIT compiler, for example. Bart Wiegmans just blogged about a benchmark where recent improvements to MoarVM got us to the point where the code was only 30% slower than an equivalent implementation in C. That was mostly due to the focus of the code being floating point operations, which likely take so long individually that imperfect code-gen is less of a problem.

But this is about the most we can get from the current version of rakudo, unless we find a better algorithm to do what we want.

The Elephant in the RAM

One thing that you've surely noticed is that this program uses about one gigabyte of memory at its highest point (that's what "maxresident" means). Sadly, this is a property of MoarVM's string implementation. In order for grapheme-level access to be fast ("linear time"), we upgrade strings to 32 bits per grapheme if needed, rather than storing strings as utf8 internally. Our strings also support 8 bits per character storage, which I had expected to be used here, but something in the machinery upgrades the string data from 8 bits to 32 bits, even though all character values ought to fit.

In the medium to far future, we'll also get strings that sacrifice linear time access for storage efficiency, but we're not at that point just yet.

Is there something else we could do to get around this? Sure! Instead of saving the string we cut out of the source file with substr and concatenated with join, we could save the start and end value of every piece of string in a native int array. We could implement a Hash subclass or compose in a role that grabs the data from the source file whenever the user asks for it. Native int arrays are much faster to GC than string arrays, and if you instead hold an index into a single giant int array in the hash, you can reduce the pressure on the GC even further!

That's a task for another post, though, as this one rapidly approaches 4k words.

Yet another task would be to write the same program in perl 5, python, and/or ruby. It should be interesting to compare performance characteristics among those. Surely our fastest code is still slower than at least one of these, but having a target in sight could help figure out which parts exactly are slower than they should be.

I personally don't code any ruby or perl 5, so I'd be happy if someone would contribute those implementations!

Parting QWORDS

Thanks for sticking with me for the duration of this huge post, and thanks to raiph for requesting this article. It may have been a bit more work than I had anticipated, but it was fun, and hopefully it is interesting for you readers!

And here's the QWORDS I promised: 0x7fffffffd950, 0x7ffff3a17c88, and 0x7ffff62dd700.

Have a good time, and see you in the next one!
  - Timo


  1. This ought to be a relatively simple fix. Split by \n> instead of >, and handle the very first blob differently, because it now starts with a > still left in it. ↩︎

  2. We're ignoring the possibility of adding comments to the beginning of the file, or anywhere in the file, really. ↩︎

brrt to the future: A Curious Benchmark

Published by Bart Wiegmans on 2018-08-25T15:57:00

Hi hackers! I recently saw a curious benchmark passed round on the #moarvm channel. (I understand that this originates from Ovid during his Future of Perl 5 and 6 presentation, but I haven't seen it myself, so if I'm wrong, don't hesitate to correct me). It's curious because it runs on perl5 and perl6, and because the difference is… significant:

# reciprocal.pl
my $x = 0;
$x += 1/$_ for 1..50_000_000;
print "$x\n";

Perl 5 (5.24.1), on my laptop (Intel i7-4700HQ CPU @ 2.40GHz, 16GB ram), takes approximately 2.7s to print the value 18.3047492382933. Perl 6, on the same script, takes just shy of 2m18s, or nearly 140s. This is 51 times as much. Perl 5 is not known as a particularly fast language, but in this case perl 6 is really very slow indeed.

Let's try and do a little better. As pointed out by timotimo, the benchmark above is a pessimal case for rationals numbers, which perl 6 uses by default. Perl 5 uses floating point throughout. So we can do better by explicitly using floating point calculations in perl6:

# reciprocal.pl6
my $x = 0e0;
$x += 1e0/$_.Num for 1..50_000_000;
say $x;

This takes approximately 30s on my machine, approximately 5 times faster, but still over 11 times slower than perl5. (NB: for all these numbers I'm posting here, I didn't run exhaustive tests and calculate the statistics, but I feel like this is reliable).

We can typically avoid some overhead in Perl 6 if we avoid using scalar containers by means of binding. We can avoid the dynamic lookup of $_ by replacing the for with a while loop. And we can skip the cast from Int to Num by using a Num iterator value. That gives us the following code:

# reciprocal-while.pl6
my $x := 0e0;
my $i := 0e0;
while (($i := $i + 1e0) < 5e7) {
$x := $x + 1e0/$i;
}
say $x;

This reduces the run time to approximately 26.5s. So instead of well over 11 times slower than perl 5, perl 6 is now a little less than 10 times slower.
I tried using native types, but that increased the run time to up to 36s. Native type performance (except for native typed arrays) has so far not met expectations for perl 6. (I understand that this is due to excessive boxing and unboxing, unfortunately). So it seems that I failed to make a perl6 program that performs comparably to perl 5.

And yet…

With all due respect to the perl 5 (and perl 6) developers, I think MoarVM ought to be able to do better. MoarVM has support for native-typed values and operators, the perl interpreter does not. It has a dynamic specializer and JIT compiles to native code, the perl interpreter does not. I should expect MoarVM to do better on this benchmark. (Otherwise, what have I been doing with my life, the last few years?)

Let's try the secret weapon: NQP. NQP stands for Not Quite Perl and it is the language the rakudo compiler is built in. It acts like a 'bootstrap' language and compatibility layer between the various backends of rakudo perl 6, such as MoarVM, Java and Javascript, and in the near future Truffle. I like to think that it relates to MoarVM in the same way that C relates to contemporary CPUs - it is a sort of low level, high level language. Although NQP fully support perl6 classes, regular expressions and grammars, it has no support for ranges or C-style for loops, so it does tend to look a bit primitive:

# reciprocal-boxed.nqp
my $x := 0e0;
my $i := 1e0;
while ($i < 5e7) {
$x := $x + 1e0/$i;
$i := $i + 1e0;
}
nqp::say("$x");

This version uses boxed objects and takes (on my machine) approximately 12.5s. If you recall, perl 5 took 2.7s, so this is just a bit less than 5 times slower than perl 5. Still not very satisfying though.

Let's improve this version a little bit and add native types here. The only difference between this code and the code above it, is that we have explicitly opted to use num values for $x and $i.

# reciprocal-native.nqp
my num $x := 0e0;
my num $i := 1e0;
while ($i < 5e7) {
$x := $x + 1e0/$i;
$i := $i + 1e0;
}
nqp::say("$x");

This code, with JIT enabled, consistently runs in approximately 0.28s on my machine. That is not a typing error. It prints the correct result. I emphatically want you to try this at home: simply save it as reciprocal.nqp and run time nqp reciprocal.nqp. (With JIT disabled, it runs in 2.4s, which is (finally) a bit faster than perl 5 in the interpreter).

Just out of curiosity, I tried comparing this result with the following C code:
 
#include <stdio.h>

int main(int argc, char **argv) {
double x = 0.0;
double i;
for (i = 1.0; i < 5e7; i += 1.0)
x += 1.0/i;
printf("%f", x);
}

On my machine, this takes approximately 0.22s per run, which means that NQP on MoarVM, using native types, is a little more than 1.3x slower than compiled C, excluding the compilation time of the C program. The NQP program does include JIT compilation.

For the record, the C compiled code is much simpler and faster than that generated by the MoarVM JIT - 75 bytes for C 'main' vs 1139 bytes for the NQP 'mainline' (149 bytes for the hot loop). So there is much to improve left, for sure. (EDIT: I originally wrote this in a rather confusing way, but the C code was always the shorter version).

So what does that tell us? Is perl6 50 times slower than perl5, or is NQP just 30% slower than C? There is actually much more to learn from the relative performance of these programs:
After I wrote this (but before publishing), I reran the benchmarks with the new branch postrelease-opts. The results were rather different and encouraging (units in seconds):

testmasterpostrelease-opts
reciprocal.pl14051
reciprocal.pl6316.6
reciprocal-while.pl6264.6
reciprocal-boxed.nqp123.6
reciprocal-native.nqp0.270.25

Congratulations to jnthn++ and timotimo++ for this fantastic result. As I understand it, we expect to merge this branch after the release of MoarVM 2018.08.

As always, benchmarks should be taken with a grain of salt - results do not always generalize cleanly to production code. But I hope this has given some insights into the performance characteristics of perl 6, and I think it highlights some areas of interest.

PS. The postrelease-opts has taken the place of the masterbranch as the mainline for development, as we aim to get master fit for a release. Personally, I'd rather have master open for development and a release branch for cutting a stable release. I expect most users get MoarVM from a specific revision via MOAR_REVISION anyway, so having master be the 'development' branch should cause little harm.

my Timotimo \this: The first public release!

Published by Timo Paulssen on 2018-08-15T14:52:11

Hello esteemed readers, and thank you for checking in on my progress. Not a full month ago I showed off the GC tab in the previous post, titled "Wow check out this garbage". Near the end of that post I wrote this:

I had already started the Routine Overview tab, but it's not quite ready to be shown off. I hope it'll be pretty by the time my next report comes out.

Well, turns out I got a whole lot of work done since then. Not only is the Routine tab pretty, but there's also a Call Graph explorer beside it. This post has the details, and at the end I'll give you the link to the github repository where you can get the code and try it out for yourself!

Routine Tab

Routine Overview

Now, the simplest part of the Routine tab is its overview function. Every routine that has been called while the profiler was recording shows up here. Here's a screenshot so you can understand what I'm referring to:

Selection_036

For comparison, here's the old profiler's Routine Tab

Selection_037

There's actually a lot of differences. Going from the old to the new, the Interp / Spesh / Jit column has disappeared, and with it the OSR badges. Also, the table headers are no longer clickable (to change the sorting order) and there is no filter search field. Both of these features will make a come-back in the new profiler's UI as well, though!

There are also additions, though: There's now a column labelled "Sites", some of the filename + line texts are clickable (they take you directly to github, though opening the files locally in an editor is a feature on my wish-list), and there's a column of mysterious buttons. On top of that, you can now see not only the exclusive / inclusive times for each routine, but also how much time that is when divided by the number of entries.

I wonder what happens when I click one of these!

Expanding

Selection_038

Neat, clicking the button expands an extra section below the routine. It has three tabs: Callees, Paths, and Allocations. Let's go through them one by one.

Callees

Listed here are all routines that got called by the parent routine (in this case ACCEPTS from the Regex source file). They are ordered by inclusive time, as opposed to the outer list which is ordered by exclusive time. [1]

Since there is now a parent/child relationship between the routines, there's also the number of entries per entry in the entries column. That's simply the entries of the child divided by the entries of the parent. This number can tell you how often another routine is mentioned, or what the probability for the child being called by the parent is.

Paths, and what's this about "Sites"?

Selection_039

The next tab you can open in the expanded view is called "Paths". Here you can see a vaguely tree-shaped table with four rows. These rows actually correspond to the Sites column that I have not explained yet. That's because the number of Sites corresponds directly to the number of rows in this table, or the number of leafs in the tree it displays.

The same Routine can behave in different ways depending on where it was called from. Normally, a difference in arguments passed is the main cause of different behaviour, but it's very likely that each unique location in the program will call the Routine the same way every time. Such a "location" is sometimes called a "Call Site", i.e. the site where a call resides. A Site in the profiler's nomenclature refers to one specific path from the outermost routine of the profile to the given Routine. In the screenshot above, all paths to ACCEPTS go through to-json either once, twice, or three times. And every path goes through str-escape.

The names in the table/tree (trable?) are all clickable. I can tell you right now, that they bring you right over to the Call Graph. More on that later, though.

Allocations

Selection_040

This is a small one, at least for ACCEPTS. It has one row per type, and splits the number of objects created for each type into before spesh and after spesh/jit. Coincidentally, the ACCEPTS method is already a good example for having the split: BOOTHash is just a lower-level hash class used in some internals. Notably, BOOTHash is used to pass named arguments to methods. Of course, many method invocations don't actually pass named arguments at all, and many methods don't care about named arguments either. Thankfully, spesh is competent at spotting this situation and removes all traces of the hash ever existing. The Scalar on the other hand seems to be used, so it stays even after spesh optimized the code.

There's also a Sites column here. This lets you spot cases where one or two sites differ strikingly from the others.

Call Graph Tab

Selection_041

Here's the new Call Graph tab. It's similar to the old one with one major omission. The new version of the call graph explorer currently lacks a flame graph (or icicle graph in this case). It will return later, of course.

Until then, there's a few improvements to be enjoyed. One of them is that the breadcrumbs navigation now works reliably, whereas in the previous profiler it tended to lose elements near the beginning or in the middle sometimes. On top of that, your browser's back and forward buttons will work between nodes in the call graph as well as the different tabs!

Something that's completely new is the Allocations section at the bottom of the page, shown in the screenshot below:

Selection_042

Here you can see that the Routine in question (it's the body of this for loop allocates Str and Scalar objects. The Str objects seem to be optimized by spesh again.

There's still two unexplained buttons here, though. The one at the bottom is labelled "Load inclusive allocations". Clicking on it reveals a second table of allocations, which is quite a lot bigger:

Selection_043

This view is about everything allocated by anything in the call graph from the current node downwards (towards the leaves, not towards the root. You know, because trees hang from the ceiling, right?). For something so close to the root in a rather deep call graph, you'll get quite a big list, and it's not very helpful in finding where exactly individual types are being allocated.

That's where the second button comes in. It says "Show allocations for all children" on it, and clicking it expands every row in the table of routines:

Selection_044

This way you can drill down from the root towards nodes that interest you.

What's missing?

Here's a little overview of what I'd like to include in the near and medium future:

Where can I get it?

I just uploaded all the code to my github. You can find it here. To use it, you will have to run npm install in the source folder to get all the javascript dependencies, and npm run build to compile all the javascript bundles. It will run a watcher process that will update as soon as any sources change, but if you're not working on the moarperf source code, you can just kill it after it has done its thing once.

Next, you'll have to install the dependencies of the Perl 6 program that acts as the back-end. You can do that with zef --depsonly install in the same folder as the META6.json. Please note that I haven't prepared the app for actually being installed, so don't do that yet :)

You can then start the backend with perl6 -Ilib service.p6 /path/to/profile.sql, where profile.sql is a profile generated with a commandline like perl6 --profile --profile-filename=profile.sql my_script.p6. Passing the filename to service.p6 on the commandline is optional, you can also enter it in the web frontend. By default, it is reachable on http port 20000 on localhost, but you can set MOARPERF_HOST and MOARPERF_PORT in the environment using the env or export commands of your shell.

A word of warning, though: It still has a bunch of rough edges. In some parts, loading data isn't implemented cleanly yet, which can lead to big red blocks with frowny faces. A refresh and/or doing things more slowly can help in that case. There's places where "there is no data" looks a lot like "something broke". Little usability problems all around.

I would appreciate a bit of feedback either on the github issue tracker, on #perl6 on the freenode IRC server, on reddit if someone's posted this article in /r/perl6, or via mail to the perl6-users mailing list or to timo at this website's domain name.

Thanks again to The Perl Foundation for funding my grant, and to you for reading my little report.

Have fun with the program!
  - Timo


  1. The reasoning behind the different default sorting modes is that in the first step you are likely more interested in routines that are expensive by themselves. Sorting by inclusive time just puts the entry frame first, and then gives you the outermost frames that quite often hardly do any work themselves. When you've found a routine that has a lot of exclusive time, the next step is - at least for me - to look at what routines below that take up the most time in total. That's why I prefer the inner routines list to be sorted by inclusive time. Once the user can change sorting in the UI i'll also offer a way to set the selected sorting as the default, i think. ↩︎

  2. The difference between managed and unmanaged bytes is that the managed bytes are what lands in the actual nursery, whereas unmanaged bytes refers to all kinds of extra data allocated for an Object. For example, an Array, Hash, or String would have a memory buffer somewhere in memory, and a header object pointing to that buffer in the nursery or gen2 memory pools. The managed size doesn't change for objects of the same type, which is why it's fine to put them in the allocations tabs. ↩︎

Zoffix Znet: The 100 Day Plan: The Update on Perl 6.d Preparations

Published on 2018-08-09T00:00:00

Info on how 6.d release prep is going

rakudo.org: Rakudo Star Release 2018.06

Published on 2018-08-06T00:00:00

Zoffix Znet: Introducing: Perl 6 Marketing Assets Web App

Published on 2018-08-05T00:00:00

Get your Perl 6 flyers and brochures

Zoffix Znet: Introducing: Newcomer Guide to Contributing to Core Perl 6

Published on 2018-08-02T00:00:00

Info on the new guide for newcomers

6guts: Redesigning Rakudo’s Scalar

Published by jnthnwrthngtn on 2018-07-26T23:54:45

What’s the most common type your Perl 6 code uses? I’ll bet you that in most programs you write, it’ll be Scalar. That might come as a surprise, because you pretty much never write Scalar in your code. But in:

my $a = 41;
my $b = $a + 1;

Then both $a and $b point to Scalar containers. These in turn hold the Int objects. Contrast it with:

my $a := 42;
my $b := $a + 1;

Where there are no Scalar containers. Assignment in Perl 6 is an operation on a container. Exactly what it does depending on the type of the container. With an Array, for example, it iterates the data source being assigned, and stores each value into the target Array. Assignment is therefore a copying operation, unlike binding which is a referencing operation. Making assignment the shorter thing to type makes it more attractive, and having the more attractive thing decrease the risk of action at a distance is generally a good thing.

Having Scalar be first-class is used in a number of features:

And probably some more that I forgot. It’s powerful. It’s also torture for those of us building Perl 6 implementations and trying to make them run fast. The frustration isn’t so much the immediate cost of the allocating all of those Scalar objects – that of course costs something, but modern GC algorithms can throw away short-lived objects pretty quickly – but also because of the difficulties it introduces for program analysis.

Despite all the nice SSA-based analysis we do, tracking the contents of Scalar containers is currently beyond that. Rather than any kind of reasoning to prove properties about what a Scalar holds, we instead handle it through statistics, guards, and deoptimization at the point that we fetch a value from a Scalar. This still lets us do quite a lot, but it’s certainly not ideal. Guards are cheap, but not free.

Looking ahead

Over the course of my current grant from The Perl Foundation, I’ve been working out a roadmap for doing better with optimization in the presence of Scalar containers. Their presence is one of the major differences between full Perl 6 and the restricted NQP (Not Quite Perl), and plays a notable part in the performance difference between the two.

I’ve taken the first big step towards improving this situation by significantly re-working the way Scalar containers are handled. I’ll talk about that in this post, but first I’d like to provide an idea of the overall direction.

In the early days of MoarVM, when we didn’t have specialization or compilation to machine code, it made sense to do various bits of special-casing of Scalar. As part of that, we wrote code handling common container operations in C. We’ve by now reached a point where the C code that used to be a nice win is preventing us from performing the analyses we need in order to do better optimizations. At the end of the day, a Scalar container is just a normal object with an attribute $!value that holds its value. Making all operations dealing with Scalar container really be nothing more than some attribute lookups and binds would allow us to solve the problem in terms of more general analyses, which stand to benefit many other cases where programs use short-lived objects.

The significant new piece of analysis we’ll want to do is escape analysis, which tells us which objects have a lifetime bounded to the current routine. We understand “current routine” to incorporate those that we have inlined.

If we know that an object’s usage lies entirely within the current routine, we can then perform an optimization known as scalar replacement, which funnily enough has nothing much to do with Scalar in the Perl 6 sense, even if it solves the problems we’re aiming to solve with Scalar! The idea is that we allocate a local variable inside of the current frame for each attribute of the object. This means that we can then analyze them like we analyze other local variables, subject them to SSA, and so forth. This for one gets rid of the allocation of the object, but also lets us replace attribute lookups and binds with a level of indirection less. It will also let us reason about the contents of the once-attributes, so that we can eliminate guards that we previously inserted because we only had statistics, not proofs.

So, that’s the direction of travel, but first, Scalar and various operations around it needed to change.

Data structure redesign

Prior to my recent work, a Scalar looked something like:

class Scalar {
    has $!value;        # The value in the Scalar
    has $!descriptor;   # rw-ness, type constraint, name
    has $!whence;       # Auto-vivification closure
}

The $!descriptor held the static information about the Scalar container, so we didn’t have to hold it in every Scalar (we usually have many instances of the same “variable” over a programs lifetime).

The $!whence was used when we wanted to do some kind of auto-vivification. The closure attached to it was invoked when the Scalar was assigned to, and then cleared afterwards. In an array, for example, the callback would bind the Scalar into the array storage, so that element – if assigned to – would start to exist in the array. There are various other forms of auto-vivification, but they all work in roughly the same way.

This works, but closures aren’t so easy for the optimizer to deal with (in short, a closure has to have an outer frame to point to, and so we can’t inline a frame that takes a closure). Probably some day we’ll find a clever solution to that, but since auto-vivification is an internal mechanism, we may as well make it one that we can see a path to making efficient in the near term future.

So, I set about considering alternatives. I realized that I wanted to replace the $!whence closure with some kind of object. Different types of object would do different kinds of vivification. This would work very well with the new spesh plugin mechanism, where we can build up a set of guards on objects. It also will work very well when we get escape analysis in place, since we can then potentially remove those guards after performing scalar replacement. Thus after inlining, we might be able to remove the “what kind of vivification does this assignment cause” checking too.

So this seemed workable, but then I also realized that it would be possible to make Scalar smaller by:

This not only makes Scalar smaller, but it means that we can use a single guard check to indicate the course of action we should take with the container: a normal assignment, or a vivification.

The net result: vivification closures go away giving more possibility to inline, assignment gets easier to specialize, and we get a memory saving on every Scalar container. Nice!

C you later

For this to be really worth it from an optimization perspective, I needed to eliminate various bits of C special-case code around Scalar and replace it with standard MoarVM ops. This implicated:

The first 3 became calls to code registered to perform the operations, using the 6model container API. The second two cases were handled by replacing the calls to C extops with desugars, which is a mechanism that takes something that is used as an nqp::op and rewrites it, as it is compiled, into a more interesting AST, which is then in turn compiled. Happily, this meant I could make all of the changes I needed to without having to go and do a refactor across the CORE.setting. That was nice.

So, now those operations were compiled into bytecode operations instead of ops that were really just calls to C code. Everything was far more explicit. Good! Alas, the downside is that the code we generate gets larger in size.

Optimization with spesh plugins

talked about specializer plugins in a recent post, where I used them to greatly speed up various forms of method dispatch. However, they are also applicable to optimizing operations on Scalar containers.

The change to decontainerizing return values was especially bad at making the code larger, since it had to do quite a few checks. However, with a spesh plugin, we could just emit a use of the plugin, followed by calling whatever the plugin produces.

Here’s a slightly simplified version of the the plugin I wrote, annotated with some comments about what it is doing. The key thing to remember about a spesh plugin is that it is not doing an operation, but rather it’s setting up a set of conditions under which a particular implementation of the operation applies, and then returning that implementation.

nqp::speshreg('perl6', 'decontrv', sub ($rv) {
    # Guard against the type being returned; if it's a Scalar then that
    # is what we guard against here (nqp::what would normally look at
    # the type inside such a container; nqp::what_nd does not do that).
    nqp::speshguardtype($rv, nqp::what_nd($rv));

    # Check if it's an instance of a container.
    if nqp::isconcrete_nd($rv) && nqp::iscont($rv) {
        # Guard that it's concrete, so this plugin result only applies
        # for container instances, not the Scalar type object.
        nqp::speshguardconcrete($rv);

        # If it's a Scalar container then we can optimize further.
        if nqp::eqaddr(nqp::what_nd($rv), Scalar) {
            # Grab the descriptor.
            my $desc := nqp::speshguardgetattr($rv, Scalar, '$!descriptor');
            if nqp::isconcrete($desc) {
                # Has a descriptor, so `rw`. Guard on type of value. If it's
                # Iterable, re-containerize. If not, just decont.
                nqp::speshguardconcrete($desc);
                my $value := nqp::speshguardgetattr($rv, Scalar, '$!value');
                nqp::speshguardtype($value, nqp::what_nd($value));
                return nqp::istype($value, $Iterable) ?? &recont !! &decont;
            }
            else {
                # No descriptor, so it's already readonly. Return as is.
                nqp::speshguardtypeobj($desc);
                return &identity;
            }
        }

        # Otherwise, full slow-path decont.
        return &decontrv;
    }
    else {
        # No decontainerization to do, so just produce identity.
        return &identity;
    }
});

Where &identity is the identity function, &decont removes the value from its container, &recont wraps the value in a new container (so an Iterable in a Scalar stays as a single item), and &decontrv is the slow-path for cases that we do not know how to optimize.

The same principle is also used for assignment, however there are more cases to analyze there. They include:

Vivifying hash assignments are not yet optimized by the spesh plugin, but will be in the near future.

The code selected by the plugin is then executed to perform the operation. In most cases, there will only be a single specialization selected. In that case, the optimizer will inline that specialization result, meaning that the code after optimization is just doing the required set of steps needed to do the work.

Next steps

Most immediately, a change to such a foundational part of the the Rakudo Perl 6 implementation has had some fallout. I’m most of the way through dealing with the feedback from toaster (which runs all the ecosystem module tests), being left with a single issue directly related to this work to get to the bottom of. Beyond that, I need to spend some time re-tuning array and hash access to better work with these changes.

Then will come the step that this change was largely in aid of: implementing escape analysis and scalar replacement, which for much Perl 6 code will hopefully give a quite notable performance improvement.

This brings me to the end of my current 200 hours on my Perl 6 Performance and Reliability Grant. Soon I will submit a report to The Perl Foundation, along with an application to continue this work. So, all being well, there will be more to share soon. In the meantime, I’m off to enjoy a week’s much needed vacation.

6guts: Dynamic lookups and context introspection with inlining

Published by jnthnwrthngtn on 2018-07-22T23:27:41

Inlining is one of the most important optimizations that MoarVM performs. Inlining lets us replace a call to some BlockSub, or Method with the code that is inside of it. The most immediate benefit is to eliminate the overhead of calling, but that’s just the start. Inlined code has often already been specialized for a certain set of argument types. If we already have proven those argument types in the caller, then there’s no need to re-check them. Inlining can also expose pairs of operations that can cancel, such as box/unbox, and bring the point a control exception is thrown into the some body of code where it is caught, which may allow the exception throw to be rewritten to a far cheaper goto.

In a language like Perl 6, where every operator is a call to a multiple dispatch subroutine, inlining can be a significant win. In the best cases, inlining can lead to smaller code, because the thing that is inlined ends up being smaller than the bytecode for the call sequence. Of course, often it leads to bigger code, and so there’s limits to how much of it we really want to do. But still, we’ve been gradually pushing on with increasing the range of things that we’re able to inline.

The problem with inlining is that the very call boundaries it does away with may carry semantic significance for the program. In this post, I’ll talk about a couple of operations that became problematic as we ramped up our inlining capabilities, and discuss a new abstraction I recently added to MoarVM – the frame walker – which provides a common foundation for solving the problem.

A little inlining history

Inlining first showed up in MoarVM back in 2014, not too many months after the type-specializing optimizer was added. MoarVM has done speculative optimizations from the start, performing deoptimization (falling back to the interpreter) in the case that an unexpected situation shows up. But what if we had to deoptimize in code that had been inlined? Then we’d have to pretend we never did the inlines! Therefore, MoarVM can uninline too – that is, untangle the results of inlining and produce a call stack as if we’d been running the unoptimized code all along.

MoarVM has also from the start supported nested inlines – that is, inlining things that themselves contained inlines. However, the initial implementation of inlining was restricted in what it could handle. The first implementation could not inline anything with exception handlers, although that was supported within a couple of months. It also could not inline closures. Only subs in the outermost scope or from the CORE.setting, along with simple method calls, were possible to inline, because those were the only cases where we had enough information about what was being called, which is a decided prerequisite for inlining it.

Aside from bug fixes, things stayed the same until 2017. The focus in that time largely switched away from performance and towards the Perl 6.c release. Summer of 2017 brought some very large changes to how dynamic optimization worked in MoarVM, moving optimization to a background thread, along with changing and extending the statistics that were collected. A new kind of call optimization became possible, whereby if we could not prove what we were going to call, but the statistics showed a pattern, then we could insert a guard and speculatively optimize the call. Speculative inlining fell neatly out of that. Suddenly, a bunch more things could be considered for inlining.

Further work lifted some of the inlining restrictions. Deoptimization learned how to cope if we deoptimized in the middle of processing named arguments, so we could optimize code where that situation occurred. It became possible to inline many closures, by rewriting the lexical lookup operations into an indirection through the code object of the code that we had inlined. It also became possible to inline code involving lexical throws of exceptions and their handlers. Since that is how return works in Perl 6, that again made quite a few more things possible to inline. A more fine-grained analysis allowed us to do some amount of cross-language inlining, meaning bits of the Rakudo internals written in NQP could be inlined into the Perl 6 code calling them, including closure cloning. I’ll add at this point that while it’s easy to write a list of these improvements now, realizing various of them was quite challenging.

Now it’s summer 2018, and my work has delivered some more advances. Previously, we would only do an inlining if we already had produced a specialized version of the callee. This usually worked out, and we sorted by maximum call stack depth and specialized deepest first to help with that. However, sometimes that was not enough, and we missed inlining opportunities. So, during the last month, I added support for producing code to inline on-demand. I also observed that we were only properly doing speculative (that is, based on statistics) inlines of calls made that were expected to return an object, but not those in void context. (If that sounds like an odd oversight, it’s because void calls were previously rare. It was only during the last month, when I improved code-gen to spot a lot more opportunities to emit void context calls, that we got a lot more of them and I spotted the problem.)

More is better, no?

Being able to inline a wider range of calls is a good thing. However, it also made it far more likely that we would run into constructs that don’t cope well with inlining. We’ve got a long way by marking ops that we know won’t cope well with it as :noinline (and then gradually liberalizing that over time where it was beneficial). The improvements over the previous month created a more difficult problem, however. We have a number of ops that allow for introspection and walking of the call stack. These are used to implement Perl 6 features such as the CALLER:: pseudo-package. However, they are also the way that $/ can be set by things like match.

Marking the ctx op as :noinline got us a long way. However, we ran into trouble because once a context handle has been obtained, one could then start traversing from it to callers or outers, and then starting some kind of lookup lookup relative to that point. But what if the caller was an inline? Then we don’t have a callframe to reference in the context object that we return.

A further problem was that a non-introspection form of dynamic lookup, which traverses the lexical chain hanging off each step of the dynamic chain, also was not aware of inlines. In theory, this would have become a problem when we started doing inlining of closures last year. However, since it is used in a tiny number of places, and those places didn’t permit inlining, we didn’t notice until this month, when inlining started to cover more cases.

Normal dynamic lookup, used for $*foo style variables, has been inline-aware for about as long as we’ve had inlining. However, this greatly complicated the lookup code. Replicating such inline compensation code in a bunch of places was clearly a bad idea. It’s a tricky problem, since we’re effectively trying to model a callstack that doesn’t really exist by using information telling us what it would look like if it did. It’s the same problem that deoptimization has to solve, except this time we’re just imagining what the call stack would look like unoptimized, not actually trying to recreate it. It’s certainly not a problem we want solved repeatedly around MoarVM’s implementation.

A new frame walker

To help tackle all of these problems, I introduced a new abstraction: the specialization-aware frame walker. It provides an iterator over the call stack as if no inlining had taken place, figuring out as much as it needs to in order to recreate the information that a particular operation wants.

First, I used it to make caller-dynamic lookup inline-aware. That went pretty well, and immediately fixed one of the module regressions that was “caused” by the recent inlining improvements.

Next, I used it to refactor the normal dynamic lookup. That needed careful work teasing out the details of dynamic lookup and caching of dynamic variable lookups from the inlining traversal. However, the end result was far simpler code, with much less duplication, since the JITted inline, interpreted inline, and non-inline paths largely collapsed and were handled by the frame walker.

Next up, contexts. Alas, this would be trickier.

Embracing laziness

Previously, context traversal had worked eagerly. When we asked for a context object representing the caller or outer of a particular context we already had, we immediately walked one frame in the appropriate direction and produced a result. Of course, this did not go well if there were inlines, since it always walked one real frame, but that may have multiple inlined frames within it.

One possibility was to make the ctx op immediately deoptimize the whole call stack, so that we then had a chain of real call frames to traverse. However, when I looked at some of the places using ctx, it became clear this would have some very negative performance consequences: every regex match causing a global deopt was not going to work!

Another option, that would largely preserve the existing design, was to store extra information about the inline we were inside of at the time we walked to a caller. This, however, had the weakness that we might do a deoptimization between the two points, thus invalidating the information. That was probably also possible to fix up, but the complexity of doing so put me off that approach.

Instead, I switched to a model where “move to caller” and “move to outer” would be stored as displacements to apply when the context object was used in order to obtain information. The frame walker could make these movements, before doing whatever lookup was required. Thus, even if a deoptimization were to take place between obtaining the context and using it, we could still do a correct traversal.

Too much laziness

This helped, but wasn’t quite enough either. If a context handle was taken and used immediately, things worked out fine. However, if the situation was like this:

+-------------------------+
| Frame we used ctx op on | (Frame 1)
+-------------------------+
             |
           Caller
             |
             v
+-------------------------+
|    Frame with inlines   | (Frame 2)
+-------------------------+

And then we used the handle some time later, things worked out less well. The problem was that I used the current return address of Frame 2 in order to understand which inline(s) we were inside of. However, if we’d executed more code in Frame 2, then it could have made another call. The current return address could thus point to the wrong inline. Oops.

However, since the ctx operation at the start of the lookup is never inlined, and a given call frame can only ever be called from one location in the caller, there’s a solution. If the ctx op is used to get a first-class reference to a frame on the call stack, we walk down the call stack and make sure that each frame called from inlined code preserves enough location information that we can later reconstruct what we need. It only needs to walk down the call stack until it sees a point where another ctx operation already preserved that information, so in programs with lots of use of the ctx op, we can avoid doing full stack walks each time, and just walk over recently created frames.

In closing

With those changes, the various Perl 6 modules exhibiting lookup problems since this month’s introduction of more aggressive inlining were fixed. Along the way, a common mechanism was introduced allowing us to walk a call stack as if no inlines had taken place. There’s at least one more place that we can use this: in order to make stack trace output not be sensitive to inlining. I’ll get to that in the coming weeks, or it might make a nice task for somebody looking to get themselves (more) involved with MoarVM development. Last but not least, I’d like to once again thank The Perl Foundation for organizing the funding that made this work possible.

brrt to the future: Perl 6 on MoarVM has had a JIT for a few years

Published by Bart Wiegmans on 2018-07-05T15:32:00

Dear readers, I recently came across the following comment on the internet (via Perl 6 weekly):

perl6 on MoarVM is starting to grow a JIT

 Which is very interesting because:
  1. Someone had at least heard of our efforts, so that's a win.
  2. But they had left with the impression that it was still in a beginning phase.
Clearly, we have some PR work to do. Consider this my attempt.

When people are talking about a 'JIT' colloquially and especially in the context of dynamic languages, they usually refer to a system that has both a dynamic specialization functionality, as well as a machine-code emitting functionality. MoarVM has had support for both since 2014. Historically we've called the specializer 'spesh' and the machine code emitter 'jit'. Maybe for communicating with the rest of the world it is better to call both 'the JIT' frontend and backend, respectively.

Without further ado, let me list the history of the MoarVM JIT compiler:
  1. The April 2014 release of MoarVM introduced the dynamic specialization framework spesh (the 'frontend').
  2. In June 2014, I started working in earnest on the machine code emitter (the part that we call the JIT), and halfway through that month had compiled the first function.
  3. In July 2014 we saw the introduction of inlining and on-stack-replacement in spesh.
  4. Also July 2014 saw the introduction of the invokish mechanism by which control flow can be returned to the interpreter.
  5. In August 2014 the JIT backend was completed and merged into MoarVM master.
  6. In June 2015, I started work on a new JIT compiler backend, first by patching the assembler library we use (DynASM), then proceeded with a new intermediate representation and instruction selection.
  7. After that progress slowed until in March 2017 I completed the register allocator. (I guess register allocation was harder than I thought it would be). A little later, the register allocator would support call argument placement
  8. The new 'expression' JIT backend was merged in August 2017. Since then, many contributors have stepped up to develop expression JIT templates (that are used to generate machine code for MoarVM opcodes).
  9. In August 2017, Jonathan started working on reorganizing spesh, starting with moving specialization to a separate thread, central optimization planning, improving optimizations and installing argument guards (which makes the process of selecting the correct specialized variant more efficient).
  10. Somewhere after this - I'm not exactly sure when - nine implemented inlinig the 'NativeCall' foreign function interface into JIT compiled code.
  11. Most recently Jonathan has started work to write specializer plugins - ways for rakudo to inform MoarVM on how to cache the result of method lookups, which should help increase the effectiveness of optimization for Perl 6.
  12. Arround the same time I reworked the way that the interpreter handles control flow changes for  JIT compiled code (e.g. in exception handling).
We (mostly Jonathan and I) have also given presentations on the progress of the JIT compiler:
  1. At YAPC::EU 2014 and FOSDEM 2015, Jonathan gave a presentation on the MoarVM dynamic specialization system.
  2. At YAPC::EU 2015 (Granada) I also gave a presentation. Sadly I can no longer find the presentation online or offline.
  3. At the Swiss Perl Workshop in 2017 Jonathan gave a presentation on deoptimization  and how it is used for supporting speculative optimizations.
  4. At The Perl Conference in Amsterdam (2017) I gave a presentation on how to implement JIT expression templates in MoarVM.
I'm sure there is more out there that I haven't yet linked to. And there's still a lot of exciting work going on and still more work to do. However, I hope that after this post there can be no doubt about the reality of a Perl6 implementation backed by a specializing JIT compiler ;-)


PS: You might read this and be reasonably surprised that Rakudo Perl 6 is not, after all this, very fast yet. I have a - not entirely serious - explanation for that:
  1. All problems in computer science can be solved with a layer of indirection.
  2. Many layers of indirection make programs slow.
  3. Perl 6 solves many computer science problems for you ;-) 
In the future, we'll continue to solve those problems, just faster.

PPS: Also, it is important to note that many of practical speed improvements that Rakudo users have come to enjoy did not come from VM improvements per se, but from better use of the VM by core library routines, for which many volunteers are responsible.

    brrt to the future: Controlled Stack Hacking for the MoarVM JIT Compiler

    Published by Bart Wiegmans on 2018-06-10T16:29:00

    Hi readers! Today I have a story about a recently-merged set of patches that allows MoarVM to use the on-stack return pointer to reduce the overhead of exception handling and other VM features for JIT compiled code. Maybe you'll find it interesting.

    As you might know, MoarVM Is a bytecode interpreter. In some situations, MoarVM internals need to know the current position in the execution of the program. For instance in exception handling, all exception thrown within a block are caught by the associated CATCH block or propagated if no such block exists. Such blocks are indicated as a range within the bytecode, and we find the associated CATCH block by comparing the current position with the known ranges.

    This is relatively straightforward to implement for the interpreter, because the interpreter must maintain a 'current position' pointer simply to function. (MoarVM stores a pointer to this pointer in a thread context object so that it is available throughout the VM). For the JIT that is another matter, because the control flow is handled implicitly by the CPU. The instruction pointer register (called %rip on amd64) cannot be read directly. Moreover, as soon as you enter a function that might want to use the current address (like the functions responsible for exception handling), you've left the 'program' code and entered VM code.

    So what we used to do instead is take the address of a position within the bytecode (as indicated by a label in the bytecode, a somewhat involved process) and store that in a per-frame field called the jit_entry_label. This field is necessary to support another MoarVM feature  - we use the interpreter as a trampoline (in the first or second sense of that definition). Because the interpreter is not recursive, JIT compiled code needs to return to the interpreter to execute a subroutine that was invoked (as opposed to calling an interpreter function, as perl5 does for exception handling). The primary purpose of this label is to continue where we left off after returning from another invoked program. But it can be used just as well for finding where we are in the execution of the program.

    Only problem then is that we need to keep it up to date, which we did. On the entry of every basic block (uninterrupted sequence of code), we stored the current position in this field. This is quite common - every conditional statement, loop or other control flow change needs one, as well as every exception-handler scope change needed a little snippet storing the current position. This was annoying.

    Furthermore, there are numerous MoarVM instructions that might change the control flow (or might not). For instance, the instruction responsible for converting an object to a boolean value might need to invoke the Bool method specific to that objects' class - or, if no such method exists, fallback to a default implementation. We call such instructions invokish. When compiling code that contains such invokish instructions, we installed 'control guards' to check if the VM had in fact invoked another routine, and if so, to return to the interpreter to execute that routine. This too added quite a bit of overhead.

    I keep writing in the past tense because all of that is now gone, and that happened due to a simple realization. When we call a function (in C or assembly), we place the return address (the machine instruction after the call instruction) on the stack. We can read this value from the stack and use it wherever we want to know about the current position.

    I initially had implemented that using a stack walker function similar to the one in the link, except that I implemented it in assembly instead. (When writing this post I learned of the GCC __builtin_return_address and MSVC _ReturnAddress intrinsic functions, which presumably do the same thing). Unfortunately, that strategy didn't really work - it relies on the frame base pointer (%rbp) being placed right 'on top' of the return address pointer on the stack. Even with special compiler flags intended to preserve that behaviour, this assumption turned out to be unreliable.

    Fortunately I realized later that it was also unnecessary. Because the JIT compiler controls the layout of the compiled code frame, it also controls exactly where the return address will be stored when we compile a (C) function call. That means that we can simply take a pointer to this address and store that in the thread context structure. From that address, we can read exactly the current position in the compiled code, without having to explicitly store it so often. Furthermore, we can also write to this location, changing the address the function will return to. Effectively, this is a controlled 'on-stack goto', an idiom more often used for exploits than for good purposes - clearly this is an exception! We use this to force a return to the interpreter (with proper stack frame cleanup) for 'invokish' instructions that end up invoking. We can change control to go directly to an exception handler if it is in the same frame. This makes all the earlier control 'guard' fragments redundant, allowing us to remove them entirely. Thus, an invokish instruction that doesn't actually invoke now carries no extra cost.

    How much does this save? It depends a lot on the exact program, but I estimate about 5% of compiled code size, and from a hopelessly optimal (and fairly unrealistic) benchmark which I lifted from this blog post, approximately 10% of runtime. In real code, the effect is definitely nowhere near what jnthn++ or samcv++ achieved lately, but it's still nice. Also nice is that the code is quite a bit simpler than it was before.

    Anyway, that's all I have to tell today. Have fun hacking, and until next time!

    samcv: Secure Hashing for MoarVM to Prevent DOS Attacks

    Published on 2018-05-16T07:00:00

    Hashes are very useful data structures and underlie many internal representations in Perl 6 as well as being used as themselves. These data structures are very nice since they offer O(1) insertion time and O(1) lookup time on average. Hashes have long been considered an essential feature for Perl, much loved by users. Though when exploited, hashes can cause servers to grind to a halt. New in Rakudo Perl 6 2018.5 will be a feature called hash randomization which does much to help protect against this attack. In this article I explain some hashing basics as well as how the attack against non-randomized hashing can work.

    Table of Contents

    Hashing Basics

    Some hashing basics: when we use a hash, we take a string and come up with a unique integer to represent the string. Similar to how md5 or sha1 sums take an arbitrary amount of data and condense it into a shorter number which can identify it, we do a similar thing for strings.

    my %hash; %hash<foo> = 10

    In this code, MoarVM takes the string foo and performs a hashing function on it using a series of bitwise operations. The goal is to create a shorter number which allows us to put the foo key into one of the 8 buckets that MoarVM initializes when a hash is created.

    8 Hash buckets

    Our hashing code sets up a predefined number of buckets . When a bucket fills up to have 10 items it doubles the number of buckets. In normal operation the hashes will be randomly distributed, so it would take ≈47 keys added (≈47 is the average number of items to result in one bucket being filled to 10 items) before we have to expand the buckets the first time.

    When the buckets are expanded, we will now have 16 buckets. In normal operation our previous ≈47 items should be evenly distributed into those 16 buckets.

    The Attack

    Without a random hash seed it is easy for an attacker to generate strings which will result in the same hash. This devolves to O(n️2) time for the hash lookup. This O(n2) is actually O(string_length * num_collisions). When we have hash collisions, that means that no matter how many times we double the number of buckets we have, the strings which have hash collisions will always remain in the same bucket as each other. To locate the correct string, MoarVM must go down the chain and compare each hash value with the one we’re looking for. Since they are all the same, we must fall back to also checking each string itself manually until we find the correct string in that bucket.

    Hash collision

    This attack is done by creating a function that essentially is our hashing function backward (for those curious see here for an example of code which does forward and backward hashing for Chrome V8 engine’s former hashing function). We hash our target string, t. We then use random 3 character sequences (in our case graphemes) and plug them into our backward hashing function along with the hash for our target t. The backward hash and the random character sequence are stored in the dictionary and the process is repeated until we have a very large number of backward hash’s and random 3 grapheme prefixes.

    We can then use this dictionary to construct successively longer strings (or short if we so desire) which are the same hash as our target string t. This is a simplification of how the Meet-In-The-Middle attack works.

    This has been fixed in most programming languages (Python, Ruby, Perl), and several CVE’s have been issued over the years for this exploit (See CVE’s for PHP, OCaml, Perl, Ruby and Python).

    Assuming everything is fine for the next release I will also merge changes which introduce a stronger hashing function called SipHash. SipHash is meant to protect against an attacker discovering a hash secret remotely. While randomizing the seed makes this attack much harder, a determined attacker could discover the hash and if that is done they can easily perform a meet in the middle attack. SipHash was designed to solve the vulnerability of the hash function itself to meet-in-the-middle attacks. Both the randomization of the hash secret in addition with a non-vulnerable hashing function work work together to avert hash collision denial of service attacks.

    While the hash secret randomization will be out in Rakudo 2018.05, SipHash is planned to be introduced in Rakudo 2018.06.

    Randomness Source

    On Linux and Unix we prefer function calls rather than reading from /dev/urandom. There are some very important reasons for this.

    Relying on an external file existing is potentially problematic. If we are in a chroot and /dev is not mounted we will not have access to /dev/urandom. /dev/urandom is not special, it can be deleted by accident (or on purpose) or a sparse data file mounted in its place undetectable by programs. Trusting it simply because of its path is not ideal. Also, if we exhaust the maximum number of open file descriptors we will be unable to open /dev/urandom as well.

    System Functions

    On Windows we use the pCryptGenRandom which is provided by advapi32.dll since Windows XP.

    Linux, FreeBSD, OpenBSD and MacOS all use system provided random calls (if available) to get the data rather than having to open /dev/urandom. All these OS’s guarantee these calls to be non-blocking, though MacOS’s documentation does not comment on it. This is mostly important in very early userspace, which bit Python when a developer accidentally changed the randomness source causing systems which relied on very early Python scripts to stop booting due to waiting for randomness source to initialize.

    If the function doesn’t exist we fall back to using /dev/urandom. If opening or reading it fails, on BSD’s we will use the arc4random() function. In many BSD’s this is seeded from the system’s random entropy pool, providing us with a back up in case /dev/urandom doesn’t exist.

    On Linux we use the getrandom() system call which was added to kernel 3.17 instead of using the glibc wrapper since the glibc wrapper was added much later than to the kernel.

    On MacOS, Solaris and FreeBSD we use getrandom() while on OpenBSD we use getentropy().

    User Facing Changes

    From Rakudo Perl 6 2018.05, the order that keys are returned will be random between each perl6 instance.

    perl6 -e 'my %hash = <a 1 b 1 c 1 d 1 e 1 f 1>; say %hash.keys'
    (d f c a b e)
    perl6 -e 'my %hash = <a 1 b 1 c 1 d 1 e 1 f 1>; say %hash.keys'
    (e f a d c b)

    This will also effect iterating a hash without sorting: for %hash { }

    What Do I Have To Do?

    Users and module developers should make sure that they explicitly sort hashes and not rely on a specific order being constant. If you have a module, take a look at the code and see where you are iterating on a hash’s keys and whether or not the order of processing the hash’s keys affects the output of the program.

    # This should be okay since we are putting the hash into another hash, order
    # does not matter.
    for %hash.keys -> $key {
        %stuff{$key} = $i++;
    }
    # This can potentially cause issues, depending on where `@stuff` is used.
    for %hash.keys -> $key {
        @stuff.push: $key;
    }
    # This should be OK since we are using is-deeply and comparing a hash with another
    # hash
    is-deeply my-cool-hash-returning-function($input), %( foo => 'text', bar => 'text', baz => 'text');
    # Probably best to avoid using `is`. The `is` test function converts the input to a string before
    # checking for equality, but works since we stringify it in sorted order.
    is %hash,  %( foo => 'text', bar => 'text', baz => 'text');
    # NO. Keys are not guaranteed to be in the same order on each invocation
    is %hash.keys, <a b c d>;

    Module Developers

    Module developers should check out the git master of Rakudo, or if 2018.05 has been released, use that to run the tests of your module. Make sure to run the tests multiple times, ideally at least 10 times or use a loop:

    while prove -e 'perl6 -Ilib'; do true; done

    This loop will run again and again until it encounters a test failure, in which case it will stop.

    You must run your tests many times because the hash order will be different on each run. For hashes will a small number of items, it may not fail on every run. Make sure that you also look at the source to identify items that need fixing; don’t just rely on the test’s to tell you if you must make changes to your module.

    Further Reading

    Hardening Perl’s Hash Function, article by Booking.com about changes Perl 5 has made to harden hashing.

    gfldex: Deconstructing Simple Grammars

    Published by gfldex on 2018-05-10T14:51:34

    Last year I wrote an egg timer that was parsing command line arguments similar to GNU sleep. I was happy with the stringent form of the parser as follows.

    my Seconds $to-wait = @timicles»\
        .split(/<number>/, :v)\
        .map(-> [$,Rat(Any) $count, Str(Any) $unit] --> Seconds { %unit-multipliers{$unit} * $count })\
        .sum;

    It does a few simple things and does them one after another. A grammar with an action class would be overkill. I wasn’t happy with using splits ability to return the needle with the parts. It certainly does not improve readability.

    After quite a few iterations (and stepping on a bug), I came up with a way to use Str.match instead. If I convert each Match-object into a Hash I can use deconstruction in a signature of a pointy block.

    my Seconds $to-wait = @timicles»\
        .match(/<number> <suffix>+/)».hash\ # the +-quatifier is a workaround
        .map(-> % ( Rat(Any) :$number, Str(Any) :$suffix ) { %unit-multipliers{$suffix} * $number })\
        .sum;

    Instead of using positionals I can use named arguments that correspond to the named regexes inside the match arguments.

    Even in such a small pice of code things fall into place. Hyper-method-calls get rid of simple loops. The well crafted buildin types allow signature deconstruction to actually work without loads of temporary variables. It’s almost as certain language designers where aiming to make a most elegant language.

    rakudo.org: Rakudo Star Release 2018.04

    Published on 2018-05-07T00:00:00

    rakudo.org: Rakudo Star Release 2018.01

    Published on 2018-01-29T00:00:00

    gfldex: Expensive Egg-Timers

    Published by gfldex on 2017-12-31T13:28:01

    If you use a CLI you might have done something along the line.

    sleep 1m 30s; do-the-next-thing

    I have a script called OK that will display a short text in a hopeful green and morse code O-K via the PC speaker. By doing so I turn my computer into an expensive egg-timer.

    As of late I found myself waiting for longer periods of time and was missing a count-down so I could estimate how much more time I can waste playing computer games. The result is a program called count-down.

    Since I wanted to mimic the behaviour of sleep as closely as possible I had a peek into its source-code. That made me realise how lucky I am to be allowed to use Perl 6. If I strip all the extra bits a count-down needs I’m at 33 lines of code compared to 154 lines of GNU sleep. The boilerplate I have is mostly for readability. Like defining a subset called Seconds and a Rexex called number.

    Errors in the arguments to the script will be cought by the where clause in MAINs signature. Since there are no further multi candidates for MAIN that might interfere, the usage message will be displayed automatically if arguments are not recognized. Pretty much all lines in the C implementation deal with argument handling and the fact that they can’t trust their arguments until the last bit of handling is done. With a proper signature a Perl 6 Routine can fully trust its arguments and no further error handling is needed. Compared to the C version (that does a lot less) the code can be read linear from top to bottom and is much more expressive. After changing a few identifiers I didn’t feel the need for comments anymore. Even some unclear code like the splitting on numbers and keeping the values, becomes clear in the next lines where I sum up a list of seconds.

    Now I can comfortably count down the rest of a year that was made much better by a made better Perl 6. I wish you all a happy 2018.

    Perl 6 Advent Calendar: Bonus Xmas – Concurrent HTTP Server implementation and the scripter’s approach

    Published by ramiroencinas on 2017-12-25T01:52:24

    First of all, I want to highlight Jonathan Worthington‘s work with Rakudo Perl6 and IO::Socket::Async. Thanks Jon!

    ***

    I like to make scripts; write well-organized sequences of actions, get results and do things with them.

    When I began with Perl6 I discovered a spectacular ecosystem, where I could put my ideas into practice in the way that I like: script manner. One of these ideas was to implement a small HTTP server to play with it. Looking at other projects and modules related to Perl6, HTTP and sockets I discovered that the authors behind were programmers with a great experience with Object-Oriented programming.

    Perl6 paradigms

    Perl6 supports the three most popular programming paradigms:

    I think that the Object-Oriented paradigm is fine when you design an application or service that will grow, will do many and varied things and will have many changes. But I don’t like things that grow too much and will have many changes; that’s why I like scripts, for its native procedural approach, because it promote simplicity and effectiveness quickly. I like small (step by step) things that do great things quickly.

    The Functional paradigm is awesome in my opinion; you can take a function and use it like a var, among other amazings things.

    Perl6 Supplies are like a V12 engine

    When I started with Perl6 shortly after I started the translation of perl6intro.com to Spanish language. Looking at the documentation of Perl6 I discovered the great concurrent potential that Perl6 has. The concurrent aspect of Perl6 was more powerful than I thought.

    The idea I had of the HTTP server with Perl6 began with the Perl6 Supplies (Asynchronous data stream with multiple subscribers), specifically with the class IO::Socket::Async. All socket management, data transmission and concurrency is practically automatic and easy to understand. It was perfect for making and play with a small concurrent but powerful service.

    Based on the examples of the IO::Socket::Async documentation I started to implement a small HTTP server with pseudoCGI support in the mini-http-cgi-server project, and it worked as I expected. As I got what I wanted, I was satisfied and I left this project for a while. I didn’t like things to grow too much.

    But then, preparing a talk for the Madrid Perl Workshop 2017 (thanks to Madrid Perl Mongers and Barcelona Perl Mongers guys for the event support), I had enough motivation to do something more practical, something where web front-end coders could do their job well and communicate with the back-end where Perl6 is awaiting. On the one hand, the typical public html static structure, and on the other hand a Perl6 module including several webservices waiting for the web requests from the front-end guys.

    Then Wap6 was born (Web App Perl6).

    The Wap6 structure

    I like the structure for a web application that Wap6 implements:

    public folder contains the friendly front-end stuff, like static html, javascript, css, etc., that is, the front-end developer space. The webservices folder contains the back-end stuff: a Perl6 module including a function per webservice.

    This same folder level contains the solution entry point, a Perl6 script that, among other things like initialization server parameters, contains the mapping between routes and webservices:

    my %webservices =
      '/ws1' => ( &ws1, 'html' ),
      '/ws2' => ( &ws2, 'json' )
    ;
    

    As you can see, not only the routes are mapped to the corresponding webservice, but also specify the return content-type of the webservice (like HMTL or JSON). That is, you type http://domain/ws1 in the web browser and the ws1 function returns the response data with the corresponding content-type as we will see later.

    All the routes to the webservices are in %webservices hash and it is passed to the main funcion wap with other useful named params:

    wap(:$server-ip, :$server-port, :$default-html, :%webservices);

    The core of Wap6

    The wap funcion is located out side, in the core lib module that Wap6 use and contains the concurrent and elegant V12 engine:

    react {   
      whenever IO::Socket::Async.listen($server-ip,$server-port) -> $conn {
        whenever $conn.Supply(:bin) -> $buf {
          my $response = response(:$buf, :$current-dir, :$default-html, :%webservices);
          $conn.write: $response.encode('UTF-8');
          $conn.close;
        }
      }
    }
    

    This is a threes (react – whenever – IO::Socket::Async) reactive, concurrent and asynchronous context. When a transmission arrives from the web client ($conn), it is placed in a new Supply $buf of bin type ($conn.Suply(:bin)), and $buf with other things like the %webservices hash are sent to the response function that runs the HTTP logic. Finally, the return from the response function is written back to the web client.

    The response function (located out side, in the core lib too) contains the HTTP parser stuff: it splits the incoming data (the HTTP entity) into headers and body, it performs validations, it takes basic HTTP header information like the method (GET or POST) and the URI (Uniform Resource Identifier), it determines if the requested resource is a webservice (from the webservices folder) or static file (from the public folder), get the data from the resource (from static file or webservice) and returns back to wap function to write the response to the web client, as we have seen before.

    The Webservices

    The response function, validates $buf and extract the HTTP method from the request header that can be GET or POST (I don’t think that in the future it will support more HTTP methods). Case of GET method it puts the URL params (if any) into $get-params. Case of POST method, it puts the body request into $body.

    Then it’s time to check if the web client has requested a webservice. $get-params includes the URI and is extracted with the URI module, finally the result is placed in $path:

    given $path {
      when %webservices{"$_"}:exists {
        my ( &ws, $direct-type ) = %webservices{"$_"};
        my $type = content-type(:$direct-type);
        return response-headers(200, $type) ~ &ws(:$get-params, :$body);
      }
      ..
    }
    

    If $path exists in the %webservices hash, the client wants a webservice. Then it extracts the corresponding webservice callable function &ws from %webservices hash (yes, I also love Functional paradigm :-) ) and the correspondig content-type. Then it calls the webservice function &ws with the $get-params and the request $body parameters. Finally it returns the HTTP response entity that concatenates:

    The callable webservice &ws can be ws1, located in the Perl6 module from webservices folder:

    sub ws1 ( :$get-params, :$body ) is export {
      if $get-params { return 'From ws1: ' ~ $get-params; }
      if $body { return 'From ws1: ' ~ $body; }
    }
    

    In this demo context the webservice simply returns the input, that is, the $get-params (when GET) or the $body (when POST).

    When the client request a static file

    After discarding all the other possibilities, if the client request a static file hosted in the public folder, like html, js, css, etc, then:

    given $path {
    ..
      default {
        my $filepath = "$current-dir/public/$path";
        my $type = content-type(:$filepath);
        return response-headers(200, $type) ~ slurp "$current-dir/public/$path";
      }
    }
    

    It returns the response headers including the matched content-type and the requested file contents with slurp.

    And that’s all folks! a concurrent web server in the script-procedural manner: Wap6.

    Epilogue

    I’m happy with the results of Wap6. I don’t pretend that it grows a lot, but I’m always tempted to continue adding more features: SSL support (completed!), session management (in progress), cookies, file uploads, etc.

    Perl6 has put on the table a very powerful way to perform concurrent network operations: IO::Socket::Async, a masterpiece. Also, with Perl6 you can mix the Object-Oriented, Procedural and Functional paradigms as you wish. With these capabilities you can design a concurrent asynchronous service and implement it quickly.

    If you want something more serious approach with HTTP services and concurrency in the Perl6 ecosystem, take a look at Cro, it represents a great opportunity to establish Perl6 as a powerful entity in the HTTP services space. Jonathan Worthington wrote about it last 9th on this same Advent Calendar.

    Meanwhile, I will continue playing with Wap6, in the script manner, contributing with the Perl6 ecosystem and learning from the bests coders in the world, I mean: Perl and Perl6 coders, of course :-)

    Perl 6 Advent Calendar: Day 24 – Solving a Rubik’s Cube

    Published by coke on 2017-12-24T00:33:51

    Intro

    I have a speed cube on my wish list for Christmas, and I'm really excited about it. :) I wanted to share that enthusiasm with some Perl 6 code.

    I graduated from high school in '89, so I'm just the right age to have had a Rubik's cube through my formative teen years. I remember trying to show off on the bus and getting my time down to just under a minute. I got a booklet from a local toy store back in the 80s that showed an algorithm on how to solve the cube, which I memorized. I don't have the booklet anymore. I've kept at it over the years, but never at a competitive level.

    In the past few months, YouTube has suggested a few cube videos to me based on my interest in the standupmaths channel; seeing the world record come in under 5 seconds makes my old time of a minute seem ridiculously slow.

    Everyone I've spoken to who can solve the cube has been using a different algorithm than I learned, and the one discussed on standupmaths is yet a different one. The advanced version of this one seems to be commonly used by those who are regularly setting world records, though.

    Picking up this algorithm was not too hard; I found several videos, especially one describing how to solve the last layer. After doing this for a few days, I transcribed the steps to a few notes showing the list of steps, and the crucial parts for each step: desired orientation, followed by the individual turns for that step. I was then able to refer to a single page of my notebook instead of a 30-minute video, and after a few more days, had memorized the steps: being able to go from the notation to just doing the moves is a big speed up.

    After a week, I was able to solve it reliably using the new method in under two minutes; a step back, but not bad for a week's effort in my off hours. Since then (a few weeks now), I've gotten down to under 1:20 pretty consistently. Again, this is the beginner method, without any advanced techniques, and I'm at the point where I can do the individual algorithm steps without looking at the cube. (I still have a long way to go to be competitive though.)

    Notation

    A quick note about the notation for moves – given that you're holding the cube with a side on the top, and one side facing you, the relative sides are:

    L (Left) R (Right) U (Up) D (Down) F (Front) B (Back)

    If you see a lone letter in the steps, like B, that means to turn that face clockwise (relative to the center of the cube, not you). If you add a ʼ to the letter, that means counter clockwise, so would have the top piece coming down, while a R would have the bottom piece coming up.

    Additionally, you might have to turn a slice twice, which is written as U2; (Doesn't matter if it's clockwise or not, since it's 180º from the starting point.)

    Algorithm

    The beginner's algorithm I'm working with has the following basic steps:

    1. White cross 2. White corners 3. Second layer 4. Yellow cross 5. Yellow edges 6. Yellow corners 7. Orient yellow corners

    If you're curious as to what the individual steps are in each, you'll be able to dig through the Rubik's wiki or the YouTube video linked above. More advanced versions of this algorithm (CFOP by Jessica Fridrich) allow you to combine steps, have specific "shortcuts" to deal with certain cube states, or solve any color as the first side, not just white.

    Designing a Module

    As I began working on the module, I knew I wanted to get to a point where I could show the required positions for each step in a way that was natural to someone familiar with the algorithm, and to have the individual steps also be natural, something like:

     F.R.U.Rʼ.Uʼ.Fʼ 

    I also wanted to be able to dump the existing state of the cube; For now as text, but eventually being able to tie it into a visual representation as well,

    We need to be able to tell if the cube is solved; We need to be able to inspect pieces relative to the current orientation, and be able to change our orientation.

    Since I was going to start with the ability to render the state of the cube, and then quickly add the ability to turn sides, I picked an internal structure that made that fairly easy.

    The Code

    The latest version of the module is available on github. The code presented here is from the initial version.

    Perl 6 lets you create Enumerations so you can use actual words in your code instead of lookup values, so let's start with some we'll need:

    enum Side «:Up('U') :Down('D') :Front('F') :Back('B') :Left('L') :Right('R')»;
    enum Colors «:Red('R') :Green('G') :Blue('B') :Yellow('Y') :White('W') :Orange('O')»;

    With this syntax, we can use Up directly in our code, and its associated value is U.

    We want a class so we can store attributes and have methods, so our class definition has:

    class Cube::Three {
    has %!Sides;
    ...
    submethod BUILD() {
    %!Sides{Up} = [White xx 9];
    %!Sides{Front} = [Red xx 9];
    ...
    }
    }

    We have a single attribute, a Hash called %.Sides; Each key corresponds to one of the Enum sides. The value is a 9-element array of Colors. Each element on the array corresponds to a position on the cube. With white on top and red in front as the default, the colors and cell positions are shown here with the numbers & colors. (White is Up, Red is Front)

             W0 W1 W2
             W3 W4 W5
             W6 W7 W8
    G2 G5 G8 R2 R5 R8 B2 B5 B8 O2 O5 O8
    G1 G4 G7 R1 R4 R7 B1 B4 B7 O1 O4 O7
    G0 G3 G6 R0 R3 R6 B0 B3 B6 B0 B3 B6
             Y0 Y1 Y2
             Y3 Y4 Y5
             Y6 Y7 Y8
    

    The first methods I added were to do clockwise turns of each face.

    method F {
    self!rotate-clockwise(Front);
    self!fixup-sides([
    Pair.new(Up, [6,7,8]),
    Pair.new(Right, [2,1,0]),
    Pair.new(Down, [2,1,0]),
    Pair.new(Left, [6,7,8]),
    ]);
    self;
    }

    This public method calls two private methods (denoted with the !); one rotates a single Side clockwise, and the second takes a list of Pairs, where the key is a Side, and the value is a list of positions. If you imagine rotating the top of the cube clockwise, you can see that the positions are being swapped from one to the next.

    Note that we return self from the method; this allows us to chain the method calls as we wanted in the original design.

    The clockwise rotation of a single side shows a raw Side being passed, and uses array slicing to change the order of the pieces in place.

    # 0 1 2 6 3 0
    # 3 4 5 -> 7 4 1
    # 6 7 8 8 5 2
    method !rotate-clockwise(Side \side) {
    %!Sides{side}[0,1,2,3,5,6,7,8] = %!Sides{side}[6,3,0,7,1,8,5,2];
    }

    To add the rest of the notation for the moves, we add some simple wrapper methods:

    method F2 { self.F.F; }
    method Fʼ { self.F.F.F; }

    F2 just calls the move twice; Fʼ cheats: 3 rights make a left.

    At this point, I had to make sure that my turns were doing what they were supposed to, so I added a gist method (which is called when an object is output with say).

    say Cube::Three.new.U2.D2.F2.B2.R2.L2;
          W Y W
          Y W Y
          W Y W
    G B G R O R B G B O R O
    B G B O R O G B G R O R
    G B G R O R B G B O R O
          Y W Y
          W Y W
          Y W Y
    

    The source for the gist is:

    method gist {
    my $result;
    $result = %!Sides{Up}.rotor(3).join("\n").indent(6);
    $result ~= "\n";
    for 2,1,0 -> $row {
    for (Left, Front, Right, Back) -> $side {
    my @slice = (0,3,6) >>+>> $row;
    $result ~= ~%!Sides{$side}[@slice].join(' ') ~ ' ';
    }
    $result ~= "\n";
    }
    $result ~= %!Sides{Down}.rotor(3).join("\n").indent(6);
    $result;
    }

    A few things to note:

    The gist is great for stepwise inspection, but for debugging, we need something a little more compact:

    method dump {
    gather for (Up, Front, Right, Back, Left, Down) -> $side {
    take %!Sides{$side}.join('');
    }.join('|');
    }

    This iterates over the sides in a specific order, and then uses the gather take syntax to collect string representations of each side, then joining them all together with a |. Now we can write tests like:

    use Test; use Cube::Three;
    my $a = Cube::Three.new();
    is $a.R.U2...R....U2.L.U..U.L.dump,
    'WWBWWWWWB|RRRRRRRRW|BBRBBBBBO|OOWOOOOOO|GGGGGGGGG|YYYYYYYYY',
    'corners rotation';

    This is actually the method used in the final step of the algorithm. With this debug output, I can take a pristine cube, do the moves myself, and then quickly transcribe the resulting cube state into a string for testing.

    While the computer doesn't necessarily need to rotate the cube, it will make it easier to follow the algorithm directly if we can rotate the cube, so we add one for each of the six possible turns, e.g.:

    method rotate-F-U {
    self!rotate-clockwise(Right);
    self!rotate-counter-clockwise(Left);
    # In addition to moving the side data, have to
    # re-orient the indices to match the new side.
    my $temp = %!Sides{Up};
    %!Sides{Up} = %!Sides{Front};
    self!rotate-counter-clockwise(Up);
    %!Sides{Front} = %!Sides{Down};
    self!rotate-clockwise(Front);
    %!Sides{Down} = %!Sides{Back};
    self!rotate-clockwise(Down);
    %!Sides{Back} = $temp;
    self!rotate-counter-clockwise(Back);
    self;
    }

    As we turn the cube from Front to Up, we rotate the Left and Right sides in place. Because the orientation of the cells changes as we change faces, as we copy the cells from face to face, we also may have to rotate them to insure they end up facing in the correct direction. As before, we return self to allow for method chaining.

    As we start testing, we need to make sure that we can tell when the cube is solved; we don't care about the orientation of the cube, so we verify that the center color matches all the other colors on the face:

    method solved {
    for (Up, Down, Left, Right, Back, Front) -> $side {
    return False unless
    %!Sides{$side}.all eq %!Sides{$side}[4];
    }
    return True;
    }

    For every side, we use a Junction of all the colors on a side to compare to the center cell (always position 4). We fail early, and then succeed only if we made it through all the sides.

    Next I added a way to scramble the cube, so we can consider implementing a solve method.

    method scramble {
    my @random = <U D F R B L>.roll(100).squish[^10];
    for @random -> $method {
    my $actual = $method ~ ("", "2", "ʼ").pick(1);
    self."$actual"();
    }
    }

    This takes the six base method names, picks a bunch of random values, then squishes them (insures that there are no dupes in a row), and then picks the first 10 values. We then potentially add on a 2 or a ʼ. Finally, we use the indirect method syntax to call the individual methods by name.

    Finally, I'm ready to start solving! And this is where things got complicated. The first steps of the beginner method are often described as intuitive. Which means it's easy to explain… but not so easy to code. So, spoiler alert, as of the publish time of this article, only the first step of the solve is complete. For the full algorithm for the first step, check out the linked github site.

    method solve {
    self.solve-top-cross;
    }
    method solve-top-cross {
    sub completed {
    %!Sides{Up}[1,3,5,7].all eq 'W' &&
    %!Sides{Front}[5] eq 'R' &&
    %!Sides{Right}[5] eq 'B' &&
    %!Sides{Back}[5] eq 'O' &&
    %!Sides{Left}[5] eq 'G';
    }
    ...
    MAIN:
    while !completed() {
    # Move white-edged pieces in second row up to top
    # Move incorrectly placed pieces in the top row to the middle
    # Move pieces from the bottom to the top
    }
    }

    Note the very specific checks to see if we're done; we use a lexical sub to wrap up the complexity – and while we have a fairly internal check here, we see that we might want to abstract this to a point where we can say "is this edge piece in the right orientation". To start with, however, we'll stick with the individual cells.

    The guts of solve-top-cross are 100+ lines long at the moment, so I won't go through all the steps. Here's the "easy" section

    my @middle-edges =
    [Front, Right],
    [Right, Back],
    [Back, Left],
    [Left, Front],
    ;
    for @middle-edges -> $edge {
    my $side7 = $edge[0];
    my $side1 = $edge[1];
    my $color7 = %!Sides{$side7}[7];
    my $color1 = %!Sides{$side1}[1];
    if $color7 eq 'W' {
    # find number of times we need to rotate the top:
    my $turns = (
    @ordered-sides.first($side1, :k) -
    @ordered-sides.first(%expected-sides{~$color1}, :k)
    ) % 4;
    self.U for 1..$turns;
    self."$side1"();
    self.for 1..$turns;
    next MAIN;
    } elsif $color1 eq 'W' {
    my $turns = (
    @ordered-sides.first($side7, :k) -
    @ordered-sides.first(%expected-sides{~$color7}, :k)
    ) % 4;
    self.for 1..$turns;
    self."$side1"();
    self.U for 1..$turns;
    next MAIN;
    }
    }

    When doing this section on a real cube, you'd rotate the cube without regard to the side pieces, and just get the cross in place. To make the algorithm a little more "friendly", we keep the centers in position for this; we rotate the Up side into place, then rotate the individual side into place on the top, then rotate the Up side back into the original place.

    One of the interesting bits of code here is the .first(..., :k) syntax, which says to find the first element that matches, and then return the position of the match. We can then look things up in an ordered list so we can calculate the relative positions of two sides.

    Note that the solving method only calls to the public methods to turn the cube; While we use raw introspection to get the cube state, we only use "legal" moves to do the solving.

    With the full version of this method, we now solve the white cross with this program:

    #!/usr/bin/env perl6
    use Cube::Three;
    my $cube = Cube::Three.new();
    $cube.scramble;
    say $cube;
    say '';
    $cube.solve;
    say $cube;

    which generates this output given this set of moves (Fʼ L2 B2 L Rʼ Uʼ R Fʼ D2 B2). First is the scramble, and then is the version with the white cross solved.

          W G G
          Y W W
          Y Y Y
    O O B R R R G B O Y Y B
    R G O B R R G B G W O B
    Y B B R O W G G G W W O
          W W O
          Y Y O
          B R R
    
          Y W W
          W W W
          G W R
    O G W O R Y B B G R O G
    Y G G R R B R B Y R O G
    O O R Y O W O O R W Y B
          G G B
          B Y Y
          Y B B
    

    This sample prints out the moves used to do the scramble, shows the scrambled cube, "solves" the puzzle (which, as of this writing, is just the white cross), and then prints out the new state of the cube.

    Note that as we get further along, the steps become less "intuitive", and, in my estimation, much easier to code. For example, the last step requires checking the orientationof four pieces, rotating the cube if necessary, and then doing a 14-step set of moves. (shown in the test above).

    Hopefully my love of cubing and Perl 6 have you looking forward to your next project!

    I'll note in the comments when the module's solve is finished, for future readers.

    Perl 6 Advent Calendar: Day 23 – The Wonders of Perl 6 Golf

    Published by AlexDaniel on 2017-12-23T00:00:05

    Ah, Christmas! What could possibly be better than sitting around the table with your friends and family and playing code golf! … Wait, what?

    Oh, right, it’s not Christmas yet. But you probably want to prepare yourself for it anyway!

    If you haven’t noticed already, there’s a great website for playing code golf: https://code-golf.io/. The cool thing about it is that it’s not just for perl 6! At the time of writing, 6 other langs are supported. Hmmm…

    Anyway, as I’ve got some nice scores there, I thought I’d share some of the nicest bits from my solutions. All the trickety-hackety, unicode-cheatery and mind-blowety. While we are at it, maybe we’ll even see that perl 6 is quite concise and readable even in code golf. That is, if you have a hard time putting your Christmas wishes on a card, maybe a line of perl 6 code will do.

    I won’t give full solutions to not spoil your Christmas fun, but I’ll give enough hints for you to come up with competitive solutions.

    All I want for Christmas is for you to have some fun. So get yourself rakudo to make sure you can follow along. Later we’ll have some pumpkin pie and we’ll do some caroling. If you have any problems running perl 6, perhaps join #perl6 channel on freenode to get some help. That being said, https://code-golf.io/ itself gives you a nice editor to write and eval your code, so there should be no problem.

    Some basic examples

    Let’s take Pascal’s Triangle task as an example. I hear ya, I hear! Math before Christmas, that’s cruel. Cruel, but necessary.

    There’s just one basic trick you have to know. If you take any row from the Pascal’s Triangle, shift it by one element and zip-sum the result with the original row, you’ll get the next row!

    So if you had a row like:

    1 3 3 1

    All you do is just shift it to the right:

    0 1 3 3 1

    And sum it with the original row:

    1 3 3 1
    + + + +
    0 1 3 3 1
    =
    1 4 6 4 1

    As simple as that! So let’s write that in code:

    for ^16 { put (+combinations($^row,$_) for 0..$row) }

    You see! Easy!

    … oh… Wait, that’s a completely different solution. OK, let’s see:

    .put for 1, { |$_,0 Z+ 0,|$_ } … 16

    Output:

    1
    1 1
    1 2 1
    1 3 3 1
    1 4 6 4 1
    1 5 10 10 5 1
    1 6 15 20 15 6 1
    1 7 21 35 35 21 7 1
    1 8 28 56 70 56 28 8 1
    1 9 36 84 126 126 84 36 9 1
    1 10 45 120 210 252 210 120 45 10 1
    1 11 55 165 330 462 462 330 165 55 11 1
    1 12 66 220 495 792 924 792 495 220 66 12 1
    1 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1
    1 14 91 364 1001 2002 3003 3432 3003 2002 1001 364 91 14 1
    1 15 105 455 1365 3003 5005 6435 6435 5005 3003 1365 455 105 15 1

    Ah-ha! There we go. So what happened there? Well, in perl 6 you can create sequences with a very simple syntax: 2, 4, 8 … ∞. Normally you’ll let it figure out the sequence by itself, but you can also provide a code block to calculate the values. This is awesome! In other languages you’d often need to have a loop with a state variable, and here it does all that for you! This feature alone probably needs an article or 𝍪.

    The rest is just a for loop and a put call. The only trick here is to understand that it is working with lists, so when you specify the endpoint for the sequence, it is actually checking for the number of elements. Also, you need to flatten the list with |.

    If you remove whitespace and apply all tricks mentioned in this article, this should get you to 26 characters. That’s rather competitive.

    Similarly, other tasks often have rather straightforward solutions. For example, for Evil Numbers you can write something like this:

    .base(2).comb(~1) %% 2 && .say for ^50

    Remove some whitespace, apply some tricks, and you’ll be almost there.

    Let’s take another example: Pangram Grep. Here we can use set operators:

    a..z .lc.comb && .say for @*ARGS

    Basically, almost all perl 6 solutions look like real code. It’s the extra -1 character oomph that demands extra eye pain, but you didn’t come here to listen about conciseness, right? It’s time to get dirty.

    Numbers

    Let’s talk numbers! 1 ² ③ ٤ ⅴ ߆… *cough*. You see, in perl 6 any numeric character (that has a corresponding numeric value property) can be used in the source code. The feature was intended to allow us to have some goodies like ½ and other neat things, but this means that instead of writing 50 you can write . Some golfing platforms will count the number of bytes when encoded in UTF-8, so it may seem like you’re not winning anything. But what about 1000000000000 and 𖭡? In any case, code-golf.io is unicode-aware, so the length of any of these characters will be 1.

    So you may wonder, which numbers can you write in that manner? There you go:

    -0.5 0.00625 0.025 0.0375 0.05 0.0625 0.083333 0.1
    0.111111 0.125 0.142857 0.15 0.166667 0.1875 0.2
    0.25 0.333333 0.375 0.4 0.416667 0.5 0.583333 0.6
    0.625 0.666667 0.75 0.8 0.833333 0.875 0.916667 1
    1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 10
    11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
    28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
    45 46 47 48 49 50 60 70 80 90 100 200 300 400 500
    600 700 800 900 1000 2000 3000 4000 5000 6000 7000
    8000 9000 10000 20000 30000 40000 50000 60000 70000
    80000 90000 100000 200000 216000 300000 400000
    432000 500000 600000 700000 800000 900000 1000000
    100000000 10000000000 1000000000000

    This means, for example, that in some cases you can save 1 character when you need to negate the result. There are many ways you can use this, and I’ll only mention one particular case. The rest you figure out yourself, as well as how to find the actual character that can be used for any particular value (hint: loop all 0x10FFFF characters and check their .univals).

    For example, when golfing you want to get rid of unnecessary whitespace, so maybe you’ll want to write something like:

    say 5max3 # ERROR

    It does not work, of course, and we can’t really blame the compiler for not untangling that mess. However, check this out:

    saymax# OUTPUT: «5␤»

    Woohoo! This will work in many other cases.

    Conditionals

    If there is a good golfing language, that’s not Perl 6. I mean, just look at this:

    puts 10<30?1:2 # ruby
    say 10 <30??1!!2 # perl 6

    Not only TWO more characters are needed for the ternary, but also some obligatory whitespace around < operator! What’s wrong with them, right? How dare they design a language with no code golf in mind⁉

    Well, there are some ways we can work around it. One of them is operator chaining. For example:

    say 5>3>say(42)

    If 5 is ≤ than 3, then there’s no need to do the other comparison, so it won’t run it. This way we can save at least one character. On a slightly related note, remember that junctions may also come in handy:

    say yes! if 5==3|5

    And of course, don’t forget about unicode operators: , , .

    Typing is hard, let’s use some of the predefined strings!

    You wouldn’t believe how useful this is sometimes. Want to print the names of all chess pieces? OK:

    say (.uniname».words»[2]
    # KING QUEEN ROOK BISHOP KNIGHT PAWN

    This saves just a few characters, but there are cases when it can halve the size of your solution. But don’t stop there, think of error messages, method names, etc. What else can you salvage?

    Base 16? Base 36? Nah, Base 0x10FFFF!

    One of the tasks tells us to print φ to the first 1000 decimal places. Well, that’s very easy!

    say 1.6180339887498948482045868343656381177203091798057628621354486227052604628189024497072072041893911374847540880753868917521266338622235369317931800607667263544333890865959395829056383226613199282902678806752087668925017116962070322210432162695486262963136144381497587012203408058879544547492461856953648644492410443207713449470495658467885098743394422125448770664780915884607499887124007652170575179788341662562494075890697040002812104276217711177780531531714101170466659914669798731761356006708748071013179523689427521948435305678300228785699782977834784587822891109762500302696156170025046433824377648610283831268330372429267526311653392473167111211588186385133162038400522216579128667529465490681131715993432359734949850904094762132229810172610705961164562990981629055520852479035240602017279974717534277759277862561943208275051312181562855122248093947123414517022373580577278616008688382952304592647878017889921990270776903895321968198615143780314997411069260886742962267575605231727775203536139362

    Yes!!!

    Okay, that takes a bit more than 1000 characters… Of course, we can try to calculate it, but that is not exactly in the Christmas spirit. We want to cheat.

    If we look at the docs about polymod, there’s a little hint:

    my @digits-in-base37 = 9123607.polymod(37 xx *); # Base conversion

    Hmmm… so that gives us digits for any arbitrary base. How high can we go? Well, it depends on what form we would like to store the number in. Given that code-golf.io counts codepoints, we can use base 0x10FFFF (i.e. using all available codepoints). Or, in this case we will go with base 0x10FFFE, because:

    ☠☠☠⚠⚠⚠ WARNING! WARNING! WARNING! ⚠⚠⚠☠☠☠
    THIS WILL MAKE YOUR COMPUTER IMPLODE!
    UNICODE STRINGS ARE SUBJECT TO NORMALIZATION SO YOUR
    DATA WILL NOT BE PRESERVED. HIDE YOUR KIDS, HIDE YOUR
    WIFE. HIDE YOUR KIDS, HIDE YOUR WIFE. HIDE YOUR KIDS,
    HIDE YOUR WIFE. AND HIDE YOUR HUSBAND.
    ☠☠☠⚠⚠⚠ WARNING! WARNING! WARNING! ⚠⚠⚠☠☠☠

    When applied to our constant, it should give something like this:

    󻁾񤍠򷒋󜹕󘶸񙦅񨚑򙯬񗈼𢍟𪱷򡀋𢕍򌠐񘦵𔇆򅳒򑠒󌋩򯫞򶝠򚘣򣥨񫵗𿞸􋻩񱷳󟝐󮃞󵹱񿢖𛒕𺬛󊹛󲝂򺗝𭙪񰕺𝧒򊕆𘝞뎛􆃂򊥍񲽤򩻛󂛕磪󡯮끝򰯬󢽈󼿶󘓥򮀓񽑖򗔝󃢖񶡁􁇘󶪼񌍌񛕄񻊺򔴩寡񿜾񿸶򌰘񡇈򦬽𥵑󧨑򕩃򳴪񾖾򌯎󿥐񱛦𱫞𵪶򁇐󑓮򄨠򾎹𛰑𗋨䨀򡒶𰌡򶟫񦲋𧮁􍰍񲍚񰃦𦅂󎓜󸾧󉦩󣲦򄉼񿒣𸖉񿡥󬯞嗟𧽘񿷦򠍍🼟򇋹񖾷𖏕񟡥󜋝􋯱񤄓򭀢򌝓𱀉𫍡󬥝򈘏򞏡񄙍𪏸࿹𺐅񢻳򘮇𐂇񘚡ந򾩴󜆵𰑕򰏷񛉿򢑬򭕴𨬎󴈂􋵔򆀍񖨸􂳚󽡂󎖪񡉽񕧣񎗎򝤉򡔙񆔈󖾩󅾜񋩟򝼤񯓦󐚉񟯶򄠔𦔏򲔐o

    How do we reverse the operation? During one of the squashathons I found a ticket about a feature that I didn’t know about previously. Basically, the ticket says that Rakudo is doing stuff that it shouldn’t, which is of course something we will abuse next time. But for now we’re within the limits of relative sanity:

    say 1.,:1114110[o򲔐𦔏򄠔񟯶󐚉񯓦򝼤񋩟󅾜󖾩񆔈򡔙򝤉񎗎񕧣񡉽󎖪󽡂􂳚񖨸򆀍􋵔󴈂𨬎򭕴򢑬񛉿򰏷𰑕󜆵򾩴ந񘚡𐂇򘮇񢻳𺐅࿹𪏸񄙍򞏡򈘏󬥝𫍡𱀉򌝓򭀢񤄓􋯱󜋝񟡥𖏕񖾷򇋹🼟򠍍񿷦𧽘嗟󬯞񿡥𸖉񿒣򄉼󣲦󉦩󸾧󎓜𦅂񰃦񲍚􍰍𧮁񦲋򶟫𰌡򡒶䨀𗋨𛰑򾎹򄨠󑓮򁇐𵪶𱫞񱛦󿥐򌯎񾖾򳴪򕩃󧨑𥵑򦬽񡇈򌰘񿸶񿜾寡򔴩񻊺񛕄񌍌󶪼􁇘񶡁󃢖򗔝񽑖򮀓󘓥󼿶󢽈򰯬끝󡯮磪󂛕򩻛񲽤򊥍􆃂뎛𘝞򊕆𝧒񰕺𭙪򺗝󲝂󊹛𺬛𛒕񿢖󵹱󮃞󟝐񱷳􋻩𿞸񫵗򣥨򚘣򶝠򯫞󌋩򑠒򅳒𔇆񘦵򌠐𢕍򡀋𪱷𢍟񗈼򙯬񨚑񙦅󘶸󜹕򷒋񤍠󻁾.ords]

    Note that the string has to be in reverse. Other than that it looks very nice. 192 characters including the decoder.

    This isn’t a great idea for printing constants that are otherwise computable, but given the length of the decoder and relatively dense packing rate of the data, this comes handy in other tasks.

    All good things must come to an end; horrible things – more so

    That’s about it for the article. For more code golf tips I’ve started this repository: https://github.com/AlexDaniel/6lang-golf-cheatsheet

    Hoping to see you around on https://code-golf.io/! Whether using perl 6 or not, I’d love to see all of my submissions beaten.

    🥧

    Perl 6 Advent Calendar: Day 22 – Features of Perl 6.d

    Published by liztormato on 2017-12-22T00:00:49

    So there we are. Two years after the first official release of Rakudo Perl 6. Or 6.c to be more precise. Since Matt Oates already touched on the performance improvements since then, Santa thought to counterpoint this with a description of the new features for 6.d that have been implemented since then. Because there have been many, Santa had to make a selection.

    Tweaking objects at creation

    Any class that you create can now have a TWEAK method. This method will be called after all other initializations of a new instance of the class have been done, just before it is being returned by .new. A simple, bit contrived example in which a class A has one attribute, of which the default value is 42, but which should change the value if the default is specified at object creation:

    class A {
        has $.value = 42;
        method TWEAK(:$value = 0) { # default prevents warning
            # change the attribute if the default value is specified
            $!value = 666 if $value == $!value;
        }
    }
    # no value specified, it gets the default attribute value
    dd A.new;              # A.new(value => 42)
    
    # value specified, but it is not the default
    dd A.new(value => 77); # A.new(value => 77)
    
    # value specified, and it is the default 
    dd A.new(value => 42); # A.new(value => 666)

    Concurrency Improvements

    The concurrency features of Rakudo Perl 6 saw many improvements under the hood. Some of these were exposed as new features. Most prominent are Lock::Async (a non-blocking lock that returns a Promise) and atomic operators.

    In most cases, you will not need to use these directly, but it is probably good that you know about atomic operators if you’re engaged in writing programs that use concurrency features. An often occurring logic error, especially if you’ve been using threads in Pumpking Perl 5, is that there is no implicit locking on shared variables in Rakudo Perl 6. For example:

       my int $a;
        await (^5).map: {
            start { ++$a for ^100000 }
        }
        say $a; # something like 419318

    So why doesn’t that show 500000? The reason for this is that we had 5 threads that were incrementing the same variable at the same time. And since incrementing consists of a read step, an increment step and write step, it became very easy for one thread to do the read step at the same time as another thread. And thus losing an increment. Before we had atomic operators, the correct way of doing the above code would be:

       my int $a;
        my $l = Lock.new;
        await (^5).map: {
           start {
               for ^100000 {
                   $l.protect( { ++$a } )
               }
           }
        }
        say $a; # 500000

    This would give you the correct answer, but would be at least 20x as slow.

    Now that we have atomic variables, the above code becomes:

       my atomicint $a;
        await (^5).map: {
            start { ++⚛$a for ^100000 }
        }
        say $a; # 500000

    Which is very much like the original (incorrect) code. And this is at least 6x as fast as the correct code using Lock.protect.

    Unicode goodies

    So many, so many. For instance, you can now use , , as Unicode versions of <=, >= and != (complete list).

    You can now also create a grapheme by specifying the Unicode name of the grapheme, e.g.:

    say "BUTTERFLY".parse-names; # 🦋

    or create the Unicode name string at runtime:

    my $t = "THUMBS UP SIGN, EMOJI MODIFIER FITZPATRICK TYPE";
    print "$t-$_".parse-names for 3..6; # 👍🏼👍🏽👍🏾👍🏿

    Or collate instead of just sort:

    # sort by codepoint value
    say <ä a o ö>.sort; # (a o ä ö)
    # sort using Unicode Collation Algorithm
    say <ä a o ö>.collate; # (a ä o ö)

    Or use unicmp instead of cmp:

    say "a" cmp "Z"; # More
     say "a" unicmp "Z"; # Less

    Or that you can now use any Unicode digits Match variables ( for $1), negative numbers ( for -1), and radix bases (:۳("22") for :3("22")).

    It’s not for nothing that Santa considers Rakudo Perl 6 to have the best Unicode support of any programming language in the world!

    Skipping values

    You can now call .skip on Seq and Supply to skip a number of values that were being produced. Together with .head and .tail this gives you ample manipulexity with Iterables and Supplies.

    By the way, .head now also takes a WhateverCode so you can indicate you want all values except the last N (e.g. .head(*-3) would give you all values except the last three). The same goes for .tail (e.g. .tail(*-3) would give you all values except the first three).

    Some additions to the Iterator role make it possible for iterators to support the .skip functionality even better. If an iterator can be more efficient in skipping a value than to actually produce it, it should implement the skip-one method. Derived from this are the skip-at-least and skip-at-least-pull-one methods that can be provided by an iterator.

    An example of the usage of .skip to find out the 1000th prime number:

    say (^Inf).grep(*.is-prime)[999]; # 7919

    Versus:

    say (^Inf).grep(*.is-prime).skip(999).head; # 7919

    The latter is slightly more CPU efficient, but more importantly much more memory efficient, as it doesn’t need to keep the first 999 prime numbers in memory.

    Of Bufs and Blobs

    Buf has become much more like an Array, as it now supports .push, .append, .pop, .unshift, .prepend, .shift and .splice. It also has become more like Str with the addition of a subbuf-rw (analogous with .substr-rw), e.g.:

    my $b = Buf.new(100..105);
    $b.subbuf-rw(2,3) = Blob.new(^5);
    say $b.perl; # Buf.new(100,101,0,1,2,3,4,105)

    You can now also .allocate a Buf or Blob with a given number of elements and a pattern. Or change the size of a Buf with .reallocate:

    my $b = Buf.allocate(10,(1,2,3));
    say $b.perl; # Buf.new(1,2,3,1,2,3,1,2,3,1)
    $b.reallocate(5);
    say $b.perl; # Buf.new(1,2,3,1,2)

    Testing, Testing, Testing!

    The plan subroutine of Test.pm now also takes an optional :skip-all parameter to indicate that all tests in the file should be skipped. Or you can call bail-out to abort the test run marking it as failed. Or set the PERL6_TEST_DIE_ON_FAIL environment variable to a true value to indicate you want the test to end as soon as the first test has failed.

    What’s Going On

    You can now introspect the number of CPU cores in your computer by calling Kernel.cpu-cores. The amount of CPU used since the start of the program is available in Kernel.cpu-usage, while you can easily check the name of the Operating System with VM.osname.

    And as if that is not enough, there is a new Telemetry module which you need to load when needed, just like the Test module. The Telemetry module provides a number of primitives that you can use directly, such as:

    use Telemetry;
    say T<wallclock cpu max-rss>; # (138771 280670 82360)

    This shows the number of microseconds since the start of the program, the number of microseconds of CPU used, and the number of Kilobytes of memory that were in use at the time of call.

    If you want get to a report of what has been going on in your program, you can use snap and have a report appear when your program is done. For instance:

    use Telemetry;
    snap;
    Nil for ^10000000;  # something that takes a bit of time

    The result will appear on STDERR:

    Telemetry Report of Process #60076
    Number of Snapshots: 2
    Initial/Final Size: 82596 / 83832 Kbytes
    Total Time:           0.55 seconds
    Total CPU Usage:      0.56 seconds
    No supervisor thread has been running
    
    wallclock  util%  max-rss
       549639  12.72     1236
    --------- ------ --------
       549639  12.72     1236
    
    Legend:
    wallclock  Number of microseconds elapsed
        util%  Percentage of CPU utilization (0..100%)
      max-rss  Maximum resident set size (in Kbytes)

    If you want a state of your program every .1 of a second, you can use the snapper:

    use Telemetry;
    snapper;
    Nil for ^10000000;  # something that takes a bit of time

    The result:

    Telemetry Report of Process #60722
    Number of Snapshots: 7
    Initial/Final Size: 87324 / 87484 Kbytes
    Total Time:           0.56 seconds
    Total CPU Usage:      0.57 seconds
    No supervisor thread has been running
    
    wallclock  util%  max-rss
       103969  13.21      152
       101175  12.48
       101155  12.48
       104097  12.51
       105242  12.51
        44225  12.51        8
    --------- ------ --------
       559863  12.63      160
    
    Legend:
    wallclock  Number of microseconds elapsed
        util%  Percentage of CPU utilization (0..100%)
      max-rss  Maximum resident set size (in Kbytes)

    And many more options are available here, such as getting the output in .csv format.

    The MAIN thing

    You can now modify the way MAIN parameters are handled by setting options in %*SUB-MAIN-OPTS. The default USAGE message is now available inside the MAIN as the $*USAGE dynamic variable, so you can change it if you want to.

    Embedding Perl 6

    Two new features make embedding Rakudo Perl 6 easier to handle:
    the &*EXIT dynamic variable now can be set to specify the action to be taken when exit() is called.

    Setting the environment variable RAKUDO_EXCEPTIONS_HANDLER to "JSON" will throw Exceptions in JSON, rather than text, e.g.:

    $ RAKUDO_EXCEPTIONS_HANDLER=JSON perl6 -e '42 = 666'
    {
      "X::Assignment::RO" : {
        "value" : 42,
        "message" : "Cannot modify an immutable Int (42)"
      }
    }

    Bottom of the Gift Bag

    While rummaging through the still quite full gift bag, Santa found the following smaller prezzies:

    Time to catch a Sleigh

    Santa would like to stay around to tell you more about what’s been added, but there simply is not enough time to do that. If you really want to keep up-to-date on new features, you should check out the Additions sections in the ChangeLog that is updated with each Rakudo compiler release.

    So, catch you again next year!

    Best wishes from

    🎅🏾

    Perl 6 Advent Calendar: Day 21 – Sudoku with Junctions and Sets

    Published by scimon on 2017-12-21T00:00:31

    There are a number of core elements in Perl6 that give you powerful tools to do things in a concise and powerful way. Two of these are Junctions and Sets which share a number of characteristics but are also wildly different. In order to demonstrate the power of these I’m going to look at how they can be used with a simple problem, Sudoku puzzles.

    Sudoku : A refresher

    So for those of you who don’t know a Sudoku puzzle is a 9 by 9 grid that comes supplied with some cells filled in with numbers between 1 and 9. The goal is to fill in all the cells with numbers between 1 and 9 so that no row, column or sub square has more than one of any of the numbers in it.

    There’s a few ways to represent a Sudoku puzzle, my personal favourite being a 9 by 9 nested array for example :

    my @game = [
        [4,0,0,0,0,0,0,0,0],
        [0,9,0,3,4,6,0,5,0],
        [5,8,0,0,9,0,0,0,6],
        [0,4,0,8,1,3,0,0,9],
        [0,0,0,5,0,4,0,0,0],
        [8,0,0,6,2,9,0,4,0],
        [3,0,0,0,5,0,0,6,2],
        [0,5,0,9,3,2,0,8,0],
        [0,0,0,0,0,0,0,0,1]
    ];
    

    In this situation the cells with no value assigned are given a 0, this way all the cells have an Integer value assigned to them. The main thing to bear in mind with this format is you need to reference cells using @game[$y][$x] rather than @game[$x][$y].

    Junctions : Quantum Logic Testing

    One of the simplest ways to use Junctions in Perl6 is in a logical test. The Junction can represent a selection of values you are wanting to test against. For example :

    if ( 5 < 1|10 < 2 ) { say "Spooky" } else { say "Boo" }
    Spooky

    So, not only does this demonstrate operator chaining (something that experienced programmers may already be looking confused about) but the any Junction ( 1|10 ) evaluates to True for both 5 < 10 and 1 < 2. In this way Junctions can be extremely powerful already, it’s when you assign a variable container to them that it gets really interesting.

    One of the tests we’d like to be able to make on our Sudoku puzzle is to see if it’s full. By which I mean every cell has been assigned a value greater than 0. A full puzzle may not be completed correctly but there’s a guess in each cell. Another way of putting that would be that none of the cells has a value of 0. Thus we can define a Junction and store it in a scalar variable we can test it at any point to see if the puzzle is full.

    my $full-test = none( (^9 X ^9).map(-> ($x,$y) { 
        @game[$y][$x];
    } ) );
    say so $full-test == 0;
    False

    In this case the game has a number of 0’s still in it so seeing if $full-test equals 0 evaluates to False. Note that without the so to cast the result to a Boolean you’ll get a breakdown of the cells that are equal to 0 only if all of these are False will the Junction evaluate to True.

    Note also the use of the ^9 and X operators to generate two Ranges from 0 to 8 and then the cross product of these two lists of 9 characters to make a list of all the possible X,Y co-ordinates of the puzzle. It’s this kind of powerful simplicity that is one of the reasons I love Perl6. But I digress.

    The strength of this method is that once you’ve defined the Junction you don’t need to modify it. If you change the values stored in the Array the Junction will look at the new values instead (note this only holds true for updating individual cells, if you swap out a whole sub array with a new one you’ll break the Junction).

    So that’s a simple use of a Junction so store a multi-variable test you can reuse. But it gets more interesting when you realise that the values in a Junction can themselves be Junctions.

    Lets look at a more complex test, a puzzle is complete if for every row, column and square in the puzzle there is only one of each number. In order to make this test we’re going to need three helper functions.

    subset Index of Int where 0 <= * <= 8; 
    sub row( Index $y ) {
        return (^9).map( { ( $_, $y ) } ); 
    } 
    sub col( Index $x ) {
         return (^9).map( { ( $x, $_ ) } ); 
    } 
    multi sub square( Index $sq ) {
        my $x = $sq % 3 * 3;
        my $y = $sq div 3 * 3;
        return self.square( $x, $y );
    } 
    multi sub square( Index $x, Index $y ) {
         my $tx = $x div 3 * 3;
         my $ty = $y div 3 * 3;
         return ( (0,1,2) X (0,1,2) ).map( -> ( $dx, $dy ) { 
            ( $tx + $dx, $ty + $dy ) 
        } );
    }

    So here we define an Index as a value between 0 and 8 and then define our sub‘s to return a List of List‘s with the sub lists being a pair of X and Y indices’s. Note that our square function can accept one or two positional arguments. In the single argument we define the sub squares with 0 being in the top left then going left to right with 8 being the bottom right. The two argument version gives use the list of cells in the square for a given cell (including itself).

    So with these in place we can define our one() lists for each row, column and square. Once we have them we can them put them into an all() junction.

    my $complete-all = all(
         (^9).map(
            {
                |(
                    one( row( $_ ).map( -> ( $x, $y ) { 
                        @game[$y][$x] 
                    } ) ),
                    one( col( $_ ).map( -> ( $x, $y ) { 
                        @game[$y][$x] 
                    } ) ),
                    one( square( $_ ).map( -> ( $x, $y ) { 
                        @game[$y][$x] 
                    } ) )
                )
            }
        )
    );

    Once we have that testing to see if the puzzle is complete is quite simple.

    say [&&] (1..9).map( so $complete-all == * );
    False

    Here we test each possible cell value of 1 through 9 against the Junction, in each case this will be True if all the one() Junctions contains only one of the value. Then we use the [] reduction meta-operator to chain these results to give a final True / False value (True if all the results are True and False otherwise). Again this test can be reused as you add values to the cells and will only return True when the puzzle has been completed and is correct.

    Once again we’ve got a complex test boiled down to a single line of code. Our $complete-all variable needs to be defined once and is then valid for the rest of the session.

    This sort of nested junction tests can reach many levels, a final example is if we want to test if a current puzzle is valid. By which I mean it’s not complete but it doesn’t have any duplicate numbers in and row, column or square. Once again we can make a Junction for this, for each row, column or square it’s valid if one or none of the cells is set to each of the possible values.  Thus our creation of the Junction is similar to the $complete-all one.

    $valid-all = all(
        (^9).map(
            {
                |(
                    one( 
                        none( row( $_ ).map( -> ( $x, $y ) {
                            @game[$y][$x]
                        } ) ),
                        one( row( $_ ).map( -> ( $x, $y ) {
                            @game[$y][$x]
                        } ) ) 
                    ), 
                    one( 
                        none( col( $_ ).map( -> ( $x, $y ) {
                            @game[$y][$x] 
                        } ) ),
                        one( col( $_ ).map( -> ( $x, $y ) { 
                            @game[$y][$x]
                        } ) ) 
                    ),
                    one( 
                        none( square( $_ ).map( -> ( $x, $y ) {
                            @game[$y][$x]
                        } ) ),
                        one( square( $_ ).map( -> ( $x, $y ) {
                            @game[$y][$x]
                        } ) ) 
                    )
                )
            }
        )
    );

    The test for validity is basically the same as the test for completeness.

    say [&&] (1..9).map( so $valid-all == * );
    True

    Except in this case our puzzle is valid and so we get a True result.

    Sets : Collections of Objects

    Whilst the Junctions are useful to test values they aren’t as useful if we want to try solving the puzzle. But Perl6 has another type of collection that can come in very handy. Sets, (and their related types Bags and Mixes) let you collect items and then apply mathematical set operations to them to find how different Sets interact with each other.

    As an example we’ll define a possible function  that returns the values that are possible for a given cell. If the cell has a value set we will return the empty list.

    sub possible( Index $x, Index $y, @game ) {
        return () if @game[$y][$x] > 0;
    
        ( 
            (1..9) 
                (-)
            set(
                ( row($y).map( -> ( $x, $y ) { 
                    @game[$y][$x] 
                } ).grep( * > 0 ) ),
                ( col($x).map( -> ( $x, $y ) { 
                    @game[$y][$x] 
                } ).grep( * > 0 ) ),
                ( square($x,$y).map( -> ( $x, $y ) { 
                    @game[$y][$x] 
                } ).grep( * > 0 ) )
            )
        ).keys.sort;
     }

    Here we find the different between the numbers 1 through 9 and the Set made up of the values of the row, column and square the given cell is in. We ignore cells with a 0 value using grep. As Sets store their details as unordered key / value pairs we get the keys and then sort them for consistency. Note that here we’re using the ascii (-) version of the operator, we could also use the Unicode version instead.

    We could define the set as the union of each of the results from row, col and square and the result would be the same. Also we’re using the two argument version of square in this case.

    It should be noted that this is the simplest definition of possible values, there’s no additional logic going on but even this simple result lets us do the simplest of solving algorithms. If this case we loop around every cell in the grid and if it’s got 1 possible value we can set the value to that. In this case we’ll loop round, get a list of cells to set, then loop through the list and set the values. If the list of ones to set is empty or the puzzle is complete then we stop.

    my @updates;
    repeat {
        @updates = (^9 X ^9).map( -> ($x,$y) { 
            ($x,$y) => possible($x,$y,@game) 
        } ).grep( *.value.elems == 1 );
        for @updates -> $pair { 
            my ( $x, $y ) = $pair.key; 
            @game[$y][$x] = $pair.value[0];
        }
    } while ( @updates.elems > 0 && 
              ! [&&] (1..9).map( so $complete-all == * ) );

    So we make a list of Pairs where the key is the x,y coordinates and the value is the possible values. Then we remove all those that don’t have one value. This is continued until there are no cells found with a single possible value or the puzzle is complete.

    Another way of finding solutions is to get values that only appear in one set of possibilities in a given, row, column or square. For example if we have the following possibilities:

    (1,2,3),(2,3,4),(),(),(4,5),(),(),(2,3,4),()

    1 and 5 only appear in the row once each. We can make use of the symmetric set difference operator and operator chaining to get this.

    say (1,2,3) (^) (2,3,4) (^) () (^) () (^) (4,5) (^) () (^) () (^) (2,3,4) (^) ()
    set(1 5)

    Of course in that case we can use the reduction meta-operator on the list instead.

    say [(^)] (1,2,3),(2,3,4),(),(),(4,5),(),(),(2,3,4),()
    set(1 5)

    So in that case the algorithm is simple (in this case I’ll just cover rows, the column and square code is basically the same).

    my @updates;
    for ^9 -> $idx {
        my $only = [(^)] row($idx).map( -> ( $x,$y ) { 
            possible($x,$y,@game) 
        } );
        for $only.keys -> $val {
            for row($idx) -> ($x,$y) {
                if $val (elem) possible($x,$y,@game) {
                    @updates.push( ($x,$y) => $val );
                }
            }
        }
    }

    We then can loop through the updates array similar to above. Combining these two algorithms can solve a large number of Sudoku puzzle by themselves and simplify others.

    Note we have to make two passes, firstly we get the numbers we’re looking for and then we have to look through each row and find where the number appears. For this we use the (elem) operator. Sets can also be referenced using Associative references for example:

    say set(1,5){1}
    True

    A note on Objects

    So for all the examples so far I’ve used basic integers. But there’s nothing stopping you using Objects in your Junctions and Sets. There are a few things to bear in mind though, Sets use the === identity operator for their tests. Most objects will fail an identity check unless you have cloned them or have defined the WHICH method in a way that will allow them to be compared.

    For the Sudoku puzzle you may want to create a CellValue class that stores whether the number was one of the initial values in the puzzle. If you do this though you’ll need to override WHICH and make it return the Integer value of the Cell. As long as you are fine with an identity check being technically invalid in this case (two different CellValues may have the same value but the won’t be the same object) then you can put them in Sets.

    I hope you’ve found this interesting, Junctions and Sets are two of the many different parts of Perl6 that give you power to do complex tasks simply. If you’re interested in the code here there’s a Object based version available to use you can install with :

    zef install Game::Sudoku

    Strangely Consistent: Has it been three years?

    Published by Carl Mäsak

    007, the toy language, is turning three today. Whoa.

    On its one-year anniversary, I wrote a blog post to chronicle it. It seriously doesn't feel like two years since I wrote that post.

    On and off, in between long stretches of just being a parent, I come back to 007 and work intensely on it. I can't remember ever keeping a side project alive for three years before. (Later note: Referring to the language here, not my son.) So there is that.

    So in a weird way, even though the language is not as far along as I would expect it to be after three years, I'm also positively surprised that it still exists and is active after three years!

    In the previous blog post, I proudly announce that "We're gearing up to an (internal) v1.0.0 release". Well, we're still gearing up for v1.0.0, and we are closer to it. The details are in the roadmap, which has become much more detailed since then.

    Noteworthy things that happened in these past two years:

    Things that I'm looking forward to right now:

    I tried to write those in increasing order of difficulty.

    All in all, I'm quite eager to one day burst into #perl6 or #perl6-dev and actually showcase examples where macros quite clearly do useful, non-trivial things. 007 is and has always been about producing such examples, and making them run in a real (if toy) environment.

    And while we're not quite over that hump yet, we're perceptibly closer than we were two years ago.

    Belated addendum: Thanks and hugs to sergot++, to vendethiel++, to raiph++ and eritain++, for sharing the journey with me so far.