Raku RSS Feeds

Roman Baumer (Freenode: rba #raku or ##raku-infra) / 2019-12-15T15:19:17

Raku Advent Calendar: Day 15 – Santa had too much eggnog

Published by Matthew Stephen Stuckwisch on 2019-12-15T00:01:00

Santa had too much eggnog

We’re just over a week from Christmas and Santa is sending his elves the final present lists. Unfortunately, Santa had a bit too much eggnog and so the list that he sent to his elves was … not the greatest. Take a look at some of it:

 - 4 bsaeball gluvs
 - 2 batts
 - 2 ballz
 - 3 fancee dols
 - 1 dressss
 - 1 bbaskebtall

Santa somehow managed to keep a nice format that we could mostly process with regexen, so the elves started hammering away at a nice grammar:

grammar Santa'sList {
  rule TOP        {       <kid's-list>+    }
  rule kid's-list {     <name>     <gift>+ }
  rule gift       { '-' <quantity> <item>  }
  token name      { <-[\n]>   }
  token quantity  { <.digit>+ }
  token item      { <.alpha>+ % \h+ }

While the elves figured that they could try to figure out what he meant in an action object, they decided it would be more interesting to create a token that they could reuse not just in the grammar, but in any random regex — these elves are crafty!

They wanted to make a new token that they’d call <fuzzy> that could somehow capture Santa’s drunken scribblings (can we call his typed list a scribbling?). But regex syntax doesn’t actually allow for doing any kind of fuzzy matching. But here Raku’s engine comes to the rescue. So first they created a code block inside of the token. Code blocks are normally defined with just { 🦋 } but because they needed to define the success of a match, they opted instead of the <?{ 🦋 }> conditional block, which will not only run the code, but will also fail if the block returns a false-y value.

  token fuzzy {
    (<.alpha>+ % \h+)
      # «ö» code here

Before they started writing their code, they did two other things. First they named the capture to be a bit easier to maintain down the road. And secondly, they realized they needed to actually get the list of possible toys into the token somehow. So they added a signature to the token to pass it in.

  token fuzzy(**@toys) {
    $<santa's-text>=(<.alpha>+ % \h+)
      # «ö» code here

Now they could begin the code itself. They would take Santa’s text, and compare it to each of the possible toys, and decide which one was the closest match:

  token fuzzy(**@toys) {
    $<santa's-text>=(<.alpha>+ % \h+)
      my $best = @toys
                   .map({ $^toy, qgram($toy,$<santa's-text>.Str)})
                   .sort( *.tail )
      say "Santa meant to write {$best[0]}";

The Q-gram function they used creates N-grams for each word, and compares them to see how many they have in common. With testing they found that the best value for N (the length of each substring) was about half the average length. The way that Raku works, writing the Q-gram function was super easy:

  #| Generate space-padded N-grams of length n for string t.
  sub ngrams = -> \t, \n {
    my \s = (' ' x n - 1)  ~ t ~  (' ' x n - 1);
    do for ^(t.chars + n) { s.substr: $_, n }

  #| Calculate Q-gram score using bag operations
  sub qgram (\a, \b) {
    my \q  = (a.chars + b.chars) div 4;
    my \aₙ = ngrams(a,q).BagHash;
    my \bₙ = ngrams(b,q).BagHash;

    (aₙ ∩ bₙ) / (aₙ ∪ bₙ)      # Coefficient de communauté de Jaccard

Raku let the elves calculate N-grams in just two clean lines of code, and then use those to calculate the Jaccard-index between the two strings in just four more easy to read lines of code.

Putting this back into their grammar, they ended up with the following:

grammar Santa'sList {
  rule TOP        {       <kid's-list>+    }
  rule kid's-list {     <name>     <gift>+ }
  rule gift       { '-' <quantity> <item>  }
  token name      { <-[\n]>   }
  token quantity  { <.digit>+ }
  token item      { <fuzzy(@gifts)> }
  token fuzzy     { … }
  sub ngrams      { … }
  sub qgrams      { … }

That’s a pretty handy format, but an important problem remains. How do they get access to the best matched text? If they were to match and request, say, $<kid's-list>[0]<gift>[0]<item> they would only get Santa’s original illegible mess. They could do an action but that requires doing a parse with actions, which means the fuzzy token is tied to the vagaries of grammar parsing. Works fine here, but… less reusable.

But elves are good at packing and wrapping. They decide to make a package that wraps the fuzzy token so that both Santa’s original and the corrected version are easily accessible in a DWIM manner. This ‘package’ can’t be declared with package or module, though, because the wrapping process requires using the special sub EXPORT. Their basic process looks like the following:

sub EXPORT {
  # Make the fuzzy token in the elve's factory
  my token fuzzy (*@words) { … } 

  # Wrap it in wrapping paper (apply a role) so it's prettier (easier to use)
  &fuzzy.wrap( … )

  # Ship out (export) the wrapped version
  %( '&fuzzy' => &fuzzy )

Any other special tools the elves need can be included in the EXPORT block, for example, the Q- and N-gram functions. So how will they actually do the wrapping? First, they design the paper, that is, a parameterized role that will override the .Str to give the clean/corrected value, but also provide access to a .fuzz function to allow access to older values:

  role Fuzzy[$clear,$fuzz] {
    method Str  { $clear }
    method fuzz { $fuzz  }

Now, the wrapped function could look something like the following:

    sub (|) {
      my $match = callsame;

      # Failed match evals to false, and is just passed along
      # Successful match gets Fuzzy role mixed in.
        ?? $match but Fuzzy[$match.??, $match.??]
        !! $match

There’s a small problem. The results of the calculations they ran inside of the token aren’t available. One solution they thought of involved adding new parameters to to the fuzzy token with the trait is raw so that the values could be passed back, but that felt like something the old C++ elves would do. No, Santa’s Raku elves had a better idea: dynamic variables. They made two of them, and refactored the original fuzzy method to assign to them:

  my token fuzzy(**@toys) {
    $<santa's-text>=(<.alpha>+ % \h+)
      my $best = @toys
                  .map({ $^toy, qgram($toy,$<santa's-text>.Str)})
                  .sort( *.tail )
      $*clear = $best[0];
      $*fuzz  = ~$<santa's-text>;

    sub (|) {
      my $*fuzz;
      my $*clear;

      my $match = callsame;   # sets $match to result of the original

        ?? $match but Fuzzy[$*clear, $*fuzz]
        !! $match

They did a test with some values and all went well, until an item wasn’t found:

"I like the Norht Pole" ~~ /I like the $<dir>=<fuzzy: <North South>> Pole/;
say $<dir>.clear;   # --> "North"
say $<dir>.fuzz;    # --> "Norht"

"I like the East Pole" ~~ /I like the $<dir>=<fuzzy: <North South>> Pole/;
say $<dir>.clear;   # --> "North"
say $<dir>.fuzz;    # --> "East"

What happened? The elves realized that their token was matching no matter what. This is because the <?{ 🦋 }> block will only fail if it returns a falsey value. The last statement, being an assignment of a string, will virtually always be truthy. To fix this, they added a simple conditional to the end of the block to fail if the Q-gram score wasn’t sufficiently high.

  my token fuzzy(**@toys) {
    $<santa's-text>=(<.alpha>+ % \h+)
      my $best = @toys
                   .map({ $^toy, qgram($toy,$<santa's-text>.Str)})
                   .sort( *.tail )

      $*clear = $best[0];
      $*fuzz  = ~$<santa's-text>;

      # Arbitrary but effective score cut off.
      $best[1] > 0.33

With that, they were done, and able to process Santa’s horrible typing.

Of course, there were a lot of improvements that the elves could still make to make their fuzzy token more useful. After they had made use of it (and taken the eggnog away from Santa so they wouldn’t need it), they polished it up so that it could bring joy to everyone.

With that, I can also announce the release of Regex::FuzzyToken. To use it, just do like the elves and in a grammar or any other code, say use Regex::FuzzyToken and the token fuzzy will be imported into your current scope. It has a few extra features, so take a look at its readme for information on some of its options.

While not everyone will use or have need of a fuzzy token, I hope that this shows off some interesting possibilities when creating tokens that might be better defined programmatically, as well as other cool Raku features like Bag operators, dynamic variables, and parameterized roles.

Raku Advent Calendar: Day 14 – Thinking Beyond Types: an Introduction to Rakudo's MOP

Published by kaiepi on 2019-12-14T00:01:00

It’s Christmas season! Christmas would not be Christmas without the caroling that’s part of the festivities, so let’s make it possible to sing some.

We could simply make a carol one giant string, but that’s not good enough. Being a song, carols often have a chorus that’s repeated in between verses. If we were to store it as one string, we’d be repeating ourselves. On top of that, people are not perfect; they may forget a verse of a carol, or even just a single line of a verse. We need a type to represent a carol. This could be a type of song, but since we only care about carols, it’s a bit early to abstract this out.

Now, to make this more interesting, let’s handle this without making instances of any type of any kind. All behaviour for all Christmas carols will be handled by type objects. This will be enforced using the Uninstantiable REPR.

At first, we might have a Christmas::Carol role:

role Christmas::Carol is repr<Uninstantiable> {
    proto method name(::?CLASS:U: --> Str:D)        {*}
    proto method verse(::?CLASS:U: Int:D --> Seq:D) {*}
    proto method chorus(::?CLASS:U: --> Seq:D)      {*}
    proto method lyrics(::?CLASS:U: --> Seq:D)      {*}

    method sing(::?CLASS:U: --> ::?CLASS:U) {
        .say for @.lyrics;

This would then be done by a class representing a specific carol:

class Christmas::Carol::JingleBells does Christmas::Carol {
    multi method name(::?CLASS:U: --> 'Jingle Bells') { }

    multi method verse(::?CLASS:U: 1 --> Seq:D) {
        lines q:to/VERSE/
        Dashing through the snow
        In a one-horse open sleigh
        O'er the fields we go
        Laughing all the way
        Bells on bobtails ring
        Making spirits bright
        What fun it is to ride and sing
        A sleighing song tonight
    multi method verse(::?CLASS:U: 2 --> Seq:D) {
        lines q:to/VERSE/
        A day or two ago
        I thought I'd take a ride
        And soon, Miss Fanny Bright
        Was seated by my side
        The horse was lean and lank
        Misfortune seemed his lot
        He got into a drifted bank
        And then we got upset
    multi method verse(::?CLASS:U: 3 --> Seq:D) {
        lines q:to/VERSE/
        A day or two ago
        The story I must tell
        I went out on the snow
        And on my back I fell
        A gent was riding by
        In a one-horse open sleigh
        He laughed as there I sprawling lie
        But quickly drove away
    multi method verse(::?CLASS:U: 4 --> Seq:D) {
        lines q:to/VERSE/
        Now the ground is white
        Go it while you're young
        Take the girls tonight
        And sing this sleighing song
        Just get a bobtailed bay
        Two forty as his speed
        Hitch him to an open sleigh
        And crack, you'll take the lead

    multi method chorus(::?CLASS:U: --> Seq:D) {
        lines q:to/CHORUS/
        Jingle bells, jingle bells
        Jingle all the way
        Oh, what fun it is to ride
        In a one-horse open sleigh, hey
        Jingle bells, jingle bells
        Jingle all the way
        Oh, what fun it is to ride
        In a one-horse open sleigh

    multi method lyrics(::?CLASS:U: --> Seq:D) {
        gather for 1..4 {
            take $_ for @.verse($_);
            take "";
            take $_ for @.chorus;
            take "" if $_ != 4;

There’s a problem with this approach, though. What happens if you want to hold onto a collection of Christmas carols to carol around the neighbourhood with?

use Christmas::Carol::JingleBells;
use Christmas;:Carol::JingleBellRock;
use Christmas::Carol::DeckTheHalls;
use Christmas::Carol::SilentNight;
# And so on...

That’s no good! You don’t need to know who wrote a Christmas carol in order to sing it. On top of that, no one thinks of Christmas carols in terms of symbols; they think of them in terms of their name. To represent them effectively, we need to make it so we can look up a Christmas carol using its name, while also making it possible to introduce new carols that can be looked up this way at the same time. How can we do this?

The way we’ll be using here requires a bit of explanation on how types in Raku work.

The Metaobject Protocol

Classes may contain three different types of method declarations. The two you most commonly see are public and private methods:

class Example {
    method !private-method(|) { ... }
    method public-method(|)   { ... }

There is a third type of method declaration you can make, which is exclusive to classes (this a lie, but this is the case when writing Raku with Rakudo alone), by prefixing the method’s name with ^. Calls to these are typically made using the .^ dispatch operator, which you often see when you need to introspect an object (.^name, .^methods, etc.). However, these don’t behave at all like you would otherwise expect a method to. Let’s take a look at what the invocant and parameters are when we invoke a method of this type using the .^ dispatch operator:

class Example {
    method ^strange-method(\invocant: |params --> List:D) {
        (invocant, params)

say Example.^strange-method;
# (Perl6::Metamodel::ClassHOW+{<anon>}.new \(Example))

Whoa whoa whoa, WTF? Why is the class this type of method is declared in its first parameter instead of its invocant? What even is that object that ended up as its invocant instead, and where is it coming from?

Before this can be explained, first we’ll need to understand a little bit about what the metaobject protocol (MOP) is. The MOP is a feature specific to Rakudo through which the behaviour for all objects that can exist in Raku are implemented. These are implemented based on kinds (types of types), such as classes, roles, and grammars. The behaviour for any type is driven by what is called a higher-order working (HOW), which is a metaobject. These are typically instances of a metaclass of some sort. For instance, HOWs for classes are created by the Metamodel::ClassHOW metaclass.

The HOW for any given object can be introspected. How is this done? How, you ask? How? HOW!? By calling HOW on it, of course!

role Foo { }
say Foo.HOW.^name; # OUTPUT: Metamodel::ParametricRoleGroupHOW

Methods of HOWs are called metamethods, and these are what are used to handle the various behaviours that types can support. Some examples of behaviours of kinds that metamethods handle include type names, attributes, methods, inheritance, parameterization, and typechecking. Since most of these are not features specific to any one kind, these are often mixed into metaclasses by metaroles. For instance, the Metamodel::Naming metarole is what handles naming for any type that can be named.

So that third type of method declaration from earlier? That doesn’t actually declare a method for a class; instead, it declares a metamethod that gets mixed into that class’ HOW, similarly to how metaroles are used. The .^ dispatch operator is just sugar for invoking a metamethod of an object using that object as its first argument, which metamethods accept in most cases. For instance, these two metamethod calls are equivalent:

say Int.^name;         # OUTPUT: Int
say; # OUTPUT: Int

Metamethods are the tool we’ll be using to implement Christmas carols purely as types.

Spreading the Joy

To start, instead of having a Christmas::Carol role that gets done by Christmas carol classes, let’s make our carols roles mixed into a Christmas::Carol class instead. Through this class, we will stub the methods a Christmas carol should have, like it was doing as a role, but in addition will hold on to a dictionary of Christmas carols by their name.

We can store carols using an add_carol metamethod:

my constant %CAROLS = %();
method ^add_carol(Christmas::Carol:U, Str:D $name, Mu $carol is raw --> Mu) {
    %CAROLS{$name} := $carol;

Now we can mark roles as being carols like so:

role Christmas::Carol::JingleBells { ... }
Christmas::Carol.^add_carol: 'Jingle Bells', Christmas::Carol::JingleBells;

This isn’t a great API for people to use though. A trait could make it so this could be handled from a role’s declaration. Let’s make an is carol trait for this:

multi sub trait_mod:<is>(Mu \T, Str:D :carol($name)!) {
    Christmas::Carol.^add_carol: $name, T

Now we can define a role as being a carol like this instead:

role Christmas::Carol::JingleBells is carol('Jingle Bells') { ... }

To make it so we can fetch carols by name, we can simply make our Christmas::Carol class parametric. This can be done by giving it a parameterize metamethod which, given a name, will create a Christmas::Carol mixin using any carol we know about by that name:

method ^parameterize(Christmas::Carol:U $this is raw, Str:D $name --> Christmas::Carol:U) {
    self.mixin: $this, %CAROLS{$name}

Now we can retrieve our Christmas carols by parameterizing Christmas::Carol using a carol name. What will the name of the mixin type returned be, though?

say Christmas::Carol['Jingle Bells'].^name;
# OUTPUT: Christmas::Carol+{Christmas::Carol::JingleBells}

That’s a bit ugly. Let’s reset the mixin‘s name during parameterization:

method ^parameterize(Christmas::Carol:U $this is raw, Str:D $name --> Christmas::Carol:U) {
    my Christmas::Carol:U $carol := self.mixin: $this, %CAROLS{$name};
    $carol.^set_name: 'Christmas::Carol[' ~ $name.perl ~ ']';

This gives our Jingle Bells carol a name of Christmas::Carol["Jingle Bells"] instead. Much better.

Let’s add one last metamethod: carols. This will return a list of names for the carols known by Christmas::Carol:

method ^carols(Christmas::Carol:U --> List:D) {

With that, our Christmas::Carol class is complete:

class Christmas::Carol is repr<Uninstantiable> {
    proto method name(::?CLASS:U: --> Str:D)        {*}
    proto method chorus(::?CLASS:U: --> Seq:D)      {*}
    proto method verse(::?CLASS:U: Int:D --> Seq:D) {*}
    proto method lyrics(::?CLASS:U: --> Seq:D)      {*}

    method sing(::?CLASS:U: --> ::?CLASS:U) {
        .say for @.lyrics;

    my constant %CAROLS = %();
    method ^add_carol(Christmas::Carol:U, Str:D $name, Mu $carol is raw --> Mu) {
        %CAROLS{$name} := $carol;
    method ^carols(Christmas::Carol:U --> List:D) {
    method ^parameterize(Christmas::Carol:U $this is raw, Str:D $name --> Christmas::Carol:U) {
        my Christmas::Carol:U $carol := self.mixin: $this, %CAROLS{$name};
        $carol.^set_name: 'Christmas::Carol[' ~ $name.perl ~ ']';

multi sub trait_mod:<is>(Mu \T, Str:D :carol($name)!) {
    Christmas::Carol.^add_carol: $name, T

Now, this is great and all, but how is this an improvement on our original code? By defining carols this way, we no longer need to know the symbol for a carol in order to sing it, and we no longer need to know which module even declared the carol in the first place. So long as we know that the Christmas::Carol class exists, we know all of the carols all of the modules we import happen to be aware of.

This means there can be a module defining a collection of carols:

use Christmas::Carol::JingleBells;
use Christmas::Carol::JingleBellRock;
use Christmas::Carol::DeckTheHalls;
use Christmas::Carol::SilentNight;
unit module Christmas::Carol::Collection;

From another module, we can make another collection using this, and define more carols:

use Christmas::Carol::Collection;
use Christmas::Carol::JingleBells::BatmanSmells;
unit module Christmas::Carol::Collection::Plus;

We can then import this and easily sing all of the original module’s carols, in addition to the ones this new module adds, by name:

use Christmas::Carol;
use Christmas::Carol::Collection::Plus;

Christmas::Carol[$_].sing for Christmas::Carol.^carols;

At this point, you may be wondering: “Couldn’t you just write code that does the same thing as this using instances?”. You’d be right! What this shows is that while there is a protocol for working with types when using Rakudo, the behaviour of any given type isn’t particularly unique; it’s driven mainly by metaobjects that you can have complete control over.

Just with metamethod declarations in classes alone, you can augment or override behaviours of any type that supports inheritance. This is far from the extent of what the MOP allows you to do with types! Alas, the more advanced features for working with the MOP would require more explanation, and would be best left for another time.

Raku Advent Calendar: Day 13 – A Little R&R

Published by tmtvl on 2019-12-13T00:00:03

A Little R&R



Raku is a really nice language. Versatile, expressive, fast, dwimmy. The only problem I sometimes have with it is that it can be a little slow. Fortunately that can easily be solved by the NativeCall interface, which makes it easy to call C code in your Raku program. Now, as nice as C is, it is a rather old language with some limitations. A newer language that can fill its niche is known as Rust. I’ll show some examples of having Raku talk to Rust.


Rust code can be called from other languages using the FFI standard. FFI stands for “Foreign Function Interface” and allows you to export a Rust library as a standard shared library (a .so file on Linux or .dll on Windows). This is done by adding the following section in your Cargo.toml:

crate-type = ["cdylib"]

After adding this section you will find the libraries in your target/debug or target/release folder when you build the library with Cargo. Also be sure to add the libc dependency to get access to standard C types.


We can use the same primitive types as in C: numbers (and chars) and arrays.

Numbers and chars


pub extern fn addition(a: u32, b:32) -> u32 {
        a + b


use NativeCall;
sub addition(uint32, uint32) returns uint32 is native('foo') { * }

Note the #[no_mangle], this keeps the name of the function unchanged in the final library file. While Rust has standardized name mangling (contrary to C++, where the name mangling is platform-dependent), it is still nice to call a function with it’s original name.

Arrays and strings


use std::ffi::CStr;
use std::os::raw::c_char;

pub unsafe extern fn count_chars(s: *const c_char) -> u32 {
        CStr::from_ptr(s).to_str().unwrap().chars().count() as u32

pub extern fn lunch() -> *mut c_char {
        let c_string = CString::new("🌮🍚").expect("CString::new failed");

pub unsafe extern fn free_lunch(ptr: *mut c_char) {
        let _ = CString::from_raw(ptr);


sub count_chars(Str is encoded('utf8')) returns uint32 is native ('foo') { * }
sub lunch() returns CArray[uint8] is native('foo') { * }
sub free_lunch(CArray[uint8]) is native('foo') { * }

Rust has first class support for UTF-8, making it a great fit for Raku. Using CString also guarantees to add a null byte at the end of the string, so you can get the significant bytes by looping until the AT-POS() value equals 0… if, that is, you choose to return an array rather than populate it.



use std::mem::swap;

pub struct Point {
    x: f32,
    y: f32,

impl Point {
        fn print(&self) {
                println!("x: {}, y: {}", self.x, self.y);

pub unsafe extern "C" fn flip(p: *mut Point) {
    swap(&mut (*p).x, &mut (*p).y);


class Point is repr('CStruct') {
    has num32 $.x;
    has num32 $.y;

sub flip(Pointer[Point]) is native('./') { * }

sub flipper {
    my Point $p .= new(x => 3.Num, y => 4.Num);
    say "x: ", $p.x, ", y: ", $p.y;
    flip(nativecast(Pointer[Point], $p));

Rust separates objects into structs (which we are all familiar with), and traits, which are kind of like roles in Raku.



pub extern "C" fn multithread(count: i32) {
    let threads: Vec<_> = (1..8)
        .map(|id| {
            thread::spawn(move || {
                println!("Starting thread {}", id);
                let mut x = 0;
                for y in 0..count {
                    x += y;
                    println!("Thread {}, {}/{}: {}", id, y, count, x);

    for t in threads {
        t.join().expect("Could not join a thread!");


sub multithread(int32) is native('./') { * }

sub multi-multithread {
    my @numbers = (3_000..50_000).pick(10);

    my @promises;

    for @numbers -> $n {
        push @promises, start {

    await Promise.allof(@promises);

Rust and Raku both have first-class concurrency support. This allows you to easily tweak your programs to get the highest possible performance.

Closing statement

These were some examples of interactions between Rust and Raku, two of the most promising languages when looking to the future. If you found this interesting, be sure to check out Andrew Shitov’s A Language a Day articles. Thanks for reading and happy holidays.

Raku Advent Calendar: Day 12 – Making a simple bot in Raku

Published by tyil on 2019-12-12T00:01:37

Making IRC bots is incredibly simple in Raku, thanks to IRC::Client. It allows you to create a very simple bot in about 20 lines of code. There’s a plugin system that allows easy re-use of code between multiple bots, and adding customized features can be as easy as dropping in an anonymous class.

So, let’s get to it!

Get your dependencies installed

Raku uses zef as the standard module installer, and if you’re reading this, I’m assuming you have it available to you. Install IRC::Client with zef, and you should be good to get started.

zef install IRC::Client

Setting up the bot

To set up the bot, we’ll need to have a nickname to use, a server to connect to and a list of channels to join. To make it easier to run this is a program from your shell, I’ll be using a MAIN sub as well.

use IRC::Client;

sub MAIN () {
    nick => 'raku-advent',
    host => '',
    channels => < #advent >,

Let’s save this in a file called bot.pl6, and run it.

perl6 bot.pl6

This will run, and if you’re in the channel you specified in channels, you should see the bot joining in a short moment. However, the program itself doesn’t seem to provide any output. It would be highly convenient, especially during development, to show what it’s doing. This is possible by enabling the debug mode. Adding this to the new method call, making it look as follows.
  nick => 'raku-advent',
  host => '',
  channels => < #advent >,
  debug => True,

If you restart the application now, you will see there’s a lot of output all of a sudden, showcasing the IRC commands the bot is receiving and sending in response. Now all we need to do is add some functionality.

Making the bot work

As described earlier, functionality of the bot is added in using plugins. These can be any class that implements the right method names. For now, we’ll stick to irc-to-me, which is a convenience method which is triggered whenever the bot is spoken to in a private message, or directly addressed in a channel.

The simplest example to get started with here is to simply have it respond with the message you sent to the bot. Let’s do this by adding an anonymous class as a plugin to the new method call.
  nick => 'raku-advent',
  host => '',
  channels => < #advent >,
  debug => True,
  plugins => [
    class {
      multi method irc-to-me ($e) {

When you restart the bot and talk to it on IRC, you will see it responding to you with the same message you sent it.

 <@tyil> raku-advent: hi
 <raku-advent> tyil, hi
 <@tyil:> raku-advent: how are you doing
 <raku-advent> tyil, how are you doing

Adding some real features

So, you’ve seen how easy it is to get started with a simple IRC bot in just over a dozen lines. Let’s add two features that you may want your bot to support.

For convenience sake, I will only cover the class implementing the features, not the entire block.


First off, let’s make the bot able to show the time its been running for. For this, I’ll make it respond to people asking it for “uptime”. We can use the irc-to-me convenience method for this again. After all, we probably don’t want it to respond every time someone discusses uptime, only when the bot is asked directly about it.

In Raku, there’s a special variable called $*INIT-INSTANT, which contains an Instant of the moment the program started. We can use this to easily get the Duration that the program has been running for.

class {
  multi method irc-to-me ($ where *.text eq 'uptime') {
    my $response = "I've been alive for";
    my ($seconds, $minutes, $hours, $days, $weeks) =
      (now - $*INIT-INSTANT).polymod(60, 60, 24, 7);

    $response ~= " $weeks weeks" if $weeks;
    $response ~= " $days days" if $days;
    $response ~= " $hours hours" if $hours;
    $response ~= " $minutes minutes" if $minutes;
    $response ~= " $seconds seconds" if $seconds;

    $response ~ '.';

Now, whenever you ask the bot for uptime, it will respond with a human friendly uptime notification.

 <@tyil> uptime
 <@tyil> raku-advent: uptime
 <raku-advent> tyil, I've been alive for 5 minutes 8 seconds.

User points

Most channels have a bot that keeps track of user points, or karma as it’s sometimes referred to. There’s a module already that does this for us, called IRC::Client::Plugin::UserPoints. We don’t have to do much apart from installing it and adding it to the list of plugins.

zef install IRC::Client::Plugin::UserPoints

Once this finishes, the module can be used in your code. You will need to import it with a use statement, which you can put directly under the use IRC::Client line.

use IRC::Client;
use IRC::Client::Plugin::UserPoints;

Now, in the list of plugins, add it as a new entry.

plugins => [,
  class {

This plugin makes the bot respond to !scores, !sum and whenever a nick is
given points using a ++ suffix, for instance, t​yil++.

 <@tyil> raku++
 <@tyil> raku++
 <@tyil> !scores
 <raku-advent> tyil, « raku » points: main: 2

Finding plugins

All plugins for IRC::Client that are shared on the community have the prefix IRC::Client::Plugin::, so you can search for that on to find plugins to use. Of course, you can easily add your own plugins to the ecosystem as well!

Winding down

As you can see, with some very simple code you can add some fun or important
tools to your IRC community using the Raku programming language. Try it out and
have some fun, and share your ideas with others!

Raku Advent Calendar: Day 11 – Packaging with Libarchive

Published by ctilmes on 2019-12-11T00:01:04

Distributing physical gifts involves wrapping them up into packages, but suppose you want to distribute digital gifts. How can you use Raku to help you wrap them up? Enter Libarchive!

Simple wrapping files into a package

Let’s wrap up just two files, myfile1 and myfile2 into a single file. (Libarchive just as easily creates tar files, cpio, rar, even iso9660 images for cds or dvds.)

use Libarchive::Simple;

given archive-write('') {
    .add: 'myfile1', 'myfile2';

This very simple syntax looks a little weird for those unfamiliar… here is a more ‘traditional’ way of writing the same thing:

use Libarchive::Write;

my $handle ='');
$handle.add('myfile1', 'myfile2');

What is the difference? Libarchive::Simple provides a few shorthand routines for accessing the various Libarchive functionalities. One of these is archive-write() which is identical to

The second example takes the return from new() and stores it in the variable $handle. Then we call two methods on that variable to add the files, and close the file.

The given statement makes this even simpler by topicalizing that variable, that is, storing it in the topic variable $_. Since $_ can be used as the default object for method calls, we don’t need to explicitly refer to it when calling methods.

.add('myfile1') is equivalent to $_.add('myfile1')

But what happened to the parentheses? Another little shorthand when calling methods — rather than surrounding your arguments to a method with parentheses, you can just precede them with a colon:

.add: 'myfile1';

Nice! I love programming with Raku!

Package a bunch of files by smartmatching

A handy routine to help in your packaging is dir(). It will return a lazy list of IO::Path objects for a directory. By happy coincidence, Libarchive add can take IO::Path just as easily as a filename.

given archive-write('') {
    .add: 'mydir', dir('mydir');

Note we’ve added the directory itself first, then used dir() to get a list of the files inside mydir, which also get added. If you don’t include the directory itself, it won’t be part of the package. That works fine most of the time, depending on your format and your unpackaging program, but it is good practice to include the directory to make sure it gets created the way you want it to.

dir has an extra feature — it can filter the directory by smartmatching the string with a :test argument. Lets include only jpeg files, allowing them to end in either .jpg or .jpeg:

given archive-write('') {
    .add: 'mydir', dir('mydir', test => /:i '.' jpe?g $/);

Ecosystem modules like File::Find or Concurrent::File::Find can easily generate even more complicated lists of files for including by recursively adding an entire hierarchy to the package.

Create your files on the fly while packaging

You aren’t limited to adding existing files. You can use the write() method to generate a file for the package on the fly. You can specify content as a Str, a Blob, or even an IO::Handle or IO::Path to get the content from.

given archive-write('') {
    .write: 'myfile', q:to/EOF/;
        A special file for a special friend!

Here we’re using a special Raku quoting construct called the heredoc.

The q:to/EOF/ says to use the lines following up until the EOF marker and make them into the content of a file named ‘myfile’ included in the package file. As a friendly benefit, the amount of indentation of the terminator is automatically removed from each line to the quoted lines. How convenient!

Stream your package instead of making files

Making files with your packages is great and all, but I’ve got a web site I want to return my custom CD images from — why bother with a temporary file? Just stream the output on the fly!

For this example, we’re going to stream the package as an iso9660 file (the image used for CD writers) to STDOUT, but you can stream to other programs too.

given archive-write($*OUT, format => 'iso9660') {
    .add: 'myfile1', 'myfile2', 'mydir', dir('mydir');

Usually the format can be inferred from the suffix on a specified filename, but since we are streaming there is no filename, so the format must be specified. $*OUT is a special filehandle that is automatically opened for you for writing to STDOUT.

Burn that image to a CD and mount it and you’ll see the specified files. So easy!

Libarchive has so many cool features it would take days to go over them all, but I hope this little intro has whet your appetite for packaging things up. Raku has fantastic syntax, features and expressivity that make it so easy to interface with libraries like this.

Have fun packaging your own things, be they physical or digital!

Rakudo Weekly: 2019.49 Almost Starring

Published by liztormato on 2019-12-09T20:06:50

Patrick Spek has made the first release candidate of Rakudo Star 2019.11 available for download. If you are working with Raku from Rakudo Star distributions, then this is the moment to test the distribution so that you can be sure that nothing was missed! So please, download and test it! Which of course you can also do if you’re not generally a user of Rakudo Star 🙂

Adventing continued!

Official Raku Advent Calendar:

Sterling Hanenkamp‘s Async & Concurrency Advent:

Weekly Challenge Advent (the ones with Raku content):

Renaming Progress

Rewriting Legacy Code

Jeff Goff has continued his series of blog posts about porting OLE::Storage_Lite with two blog posts: The Sorceror and A New Hope!

The future of SparrowHub

Alexey Melezhik is inviting feedback about the future of SparrowHub (the central repository for Sparrow plugins).

Weekly Challenge

Here are the Raku entries for Challenge #37:

Of course, Challenge #38 is up for your perusal!

Core Developments

Questions about Raku

Meanwhile on Twitter

Meanwhile on Facebook

If you’re interested in developments there, go the Perl 6 group homepage.

Meanwhile on perl6-users

Comments about Raku

New Raku Modules

Updated Raku Modules

Winding down

A bit of a quiet week, with a lot of people either writing their advent post, or reading the many advent posts. This will most likely stay the same this coming week. Nonetheless, yours truly advises you to check in again next week for more news about the Raku Programming Language and its community!

Rakudo Weekly: 2019.48 Released Advent

Published by liztormato on 2019-12-02T17:46:00

Thanks to the tireless efforts of release managers Aleks-Daniel Jakimenko-Aleksejev and Samantha McVey, this week finally saw a new Rakudo Compiler release again: 2019.11. For packagers, this is the first release that is fully relocatable. Kudos to the 65 contributors to this release! And kudos to Claudio Ramirez to immediately supply packages for many Linux distributions that now also support relocatable builds!

This also brings an end to the era in which Aleks-Daniel Jakimenko-Aleksejev has done 27 (twenty-seven!) Rakudo compiler releases in the past 2+ years. On behalf of the Raku Community, I would like to thank Aleks-Daniel for his tireless efforts, again and again! Fortunately, Alexander Kiryuhin has stepped up to take their place.

Comma Complete

Jonathan Worthington also announced a new release: Comma Complete, the preferred IDE for Raku for many! This is the full-featured paid version of Comma, which also comes in a Comma Community edition that can be downloaded for free (feature comparison).

Adventing started

This year saw the start of two Raku based Advent blog sites, and one mixed Advent blog: the official Raku Advent Calendar (by many different authors), Sterling Hanenkamp‘s Async & Concurrency Advent and the Weekly Challenge Advent. So these are the blog posts so far:

And the blog posts by Sterling Hanenkamp:

And the Weekly Challenge ones that cover Raku:

Renaming Progress

Squashathon Time Again

Last month’s Squashathon somehow fell between the cracks, but this weekend (6/7/8 December) there will be another Squashathon! This time focused on fixing the Raku FAQ, which currently has such gems as: “What’s the difference between Raku, Rakudo and Raku?” (courtesy of the automatic renaming process).

Seeking Feedback on Rakudo.js

Paweł Murias is seeking feedback on his work on the Rakudo Javascript backend, so that all of his work on his long running grant can be considered completed. Please have a look at the Javascript backend and give us your feedback!

A Raku CoC

Originally conceived as part of the design specifications, but never as such implemented. Now a Pull Request is underway to upgrade it to current best practices and the Raku era. Comments / Suggestions / Additions are welcome!

Rewriting Legacy Code

Jeff Goff has continued his blog posts about porting OLE::Storage_Lite: Rewriting Legacy Code for Raku II: Electric Boogaloo Still looking forward to seeing the first module upload to the ecosystem!

Weekly Challenge

Here are the Raku entries for Challenge #36:

Of course, Challenge #37 is up for your perusal!

Core Developments

Questions about Raku

Meanwhile on Twitter

Meanwhile on Facebook

If you’re interested in developments there, go the Perl 6 group homepage.

Meanwhile on perl6-users

Comments about Raku

New Raku Modules

Updated Raku Modules

Winding down

And there it is, a new Rakudo Compiler release of the Raku Programming Language. Whee! A lot of things happening in the past week, when the release got unblocked. Looks like a lot of things are going to happen this week as well: potentially 15 Raku blog posts, to start with. Hope you’ll all find time for all this goodness. Check in again next week for more reports on those goodies!

Rakudo Weekly: 2019.47 Late Again Or

Published by liztormato on 2019-11-25T19:00:01

If you intended to write a blog post for this years Raku Advent Calendar, you’re too late! Well, sort of. You can still add yourself as a fallback should one of the other participants not be able to write a blog post after all!

Or you can do as Sterling Hanenkamp and do your own Advent Calendar. In their case about Async & Concurrency! The more the merrier!

Renaming Progress

The Raku Beginner Tutorial

Nikos Vaggalis has published a nice blog about Yanzhan Yang‘s Raku Tutorial videos, also mentioning Laurent Rosenfeld‘s Think Perl 6 book. Curiously, there also appears to be a French version of the same blog.


Andrew Shitov reports that Raku will have a booth at FOSDEM 2020 on 1 and 2 February in Brussels, Belgium. Well, at least on one of these days. Stay tuned!

Rewriting Legacy Code

Jeff Goff has started a series of blog posts about rewriting legacy code to Raku, more specifically porting OLE::Storage_Lite. Looking forward to seeing the first module upload to the ecosystem! (/r/rakulang comments).

Weekly Challenge

Mohammad S Anwar elaborates on his ideas about the Weekly Challenge in 2020. Here are the Raku entries for Challenge #35:

Of course, Challenge #36 is up for your perusal!

Core Developments

Not a lot was going on last week feature-wise, while awaiting the Rakudo compiler release:

Questions about Raku

Meanwhile on Twitter

Meanwhile on Facebook

Alas, the Perl 6 Facebook group has still not been renamed. If you’re interested in developments there, please navigate from the Perl 6 group homepage.

Meanwhile on perl6-users

Comments about Raku

New Raku Modules

Updated Raku Modules

Winding down

Just before closing this issue of the Rakudo Weekly, a renaming snag was found in the Rakudo Compiler Release process that needed some thoughts and discussion. Fortunately, there is a 2019.11 MoarVM release already! So please check in again next week for news about the compiler release, and other news related to Rakudo and the Raku Programming Language, of course!

Jo Christian Oterhals: By the way, you could replace … * with Inf or the unicode infinity symbol ∞ to make it more…

Published by Jo Christian Oterhals on 2019-11-24T19:25:11

By the way, you could replace … * with Inf or the unicode infinity symbol ∞ to make it more readable, i.e.

my @a = 1, 1, * + * … ∞;

— — or — —

my @a = 1, 1, * + * … Inf;

Jo Christian Oterhals: As I understand this, * + * … * means the following:

Published by Jo Christian Oterhals on 2019-11-24T10:20:11

As I understand this, * + * … * means the following:

First— * + * sums the two previous elements in the list. … * tells this to do this an infinite number of times; i.e.

1, 1, (1 + 1)

1, 1, 2, (1 + 2)

1, 1, 2, 3, (2 + 3)

1, 1, 2, 3, 5, (3 + 5)

1, 1, 2, 3, 5, 8, (5 + 8), etc.

… three dots means that it does it lazy, i.e. that it does not generate an element before you call it. This can be good for large lists that are computationally heavy.

Rakudo Weekly: 2019.46 Guidance

Published by liztormato on 2019-11-18T19:41:02

Naoum Hankache has taken the famous website, which currently provides the same introduction in 13 different languages, to the Raku era at (/r/rakulang comments). So if your native language is Bulgarian, Chinese, Dutch, French, German, Indonesian, Italian, Japanese, Portuguese, Spanish, Russian or Turkish, you can learn the basics about the Raku Programming Language in your native language!

Should your native language not be there yet, realize that your fellow native language users will be indebted to you when you create a translation for your native language!

Only one slot left for Advent!

If you intend to write a blog post for this years Raku Advent Calendar, you have to make yourself known soon! As there is only one slot left as of this writing!

Renaming Progress

Apart from the work on the Raku Guide:

Weekly Challenge

Here are the Raku entries for Challenge #34:

Of course, Challenge #35 is up for your perusal!

Core Developments

Not a lot was going on last week feature wise, while awaiting a MoarVM release:

Questions about Raku

Meanwhile on Twitter

Meanwhile on Facebook

Alas, the Perl 6 Facebook group has still not been renamed. If you’re interested in developments there, please navigate from the Perl 6 group homepage.

Meanwhile on perl6-users

Comments about Raku

New Raku Modules

Updated Raku Modules

Winding down

A quiet week yet again. And alas, still no 2019.11 Rakudo Compiler release of the Raku Programming Language. Sounds familiar? Yes, it is. But the coming week will probably see a Rakudo compiler release! See you then!

Rakudo Weekly: 2019.45 Red Alert

Published by liztormato on 2019-11-11T17:11:27

In the past months, Fernando Corrêa de Oliveira has been working on Red, an ORM for Raku, with a twist! It pushes the meta-programming features of Raku to new heights. But now Fernando needs feedback from you, about whether or not to provide aliases for often occurring idioms. Your thoughts will be greatly appreciated!

Only 3 slots left for Advent!

If you intend to write a blog post for this years Raku Advent Calendar, you have to make yourself known soon! As there are only 3 slots left as of this writing!

Some more video tutorials

Yanzhan Yang just started a series of tutorials about writing OpenGL in Raku! In the past week he published 3 video tutorials:

Renaming Progress

Why ⚛ ?

Andrew Shitov did some research into the how and why of atomicint and the use of ⚛ operators, and whether they should or should not be necessary.

DC Baltimore Perlyglot Workshop 2020

The two-day technical conference in Baltimore, MD (on 18/19 April 2020) is now open for presentation proposals. So time to get your Raku proposals in!

Weekly Challenge

Julien Fiegehenn did a lightning talk about the Weekly Challenge at the Barcelona Perl And Friends 2019!

And here are the Raku entries for Challenge #33:

Of course, Challenge #34 is up for your perusal!

Core Developments

Not a lot was going on last week feature wise, while extended multi-platform and ecosystem testing was still going on in preparation for the Rakudo Compiler release:

Questions about Raku

Meanwhile on Twitter

Meanwhile on Facebook

Alas, the Perl 6 Facebook group has still not been renamed back, so deeplinks will break soon. So, if you’re interested in developments there, please navigate from the Perl 6 group homepage.

Meanwhile on perl6-users

Comments about Raku

New Raku Modules

Updated Raku Modules

Winding down

A quiet week. And alas, still no 2019.11 Rakudo Compiler release of the Raku Programming Language. The final, really final, final tweaks are being done as this is written. More news about this in the next Rakudo Weekly. See you then!

my Timotimo \this: Introducing: The Heap Snapshot UI

Published by Timo Paulssen on 2019-10-25T23:12:36

Introducing: The Heap Snapshot UI

Hello everyone! In the last report I said that just a little bit of work on the heap snapshot portion of the UI should result in a useful tool.

Introducing: The Heap Snapshot UI
Photo by Sticker Mule / Unsplash

Here's my report for the first useful pieces of the Heap Snapshot UI!

Last time you already saw the graphs showing how the number of instances of a given type or frame grow and shrink over the course of multiple snapshots, and how new snapshots can be requested from the UI.

The latter now looks a little bit different:

Introducing: The Heap Snapshot UI

Each snapshot now has a little button for itself, they are in one line instead of each snapshot having its own line, and the progress bar has been replaced with a percentage and a little "spinner".

There are multiple ways to get started navigating the heap snapshot. Everything is reachable from the "Root" object (this is the norm for reachability-based garbage collection schemes). You can just click through from there and see what you can find.

Another way is to look at the Type & Frame Lists, which show every type or frame along with the number of instances that exist in the heap snapshot, and the total size taken up by those objects.

Type & Frame Lists

Introducing: The Heap Snapshot UI

Clicking on a type, or the name or filename of a frame leads you to a list of all objects of that type, all frames with the given name, or all frames from the given file. They are grouped by size, and each object shows up as a little button with the ID:

Introducing: The Heap Snapshot UI

Clicking any of these buttons leads you to the Explorer.


Here's a screenshot of the explorer to give you an idea of how the parts go together that I explain next:

Introducing: The Heap Snapshot UI

The explorer is split into two identical panels, which allows you to compare two objects, or to explore in multiple directions from one given object.

There's an "Arrow to the Right" button on the left pane and an "Arrow to the Left" button on the right pane. These buttons make the other pane show the same object that the one pane currently shows.

On the left of each pane there's a "Path" display. Clicking the "Path" button in the explorer will calculate the shortest path to reach the object from the root. This is useful when you've got an object that you would expect to have already been deleted by the garbage collector, but for some reason is still around. The path can give the critical hint to figure out why it's still around. Maybe one phase of the program has ended, but something is still holding on to a cache that was put in as an optimization, and that still has your object in it? That cache in question would be on the path for your object.

The other half of each panel shows information about the object: Displayed at the very top is whether it is an object, a type object, an STable, or a frame.

Below that there is an input field where you can enter any ID belonging to a Collectable (the general term encompassing types, type objects, stables, and frames) to have a look.

The "Kind" field needs to have the number values replaced with human-readable text, but it's not the most interesting thing anyway.

The "Size" of the Collectable is split into two parts. One is the fixed size that every instance of the given type has. The other is any extra data an instance of this type may have attached to it, that's not a Collectable itself. This would be the case for arrays and hashes, as well as buffers and many "internal" objects.

Finally, the "References" field shows how many Collectables are referred to by the Collectable in question (outgoing references) and how many Collectables reference this object in question.

Below that there are two buttons, Path and Network. The former was explained further above, and the latter will get its own little section in this blog post.

Finally, the bottom of the panel is dedicated to a list of all references - outgoing or incoming - grouped by what the reference means, and what type it references.

Introducing: The Heap Snapshot UI

In this example you see that the frame of the function display from elementary2d.p6 on line 87 references a couple of variables ($_, $tv, &inv), the frame that called this frame (step), an outer frame (MAIN), and a code object. The right pane shows the incoming references. For incoming references, the name of the reference isn't available (yet), but you can see that 7 different objects are holding a reference to this frame.

Network View

The newest part of the heap snapshot UI is the Network View. It allows the user to get a "bird's eye" view of many objects and their relations to each other.

Here's a screenshot of the network view in action:

Introducing: The Heap Snapshot UI

The network view is split into two panes. The pane on the left lists all types present in the network currently. It allows you to give every type a different symbol, a different color, or optionally make it invisible. In addition, it shows how many of each type are currently in the network display.

The right pane shows the objects, sorted by how far they are from the root (object 0, the one in Layer 0, with the frog icon).

Each object has one three-piece button. On the left of the button is the icon representing the type, in the middle is the object ID for this particular object, and on the right is an icon for the "relation" this object has to the "selected" object:

This view was generated for object 46011 (in layer 4, with a hamburger as the icon). This object gets the little "map marker pin" icon to show that it's the "center" of the network. In layers for distances 3, 2, and 1 there is one object each with a little icon showing two map marker pins connected with a squiggly line. This means that the object is part of the shortest path to the root. The third kind of icon is an arrow pointing from the left into a square that's on the right. Those are objects that refer to the selected object.

There is also an icon that's the same but the arrow goes outwards from the square instead of inwards. Those are objects that are referenced by the selected object. However, there is currently no button to have every object referenced by the selected object put into the network view. This is one of the next steps I'll be working on.

Customizing the colors and visibility of different types can give you a view like this:

Introducing: The Heap Snapshot UI

And here's a view with more objects in it:

Introducing: The Heap Snapshot UI

Interesting observations from this image:

Next Steps

You have no doubt noticed that the buttons for collectables are very different between the network view and the type/frame lists and the explorer. The reason for that is that I only just started with the network view and wanted to display more info for each collectable (namely the icons to the left and right) and wanted them to look nicer. In the explorer there are sometimes thousands of objects in the reference list, and having big buttons like in the network view could be difficult to work with. There'll probably have to be a solution for that, or maybe it'll just work out fine in real-world use cases.

On the other hand, I want the colors and icons for types to be available everywhere, so that it's easier to spot common patterns across different views and to mark things you're interested in so they stand out in lists of many objects. I was also thinking of a "bookmark this object" feature for similar purposes.

Before most of that, the network viewer will have to become "navigable", i.e. clicking on an object should put it in the center, grab the path to the root, grab incoming references, etc.

There also need to be ways to handle references you're not (or no longer) interested in, especially when you come across an object that has thousands of them.

But until then, all of this should already be very useful!

Here's the section about the heap snapshot profiler from the original grant proposal:

Looking at the list, it seems like the majority of intended features are already available or will be very soon!

Easier Installation

Until now the user had to download nodejs and npm along with a whole load of javascript libraries in order to compile and bundle the javascript code that powers the frontend of moarperf.

Fortunately, it was relatively easy to get travis-ci to do the work automatically and upload a package with the finished javascript code and the backend code to github.

You can now visit the releases page on github to grab a tarball with all the files you need! Just install all backend dependencies with zef install --deps-only . and run service.p6!

And with that I'm already done for this report!

It looks like the heap snapshot portion of the grant is quite a bit smaller than the profiler part, although a lot of work happened in moarvm rather than the UI. I'm glad to see rapid progress on this.

I hope you enjoyed this quick look at the newest pieces of moarperf!
  - Timo

Weekly changes in and around Perl 6: 2019.42 Question

Published by liztormato on 2019-10-21T14:14:51

The Perl 6 Weekly has been renamed to the Rakudo Weekly and can be found at:

Thank you for visiting the Perl 6 Weekly the past years. It was a blast!

Please adjust your RSS readers, email notifications. Thank you!

Hope to see you there!

Weekly changes in and around Perl 6: 2019.41 New Wineskins

Published by liztormato on 2019-10-15T16:46:05

Larry Wall emerged, almost like a deus ex machina, to give his approval of changing the name of the Perl 6 Programming Language to “Raku”. This stirred up quite some reactions on the interwebs:

Also see quite some reactions on Twitter below.

London Perl Workshop

Next weekend, on Saturday 19 October, it’s time for the London Perl Workshop again. The following presentations with Perl 6 content appear to have been planned:

An excellent opportunity to learn more about Perl 6 / Raku. And possibly the last time for a pre-Brexit visit to the UK!

Even More Video Tutorials

Yanzhan Yang, a serial tech video uploader, yet again has posted more Perl 6 introductory videos on YouTube:

Getting more amazing every week!

Andrew Shitov

It was a stressful week for Andrew Shitov: at first he was worried about the future of his Perl 6 compiler book. Then, after Larry Wall showed his approval of the renaming, he suggested having a RakuCon in May 2020 on Cyprus at the venue previously intended for the 2020 Perl Conference. Quickly followed by a blog post explaining how he sees the future of Raku, including building a new, faster Raku compiler (/r/perl6 comments). Followed by the publication of the first Raku book: Using Raku and making that available for free download (/r/perl6 comments). And to top it off, produced a little tutorial about the difference between is rw and is raw. Yours truly can only hope that future weeks will be less stressful.

Perl Weekly Challenge #29

Blog posts with Perl 6 solutions for Challenge #29:

Challenge #30 is up for your perusal.

Questions about Perl 6

Meanwhile on Twitter

Meanwhile on Facebook

Thanks to the quickly unrepairable actions of a certain individual outside of the Perl 6 community, the Perl 6 Facebook group has changed all of its URLs and/or is only accessible to people with Facebook logins. And since it is the intention of changing back to the original URLs (which will most likely take a month thanks to Facebook policies), it doesn’t seem to make sense to put deep links here now.

So, if you’re interested in happenings on Facebook, check out the Perl 6 Group, and navigate from there.

Meanwhile on perl6-users

Perl 6 in comments

Perl 6 Modules

New modules:

Updated modules:

Winding Down

The past year has been very stressful for yours truly. And the past week even more so. Some stress will remain in the near future, but it looks like stress levels will be allowed to go down in the further future, with “Perl” and “Raku” each going their own way.

This is the last Perl 6 Weekly that yours truly will write. Next week I will just announce the location of the new Rakudo Weekly blog here.

Why “Rakudo” and not “Raku”, you might ask? Well, originally the “Perl 6 Weekly” was about all implementations of Perl 6. But for the past 4 years, it has effectively been only about the “Rakudo” implementation. Now that “Perl 6” is being renamed to “Raku”, it seems like a good opportunity to not squat on the language name in a blog that is effectively dedicated to a single implementation of the Raku Programming Language.

So see you next week for directions to the new blog!

Weekly changes in and around Perl 6: 2019.40 Quick Syntaxing

Published by liztormato on 2019-10-07T14:23:14

JJ Merelo is the proud writer of the latest Perl 6 book: Perl 6 Quick Syntax Reference: A Pocket Guide to the Language, the Core Modules, and the Community. A book packed with useful information and a must-have for any developer new to Perl 6. Highly recommended! (Safari).

Sub as method

Sterling Hanenkamp shows how to use a sub as a method (/r/perl6 comments).

Perspective and FALLBACK

Greg Donald has published two blog posts in the past week: a personal one about his perspective as an outsider, and a technical one about adding an attribute to a class at runtime.

Even More Video Tutorials

Yanzhan Yang, a serial tech video uploader, yet again has posted more Perl 6 introductory videos on YouTube:

Getting more amazing every week!

Perl Weekly Challenge #28

Blog posts with Perl 6 solutions for Challenge #28:

Challenge #29 is up for your perusal.

Core Developments

Questions about Perl 6

Meanwhile on Twitter

Meanwhile on Facebook

Perl 6 in comments

Perl 6 Modules

New modules:

Updated modules:

Winding Down

A quiet week, with one more week to go on voting on the rename of Perl 6. See you all next week with more news about the Perl 6 Programming Language!

Weekly changes in and around Perl 6: 2019.39 With A Lump

Published by liztormato on 2019-09-30T18:56:54

Jonathan Worthington explains why he got a lump in his throat when he approved changing the name of “Perl 6” to “Raku”. Feelings that yours truly (like many others) only can share. (/r/perl, /r/perl6 comments). Andrew Shitov also shared his feelings about renaming the Perl 6 Programming Language (/r/perl6, Facebook comments).

Video Tutorials

Yanzhan Yang, a serial tech video uploader, has posted some more Perl 6 introductory videos on YouTube:


Faster Continuous Integration

Tony O’Dell has created a walk through on how to do Continuous Integration with Circle CI and Travis CI, using a Docker image.

The power of base

Shred_Alert describes the magic of base.

Perl Weekly Challenge #27

Blog posts with Perl 6 solutions for Challenge #27:

Simon Proctor is the champion of week 27! And as usual, Challenge #28 is up for your perusal.

Core Developments

Questions about Perl 6

Meanwhile on Twitter

Meanwhile on Facebook

Meanwhile on perl6-users

Perl 6 in comments

Perl 6 Modules

New modules:

Updated modules:

Winding Down

It appears there will be no Rakudo 2019.09 compiler release. Which makes sense since it’s almost October. Check out the Perl 6 Weekly next week for more news about this and many other things!

Weekly changes in and around Perl 6: 2019.38 For Else Itch

Published by liztormato on 2019-09-23T15:27:14

Damian Conway had an itch, and he scratched it in “Itch.scratch()“. An extensive treatise on how to extend the Perl 6 Programming Language, giving the for loop an else block to be executed only if no iterations were done in the for loop. In less than 25 lines! (Reddit comments).

Video Tutorials

Yanzhan Yang has posted a number of Perl 6 introductory videos on YouTube, maybe the first of many to come:

Yanzhan Yang appears to be a serial tech video uploader (Reddit comments).

Atomic Units

Steve Roe explains his thoughts on seamlessly supporting atomic units in the Physics::Measure Perl 6 module.

Perl 6 at LPW

These presentations about Perl 6 are currently planned to be given at the next London Perl Workshop:

It’s not too late to submit your presentation!

Perl Foundation News

The Perl Foundation is nominating Pete Krawczyk as Treasurer. Many thanks to Dan Wright for having filled this position for so many years.

And there is a grant proposal for curating the Perl 6 documentation, a continuation of earlier work by JJ Merelo.

Please leave your comments, if you have any of course!

What’s in a name?

Sven Gregori investigated several open source projects with naming issues, Perl 6 just being one of them (/r/perl, /r/perl6 comments).

Perl Weekly Challenge #26

Blog posts with Perl 6 solutions for Challenge #26:

Yet Ebreo is the champion of week 25! And as usual, Challenge #27 is up for your perusal.

Core Developments

Questions about Perl 6

Meanwhile on Twitter

Meanwhile on Facebook

Meanwhile on perl6-users

Perl 6 in comments

Perl 6 Modules

New modules (some of them were missed previously because of not having been uploaded to CPAN):

Updated modules:

Winding Down

A quiet week yet again, while work has started on finalizing the next Rakudo compiler release. Hope to be able to report on that next week. Until then, program safely and have fun!

p6steve: Atomic Units?

Published by p6steve on 2019-09-17T20:49:39

One of the most exciting parts of blogging about and hacking on perl6* is that there’s a community out there and there’s (always) more than one way to do it!

For Physics::Measure I have been working on a design for a ‘nano-slang’ that can provide a shortcut for the usual new Class declaration… quite long winded for my target audience of physics undergraduates and high school students.

#Instances the usual way

my Unit $u .=new(name => ‘m’, unitsof => ‘Distance’); #Unit m

my Distance $a .=new(value => 10, units => $u);           #Distance 10 m

So, duly inspired by Rakudo perl6 going atomic ⚛ applying unicode to some threading constructs, I started with the notion of the ‘libra’ operator ♎ as shorthand to declare and load Measure instances.

#Introducing the libra operator ♎ as shorthand to declare and load

my $b ♎ ’50 m’;   #Distance 50 m

$b ♎ ‘3 yards’;     #Distance 3 yards

As you can see, the gap created between ♎ and ; is a slang zone that can consume strings and numeric literals. Here’s something a bit more elaborate:

#Normalization with the .norm method

my $p ♎ ’27 kg m^2 / s^3′;   #Power 27 kg m^2 / s^3

$p .= norm;                              #Power 27 W

A few design ideas drew me in this direction:

# Resistance

[‘Ω’, ‘Ohm:s’,],       ’kg m^2 / A^2 s^3′,

[‘kilohm:s’,],          ’kilo Ohm’,

[‘megohm:s’,],       ’mega Ohm’,


Others have proposed a much more direct approach to generate and combine Measure objects – by the use of a simple postfix syntax – thank you!

Something like:

say 500g; # –> => 500, prefix => “”)

say 2kg;  # –> => 2000, prefix => “kg”)

Watch this space! Or even better zef Physics::Measure and give it a try…


* soon to be Rakudo?!

my Timotimo \this: Progressing with progress.

Published by Timo Paulssen on 2019-09-12T19:50:18

Progressing with progress.

It has been a while since the last progress report, hasn't it?

Over the last few months I've been focusing on the MoarVM Heap Snapshot Profiler. The new format that I explained in the last post, "Intermediate Progress Report: Heap Snapshots", is available in the master branch of MoarVM, and it has learned a few new tricks, too.

The first thing I usually did when opening a Heap Snapshot in the heapanalyzer (the older command-line based one) was to select a Snapshot, ask for the summary, and then for the top objects by size, top objects by count, top frames by size, and/or top frames by count to see if anything immediately catches my eye. In order to make more sense of the results, I would repeat those commands for one or more other Snapshots.

Snapshot  Heap Size          Objects  Type Objects  STables  Frames  References  
========  =================  =======  ============  =======  ======  ==========  
0         46,229,818 bytes   331,212  686           687      1,285   1,146,426   
25        63,471,658 bytes   475,587  995           996      2,832   1,889,612   
50        82,407,275 bytes   625,958  1,320         1,321    6,176   2,741,066   
75        97,860,712 bytes   754,075  1,415         1,416    6,967   3,436,141   
100       113,398,840 bytes  883,405  1,507         1,508    7,837   4,187,184   

Snapshot  Heap Size          Objects    Type Objects  STables  Frames  References  
========  =================  =========  ============  =======  ======  ==========  
125       130,799,241 bytes  1,028,928  1,631         1,632    9,254   5,036,284   
150       145,781,617 bytes  1,155,887  1,684         1,685    9,774   5,809,084   
175       162,018,588 bytes  1,293,439  1,791         1,792    10,887  6,602,449 

Realizing that the most common use case should be simple to achieve, I first implemented a command summary all and later a command summary every 10 to get the heapanalyzer to give the summaries of multiple Snapshots at once, and to be able to get summaries (relatively) quickly even if there's multiple hundreds of snapshots in one file.

Sadly, this still requires the parser to go through the entire file to do the counting and adding up. That's obviously not optimal, even though this is an Embarrassingly Parallel task, and it can use every CPU core in the machine you have, it's still a whole lot of work just for the summary.

For this reason I decided to shift the responsibility for this task to MoarVM itself, to be done while the snapshot is taken. In order to record everything that goes into the Snapshot, MoarVM already differentiates between Object, Type Object, STable, and Frame, and it stores all references anyway. I figured it shouldn't have a performance impact to just add up the numbers and make them available in the file.

The result is that the summary table as shown further above is available only milliseconds after loading the heap snapshot file, rather than after an explicit request and sometimes a lengthy wait period.

The next step was to see if top objects by size and friends could be made faster in a similar way.

I decided that adding an optional "statistics collection" feature inside of MoarVM's heap snapshot profiler would be worthwhile. If it turns out that the performance impact of summing up sizes and counts on a per-type and per-frame basis makes capturing a snapshot too slow, it could be turned off.

Frontend work

> snapshot 50
Loading that snapshot. Carry on...
> top frames by size
Wait a moment, while I finish loading the snapshot...

Name                                  Total Bytes    
====================================  =============  
finish_code_object (World.nqp:2532)   201,960 bytes  
moarop_mapper (QAST.nqp:1764)         136,512 bytes  
!protoregex (QRegex.nqp:1625)         71,760 bytes   
new_type (Metamodel.nqp:1345)         40,704 bytes   
statement (Perl6-Grammar.nqp:951)     35,640 bytes   
termish (Perl6-Grammar.nqp:3641)      34,720 bytes   
<anon> (Perl6-BOOTSTRAP.c.nqp:1382)   29,960 bytes   
EXPR (Perl6-Grammar.nqp:3677)         27,200 bytes   
<mainline> (Perl6-BOOTSTRAP.c.nqp:1)  26,496 bytes   
<mainline> (NQPCORE.setting:1)        25,896 bytes   
EXPR (NQPHLL.nqp:1186)                25,760 bytes   
<anon> (<null>:1)                     25,272 bytes   
declarator (Perl6-Grammar.nqp:2189)   23,520 bytes   
<anon> (<null>:1)                     22,464 bytes   
<anon> (<null>:1)                     22,464 bytes   

Showing the top objects or frame for a single snapshot is fairly straight-forward in the commandline based UI, but how would you display how a type or frame develops its value across many snapshots?

Instead of figuring out the best way to display this data in the commandline, I switched focus to the Moarperf Web Frontend. The most obvious way to display data like this is a Line Graph, I believe. So that's what we have now!

Progressing with progress.

And of course you also get to see the data from each snapshot's Summary in graph format:

Progressing with progress.

And now for the reason behind this blog post's Title.

Progress Notifications

Using Jonathan's module Concurrent::Progress (with a slight modification) I sprinkled the code to parse a snapshot with matching calls of .increment-target and .increment. The resulting progress reports (throttled to at most one per second) are then forwarded to the browser via the WebSocket connection that already delivers many other bits of data.

The result can be seen in this tiny screencast:

Progressing with progress.

The recording is rather choppy because the heapanalyzer code was using every last drop of performance out of my CPU while it was trying to capture my screen.

There's obviously a lot still missing from the heap snapshot analyzer frontend GUI, but I feel like this is a good start, and even provides useful features already. The graphs for the summary data are nicer to read than the table in the commandline UI, and it's only in this UI that you can get a graphical representation of the "highscore" lists.

I think a lot of the remaining features will already be useful after just the initial pieces are in place, so a little work should go a long way.

Bits and Bobs

I didn't spend the whole time between the last progress report and now to work directly on the features shown here. Apart from Life Intervening™, I worked on fixing many frustrating bugs related to both of the profilers in MoarVM. I added a small subsystem I call VMEvents that allows user code to be notified when GC runs happen and other interesting bits from inside MoarVM itself. And of course I've been assisting other developers by answering questions and looking over their contributions. And of course there's the occasional video-game-development related experiment, for example with the GTK Live Coding Tool.

Finally, here's a nice little screencap of that same tool displaying a hilbert curve:

Progressing with progress.

That's already everything I have for this time. A lot has (had to) happen behind the scenes to get to this point, but now there was finally something to look at (and touch, if you grab the source code and go through the needlessly complicated build process yourself).

Thank you for reading and I hope to see you in the next one!
  - Timo

Jo Christian Oterhals: You’re right.

Published by Jo Christian Oterhals on 2019-08-25T18:51:13

You’re right. I’ll let the article stand as it is and reflect my ignorance as it was when I wrote it :-)

Jo Christian Oterhals: Perl 6 small stuff #21: it’s a date! …or: learn from an overly complex solution to a simple task

Published by Jo Christian Oterhals on 2019-07-31T13:23:17

Perl 6 small stuff #21: it’s a date! …or: learn from an overly complex solution to a simple task

This week’s Perl Weekly Challenge (#19) has two tasks. The first is to find all months with five weekends in the years from 1900 through 2019. The second is to program an implementation of word wrap using the greedy algorithm.

Both are pretty straight-forward tasks, and the solutions to them can (and should) be as well. This time, however, I’m also going to do the opposite and incrementally turn the easy solution into an unnecessarily complex one. Because in this particular case we can learn more by doing things the unnecessarily hard way. So this post will take a look at Dates and date manipulation in Perl 6, using PWC #19 task 1 as an example:

Write a script to display months from the year 1900 to 2019 where you find 5 weekends i.e. 5 Friday, 5 Saturday and 5 Sunday.

Let’s start by finding five-weekend months the easy way:

#!/usr/bin/env perl6
say join "\n", grep *.day-of-week == 5, map { |$_, 1 }, do 1900..2019 X 1,3,5,7,8,10,12;

The algorithm for figuring this out is simple. Given the prerequisite that there must be five occurrences of not only Saturday and Sunday but also Friday, you A) *must* have 31 days to cram five weekends into. And when you know that you’ll also see that B) the last day of the month MUST be a Sunday and C) the first day of the month MUST be a Friday (you don’t have to check for both; if A is true and B is true, C is automatically true too).

The code above implements B and employs a few tricks. You read it from right to left (unless you write it from left to right, like this… say do 1900..2019 X 1,3,5,7,8,10,12 ==> map { |$_, 1 } ==> grep *.day-of-week == 5 ==> join “\n”; )

Using the X operator I create a cross product of all the years in the range 1900–2019 and the months 1, 3, 5, 7, 8, 10, 12 (31-day months). In return I get a sequence containing all year-month pairs of the period.

The map function iterates through the Seq. There it instantiates a Date object. A little song and dance is necessary: As takes three unnamed integer parameters, year, month and day, I have to do something to what I have — a Pair with year and month. I therefore use the | operator to “explode” the pair into two integer parameters for year and month.

You can always use this for calling a sub routine with fixed parameters, using an array with parameter values rather than having separate variables for each parameter. The code below exemplifies usage:

my @list = 1, 2, 3;
sub explode-parameters($one, $two, $three) { 
…do something…
#traditional call 
explode-parameters(@list[0], @list[1], @list[2]);
# …or using | 

Back to the business at hand — the .grep filters out the months where the 1st is a Friday, and those are our 5 weekend months. So the output of the one-liner above looks something like this:


This is a solution as good as any, and if a solution was all we wanted, we could have stopped here. But using this task as an example I want to explore ways to utilise the Date class. Example: The one-liner above does the job, but strictly speaking it doesn’t output the months but the first day of those months. Correcting this is easy, because the Date class supports something called formatters and use the sprintf syntax. To do this you utilise the named parameter “formatter” when instantiating the object.

say join "\n", grep *.day-of-week == 5, map { |$_, 1, formatter => { sprintf "%04d/%02d", .year, .month } }, do 1900..2019 X 1,3,5,7,8,10,12;

Every time a routine pulls a stringified version of the date, the formatter object is invoked. In our case the output has been changed to…


Formatters are powerful. Look into them.

Now to the overly complex solution. This is the unthinking programmer’s solution, as we don’t suppose anything. The program isn’t told that 5 weekend months only can occur on 31 day months. It doesn’t know that the 1st of such months must be a Friday. All it knows is that if the last day of the month is not Sunday, it figures out the date of the last Sunday (this is not very relevant when counting three-day weekends, but could be if you want to find Saturday+Sunday weekends, or only Sundays).

#!/usr/bin/env perl6
my $format-it = sub ($self) {
sprintf "%04d month %02d", .year, .month given $self;
sub MAIN(Int :$from-year = 1900, Int :$to-year where * > $from-year = 2019, Int :$weekend-length where * ~~ 1..3 = 3) {
my $date-loop =$from-year, 1, 1, formatter => $format-it);
while ($date-loop.year <= $to-year) {
my $date = $date-loop.later(day => $date-loop.days-in-month);
$date = $date.truncated-to('week').pred if $ != 7;
my @weekend = do for 0..^$weekend-length -> $w {
$date.earlier(day => $w).weekday-of-month;
say $date-loop if ([+] @weekend) / @weekend == 5;
$date-loop = $date-loop.later(:1month);

This code can solve the task both for three day weekends, but also for weekends consisting of Saturday + Sunday, as well as only Sundays. You control that with the command line parameter weekend-length=[1..3].

This code finds the last Sunday of each month and counts whether it has occured five times that month. It does the same for Saturday (if weekend-length=2) and Friday (if weekend-length=3). Like this:

my @weekend = do for 0..^$weekend-length -> $w { 
$date.earlier(day => $w).weekday-of-month;

The code then calculcates the average weekday-of-month for these three days like this:

say $date-loop if ([+] @weekend) / @weekend == 5;

This line uses the reduction operator [+] on the @weekend list to find the sum of all elements. That sum is divided by the number of elements. If the result is 5, then you have a five day weekend.

As for fun stuff to do with the Date object:

.later(day|month|year => Int) — adds the given number of time units to the current date. There’s also an earlier method for subtracting.

.days-in-months — tells you how many days there are in the current month of the Date object. The value may be 31, 30, 29 (february, leap year) or 28 (february).

.truncated-to(week|month|day|year) — rolls the date back to the first day of the week, month, day or year.

.weekday-of-month — figures out what day of week the current date is and calculates how many of that day there has been so far in that month.

Apart from this you’ll see that I added the formatter in a different way this time. This is probably cleaner looking and easier to maintain.

In the end this post maybe isn’t about dates and date manipulation at all, but rather is a call for all of us to use the documentation even more. It’s often I think that Perl 6 should have a function for x, y or z — .weekday-of-month is one such example — and the documentation tells me that it actually does!

It’s very easy to pick up Perl 6 and program it as you would have programmed Perl 5 or other languages you know well. But the documentation has lots of info of things you didn’t have before and that will make programming easier and more fun when you’ve learnt about them.

I guess you don’t need and excuse to delve into the docs, but if you do the Perl Weekly Challenge is an excellent excuse for spending time in the docs!

Jo Christian Oterhals: You have several options besides do. You could use parenthesises instead, like this:

Published by Jo Christian Oterhals on 2019-08-01T07:34:44

You have several options besides do. You could use parenthesises instead, like this:

say join "\n", grep *.day-of-week == 5, map { |$_, 1 }, (1900..2019 X 1,3,5,7,8,10,12);

In this case I just thought the code looked better without parenthesises, so I chose to use do instead. The docs has a couple of sentences about this option here.

BTW, I chose the form blah — the colon variant — instead of also because I wanted to avoid parenthesises. This freedom to chose is the essence of Perl’s credo “There’s more than one way to do it” I guess.

Rant: This freedom of choice is the best thing with Perl 6, but it has a downside too. Code can be harder to understand if you’re not familiar with the variants another programmer has made. This is a part of the Perls’s reputation of being write-only languages.

It won’t happen, but personally I’d like to see someone analyse real-world Perl 6 code (public repositories on Github for instance) and figure what forms are used most. The analysis could have been used to clean house — figuring out what forms to keep and which should be removed. Perl 6 would become a smaller — and arguably easier — language while staying just as powerful as it is today.

p6steve: Mind the gap

Published by p6steve on 2019-07-28T20:38:23

Observant readers looking at the dateline on the left will have noticed a gap of nearly two years between my last blog and today. As a fascinated programming hobbyist, I have only limited time in the day to devote to coding – and a startup venture put the brakes on anything but the day job for a while.

In the meantime, I have had to choose between coding and blogging – and while the ideas have been flowing ahead, I am quite keen on the idea that ‘actions speak louder than words’.

The actions have covered:

So, today, I am delighted to announce the alpha release of Physics::Measure (try zef install[email protected] for now – hopefully zef install Physics::Measure within a day or two).

Hopefully the synopsis at makes sense. I encourage all feedback with an open mind…

Now back to the blogging!


gfldex: Tracing whats missing

Published by gfldex on 2019-07-07T21:35:09

I have a logfile of the following form that I would like to parse.

[ 2016.03.09 20:40:28 ] (MessageType) Some message text that depends on the <MessageType>

Since the form of the text depends on the message type I need a rule to identify the message type and a rule to parse the message body itself. To aid my struggle through the Grammar in question I use Grammar::Tracer from jnthn’s Grammer::Debugger module. It’s a fine module that will tell until where a match was OK and at which point the Grammar gave up parsing. In the case of a successful match it shows part of the substring that was successfully parsed. If parsing a rule or token fails it will tell you but wont show the offending string. The whole purpose of Grammar wrangling is to identify the bits that wont match and change the Grammar until they go away. Not showing the offending string is not overly helpful.

But fear not as Grammars are classes and as such can have methods. Let’s define one and add it to a chain of options.

method parse-fail {
    # self is a subclass of Grammar
    say self.postmatch.substr(0, 100);
    exit 0;

rule body-line { '[' <timestamp> ']' [ <body-notify> | <body-question> | <body-info> | <body-warning> || <parse-fail> ] }

So when none of the known message types match the Grammar stops and shows the string that still needs to be handled. With that I could parse all 8768 files until I got them all covered. Also this is much faster then running with Grammar::Tracer.

It seems to be very useful to have folk implement a language they would like to use to implement that language.

gfldex: Whatever whenever does

Published by gfldex on 2019-05-31T10:24:59

Jnthn answered the question why $*IN.lines blocks in a react block. What isn’t explained is what whenever actually does before it starts blocking.

react {
    whenever $*IN.lines { .say }

Looking at the syntax of a whenever block, we see that whenever takes a variable immediatly followed by a block. The only place where a structure like that can be defined is Grammar.nqp.

rule statement_control:sym<whenever> {
    || <?{
            nqp::getcomp('perl6').language_version eq '6.c'
          || $*WHENEVER_COUNT >= 0
    || <.typed_panic('X::Comp::WheneverOutOfScope')>
    { $*WHENEVER_COUNT++ }

Here the grammar just checks a few things without actually generating any code. So we head to Actions.nqp.

method statement_control:sym<whenever>($/) {
    my $xblock := $<xblock>.ast;
        :op<call>, :name<&WHENEVER>, :node($/),
        $xblock[0], block_closure($xblock[1])

The whenever block is converted to a call to sub WHENEVER which we find in Supply.pm6.

sub WHENEVER(Supply() $supply, &block) {

There we go. A whenever block takes its first argument of any type and calles .Supply on it, as long as Any is a parent of that type. In the case of $*IN that type will typically be whatever IO::Handle.lines returns.!LINES-ITERATOR($close))

To turn a Seq into a Supply Any.Supply calls self.list.Supply. Nowhere in this fairly long chain of method lookups (this can’t be fast) are there any threads to be found. If we want to fix this we need to sneak a Channel into $*IN.lines which does exactly that.

$*IN.^can('lines')[1].wrap(my method {
    my $channel =;
    start {
        for callsame() {
            last if $channel.closed;
        LEAVE $channel.close unless $channel.closed;

Or if we want to be explicit:

use Concurrent::Channelify;

react {
    whenever signal(SIGINT) {
        say "Got signal";
    whenever $*IN.lines⇒ {
        say "got line";

We already use ⚛ to indicate atomic operations. Maybe using prefix:<∥> to indicate concurrency makes sense. Anyway, we went lucky once again that Rakudo is implemented (mostly) in Perl 6 so we can find out where we need to poke it whenever we want to change it.

my Timotimo \this: A Close Look At Controlling The MoarVM Profiler

Published by Timo Paulssen on 2019-05-22T14:41:13

A Close Look At Controlling The MoarVM Profiler

This is slightly tangential to the Rakudo Perl 6 Performance Analysis Tooling grant from The Perl Foundation, but it does interact closely with it, so this is more or less a progress report.

The other day, Jonathan Worthington of Edument approached me with a little paid Job: Profiling very big codebases can be tedious, especially if you're interested in only one specific fraction of the program. That's why I got the assignment to make the profiler "configurable". On top of that, the Comma IDE will get native access to this functionality.

The actual request was just to allow specifying whether individual routines should be included in the profile via a configuration file. That would have been possible with just a simple text file that contains one filename/line number/routine name per line. However, I have been wanting something in MoarVM that allows much more involved configuration for many different parts of the VM, not just the profiler.

A Close Look At Controlling The MoarVM Profiler
Obligatory cat photo.

That's why I quickly developed a small and simple "domain-specific language" for this and similar purposes.

The language had a few requirements:

There's also some things that aren't necessary:

While thinking about what exactly I should build – before I eventually settled on building a "programming language" for this task – I bounced back and forth between the simplest thing that could possibly work (for example, a text file with a list of file/line/name entries) and the most powerful thing that I can implement in a sensible timeframe (for example, allowing a full NQP script). A very important realization was that as long as I require the first line to identify what "version" of configuration program it is, I could completely throw away the current design and put something else instead, if the need ever arises. That allowed me to actually commit to a design that looked at least somewhat promising. And so I got started on what I call confprog.

Here's an example program. It doesn't do very much, but shows what it's about in general:

version = 1
entry profiler_static:
log =;
profile = eq "calculate-strawberries"
profile |= eq "CustomCode.pm6"

The entry decides which stage of profiling this program is applied to. In this case, the profiler_static means we're seeing a routine for the first time, before it is actually entered. That's why only the information every individual invocation of the frame in question shares is available via the variable sf, which stands for Static Frame. The Static Frame also allows access to the Compilation Unit (cu) that it was compiled into, which lets us find the filename.

The first line that actually does something assigns a value to the special variable log. This will output the name of the routine the program was invoked for.

The next line will turn on profiling only if the name of the routine is "calculate-strawberries". The line after that will also turn on profiling if the filename the routine is from is "CustomCode.pm6".

Apart from profiler_static, there are a couple more entry points available.

The syntax is still subject to change, especially before the whole thing is actually in a released version of MoarVM.

There is a whole lot of other things I could imagine being of interest in the near or far future. One place I'm taking inspiration from is where "extended Berkeley Packet Filter" (eBPF for short) programs are being used in the linux kernel and related pieces of software:

Oversimplifying a bit, BPF was originally meant for tcpdump so that the kernel doesn't have to copy all data over to the userspace process so that the decision what is interesting or not can be made. Instead, the kernel receives a little piece of code in the special BPF language (or bytecode) and can calculate the decision before having to copy anything.

eBPF programs can now also be used as a complex ruleset for sandboxing processes (with "seccomp"), to decide how network packets are to be routed between sockets (that's probably for Software Defined Networks?), what operations a process may perform on a particular device, whether a trace point in the kernel or a user-space program should fire, and so on.

So what's the status of confprog? I've written a parser and compiler that feeds confprog "bytecode" (which is mostly the same as regular moarvm bytecode) to MoarVM. There's also a preliminary validator that ensures the program won't do anything weird, or crash, when run. It is much too lenient at the moment, though. Then there's an interpreter that actually runs the code. It can already take an initial value for the "decision output value" (great name, isn't it) and it will return whatever value the confprog has set when it runs. The heap snapshot profiler is currently the only part of MoarVM that will actually try to run a confprog, and it uses the value to decide whether to take a snapshot or not.

Next up on the list of things to work on:

Apart from improvements to the confprog programming language, the integration with MoarVM lacks almost everything, most importantly installing a confprog for the profiler to decide whether a frame should be profiled (which was the original purpose of this assignment).

After that, and after building a bit of GUI for Comma, the regular grant work can resume: Flame graphs are still not visible on the call graph explorer page, and heap snapshots can't be loaded into moarperf yet, either.

Thanks for sticking with me through this perhaps a little dry and technical post. I hope the next one will have a little more excitement! And if there's interest (which you can signal by sending me a message on irc, or posting on reddit, or reaching me via twitter @loltimo, on the Perl 6 discord server etc) I can also write a post on how exactly the compiler was made, and how you can build your own compiler with Perl 6 code. Until then, you can find Andrew Shitov's presentations about making tiny languages in Perl 6 on youtube.

I hope you have a wonderful day; see you in the next one!
  - Timo

PS: I would like to give a special shout-out to Nadim Khemir for the wonderful Data::Dump::Tree module which made it much easier to see what my parser was doing. Here's some example output from another simple confprog program:

[6] @0
0 = .Label .Node @1
│ ├ $.name = heapsnapshot.Str
│ ├ $.type = entrypoint.Str
│ ├ $.position is rw = Nil
│ └ @.children = [0] @2
1 = .Op .Node @3
│ ├ $.op = =.Str
│ ├ $.type is rw = Nil
│ └ @.children = [2] @4
│   ├ 0 = .Var .Node @5
│   │ ├ $.name = log.Str
│   │ └ $.type = CPType String :stringy  @6
│   └ 1 = String Value ("we're in") @7
2 = .Op .Node @8
│ ├ $.op = =.Str
│ ├ $.type is rw = Nil
│ └ @.children = [2] @9
│   ├ 0 = .Var .Node @10
│   │ ├ $.name = log.Str
│   │ └ $.type = CPType String :stringy  §6
│   └ 1 = .Op .Node @12
│     ├ $.op = getattr.Str
│     ├ $.type is rw = CPType String :stringy  §6
│     └ @.children = [2] @14
│       ├ 0 = .Var .Node @15
│       │ ├ $.name = sf.Str
│       │ └ $.type = CPType MVMStaticFrame  @16
│       └ 1 = name.Str
3 = .Op .Node @17
│ ├ $.op = =.Str
│ ├ $.type is rw = Nil
│ └ @.children = [2] @18
│   ├ 0 = .Var .Node @19
│   │ ├ $.name = log.Str
│   │ └ $.type = CPType String :stringy  §6
│   └ 1 = String Value ("i am the confprog and i say:") @21
4 = .Op .Node @22
│ ├ $.op = =.Str
│ ├ $.type is rw = Nil
│ └ @.children = [2] @23
│   ├ 0 = .Var .Node @24
│   │ ├ $.name = log.Str
│   │ └ $.type = CPType String :stringy  §6
│   └ 1 = String Value ("  no heap snapshots for you my friend!") @26
5 = .Op .Node @27
$.op = =.Str
$.type is rw = Nil
@.children = [2] @28
0 = .Var .Node @29
    │ ├ $.name = snapshot.Str
    │ └ $.type = CPType Int :numeric :stringy  @30
1 = Int Value (0) @31

Notice how it shows the type of most things, like name.Str, as well as cross-references for things that appear multiple times, like the CPType String. Particularly useful is giving your own classes methods that specify exactly how they should be displayed by DDT. Love It!

gfldex: Nil shall warn or fail but not both

Published by gfldex on 2019-05-14T20:57:37

As announced earlier I went to write a module to make Nil.list behave a little better. There are basicly two way Nil could be turned into a list. One should warn the same way as Nil.Str does and the other should end the program loudly. Doing both at the same time however does not make sense.

There are a few ways this could be done. One is augmenting Nil with a list method and have this method check a dynamic variable to pick the desired behaviour. That would be slow and might hurt if Nil.list is called in a loop. The other is by using a custom sub EXPORT and a given switch.

# lib/NoNilList/Warning.pm6
use NoNilList 'Warning';
# lib/NoNilList/Fatal.pm6
use NoNilList 'Fatal';
# lib/NoNilList.pm6

sub EXPORT($_?) {
    given $_ {
        when 'Warning' {
             # augment Nil with a warning .list
        when 'Fatal' {
             # augment Nil with a failing .list
        default {
            die 'Please use NoNilList::Warning or NoNilList::Fatal.';

    %() # Rakudo complains without this

Now use NoNilList; will yield a compile time error with a friedly hint how to avoid it.

I left the augmenting part out because it does not work. I thought I stepped on #2779 again but was corrected that this is acually a different bug. Jnthn++ fixed part of that new bug (Yes, Perl 6 bugs are so advanced they come in multiple parts.) and proposed the use of the MOP instead. That resulted in #2897. The tricky bit is that I have to delay augmentation of Nil to after the check on $_ because augment is a declarator and as such executed at compile time — in a module that can be months before the program starts to run. Both an augment in an EVAL string and the MOP route would lead there. I wanted to use this module as my debut on 6PAN. That will have to wait for another time.

If you find a bug please file it. It will lead to interresting discoveries for sure.

Strangely Consistent: Refactoring the universe

Published by Carl Mäsak

I'm here to share a thing I'm working on, and chew gum; and I'm all out of gum. The purpose of this post is both to break a disagreeable silence that has befallen this blog, and to be able to geek out about a niche topic here in writing, partially sparing friends and family.

I'm currently refactoring 007's type system in a branch. Basically, ever since 007 was created, a type in 007-land has corresponded to a class/role in Perl 6, the host system that's implementing 007.

Here's what the implementation of Int looks like currently:

class Val::Int does Val {
    has Int $.value;

    method truthy {

And here's what it looks like as I'm doing the refactor:

constant TYPE is export = {};

    # ...
    TYPE<Int> = make-type "Int", :backed;
    # ...

So, instead of corresponding to types on the host level, all the 007 types are about to correspond to values. The former implementation was the one that felt obvious at the time (four-plus years ago), but it's become blindingly, painstakingly obvious that it really needs to be the latter.

Here's why: as soon as you want to implement class declarations in 007, in the former model you also need to bend over backwards and come up with an entirely new type in the host system. The Perl 6 code to do that looks like this:

return $ qq[class :: \{
    method attributes \{ () \}
    method ^name(\$) \{ "{$name}" \}

Which is... even someone as EVAL-positive as I wishes for a less clunky solution.

In the new model, a new class comes down to calling make-type and dropping the result in that TYPE hash. (Wait. Or not even dropping it in the TYPE hash. That hash is only for things used by 007 itself, not for user types.)

This is a refactor I've tried once before, back in 2017, but I failed back then because the code got too complicated and ran too slow. This time around I have a much better feeling.

By the way, there's also an is-type subroutine, and similarly a make-int and an is-int subroutine, and so on for every registered type. I figure why not wrap those simple things up in very short function names. So far that turns out to have been a very good decision. "Fold the language of your domain model into your code", and so on.

This is one of the things I'm doing better this time around; last time one of the problems was that each line I touched got longer and messier because there were more layers of indirection to dig through. Concerns were scattered all over the place. This time, it feels like the codebase is getting simpler thanks to those named subroutines. Maybe it can be likened to putting all your database-specific code in one place.

I sometimes get slight vertigo due to the bootstrapping aspects of this type system. One example: Object is an instance of Type, but the base class of Type is Object — a circularity. But, it turns out, if you have absolute power over the object graph, you can always bend things to your will:

    TYPE<Type> =, slots => { name => "Type" });
    TYPE<Object> = make-type "Object";
        # Bootstrap: now that we have Object, let's make it the base of Type and Object
        TYPE<Type>.slots<base> = TYPE<Object>;
        TYPE<Object>.slots<base> = TYPE<Object>;
    # ...

I'm slightly reminded of a thing Gilad Bracha wrote once (which I can't find the exact quote for, unfortunately): that if mutual dependencies and recursive definitions are something that stump you, what you need is a healthy dose of letrec. It's twisty, yes, but it's basically a solved problem.

Like last time, I'm tackling the big task in small steps, one type at a time. I feel I've learned this from Martin Fowler's concept of asset capture. The idea is to end up back with a running system with passing tests often. I do this by replacing one old thing at a time by a new thing. Sounds obvious, but I'm actually not sure I would have been sensible enough on my own to tackle it this way, had I not known about asset capture.

One drawback is that you're sort of running the old system and the new system in parallel, as the old one is being phased out. Only once the whole thing has been asset-captured can complexity associated with the old system be completely removed.

It's a pleasant way to work. To me it's been at least a partial answer to the problem of the big rewrite. If we're free to refactor the insides, we can successively arrive at a point where the new better thing has completely replaced the old thing. The way there is allowed to be a little bit more complex (on the inside) than either endpoint. Importantly, you keep a running system throughout.

I don't have a concluding thought, except to say that I just managed to asset-capture arrays. Which is harder than it sounds, because arrays are everywhere in the compiler and the runtime.

gfldex: MONKEY see no Nil

Published by gfldex on 2019-05-03T23:14:50

In a for loop Nil is turned into a List with one Element that happens to be Any. This really buged me so I went to find out why. As it turns out the culprit is the very definition of Nil is Cool. To be able to turn any single value into a List Cool implements method list(). Which takes a single values and turns that value into a List with that one value. Nil indicates the absense of a value and turning it into a value doesn’t make sense. Luckily we can change that.


augment class Nil {
    method list() {
        note 'Trying to turn Nil into a list.';


sub niler() { Nil }

for niler() { say 'oi‽' }

We can’t just warn because that would show the wrong point in the stack trace. So we note (which also goes to $*ERR) and pull the last value out of the Backtrace.

Interestingly Failure throws both in .list and in .iterator. Nil implements push, append, unshift and prepend by immediatly die-ing. Adding more to nothing is deadly but turning nothing first into something vaguely undefined and then allowing to add more stuff to it is inconsistent at best. What leads me to believe that Nil.list as it is specced today is just an oversight.

At least I can now write a simple module to protect my code from surprising Nils.

gfldex: Parallel permutations

Published by gfldex on 2019-04-27T15:31:45

Jo Christian Oterhals asked for a parallel solution for challenge 2. I believe he had problems to find one himself, because his code sports quite a few for loops. By changing those to method call chains, we can use .hyper to run at lease some code concurrently.

use v6.d;

constant CORES = $*KERNEL.cpu-cores;

# workaround for #1210
sub prefix:<[max]>(%h){ %h.sort(-*.value).first }

my %dict = "/usr/share/dict/words"{ .lc => True });

my %seen;
%dict.keys».&{ %seen{.comb.sort.join}++; };

with [max] %seen {
    say .value, .key.comb.hyper(:batch(1024), :degree(CORES)).permutations».join.grep({ %dict{$_}:exists }).Str

My approach is a little different then Jo’s. I don’t try to keep all combinations around but just count the anagrams for each entry in the word list. Then I find a word with the most anagrams (there are more candidates with the same amount that I skip) and reconstruct the anagrams for that word.

The only operation where any form of computation happens is the generation of permutations. Anything else is just too memory bound to get a boost by spinning up threads. With the .hyper-call the program is a tiny wee bit faster then with just one thread on my Threadripper box. A system with slow cores/smaller caches should benefit a little more. The main issue is that the entire word list fits into the 3rd level cache. With a bigger dataset a fast system might benefit as well.

In many cases multi-core systems are fairy dust, which makes the wallets of chip makers sparkle. Wrangling Hashs seams to be one of those. Rakudo Star Release 2019.03

Published on 2019-03-30T00:00:00

my Timotimo \this: Intermediate Progress Report: Heap Snapshots

Published by Timo Paulssen on 2019-03-22T22:22:00

Intermediate Progress Report: Heap Snapshots

Hello dear readers,

this time I don't have something finished to show off, but nevertheless I can tell you all about the work I've recently started.

The very first post on this blog already had a little bit about the heap snapshot profiler. It was about introducing a binary format to the heap snapshot profiler so that snapshots can be written out immediately after they were taken and by moarvm itself rather than by NQP code. This also made it possible to load snapshot data faster: the files contain an index that allows a big chunk of data to be split up exactly in the middle and then read and decoded by two threads in parallel.

The new format also resulted in much smaller output files, and of course reduced memory usage of turning on the profiler while running perl6 code. However, since it still captures one heap snapshot every time the GC runs, every ordinary program that runs for longer than a few minutes will accumulate quite an amount of data. Heap snapshot files (which are actually collections of multiple snapshots) can very easily outgrow a gigabyte. There would have to be another change.

Intermediate Progress Report: Heap Snapshots
Photo by Waldemar Brandt / Unsplash

Enter Compression

The new format already contained the simplest thing that you could call compression. Instead of simply writing every record to the file as it comes, records that have smaller numbers would be stored with a shorter representation. This saved a lot of space already, but not nearly as much as off-the-shelf compression techniques would.

There had to be another way to get compression than just coming up with my own compression scheme! Well, obviously I could have just implemented something that already exists. However, at the time I was discouraged by the specific requirements of the heap snapshot analyzer - the tool that reads the files to let the user interactively explore the data within it:

Normally, compression formats are not built to support easily seeking to any given spot in the uncompressed data. There was of course the possibility to compress each snapshot individually, but that would mean a whole snapshot could either only be read in with a single thread, or the compression would have to go through the whole blob and when the first splittable piece was decompressed, a thread could go off and parse it. I'm not entirely sure why I didn't go with that, perhaps I just didn't think of it back then. After all, it's already been more than a year, and my brain compresses data by forgetting stuff.

Anyway, recently I decided I'd try a regular compression format for the new moarvm heap snapshot file format. There's already a Perl 6 module named Compress::Zlib, which I first wanted to use. Writing the data out from moarvm was trivial once I linked it to zlib. Just replace fopen with gzopen, fwrite with gzwrite, fclose with gzclose and you're almost done! The compression ratio wasn't too shabby either.

When I mentioned this in the #moarvm channel on IRC, I was asked why I use zlib instead of zstd. After all, zstd usually (or always?) outperforms zlib in both compression/decompression speed and output size. The only answer I had for that was that I hadn't used the zstd C library yet, and there was not yet a Perl 6 module for it.

Figuring out zstd didn't go as smoothly as zlib, not by a long shot. But first I'd like to explain how I solved the issue of reading the file with multiple threads.

Restructuring the data

In the current binary format, there are areas for different kinds of objects that occur once per snapshot. Those are collectables and references. On top of that there are objects that are shared across snapshots: Strings that are referenced from all the other kinds of objects (for example filenames, or descriptions for references like "hashtable entry"), static frames (made up of a name like "push", an ID, a filename, and a line number), and types (made up of a repr name like VMArray, P6opaque, or CStruct and a type name like BagHash or Regex).

That resulted in a file format that has one object after the other in the given area. The heap snapshot analyzer itself then goes through these areas and splits the individual values apart, then shoves them off into a queue for another thread to actually store. Storage inside the heap analyzer consists of one array for each part of these objects. For example, there is one array of integers for all the descriptions and one array of integers for all the target collectables. The main benefit of that is not having to go through all the objects when the garbage collector runs.

The new format on the other hand puts every value of each attribute in one long string before continuing with the next attribute.

Here's how the data for static frames was laid out in the file in the previous version:

"sframes", count, name 1, uuid 1, file 1, line 1, name 2, uuid 2, file 2, line 2, name 3, uuid 3, file 3, line 3, … index data

The count at the beginning tells us how many entries we should be expecting. For collectables, types, reprs, and static frames this gives us the exact number of bytes to look for, too, since every entry has the same size. References on the other hand have a simple "compression" applied to them, which doesn't allow us to just figure out the total size by knowing the count. To offset this, the total size lives at the end in a place that can easily be found by the parser. Strings are also variable in length, but there's only a few hundred of them usually. References take up the most space in total; having four to five times as many references as there are collectables is pretty normal.

Here's how the same static frame data is laid out in the upcoming format:

"names", length, zstd(name 1, name 2, name 3, …), index data, "uuids", length, zstd(uuid 1, uuid 2, uuid 3, …), index data, "files", length, zstd(file 1, file 2, file 3, …), index data, "lines", length, zstd(line 1, line 2, line 3, …), index data

As you can see, the values for each attribute now live in the same space. Each attribute blob is compressed individually, each has a little piece of index data at the end and a length field at the start. The length field is actually supposed to hold the total size of the compressed blob, but if the heap snapshot is being output to a destination that doesn't support seeking (moving back in the file and overwriting an earlier piece of data) we'll just leave it zeroed out and rely on zstd's format being able to tell when one blob ends.

There are some benefits to this approach that I'd like to point out:

The last point, in fact, will let me put some extra fun into the files. First of all, I currently start the files with a "plain text comment" that explains what kind of file it is and how it's structured internally. That way, if someone stumbles upon this file in fifty years, they can get started finding out the contents right away!

On a perhaps more serious note, I'll put in summaries of each snapshot that MoarVM itself can just already generate while it's writing out the snapshot itself. Not only things like "total number of collectables", but also "top 25 types by size, top 25 types by count". That will actually make the heapanalyzer not need to touch the actual data until the user is interested in more specific data, like a "top 100" or "give me a thousand objects of type 'Hash'".

On top of that, why not allow the user to "edit" heap snapshots, put some comments in before sharing it to others, or maybe "bookmarking" specific objects?

All of these things will be easier to do with the new format - that's the hope at least!

Did the compression work?

I didn't actually have the patience to exhaustively measure all the details, but here's a rough ratio for comparison: One dataset I've got results in a 1.1 gigabytes big file with the current binary format, a 147 megabytes big file when using gzip and a 99 megabytes big file using zstd (at the maximum "regular" compression level - I haven't checked yet if the cpu usage isn't prohibitive for this, though).

It seems like this is a viable way forward! Allowing capture to run 10x as long is a nice thing for sure.

What comes next?

The profiler view itself in Moarperf isn't done yet, of course. I may not put in more than the simplest of features if I start on the web frontend for the heap analyzer itself.

On the other hand, there's another task that's been waiting: Packaging moarperf for simpler usage. Recently we got support for a relocatable perl6 binary merged into master. That should make it possible to create an AppImage of moarperf. A docker container should also be relatively easy to build.

We'll see what my actual next steps will be - or will have been I suppose - when I post the next update!

Thanks for reading and take care
  - Timo

brrt to the future: Reverse Linear Scan Allocation is probably a good idea

Published by Bart Wiegmans on 2019-03-21T15:52:00

Hi hackers! Today First of all, I want to thank everybody who gave such useful feedback on my last post.  For instance, I found out that the similarity between the expression JIT IR and the Testarossa Trees IR is quite remarkable, and that they have a fix for the problem that is quite different from what I had in mind.

Today I want to write something about register allocation, however. Register allocation is probably not my favorite problem, on account of being both messy and thankless. It is a messy problem because - aside from being NP-hard to solve optimally - hardware instruction sets and software ABI's introduce all sorts of annoying constraints. And it is a thankless problem because the case in which a good register allocator is useful - for instance, when there's lots of intermediate values used over a long stretch of code - are fairly rare. Much more common are the cases in which either there are trivially sufficient registers, or ABI constraints force a spill to memory anyway (e.g. when calling a function, almost all registers can be overwritten).

So, on account of this being not my favorite problem, and also because I promised to implement optimizations in the register allocator, I've been researching if there is a way to do better. And what better place to look than one of the fastest dynamic language implementations arround, LuaJIT? So that's what I did, and this post is about what I learned from that.

Truth be told, LuaJIT is not at all a learners' codebase (and I don't think it's author would claim this). It uses a rather terse style of C and lots and lots of preprocessor macros. I had somewhat gotten used to the style from hacking dynasm though, so that wasn't so bad. What was more surprising is that some of the steps in code generation that are distinct and separate in the MoarVM JIT - instruction selection, register allocation and emitting bytecode - were all blended together in LuaJIT. Over multiple backend architectures, too. And what's more - all these steps were done in reverse order - from the end of the program (trace) to the beginning. Now that's interesting...

I have no intention of combining all phases of code generation like LuaJIT has. But processing the IR in reverse seems to have some interesting properties. To understand why that is, I'll first have to explain how linear scan allocation currently works in MoarVM, and is most commonly described:

  1. First, the live ranges of program values are computed. Like the name indicates, these represent the range of the program code in which a value is both defined and may be used. Note that for the purpose of register allocation, the notion of a value shifts somewhat. In the expression DAG IR, a value is the result of a single computation. But for the purposes of register allocation, a value includes all its copies, as well as values computed from different conditional branches. This is necessary because when we actually start allocating registers, we need to know when a value is no longer in use (so we can reuse the register) and how long a value will remain in use -
  2. Because a value may be computed from distinct conditional branches, it is necessary to compute the holes in the live ranges. Holes exists because if a value is defined in both sides of a conditional branch, the range will cover both the earlier (in code order) branch and the later branch - but from the start of the later branch to its definition that value doesn't actually exist. We need this information to prevent the register allocator from trying to spill-and-load a nonexistent value, for instance.
  3. Only then can we allocate and assign the actual registers to instructions. Because we might have to spill values to memory, and because values now can have multiple definitions, this is a somewhat subtle problem. Also, we'll have to resolve all architecture specific register requirements in this step.
In the MoarVM register allocator, there's a fourth step and a fifth step. The fourth step exists to ensure that instructions conform to x86 two-operand form (Rather than return the result of an instruction in a third register, x86 reuses one of the input registers as the output register. E.g. all operators are of the form a = op(a, b)  rather than a = op(b, c). This saves on instruction encoding space). The fifth step inserts instructions that are introduced by the third step; this is done so that each instruction has a fixed address in the stream while the stream is being processed.

Altogether this is quite a bit of complexity and work, even for what is arguably the simplest correct global register allocation algorithm. So when I started thinking of the reverse linear scan algorithm employed by LuaJIT, the advantages became clear:
There are downsides as well, of course. Not knowing exactly how long a value will be live while processing it may cause the algorithm to make worse choices in which values to spill. But I don't think that's really a great concern, since figuring out the best possible value is practically impossible anyway, and the most commonly cited heuristic - evict the value that is live furthest in the future, because this will release a register over a longer range of code, reducing the chance that we'll need to evict again - is still available. (After all, we do always know the last use, even if we don't necessarily know the first definition).

Altogether, I'm quite excited about this algorithm; I think it will be a real simplification over the current implementation. Whether that will work out remains to be seen of course. I'll let you know!

brrt to the future: Something about IR optimization

Published by Bart Wiegmans on 2019-03-17T06:23:00

Hi hackers! Today I want to write about optimizing IR in the MoarVM JIT, and also a little bit about IR design itself.

One of the (major) design goals for the expression JIT was to have the ability to optimize code over the boundaries of individual MoarVM instructions. To enable this, the expression JIT first expands each VM instruction into a graph of lower-level operators. Optimization then means pattern-matching those graphs and replacing them with more efficient expressions.

As a running example, consider the idx operator. This operator takes two inputs (base and element) and a constant parameter scale and computes base+element*scale. This represents one of the operands of an  'indexed load' instruction on x86, typically used to process arrays. Such instructions allow one instruction to be used for what would otherwise be two operations (computing an address and loading a value). However, if the element of the idx operator is a constant, we can replace it instead with the addr instruction, which just adds a constant to a pointer. This is an improvement over idx because we no longer need to load the value of element into a register. This saves both an instruction and valuable register space.

Unfortunately this optimization introduces a bug. (Or, depending on your point of view, brings an existing bug out into the open). The expression JIT code generation process selects instructions for subtrees (tile) of the graph in a bottom-up fashion. These instructions represent the value computed or work performed by that subgraph. (For instance, a tree like (load (addr ? 8) 8) becomes mov ?, qword [?+8]; the question marks are filled in during register allocation). Because an instruction is always represents a tree, and because the graph is an arbitrary directed acyclic graph, the code generator projects that graph as a tree by visiting each operator node only once. So each value is computed once, and that computed value is reused by all later references.

It is worth going into some detail into why the expression graph is not a tree. Aside from transformations that might be introduced by optimizations (e.g. common subexpression elimination), a template may introduce a value that has multiple references via the let: pseudo-operator. See for instance the following (simplified) template:

(let: (($foo (load (local))))
    (add $foo (sub $foo (const 1))))

Both ADD and SUB refer to the same LOAD node

In this case, both references to $foo point directly to the same load operator. Thus, the graph is not a tree. Another case in which this occurs is during linking of templates into the graph. The output of an instruction is used, if possible, directly as the input for another instruction. (This is the primary way that the expression JIT can get rid of unnecessary memory operations). But there can be multiple instructions that use a value, in which case an operator can have multiple references. Finally, instruction operands are inserted by the compiler and these can have multiple references as well.

If each operator is visited only once during code generation, then this may introduce a problem when combined with another feature - conditional expressions. For instance, if two branches of a conditional expression both refer to the same value (represented by name $foo) then the code generator will only emit code to compute its value when it encounters the first reference. When the code generator encounters $foo for the second time in the other branch, no code will be emitted. This means that in the second branch, $foo will effectively have no defined value (because the code in the first branch is never executed), and wrong values or memory corruption is then the predictable result.

This bug has always existed for as long as the expression JIT has been under development, and in the past the solution has been not to write templates which have this problem. This is made a little easier by a feature the let: operator, in that it inserts a do operator which orders the values that are declared to be computed before the code that references them. So that this is in fact non-buggy:

(let: (($foo (load (local))) # code to compute $foo is emitted here
  (if (...)  
    (add $foo (const 1)) # $foo is just a reference
    (sub $foo (const 2)) # and here as well

The DO node is inserted for the LET operator. It ensures that the value of the LOAD node is computed before the reference in either branch

Alternatively, if a value $foo is used in the condition of the if operator, you can also be sure that it is available in both sides of the condition.

All these methods rely on the programmer being able to predict when a value will be first referenced and hence evaluated. An optimizer breaks this by design. This means that if I want the JIT optimizer to be successful, my options are:

  1. Fix the optimizer so as to not remove references that are critical for the correctness of the program
  2. Modify the input tree so that such references are either copied or moved forward
  3. Fix the code generator to emit code for a value, if it determines that an earlier reference is not available from the current block.
In other words, I first need to decide where this bug really belongs - in the optimizer, the code generator, or even the IR structure itself. The weakness of the expression IR is that expressions don't really impose a particular order. (This is unlike the spesh IR, which is instruction-based, and in which every instruction has a 'previous' and 'next' pointer). Thus, there really isn't a 'first' reference to a value, before the code generator introduces the concept. This is property is in fact quite handy for optimization (for instance, we can evaluate operands in whatever order is best, rather than being fixed by the input order) - so I'd really like to preserve it. But it also means that the property we're interested in - a value is computed before it is used in, in all possible code flow paths - isn't really expressible by the IR. And there is no obvious local invariant that can be maintained to ensure that this bug does not happen, so any correctness check may have to check the entire graph, which is quite impractical.

I hope this post explains why this is such a tricky problem! I have some ideas for how to get out of this, but I'll reserve those for a later post, since this one has gotten quite long enough. Until next time!

my Timotimo \this: Always Wear Safety Equipment When Inline Scalaring

Published by Timo Paulssen on 2019-02-21T20:44:09

Always Wear Safety Equipment When Inline Scalaring

MoarVM just recently got a nice optimization merged into the master branch. It's called "partial escape analysis" and comes with a specific optimization named "scalar replacement". In simple terms, it allows objects whose lifetime can be proven to end very soon to be created on the stack instead of in the garbage collector managed heap. More than just that, the "partial" aspect of the escape analysis even allows that when the object can escape out of our grasp, but will not always do so.

Allocation on the stack is much cheaper than allocation in the garbage collected heap, because freeing data off of the stack is as easy as leaving the function behind.

Always Wear Safety Equipment When Inline Scalaring
Photo by A. Zuhri / Unsplash

Like a big portion of optimizations in MoarVM's run-time optimizer/specializer ("spesh"), this analysis and optimization usually relies on some prior inlining of code. That's where the pun in the post's title comes from.

This is a progress report for the profiler front-end, though. So the question I'd like to answer in this post is how the programmer would check if these optimizations are being utilized in their own code.

A Full Overview

The first thing the user gets to see in the profiler's frontend is a big overview page summarizing lots of results from all aspects of the program's run. Thread creations, garbage collector runs, frame allocations made unnecessary by inlining, etc.

That page just got a new entry on it, that sums up all allocations and also shows how many allocations have been made unnecessary by scalar replacement, along with a percentage. Here's an example screenshot:

Always Wear Safety Equipment When Inline Scalaring

That on its own is really only interesting if you've run a program twice and maybe changed the code in between to compare how allocation/optimization behavior changed in between. However, there's also more detail to be found on the Allocations page:

Per-Type and Per-Routine Allocations/Replacements

The "Allocations" tab already gave details on which types were allocated most, and which routines did most of the allocating for any given type. Now there is an additional column that gives the number of scalar replaced objects as well:

Always Wear Safety Equipment When Inline Scalaring

Here's a screenshot showing the details of the Num type expanded to display the routines:

Always Wear Safety Equipment When Inline Scalaring

Spesh Performance Overview

One major part of Spesh is its "speculative" optimizations. They have the benefit of allowing optimizations even when something can't be proven. When some assumption is broken, a deoptimization happens, which is effectively a jump from inside an optimized bit of code back to the same position in the unoptimized code. There's also "on-stack-replacement", which is effectively a jump from inside unoptimized code to the same position in optimized code. The details are, of course, more complicated than that.

How can you find out which routines in your program (in libraries you're calling, or the "core setting" of all builtin classes and routines) are affected by deoptimization or by OSR? There is now an extra tab in the profiler frontend that gives you the numbers:

Always Wear Safety Equipment When Inline Scalaring

This page also has the first big attempt at putting hopefully helpful text directly next to the data. Below the table there's these sections:

Specializer Performance

MoarVM comes with a dynamic code optimizer called "spesh". It makes your code faster by observing at run time which types are used where, which methods end up being called in certain situations where there are multiple potential candidates, and so on. This is called specialization, because it creates versions of the code that take shortcuts based on assumptions it made from the observed data.


Assumptions, however, are there to be broken. Sometimes the optimized and specialized code finds that an assumption no longer holds. Parts of the specialized code that detect this are called "guards". When a guard detects a mismatch, the running code has to be switched from the optimized code back to the unoptimized code. This is called a "deoptimization", or "deopt" for short.

Deopts are a natural part of a program's life, and at low numbers they usually aren't a problem. For example, code that reads data from a file would read from a buffer most of the time, but at some point the buffer would be exhausted and new data would have to be fetched from the filesystem. This could mean a deopt.

If, however, the profiler points out a large amount of deopts, there could be an optimization opportunity.

On-Stack Replacement (OSR)

Regular optimization activates when a function is entered, but programs often have loops that run for a long time until the containing function is entered again.

On-Stack Replacement is used to handle cases like this. Every round of the loop in the unoptimized code will check if an optimized version can be entered. This has the additional effect that a deoptimization in such code can quickly lead back into optimized code.

Situations like these can cause high numbers of deopts along with high numbers of OSRs.

I'd be happy to hear your thoughts on how clear this text is to you, especially if you're not very knowledgeable about the topics discussed. Check github for the current version of this text - it should be at - and submit an issue to the github repo, or find me on IRC, on the perl6-users mailing list, on reddit, on mastodon/the fediverse, twitter, or where-ever.

brrt to the future: A short post about types and polymorphism

Published by Bart Wiegmans on 2019-01-14T13:34:00

Hi all. I usually write somewhat long-winded posts, but today I'm going to try and make an exception. Today I want to talk about the expression template language used to map the high-level MoarVM instructions to low-level constructs that the JIT compiler can easily work with:

This 'language' was designed back in 2015 subject to three constraints:
Recently I've been working on adding support for floating point operations, and  this means working on the type system of the expression language. Because floating point instructions operate on a distinct set of registers from integer instructions, a floating point operator is not interchangeable with an integer (or pointer) operator.

This type system is enforced in two ways. First, by the template compiler, which attempts to check if you've used all operands correctly. This operates during development, which is convenient. Second, by instruction selection, as there will simply not be any instructions available that have the wrong combinations of types. Unfortunately, that happens at runtime, and such errors so annoying to debug that it motivated the development of the first type checker.

However, this presents two problems. One of the advantages of the expression IR is that, by virtue of having a small number of operators, it is fairly easy to analyze. Having a distinct set of operators for each type would undo that. But more importantly, there are several MoarVM instructions that are generic, i.e. that operate on integer, floating point, and pointer values. (For example, the set, getlex and bindlex instructions are generic in this way). This makes it impossible to know whether its values will be integers, pointers, or floats.

This is no problem for the interpreter since it can treat values as bags-of-bits (i.e., it can simply copy the union MVMRegister type that holds all values of all supported types). But the expression JIT works differently - it assumes that it can place any value in a register, and that it can reorder and potentially skip storing them to memory. (This saves work when the value would soon be overwritten anyway). So we need to know what register class that is, and we need to have the correct operators to manipulate a value in the right register class.

To summarize, the problem is:
There are two ways around this, and I chose to use both. First, we know as a fact for each local or lexical value in a MoarVM frame (subroutine) what type it should have. So even a generic operator like set can be resolved to a specific type at runtime, at which point we can select the correct operators. Second, we can introduce generic operators of our own. This is possible so long as we can select the correct instruction for an operator based on the types of the operands.

For instance, the store operator takes two operands, an address and a value. Depending on the type of the value (reg or num), we can always select the correct instruction (mov or movsd). It is however not possible to select different instructions for the load operator based on the type required, because instruction selection works from the bottom up. So we need a special load_num operator, but a store_num operator is not necessary. And this is true for a lot more operators than I had initially thought. For instance, aside from the (naturally generic) do and if operators, all arithmetic operators and comparison operators are generic in this way.

I realize that, despite my best efforts, this has become a rather long-winded post anyway.....

Anyway. For the next week, I'll be taking a slight detour, and I aim to generalize the two-operand form conversion that is necessary on x86. I'll try to write a blog about it as well, and maybe it'll be short and to the point. See you later!

brrt to the future: New years post

Published by Bart Wiegmans on 2019-01-06T13:15:00

Hi everybody! I recently read jnthns Perl 6 new years resolutions post, and I realized that this was an excellent example to emulate. So here I will attempt to share what I've been doing in 2018 and what I'll be doing in 2019.

In 2018, aside from the usual refactoring, bugfixing and the like:
So 2019 starts with me trying to complete the goals specified in that grant request. I've already partially completed one goal (as explained in the interim report) - ensuring that register encoding works correctly for SSE registers in DynASM. Next up is actually ensuring support for SSE (and floating point) registers in the JIT, which is surprisingly tricky, because it means introducing a type system where there wasn't really one previously. I will have more to report on that in the near future.

After that, the first thing on my list is the support for irregular register requirements in the register allocator, which should open up the possibility of supporting various instructions.

I guess that's all for now. Speak you later!

6guts: My Perl 6 wishes for 2019

Published by jnthnwrthngtn on 2019-01-02T01:35:51

This evening, I enjoyed the New Year’s fireworks display over the beautiful Prague skyline. Well, the bit of it that was between me and the fireworks, anyway. Rather than having its fireworks display at midnight, Prague puts it at 6pm on New Year’s Day. That makes it easy for families to go to, which is rather thoughtful. It’s also, for those of us with plans to dig back into work the following day, a nice end to the festive break.

Prague fireworks over Narodni Divadlo

So, tomorrow I’ll be digging back into work, which of late has involved a lot of Perl 6. Having spent so many years working on Perl 6 compiler and runtime design and implementation, it’s been fun to spend a good amount of 2018 using Perl 6 for commercial projects. I’m hopeful that will continue into 2019. Of course, I’ll be continuing to work on plenty of Perl 6 things that are for the good of all Perl 6 users too. In this post, I’d like to share some of the things I’m hoping to work on or achieve during 2019.

Partial Escape Analysis and related optimizations in MoarVM

The MoarVM specializer learned plenty of new tricks this year, delivering some nice speedups for many Perl 6 programs. Many of my performance improvement hopes for 2019 center around escape analysis and optimizations stemming from it.

The idea is to analyze object allocations, and find pieces of the program where we can fully understand all of the references that exist to the object. The points at which we can cease to do that is where an object escapes. In the best cases, an object never escapes; in other cases, there are a number of reads and writes performed to its attributes up until its escape.

Armed with this, we can perform scalar replacement, which involves placing the attributes of the object into local registers up until the escape point, if any. As well as reducing memory operations, this means we can often prove significantly more program properties, allowing further optimization (such as getting rid of dynamic type checks). In some cases, we might never need to allocate the object at all; this should be a big win for Perl 6, which by its design creates lots of short-lived objects.

There will be various code-generation and static optimizer improvements to be done in Rakudo in support of this work also, which should result in its own set of speedups.

Expect to hear plenty about this in my posts here in the year ahead.

Decreasing startup time and base memory use

The current Rakudo startup time is quite high. I’d really like to see it fall to around half of what it currently is during 2019. I’ve got some concrete ideas on how that can be achieved, including changing the way we store and deserialize NFAs used by the parser, and perhaps also dealing with the way we currently handle method caches to have less startup impact.

Both of these should also decrease the base memory use, which is also a good bit higher than I wish.

Improving compilation times

Some folks – myself included – are developing increasingly large applications in Perl 6. For the current major project I’m working on, runtime performance is not an issue by now, but I certainly feel myself waiting a bit on compiles. Part of it is parse performance, and I’d like to look at that; in doing so, I’d also be able to speed up handling of all Perl 6 grammars.

I suspect there are some good wins to be had elsewhere in the compilation pipeline too, and the startup time improvements described above should also help, especially when we pre-compile deep dependency trees. I’d also like to look into if we can do some speculative parallel compilation.

Research into concurrency safety

In Perl 6.d, we got non-blocking await and react support, which greatly improved the scalability of Perl 6 concurrent and parallel programs. Now many thousands of outstanding tasks can be juggled across just a handful of threads (the exact number chosen according to demand and CPU count).

For Perl 6.e, which is still a good way off, I’d like to having something to offer in terms of making Perl 6 concurrent and parallel programming safer. While we have a number of higher-level constructs that eliminate various ways to make mistakes, it’s still possible to get into trouble and have races when using them.

So, I plan to spend some time this year quietly exploring and prototyping in this space. Obviously, I want something that fits in with the Perl 6 language design, and that catches real and interesting bugs – probably by making things that are liable to occasionally explode in weird ways instead reliably do so in helpful ways, such that they show up reliably in tests.

Get Cro to its 1.0 release

In the 16 months since I revealed it, Cro has become a popular choice for implementing HTTP APIs and web applications in Perl 6. It has also attracted code contributions from a couple of dozen contributors. This year, I aim to see Cro through to its 1.0 release. That will include (to save you following the roadmap link):

Comma Community, and lots of improvements and features

I founded Comma IDE in order to bring Perl 6 a powerful Integrated Development Environment. We’ve come a long way since the Minimum Viable Product we shipped back in June to the first subscribers to the Comma Supporter Program. In recent months, I’ve used Comma almost daily on my various Perl 6 projects, and by this point honestly wouldn’t want to be without it. Like Cro, I built Comma because it’s a product I wanted to use, which I think is a good place to be in when building any product.

In a few months time, we expect to start offering Comma Community and Comma Complete. The former will be free of charge, and the latter a commercial offering under a subscribe-for-updates model (just like how the supporter program has worked so far). My own Comma wishlist is lengthy enough to keep us busy for a lot more than the next year, and that’s before considering things Comma users are asking for. Expect plenty of exciting new features, as well as ongoing tweaks to make each release feel that little bit nicer to use.

Speaking, conferences, workshops, etc.

This year will see me giving my first keynote at a European Perl Conference. I’m looking forward to being in Riga again; it’s a lovely city to wander around, and I remember having some pretty nice food there too. The keynote will focus on the concurrent and parallel aspects of Perl 6; thankfully, I’ve still a good six months to figure out exactly what angle I wish to take on that, having spoken on the topic many times before!

I also plan to submit a talk or two for the German Perl Workshop, and will probably find the Swiss Perl Workshop hard to resist attending once more. And, more locally, I’d like to try and track down other Perl folks here in Prague, and see if I can help some kind of to happen again.

I need to keep my travel down to sensible levels, but might be able to fit in the odd other bit of speaking during the year, if it’s not too far away.


While I want to spend most of my time building stuff rather than talking about it, I’m up for the occasional bit of teaching. I’m considering pitching a 1-day Perl 6 concurrency workshop to the Riga organizers. Then we’ll see if there’s enough folks interested in taking it. It’ll cost something, but probably much less than any other way of getting a day of teaching from me. :-)

So, down to work!

Well, a good night’s sleep first. :-) But tomorrow, another year of fun begins. I’m looking forward to it, and to working alongside many wonderful folks in the Perl community. Rakudo Star Release 2018.10

Published on 2018-11-11T00:00:00 Perl 6 Coding Contest 2019: Seeking Task Makers

Published by Moritz Lenz on 2018-11-10T23:00:01

I want to revive Carl Mäsak's Coding Contest as a crowd-sourced contest.

The contest will be in four phases:

For the first phase, development of tasks, I am looking for volunteers who come up with coding tasks collaboratively. Sadly, these volunteers, including myself, will be excluded from participating in the second phase.

I am looking for tasks that ...

This is non-trivial, so I'd like to have others to discuss things with, and to come up with some more tasks.

If you want to help with task creation, please send an email to [email protected], stating your intentions to help, and your freenode IRC handle (optional).

There are other ways to help too:

In these cases you can use the same email address to contact me, or use IRC (moritz on freenode) or twitter.

6guts: Speeding up object creation

Published by jnthnwrthngtn on 2018-10-06T22:59:11

Recently, a Perl 6 object creation benchmark result did the rounds on social media. This Perl 6 code:

class Point {
    has $.x;
    has $.y;
my $total = 0;
for ^1_000_000 {
    my $p = => 2, y => 3);
    $total = $total + $p.x + $p.y;
say $total;

Now (on HEAD builds of Rakudo and MoarVM) runs faster than this roughly equivalent Perl 5 code:

use v5.10;

package Point;

sub new {
    my ($class, %args) = @_;
    bless \%args, $class;

sub x {
    my $self = shift;

sub y {
    my $self = shift;

package main;

my $total = 0;
for (1..1_000_000) {
    my $p = Point->new(x => 2, y => 3);
    $total = $total + $p->x + $p->y;
say $total;

(Aside: yes, I know there’s no shortage of libraries in Perl 5 that make OO nicer than this, though those I tried also made it slower.)

This is a fairly encouraging result: object creation, method calls, and attribute access are the operational bread and butter of OO programming, so it’s a pleasing sign that Perl 6 on Rakudo/MoarVM is getting increasingly speedy at them. In fact, it’s probably a bit better at them than this benchmark’s raw numbers show, since:

While dealing with Int values got faster recently, it’s still really making two Int objects every time around that loop and having to GC them later. An upcoming new set of analyses and optimizations will let us get rid of that cost too. And yes, startup will get faster with time also. In summary, while Rakudo/MoarVM is now winning that benchmark against Perl 5, there’s still lots more to be had. Which is a good job, since the equivalent Python and Ruby versions of that benchmark still run faster than the Perl 6 one.

In this post, I’ll look at the changes that allowed this benchmark to end up faster. None of the new work was particularly ground-breaking; in fact, it was mostly a case of doing small things to make better use of optimizations we already had.

What happens during construction?

Theoretically, the default new method in turn calls bless, passing the named arguments along. The bless method then creates an object instance, followed by calling BUILDALL. The BUILDALL method goes through the set of steps needed for constructing the object. In the case of a simple object like ours, that involves checking if an x and y named argument were passed, and if so assigning those values into the Scalar containers of the object attributes.

For those keeping count, that’s at least 3 method calls (newbless, and BUILDALL).

However, there’s a cheat. If bless wasn’t overridden (which would be an odd thing to do anyway), then the default new could just call BUILDALL directly anyway. Therefore, new looks like this:

multi method new(*%attrinit) {
        (my $bless := nqp::findmethod(self,'bless')),
      nqp::create(self).BUILDALL(Empty, %attrinit),

The BUILDALL method was originally a little “interpreter” that went through a per-object build plan stating what needs to be done. However, for quite some time now we’ve instead compiled a per-class BUILDPLAN method.

Slurpy sadness

To figure out how to speed this up, I took a look at how the specializer was handling the code. The answer: not so well. There were certainly some positive moments in there. Of note:

However, the new method was getting only a “certain” specialization, which is usually only done for very heavily polymorphic code. That wasn’t the case here; this program clearly constructs overwhelmingly one type of object. So what was going on?

In order to produce an “observed type” specialization – the more powerful kind – it needs to have data on the types of all of the passed arguments. And for the named arguments, that was missing. But why?

Logging of passed argument types is done on the callee side, since:

The argument logging was done as the interpreter executed each parameter processing instruction. However, when there’s a slurpy, then it would just swallow up all the remaining arguments without logging type information. Thus the information about the argument types was missing and we ended up with a less powerful form of specialization.

Teaching the slurpy handling code about argument logging felt a bit involved, but then I realized there was a far simpler solution: log all of the things in the argument buffer at the point an unspecialized frame is being invoked. We’re already logging the entry to the call frame at that point anyway. This meant that all of the parameter handling instructions got a bit simpler too, since they no longer had logging to do.

Conditional elimination

Having new being specialized in a more significant way was an immediate win. Of note, this part:

        (my $bless := nqp::findmethod(self,'bless')),

Was quite interesting. Since we were now specializing on the type of self, then the findmethod could be resolved into a constant. The resolution of a method on the constant Mu was also a constant. Therefore, the result of the eqaddr (same memory address) comparison of two constants should also have been turned into a constant…except that wasn’t happening! This time, it was simple: we just hadn’t implemented folding of that yet. So, that was an easy win, and once done meant the optimizer could see that the if would always go a certain way and thus optimize the whole thing into the chosen branch. Thus the new method was specialized into something like:

multi method new(*%attrinit) {
    nqp::create(self).BUILDALL(Empty, %attrinit),

Further, the create could be optimized into a “fast create” op, and the relatively small BUILDALL method could be inlined into new. Not bad.

Generating simpler code

At this point, things were much better, but still not quite there. I took a look at the BUILDALL method compilation, and realized that it could emit faster code.

The %attrinit is a Perl 6 Hash object, which is for the most part a wrapper around the lower-level VM hash object, which is the actual hash storage. We were, curiously, already pulling this lower-level hash out of the Hash object and using the nqp::existskey primitive to check if there was a value, but were not doing that for actually accessing the value. Instead, an .AT-KEY('x') method call was being done. While that was being handled fairly well – inlined and so forth – it also does its own bit of existence checking. I realized we could just use the nqp::atkey primitive here instead.

Later on, I also realized that we could do away with nqp::existskey and just use nqp::atkey. Since a VM-level null is something that never exists in Perl 6, we can safely use it as a sentinel to mean that there is no value. That got us down to a single hash lookup.

By this point, we were just about winning the benchmark, but only by a few percent. Were there any more easy pickings?

An off-by-one

My next surprise was that the new method didn’t get inlined. It was within the size limit. There was nothing preventing it. What was going on?

Looking closer, it was even worse than that. Normally, when something is too big to be inlined, but we can work out what specialization will be called on the target, we do “specialization linking”, indicating which specialization of the code to call. That wasn’t happening. But why?

Some debugging later, I sheepishly fixed an off-by-one in the code that looks through a multi-dispatch cache to see if a particular candidate matches the set of argument types we have during optimization of a call instruction. This probably increased the performance of quite a few calls involving passing named arguments to multi-methods – and meant new was also inlined, putting us a good bit ahead on this benchmark.

What next?

The next round of performance improvements – both to this code and plenty more besides – will come from escape analysis, scalar replacement, and related optimizations. I’ve already started on that work, though expect it will keep me busy for quite a while. I will, however, be able to deliver it in stages, and am optimistic I’ll have the first stage of it ready to talk about – and maybe even merge – in a week or so.

brrt to the future: A future for fork(2)

Published by Bart Wiegmans on 2018-10-03T22:18:00

Hi hackers. Today I want to write about a new functionality that I've been developing for MoarVM that has very little to do with the JIT compiler. But it is still about VM internals so I guess it will fit.

Many months ago, jnthn wrote a blog post on the relation between perl 5 and perl 6. And as a longtime and enthusiastic perl 5 user - most of the JIT's compile time support software is written in perl 5 for a reason - I wholeheartedly agree with the 'sister language' narrative. There is plenty of room for all sorts of perls yet, I hope. Yet one thing kept itching me:
Moreover, it’s very much the case that Perl 5 and Perl 6 make different trade-offs. To pick one concrete example, Perl 6 makes it easy to run code across multiple threads, and even uses multiple threads internally (for example, performing optimization and JIT compilation on a background thread). Which is great…except the only winning move in a game involving both threads and fork() is not to play. Sometimes one just can’t have their cake and eat it, and if you’re wanting a language that more directly gives you your POSIX, that’s probably always going to be a strength of Perl 5 over Perl 6.
(Emphasis mine). You see, I had never realized that MoarVM couldn't implement fork(), but it's true. In POSIX systems, a fork()'d child process inherits the full memory space, as-is, from its parent process. But it inherits only the forking thread. This means that any operations performed by any other thread, including operations that might need to be protected by a mutex (e.g. malloc()), will be interrupted and unfinished (in the child process). This can be a problem. Or, in the words of the linux manual page on the subject:

       *  The child process is created with a single thread—the one that
called fork(). The entire virtual address space of the parent is
replicated in the child, including the states of mutexes,
condition variables, and other pthreads objects; the use of
pthread_atfork(3) may be helpful for dealing with problems that
this can cause.

* After a fork() in a multithreaded program, the child can safely
call only async-signal-safe functions (see signal-safety(7)) until
such time as it calls execve(2).

Note that the set of signal-safe functions is not that large, and excludes all memory management functions. As noted by jnthn, MoarVM is inherently multithreaded from startup, and will happily spawn as many threads as needed by the program. It also uses malloc() and friends rather a lot. So it would seem that perl 6 cannot implement POSIX fork() (in which both parent and child program continue from the same position in the program) as well as perl 5 can.

I was disappointed. As a longtime (and enthusiastic) perl 5 user, fork() is one of my favorite concurrency tools. Its best feature is that parent and child processes are isolated by the operating system, so each can modify its own internal state without concern for concurrency,. This can make it practical to introduce concurrency after development, rather than designing it in from the start. Either process can crash while the other continues. It is also possible (and common) to run a child process with reduced privileges relative to the parent process, which isn't possible with threads. And it is possible to prepare arbitrarily complex state for the child process, unlike spawn() and similar calls.

Fortunately all hope isn't necessarily lost. The restrictions listed above only apply if there are multiple threads active at the moment that fork() is executed. That means that if we can stop all threads (except for the one planning to fork) before actually calling fork(), then the child process can continue safely. That is well within the realm of possibility, at least as far as system threads are concerned.

The process itself isn't very exciting to talk about, actually - it involves sending stop signals to system threads, waiting for these threads to join, verifying that the forking thread is the really only active thread, and restarting threads after fork(). Because of locking, it is a bit subtle (tbere may be another user thread that is also trying to fork), but not actually very hard. When I finally merged the code, it turned out that an ancient (and forgotten) thread list modification function was corrupting the list by not being aware of generational garbage collection... oops. But that was simple enough to fix (thanks to timotimo++ for the hint).

And now the following oneliner should work on platforms that support fork():(using development versions of MoarVM and Rakudo):

perl6 -e 'use nqp; my $i = nqp::fork(); say $i;'

The main limitation of this work is that it won't work if there are any user threads active. (If there are any, nqp::fork() will throw an exception). The reason why is simple: while it is possible to adapt the system threads so that I can stop them on demand, user threads may be doing arbitrary work, hold arbitrary locks and may be blocked (possibly indefinitely) on a system call. So jnthn's comment at the start of this post still applies - threads and fork() don't work together.

In fact, many programs may be better off with threads. But I think that languages in the perl family should let the user make that decision, rather than the VM. So I hope that this will find some use somewhere. If not, it was certainly fun to figure out. Happy hacking!

PS: For the curious, I think there may in fact be a way to make fork() work under a multithreaded program, and it relies on the fact that MoarVM has a multithreaded garbage collector. Basically, stopping all threads before calling fork() is not so different from stopping the world during the final phase of garbage collection. And so - in theory - it would be possible to hijack the synchronization mechanism of the garbage collector to pause all threads. During interpretation, and in JIT compiled code, each thread periodically checks if garbage collection has started. If it has, it will synchronize with the thread that started GC in order to share the work. Threads that are currently blocked (on a system call, or on acquiring a lock) mark themselves as such, and when they are resumed always check the GC status. So it is in fact possible to force MoarVM into a single active thread even with multiple user threads active. However, that still leaves cleanup to deal with, after returning from fork() in the child process. Also, this will not work if a thread is blocked on NativeCall. Altogether I think abusing the GC in order to enable a fork() may be a bit over the edge of insanity :-)

6guts: Eliminating unrequired guards

Published by jnthnwrthngtn on 2018-09-29T19:59:28

MoarVM’s optimizer can perform speculative optimization. It does this by gathering statistics as the program is interpreted, and then analyzing them to find out what types and callees typically show up at given points in the program. If it spots there is at least a 99% chance of a particular type showing up at a particular program point, then it will optimize the code ahead of that point as if that type would always show up.

Of course, statistics aren’t proofs. What about the 1% case? To handle this, a guard instruction is inserted. This cheaply checks if the type is the expected one, and if it isn’t, falls back to the interpreter. This process is known as deoptimization.

Just how cheap are guards?

I just stated that a guard cheaply checks if the type is the expected one, but just how cheap is it really? There’s both direct and indirect costs.

The direct cost is that of the check. Here’s a (slightly simplified) version of the JIT compiler code that produces the machine code for a type guard.

/* Load object that we should guard */
| mov TMP1, WORK[obj];
/* Get type table we expect and compare it with the object's one */
MVMint16 spesh_idx = guard->ins->operands[2].lit_i16;
| get_spesh_slot TMP2, spesh_idx;
| cmp TMP2, OBJECT:TMP1->st;
| jne >1;
/* We're good, no need to deopt */
| jmp >2;
/* Call deoptimization handler */
| mov ARG1, TC;
| mov ARG2, guard->deopt_offset;
| mov ARG3, guard->deopt_target;
| callp &MVM_spesh_deopt_one_direct;
/* Jump out to the interpreter */
| jmp ->exit;

Where get_spesh_slot is a macro like this:

|.macro get_spesh_slot, reg, idx;
| mov reg, TC->cur_frame;
| mov reg, FRAME:reg->effective_spesh_slots;
| mov reg, OBJECTPTR:reg[idx];

So, in the case that the guard matches, it’s 7 machine instructions (note: it’s actually a couple more because of something I omitted for simplicity). Thus there’s the cost of the time to execute them, plus the space they take in memory and, especially, the instruction cache. Further, one is a conditional jump. We’d expect it to be false most of the time, and so the CPU’s branch predictor should get a good hit rate – but branch predictor usage isn’t entirely free of charge either. Effectively, it’s not that bad, but it’s nice to save the cost if we can.

The indirect costs are much harder to quantify. In order to deoptimize, we need to have enough state to recreate the world as the interpreter expects it to be. I wrote on this topic not so long ago, for those who want to dive into the detail, but the essence of the problem is that we may have to retain some instructions and/or forgo some optimizations so that we are able to successfully deoptimize if needed. Thus, the presence of a guard constrains what optimizations we can perform in the code around it.

Representation problems

A guard instruction in MoarVM originally looked like:

sp_guard r(obj) sslot uint32

Where r(obj) is an object register to read containing the object to guard, the sslot is a spesh slot (an entry in a per-block constant table) containing the type we expect to see, and the uint32 indicates the target address after we deoptimize. Guards are inserted after instructions for which we had gathered statistics and determined there was a stable type. Things guarded include return values after a call, reads of object attributes, and reads of lexical variables.

This design has carried us a long way, however it has a major shortcoming. The program is represented in SSA form. Thus, an invoke followed by a guard might look something like:

invoke r6(5), r4(2)
sp_guard r6(5), sslot(42), litui32(64)

Where r6(5) has the return value written into it (and thus is a new SSA version of r6). We hold facts about a value (if it has a known type, if it has a known value, etc.) per SSA version. So the facts about r6(5) would be that it has a known type – the one that is asserted by the guard.

The invoke itself, however, might be optimized by performing inlining of the callee. In some cases, we might then know the type of result that the inlinee produces – either because there was a guard inside of the inlined code, or because we can actually prove the return type! However, since the facts about r6(5) were those produced by the guard, there was no way to talk about what we know of r6(5) before the guard and after the guard.

More awkwardly, while in the early days of the specializer we only ever put guards immediately after the instructions that read values, more recent additions might insert them at a distance (for example, in speculative call optimizations and around spesh plugins). In this case, we could not safely set facts on the guarded register, because those might lead to wrong optimizations being done prior to the guard.

Changing of the guards

Now a guard instruction looks like this:

sp_guard w(obj) r(obj) sslot uint32

Or, concretely:

invoke r6(5), r4(2)
sp_guard r6(6), r6(5), sslot(42), litui32(64)

That is to say, it introduces a new SSA version. This means that we get a way to talk about the value both before and after the guard instruction. Thus, if we perform an inlining and we know exactly what type it will return, then that type information will flow into the input – in our example, r6(5) – of the guard instruction. We can then notice that the property the guard wants to assert is already upheld, and replace the guard with a simple set (which may itself be eliminated by later optimizations).

This also solves the problem with guards inserted away from the original write of the value: we get a new SSA version beyond the guard point. This in turn leads to more opportunities to avoid repeated guards beyond that point.

Quite a lot of return value guards on common operations simply go away thanks to these changes. For example, in $a + $b, where $a and $b are Int, we will be able to inline the + operator, and we can statically see from its code that it will produce an Int. Thus, the guard on the return type in the caller of the operator can be eliminated. This saves the instructions associated with the guard, and potentially allows for further optimizations to take place since we know we’ll never deoptimize at that point.

In summary

MoarVM does lots of speculative optimization. This enables us to optimize in cases where we can’t prove a property of the program, but statistics tell us that it mostly behaves in a certain way. We make this safe by adding guards, and falling back to the general version of the code in cases where they fail.

However, guards have a cost. By changing our representation of them, so that we model the data coming into the guard and after the guard as two different SSA versions, we are able to eliminate many guard instructions. This not only reduces duplicate guards, but also allows for elimination of guards when the broader view afforded by inlining lets us prove properties that we weren’t previously able to.

In fact, upcoming work on escape analysis and scalar replacement will allow us to start seeing into currently opaque structures, such as Scalar containers. When we are able to do that, then we’ll be able to prove further program properties, leading to the elimination of yet more guards. Thus, this work is not only immediately useful, but also will help us better exploit upcoming optimizations.

6guts: Faster box/unbox and Int operations

Published by jnthnwrthngtn on 2018-09-28T22:43:55

My work on Perl 6 performance continues, thanks to a renewal of my grant from The Perl Foundation. I’m especially focusing on making common basic operations faster, the idea being that if those go faster than programs composed out of them also should. This appears to be working out well: I’ve not been directly trying to make the Text::CSV benchmark run faster, but that’s resulted from my work.

I’ll be writing a few posts in on various of the changes I’ve done. This one will take a look at some related optimizations around boxing, unboxing, and common mathematical operations on Int.

Boxing and unboxing

Boxing is taking a natively typed value and wrapping it into an object. Unboxing is the opposite: taking an object that wraps a native value and getting the native value out of it.

In Perl 6, these are extremely common. Num and Str are boxes around num and str respectively. Thus, unless dealing with natives explicitly, working with floating point numbers and strings will involve lots of box and unbox operations.

There’s nothing particularly special about Num and Str. They are normal objects with the P6opaque representation, meaning they can be mixed into. The only thing that makes them slightly special is that they have attributes marked as being a box target. This indicates the attribute out as being the one to write to or read from in a box or unbox operation.

Thus, a box operations is something like:

While unbox is:

Specialization of box and unbox

box is actually two operations: an allocation and a store. We know how to fast-path allocations and JIT them relatively compactly, however that wasn’t being done for box. So, step one was to decompose this higher-level op into the allocation and the write into the allocated object. The first step could then be optimized in the usual way allocations are.

In the unspecialized path, we first find out where to write the native value to, and then write it. However, when we’re producing a specialized version, we almost always know the type we’re boxing into. Therefore, the object offset to write to can be calculated once, and a very simple instruction to do a write at an offset into the object produced. This JITs extremely well.

There are a couple of other tricks. Binds into a P6opaque generally have to check that the object wasn’t mixed in to, however since we just allocated it then we know that can’t be the case and can skip that check. Also, a string is a garbage-collectable object, and when assigning one GC-able object into another one, we need to perform a write barrier for the sake of generational GC. However, since the object was just allocated, we know very well that it is in the nursery, and so the write barrier will never trigger. Thus, we can omit it.

Unboxing is easier to specialize: just calculate the offset, and emit a simpler instruction to load the value from there.

I’ve also started some early work (quite a long way from merge) on escape analysis, which will allow us to eliminate many box object allocations entirely. It’s a great deal easier to implement this if allocations, reads, and writes to an object have a uniform representation. By lowering box and unbox operations into these lower level operations, this eases the path to implementing escape analysis for them.

What about Int?

Some readers might have wondered why I talked about Num and Str as examples of boxed types, but not Int. It is one too – but there’s a twist. Actually, there’s two twists.

The first is that Int isn’t actually a wrapper around an int, but rather an arbitrary precision integer. When we first implemented Int, we had it always use a big integer library. Of course, this is slow, so later on we made it so any number fitting into a 32-bit range would be stored directly, and only allocate a big integer structure if it’s outside of this range.

Thus, boxing to a big integer means range-checking the value to box. If it fits into the 32-bit range, then we can write it directly, and set the flag indicating that it’s a small Int. Machine code to perform these steps is now spat out directly by the JIT, and we only fall back to a function call in the case where we need a big integer. Once again, the allocation itself is emitted in a more specialized way too, and the offset to write to is determined once at specialization time.

Unboxing is similar. Provided we’re specializing on the type of the object to unbox, then we calculate the offset at specialization time. Then, the JIT produces code to check if the small Int flag is set, and if so just reads and sign extends the value into a 64-bit register. Otherwise, it falls back to the function call to handle the real big integer case.

For boxing, however, there was a second twist: we have a boxed integer cache, so for small integers we don’t have to repeatedly allocate objects boxing them. So boxing an integer is actually:

  1. Check if it’s in the range of the box cache
  2. If so, return it from the cache
  3. Otherwise, do the normal boxing operation

When I first did these changes, I omitted the use of the box cache. It turns out, however, to have quite an impact in some programs: one benchmark I was looking at suffered quite a bit from the missing box cache, since it now had to do a lot more garbage collection runs.

So, I reinstated use of the cache, but this time with the JIT doing the range checks in the produced machine code and reading directly out of the cache in the case of a hit. Thus, in the cache hit case, we now don’t even make a single function call for the box operation.

Faster Int operations

One might wonder why we picked 32-bit integers as the limit for the small case of a big integer, and not 64-bit integers. There’s two reasons. The most immediate is that we can then use the 32 bits that would be the lower 32 of a 64-bit pointer to the big integer structure as our “this is a small integer” flag. This works reliably as pointers are always aligned to at least a 4-byte boundary, so a real pointer to a big integer structure would never have the lowest bits set. (And yes, on big-endian platforms, we swap the order of the flag and the value to account for that!)

The second reason is that there’s no portable way in C to detect if a calculation overflowed. However, out of the basic math operations, if we have two inputs that fit into a 32-bit integer, and we do them at 64-bit width, we know that the result can never overflow the 64-bit integer. Thus we can then range check the result and decide whether to store it back into the result object as 32-bit, or to store it as a big integer.

Since Int is immutable, all operations result in a newly allocated object. This allocation – you’ll spot a pattern by now – is open to being specialized. Once again, finding the boxed value to operate on can also be specialized, by calculating its offset into the input objects and result object. So far, so familiar.

However, there’s a further opportunity for improvement if we are JIT-compiling the operations to machine code: the CPU has flags for if the last operation overflowed, and we can get at them. Thus, for two small Int inputs, we can simply:

  1. Grab the values
  2. Do the calculation at 32-bit width
  3. Check the flags, and store it into the result object if no overflow
  4. If it overflowed, fall back to doing it wider and storing it as a real big integer

I’ve done this for addition, subtraction, and multiplication. Those looking for a MoarVM specializer/JIT task might like to consider doing it for some of the other operations. :-)

In summary

Boxing, unboxing, and math on Int all came with various indirections for the sake of generality (coping with mixins, subclassing, and things like IntStr). However, when we are producing type-specialized code, we can get rid of most of the indirections, resulting in being able to perform them faster. Further, when we JIT-compile the optimized result into machine code, we can take some further opportunities, reducing function calls into C code as well as taking advantage of access to the overflow flags.

samcv: Adding and Improving File Encoding Support in MoarVM

Published on 2018-09-26T00:00:00

Encodings. They may seem to some as horribly boring and bland. Nobody really wants to have to worry about the details. Encodings are how we communicate, transmit data. Their purpose is to be understood by others. When they work, nobody should even know they are there. When they don’t — and everyone can probably remember a time when they have tried to open a poorly encoded or corrupted file and been disappointed — they cannot be ignored.

Here I talk about the work I have done in the past and work still in progress to improve the support of current encodings, add new encodings, and add new features and new options to allow Perl 6 and MoarVM to support more encodings and in a way which better achieves the goals encodings were designed to solve.

Table of Contents

Which Encodings Should We Add?

I started looking at the most common encodings on the Web. We supported the two most common, UTF-8 and ISO-8859-1 but we did not support windows-1251 (Cyrillic) or ShiftJIS. This seemed like the makings of a new project so I got to work.

I decided to start with windows-1251 since it was an 8 bit encoding while Shift-JIS was a variable length one or two byte encoding (the shift in the name comes from the second byte 'shift'). While the encoding itself was simpler than Shift-JIS, I was soon realizing some issues with both windows-1251 and our already supported windows-1252 encoding.

One of the first major issues I found in windows-1252, you could create files which could not be decoded by other programs. An example of this is codepoint 0x81 (129) which does not have a mapping in the windows-1252 specification. This would cause many programs to not detect the file as windows-1252, or to fail saying the file was corrupted and could not be decoded.

While our non-8bit encodings would throw an error if asked to encode text which did not exist in that codemapping, our 8 bit encodings would happily pass through invalid codepoints, as long as they fit in 8 bits.

As I said at the start, encodings are a way for us to communicate, to exchange data and to be understood. This to me indicated a major failing. While the solution could be to never attempt to write codepoints that don’t exist in the target encoding, that was not an elegant solution and would just be a workaround.

To remedy this I had to add new ops, such as decodeconf to replace decode op, and encodeconf to replace encode (plus many others). This would allow us to specify a configuration for our encodings and allow Perl 6 to tell MoarVM if it wanted to encode strictly according to the standard or to be more lenient.

I added a new :strict option to open, encode and a few others to allow you to specify if it should be encoding strict or not strict. In the needs of backwards compatibility it still defaults to leniently encoding and decoding. Strict is planned to become the default for Perl 6 specification 6.e.


Some of our other encodings had support for 'replacements'. If you tried to encode something that would not fit in the target encoding, this allowed you to supply a string that could be one or more characters which would but substituted instead of causing MoarVM to throw an exception.

Once I had the strictness nailed down I was able to add support for replacements so one could have the ability to write data in a strict mode while still ensuring all compatible characters would get written properly, and you did not have to choose between unstrict and writing incompatible files and strict and having the encoding or file write failing when you really needed it not to.


Encodings are not without difficulty. As with the previous encodings I talked about, Shift-JIS would not be without decisions that had to be made. With Shift-JIS, the question became, "which Shift-JIS?".

You see, there are dozens of different extensions to Shift-JIS, in addition to original Shift-JIS encoding. As a variable length encoding, most of the one byte codepoints had been taken while there were many hundreds of open and unallocated codepoints in the two byte range. This saw the wide proliferation of manufacturer specific extensions which other manufacturers may adopt while adding their own extensions to the format.

Which of the dozens of different Shift-JIS encodings should we support? I eventually decided on windows-932 because that is the standard that is used by browsers when they encounter Shift-JIS text and encompasses a great many symbols. It is the most widely used Shift-JIS format. This would allow us to support the majority of Shift-JIS encoded documents out there, and the one which web browsers use on the web. Most but not all the characters in it are compatible with the other extensions, so it seemed like it was the sensible choice.

The only notable exception to this is that windows-932 is not totally compatible with the original Shift-JIS encoding. While the original windows-932 encoding (and some extensions to it) map the ASCII’s backslash to the yen symbol ¥, windows-932 keeps it to the backslash. This was to retain better compatibility with ASCII and other formats.


While MoarVM had a UTF-16 encoder and decoder, it was not fully featured.

  • You could encode a string into a utf16 buffer:

    • "hello".encode('utf16') #> utf16:0x<68 65 6c 6c 6f>

  • You could decode a utf16 buffer into a string:

    •, 0x65, 0x6C, 0x6C, 0x6F).decode('utf16') #> hello

That was about it. You couldn’t open a filehandle as UTF-16 and write to it. You couldn’t even do $fh.write: "hello".encode('utf16') as the write function did not know how to deal with writing a 16 bit buffer (it expected an 8-bit buffer).

In addition there was no streaming UTF-16 decoder, so there was no way to read a UTF-16 file. So I set out to work, first allowing us to write 16-bit buffers to a file and then being able to .print a filehandle and write text to it.

At this point I knew that I would have to confront the BOM in the room, but decided to first implement the streaming decoder and ensure all of our file handle operations worked.

The BOM in the Room

You may be noticing a theme here. You do not just implement an encoding. It does not exist in a vacuum. While there may be standards, there may be multiple standards. And those standards may or may not be followed in practice or it may not be totally straightforward.

Endianess defines which order the bytes are in a number. Big endian machines will store a 16-bit number with the larger section first, similar to how we write numbers, while little endian machines will put the smallest section first. This only matters for encoding numbers that are more than one byte. UTF-16 can be either big endian or little endian. Meaning big and little endian files will contain the same bytes, but the order is different.

Since swapping the two bytes of a 16-bit integer causes a different integer instead of an invalid one, the creators of UTF-16 knew it was not possible to determine with certainty the endianess of a 16 bit number. And so the byte order mark was created, a codepoint that would be added at the start of a file to signify which endianess the file was. Since they didn’t want to break already existing software, they used a "Zero width no-break space" (ZWNBSP) and designated that a ZWNBSP with bytes reversed would be invalid in UTF-16, a "non-character".

If the BOM was not removed, it would not be visible since it is zero width. If your program opens a file and sees a byte order mark it knew it was in the correct endianess. If it’s the "non-character" then it knew it had to swap the bytes when it read the file.

The standard states that utf16 should always use a BOM and that when reading a file as utf16 the BOM should be used to detect the endianess. It states the BOM is not passed through as it is assumed to not be part of the actual data. If it doesn’t exist then the system is supposed to decode as the endianess that the context suggests. So you get a format where you may lose the first codepoint of your file if that codepoint happens to be a zero width non-break space.

For utf16be (big endian) and utf16le (little endian) the standard states the byte order mark should not exist, and that any zero width no-break space at the start of a file is just that, and should be passed through.

Standards and practice are not identical and in practice many programs will remove a leading zero width non-break space even if set explicitly to utf16be or utf16le. I was unsure which route I should take. I looked at how Perl 5 handled it and after thinking for a long time I decided that we would follow the spec. Even though other software will get rid of any detected byte order mark, I think it’s important that if a user specifies utf16be or utf16le, they will always get back the same codepoints that they write to a file.

Current Work

I am currently working on adding support so that when you write a file in 'utf16' it will add a byte order mark, as utf16 is supposed to always have one. Until this is added you can write a file with Perl 6 on one computer using the 'utf16' encoding which will not decode correctly using the same 'utf16' setting if the two computers are of different endianess.

Since there was no functionality for writing utf-16 to a file previously, there should be no backwards compatibility issues to worry about there and we can set it to add a byte order mark if the utf16 encoder is used.

Release 2018.09

In the last release utf16, utf16le and utf16be should work, aside from utf16 not adding a byte order mark on file write. Anyone who uses utf16 and finds any issues or comments are free to email me or make an issue on the MoarVM or Rakudo issue trackers. Rakudo Star Release 2018.06

Published on 2018-08-06T00:00:00

6guts: Redesigning Rakudo’s Scalar

Published by jnthnwrthngtn on 2018-07-26T23:54:45

What’s the most common type your Perl 6 code uses? I’ll bet you that in most programs you write, it’ll be Scalar. That might come as a surprise, because you pretty much never write Scalar in your code. But in:

my $a = 41;
my $b = $a + 1;

Then both $a and $b point to Scalar containers. These in turn hold the Int objects. Contrast it with:

my $a := 42;
my $b := $a + 1;

Where there are no Scalar containers. Assignment in Perl 6 is an operation on a container. Exactly what it does depending on the type of the container. With an Array, for example, it iterates the data source being assigned, and stores each value into the target Array. Assignment is therefore a copying operation, unlike binding which is a referencing operation. Making assignment the shorter thing to type makes it more attractive, and having the more attractive thing decrease the risk of action at a distance is generally a good thing.

Having Scalar be first-class is used in a number of features:

And probably some more that I forgot. It’s powerful. It’s also torture for those of us building Perl 6 implementations and trying to make them run fast. The frustration isn’t so much the immediate cost of the allocating all of those Scalar objects – that of course costs something, but modern GC algorithms can throw away short-lived objects pretty quickly – but also because of the difficulties it introduces for program analysis.

Despite all the nice SSA-based analysis we do, tracking the contents of Scalar containers is currently beyond that. Rather than any kind of reasoning to prove properties about what a Scalar holds, we instead handle it through statistics, guards, and deoptimization at the point that we fetch a value from a Scalar. This still lets us do quite a lot, but it’s certainly not ideal. Guards are cheap, but not free.

Looking ahead

Over the course of my current grant from The Perl Foundation, I’ve been working out a roadmap for doing better with optimization in the presence of Scalar containers. Their presence is one of the major differences between full Perl 6 and the restricted NQP (Not Quite Perl), and plays a notable part in the performance difference between the two.

I’ve taken the first big step towards improving this situation by significantly re-working the way Scalar containers are handled. I’ll talk about that in this post, but first I’d like to provide an idea of the overall direction.

In the early days of MoarVM, when we didn’t have specialization or compilation to machine code, it made sense to do various bits of special-casing of Scalar. As part of that, we wrote code handling common container operations in C. We’ve by now reached a point where the C code that used to be a nice win is preventing us from performing the analyses we need in order to do better optimizations. At the end of the day, a Scalar container is just a normal object with an attribute $!value that holds its value. Making all operations dealing with Scalar container really be nothing more than some attribute lookups and binds would allow us to solve the problem in terms of more general analyses, which stand to benefit many other cases where programs use short-lived objects.

The significant new piece of analysis we’ll want to do is escape analysis, which tells us which objects have a lifetime bounded to the current routine. We understand “current routine” to incorporate those that we have inlined.

If we know that an object’s usage lies entirely within the current routine, we can then perform an optimization known as scalar replacement, which funnily enough has nothing much to do with Scalar in the Perl 6 sense, even if it solves the problems we’re aiming to solve with Scalar! The idea is that we allocate a local variable inside of the current frame for each attribute of the object. This means that we can then analyze them like we analyze other local variables, subject them to SSA, and so forth. This for one gets rid of the allocation of the object, but also lets us replace attribute lookups and binds with a level of indirection less. It will also let us reason about the contents of the once-attributes, so that we can eliminate guards that we previously inserted because we only had statistics, not proofs.

So, that’s the direction of travel, but first, Scalar and various operations around it needed to change.

Data structure redesign

Prior to my recent work, a Scalar looked something like:

class Scalar {
    has $!value;        # The value in the Scalar
    has $!descriptor;   # rw-ness, type constraint, name
    has $!whence;       # Auto-vivification closure

The $!descriptor held the static information about the Scalar container, so we didn’t have to hold it in every Scalar (we usually have many instances of the same “variable” over a programs lifetime).

The $!whence was used when we wanted to do some kind of auto-vivification. The closure attached to it was invoked when the Scalar was assigned to, and then cleared afterwards. In an array, for example, the callback would bind the Scalar into the array storage, so that element – if assigned to – would start to exist in the array. There are various other forms of auto-vivification, but they all work in roughly the same way.

This works, but closures aren’t so easy for the optimizer to deal with (in short, a closure has to have an outer frame to point to, and so we can’t inline a frame that takes a closure). Probably some day we’ll find a clever solution to that, but since auto-vivification is an internal mechanism, we may as well make it one that we can see a path to making efficient in the near term future.

So, I set about considering alternatives. I realized that I wanted to replace the $!whence closure with some kind of object. Different types of object would do different kinds of vivification. This would work very well with the new spesh plugin mechanism, where we can build up a set of guards on objects. It also will work very well when we get escape analysis in place, since we can then potentially remove those guards after performing scalar replacement. Thus after inlining, we might be able to remove the “what kind of vivification does this assignment cause” checking too.

So this seemed workable, but then I also realized that it would be possible to make Scalar smaller by:

This not only makes Scalar smaller, but it means that we can use a single guard check to indicate the course of action we should take with the container: a normal assignment, or a vivification.

The net result: vivification closures go away giving more possibility to inline, assignment gets easier to specialize, and we get a memory saving on every Scalar container. Nice!

C you later

For this to be really worth it from an optimization perspective, I needed to eliminate various bits of C special-case code around Scalar and replace it with standard MoarVM ops. This implicated:

The first 3 became calls to code registered to perform the operations, using the 6model container API. The second two cases were handled by replacing the calls to C extops with desugars, which is a mechanism that takes something that is used as an nqp::op and rewrites it, as it is compiled, into a more interesting AST, which is then in turn compiled. Happily, this meant I could make all of the changes I needed to without having to go and do a refactor across the CORE.setting. That was nice.

So, now those operations were compiled into bytecode operations instead of ops that were really just calls to C code. Everything was far more explicit. Good! Alas, the downside is that the code we generate gets larger in size.

Optimization with spesh plugins

talked about specializer plugins in a recent post, where I used them to greatly speed up various forms of method dispatch. However, they are also applicable to optimizing operations on Scalar containers.

The change to decontainerizing return values was especially bad at making the code larger, since it had to do quite a few checks. However, with a spesh plugin, we could just emit a use of the plugin, followed by calling whatever the plugin produces.

Here’s a slightly simplified version of the the plugin I wrote, annotated with some comments about what it is doing. The key thing to remember about a spesh plugin is that it is not doing an operation, but rather it’s setting up a set of conditions under which a particular implementation of the operation applies, and then returning that implementation.

nqp::speshreg('perl6', 'decontrv', sub ($rv) {
    # Guard against the type being returned; if it's a Scalar then that
    # is what we guard against here (nqp::what would normally look at
    # the type inside such a container; nqp::what_nd does not do that).
    nqp::speshguardtype($rv, nqp::what_nd($rv));

    # Check if it's an instance of a container.
    if nqp::isconcrete_nd($rv) && nqp::iscont($rv) {
        # Guard that it's concrete, so this plugin result only applies
        # for container instances, not the Scalar type object.

        # If it's a Scalar container then we can optimize further.
        if nqp::eqaddr(nqp::what_nd($rv), Scalar) {
            # Grab the descriptor.
            my $desc := nqp::speshguardgetattr($rv, Scalar, '$!descriptor');
            if nqp::isconcrete($desc) {
                # Has a descriptor, so `rw`. Guard on type of value. If it's
                # Iterable, re-containerize. If not, just decont.
                my $value := nqp::speshguardgetattr($rv, Scalar, '$!value');
                nqp::speshguardtype($value, nqp::what_nd($value));
                return nqp::istype($value, $Iterable) ?? &recont !! &decont;
            else {
                # No descriptor, so it's already readonly. Return as is.
                return &identity;

        # Otherwise, full slow-path decont.
        return &decontrv;
    else {
        # No decontainerization to do, so just produce identity.
        return &identity;

Where &identity is the identity function, &decont removes the value from its container, &recont wraps the value in a new container (so an Iterable in a Scalar stays as a single item), and &decontrv is the slow-path for cases that we do not know how to optimize.

The same principle is also used for assignment, however there are more cases to analyze there. They include:

Vivifying hash assignments are not yet optimized by the spesh plugin, but will be in the near future.

The code selected by the plugin is then executed to perform the operation. In most cases, there will only be a single specialization selected. In that case, the optimizer will inline that specialization result, meaning that the code after optimization is just doing the required set of steps needed to do the work.

Next steps

Most immediately, a change to such a foundational part of the the Rakudo Perl 6 implementation has had some fallout. I’m most of the way through dealing with the feedback from toaster (which runs all the ecosystem module tests), being left with a single issue directly related to this work to get to the bottom of. Beyond that, I need to spend some time re-tuning array and hash access to better work with these changes.

Then will come the step that this change was largely in aid of: implementing escape analysis and scalar replacement, which for much Perl 6 code will hopefully give a quite notable performance improvement.

This brings me to the end of my current 200 hours on my Perl 6 Performance and Reliability Grant. Soon I will submit a report to The Perl Foundation, along with an application to continue this work. So, all being well, there will be more to share soon. In the meantime, I’m off to enjoy a week’s much needed vacation.

samcv: Secure Hashing for MoarVM to Prevent DOS Attacks

Published on 2018-05-16T00:00:00

Hashes are very useful data structures and underlie many internal representations in Perl 6 as well as being used as themselves. These data structures are very nice since they offer O(1) insertion time and O(1) lookup time on average. Hashes have long been considered an essential feature for Perl, much loved by users. Though when exploited, hashes can cause servers to grind to a halt. New in Rakudo Perl 6 2018.5 will be a feature called hash randomization which does much to help protect against this attack. In this article I explain some hashing basics as well as how the attack against non-randomized hashing can work.

Table of Contents

Hashing Basics

Some hashing basics: when we use a hash, we take a string and come up with a unique integer to represent the string. Similar to how md5 or sha1 sums take an arbitrary amount of data and condense it into a shorter number which can identify it, we do a similar thing for strings.

my %hash; %hash<foo> = 10

In this code, MoarVM takes the string foo and performs a hashing function on it using a series of bitwise operations. The goal is to create a shorter number which allows us to put the foo key into one of the 8 buckets that MoarVM initializes when a hash is created.

8 Hash buckets

Our hashing code sets up a predefined number of buckets . When a bucket fills up to have 10 items it doubles the number of buckets. In normal operation the hashes will be randomly distributed, so it would take ≈47 keys added (≈47 is the average number of items to result in one bucket being filled to 10 items) before we have to expand the buckets the first time.

When the buckets are expanded, we will now have 16 buckets. In normal operation our previous ≈47 items should be evenly distributed into those 16 buckets.

The Attack

Without a random hash seed it is easy for an attacker to generate strings which will result in the same hash. This devolves to O(n️2) time for the hash lookup. This O(n2) is actually O(string_length * num_collisions). When we have hash collisions, that means that no matter how many times we double the number of buckets we have, the strings which have hash collisions will always remain in the same bucket as each other. To locate the correct string, MoarVM must go down the chain and compare each hash value with the one we’re looking for. Since they are all the same, we must fall back to also checking each string itself manually until we find the correct string in that bucket.

Hash collision

This attack is done by creating a function that essentially is our hashing function backward (for those curious see here for an example of code which does forward and backward hashing for Chrome V8 engine’s former hashing function). We hash our target string, t. We then use random 3 character sequences (in our case graphemes) and plug them into our backward hashing function along with the hash for our target t. The backward hash and the random character sequence are stored in the dictionary and the process is repeated until we have a very large number of backward hash’s and random 3 grapheme prefixes.

We can then use this dictionary to construct successively longer strings (or short if we so desire) which are the same hash as our target string t. This is a simplification of how the Meet-In-The-Middle attack works.

This has been fixed in most programming languages (Python, Ruby, Perl), and several CVE’s have been issued over the years for this exploit (See CVE’s for PHP, OCaml, Perl, Ruby and Python).

Assuming everything is fine for the next release I will also merge changes which introduce a stronger hashing function called SipHash. SipHash is meant to protect against an attacker discovering a hash secret remotely. While randomizing the seed makes this attack much harder, a determined attacker could discover the hash and if that is done they can easily perform a meet in the middle attack. SipHash was designed to solve the vulnerability of the hash function itself to meet-in-the-middle attacks. Both the randomization of the hash secret in addition with a non-vulnerable hashing function work work together to avert hash collision denial of service attacks.

While the hash secret randomization will be out in Rakudo 2018.05, SipHash is planned to be introduced in Rakudo 2018.06.

Randomness Source

On Linux and Unix we prefer function calls rather than reading from /dev/urandom. There are some very important reasons for this.

Relying on an external file existing is potentially problematic. If we are in a chroot and /dev is not mounted we will not have access to /dev/urandom. /dev/urandom is not special, it can be deleted by accident (or on purpose) or a sparse data file mounted in its place undetectable by programs. Trusting it simply because of its path is not ideal. Also, if we exhaust the maximum number of open file descriptors we will be unable to open /dev/urandom as well.

System Functions

On Windows we use the pCryptGenRandom which is provided by advapi32.dll since Windows XP.

Linux, FreeBSD, OpenBSD and MacOS all use system provided random calls (if available) to get the data rather than having to open /dev/urandom. All these OS’s guarantee these calls to be non-blocking, though MacOS’s documentation does not comment on it. This is mostly important in very early userspace, which bit Python when a developer accidentally changed the randomness source causing systems which relied on very early Python scripts to stop booting due to waiting for randomness source to initialize.

If the function doesn’t exist we fall back to using /dev/urandom. If opening or reading it fails, on BSD’s we will use the arc4random() function. In many BSD’s this is seeded from the system’s random entropy pool, providing us with a back up in case /dev/urandom doesn’t exist.

On Linux we use the getrandom() system call which was added to kernel 3.17 instead of using the glibc wrapper since the glibc wrapper was added much later than to the kernel.

On MacOS, Solaris and FreeBSD we use getrandom() while on OpenBSD we use getentropy().

User Facing Changes

From Rakudo Perl 6 2018.05, the order that keys are returned will be random between each perl6 instance.

perl6 -e 'my %hash = <a 1 b 1 c 1 d 1 e 1 f 1>; say %hash.keys'
(d f c a b e)
perl6 -e 'my %hash = <a 1 b 1 c 1 d 1 e 1 f 1>; say %hash.keys'
(e f a d c b)

This will also effect iterating a hash without sorting: for %hash { }

What Do I Have To Do?

Users and module developers should make sure that they explicitly sort hashes and not rely on a specific order being constant. If you have a module, take a look at the code and see where you are iterating on a hash’s keys and whether or not the order of processing the hash’s keys affects the output of the program.

# This should be okay since we are putting the hash into another hash, order
# does not matter.
for %hash.keys -> $key {
    %stuff{$key} = $i++;
# This can potentially cause issues, depending on where `@stuff` is used.
for %hash.keys -> $key {
    @stuff.push: $key;
# This should be OK since we are using is-deeply and comparing a hash with another
# hash
is-deeply my-cool-hash-returning-function($input), %( foo => 'text', bar => 'text', baz => 'text');
# Probably best to avoid using `is`. The `is` test function converts the input to a string before
# checking for equality, but works since we stringify it in sorted order.
is %hash,  %( foo => 'text', bar => 'text', baz => 'text');
# NO. Keys are not guaranteed to be in the same order on each invocation
is %hash.keys, <a b c d>;

Module Developers

Module developers should check out the git master of Rakudo, or if 2018.05 has been released, use that to run the tests of your module. Make sure to run the tests multiple times, ideally at least 10 times or use a loop:

while prove -e 'perl6 -Ilib'; do true; done

This loop will run again and again until it encounters a test failure, in which case it will stop.

You must run your tests many times because the hash order will be different on each run. For hashes will a small number of items, it may not fail on every run. Make sure that you also look at the source to identify items that need fixing; don’t just rely on the test’s to tell you if you must make changes to your module.

Further Reading

Hardening Perl’s Hash Function, article by about changes Perl 5 has made to harden hashing. Rakudo Star Release 2018.04

Published on 2018-05-07T00:00:00