pl6anet

Perl 6 RSS Feeds

Steve Mynott (Freenode: stmuk) steve.mynott (at)gmail.com / 2017-01-21T08:19:12


gfldex: Once a Week

Published by gfldex on 2017-01-20T21:34:12

Rakudo is still changing quickly for the betterment of mankind (I’m quite sure of that). Once in a while commits will break code that is or was working. Checking for regressions is boring because most of the time it wont happen. As the army of bots that populate #perl6 shows, we like to offload the housekeeping to software and for testing we use travis.

Since a while travis provides build in cronjobs for rebuilding repos automatically. It’s a little hidden.

teavis-cron

I missed the cron settings at first because one needs to run a build before it even shows up in the settings tab.

With cron jobs setup to test my modules once a week I will get some angry e-mails every time Rakudo breaks my code.

After clicking the same buttons over and over again I got bored and found the need to automate that step away as well. A meta module that doesn’t do anything but having dependencies to my real modules would save me the clicking and would produce exactly one angry e-mail, what will become more important as the Perl 6 years pass by. The .travis.yml looks like this:

language: perl6
sudo: false
perl6:
- latest
install:
- rakudobrew build-zef
- zef --debug install .

And the META6.json is:

{
"name" : "gfldex-meta-zef-test",
"source-url" : "git://github.com/gfldex/gfldex-meta-zef-test",
"perl" : "v6",
"build-depends" : [ ],
"provides" : {
},
"depends" : [ "Typesafe::XHTML::Writer", "Rakudo::Slippy::Semilist", "Operator::defined-alternation", "Concurrent::Channelify", "Concurrent::File::Find", "XHTML::Writer", "Typesafe::HTML" ],
"description" : "Meta package to test installation with zef",
"test-depends" : [ ],
"version" : "0.0.1",
"authors" : [
"Wenzel P. P. Peppmeyer"
]
}

And lo and behold I found a bug. Installing XHTML::Writer on a clean system didn’t work because zef uses Build.pm differently then panda. I had to change the way Rakudos lib path is set because zef keeps dependency modules in temporary directories until the last test passed.

my @include = flat "$where/lib/Typesafe", $*REPO.repo-chain.map(*.path-spec);
my $proc = run 'perl6', '-I' «~« @include, "$where/bin/generate-function-definition.p6", "$where/3rd-party/xhtml1-strict.xsd", out => $out-file, :bin;

Please note that Build.pm will be replaced with something reasonable as soon as we figured out what reasonable means for portable module building.

There is a little bug in zef that makes the meta module fail to test. Said bug is being hunted right now.

All I need to do to get a new module tested weekly by travis is to add it to the meta module and push the commit. Laziness is a programmers virtue indeed.


Weekly changes in and around Perl 6: 2017.03 🙆‍♀️ (woman gesturing OK)

Published by liztormato on 2017-01-16T21:22:27

In the past week, Samantha McVey has landed emoji support in Rakudo Perl 6 on the MoarVM backend. The code of the title of this week:

say "\c[woman gesturing OK] (woman gesturing OK)";

If you see some strange characters in the title before the parenthesis open, your system doesn’t support Unicode 9 emojis yet. Oh, and should you wonder, all emojis are still only 1 character, thanks to NFG!

say "\c[woman gesturing OK]".chars;    # 1

Javascript Backend Milestone

Paweł Murias has reached another milestone in the development of the Javascript backend for Rakudo Perl 6. Again, a step closer to being able to run Perl 6 code in the browser. And for those of you remembering when Jonathan Worthington reached a similar milestone for the JVM backend: from here on out, it’s going to be a lot easier. I can only wish more power to Paweł!

Perl 6 – The Musical

JJ Merelo has made his latest Perl 6 project public. The introduction:

This book is about learning programming using a promising, and almost completely new, language: Perl 6. But it is only Perl 6 specific in a minority of the content. Most chapters that deal with Perl 6 could be rewritten using any other language, preferably a new, cool language such as Go or Rust. I, or someone, might do it some day. But for the time being, let us be content with Perl 6. Which is also new and cool.

He, and his family, will also talk about it at FOSDEM.

Blog Posts

Meanwhile on FaceBook

I’m adding strong typing to Perl6::Parser in order to help catch some stubborn bugs, and noticing runtimes *apparently* decreasing as I add more Array[Perl6::Element] return types. Yay team! ~10 seconds off Perl6::Parser‘s test suite. Incidentally I’m going to add some ‘find’ methods and start threading elements to have parents and siblings as well as children to make the structure easier to walk. And it pays off within 20 minutes of adding the last ‘returns‘ clauses by uncovering a hidden test-suite bug.
Jeffrey Goff

Meanwhile on StackOverflow

People are starting to ask more and more Perl 6 questions (and get answers!). The past week saw:

Core Developments

Ecosystem Additions

Winding Down

This needed quite a lot of tea. See you next week!


Perlgeek.de: Perl 6 By Example: Stateful Silent Cron

Published by Moritz Lenz on 2017-01-14T23:00:01

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


In the last two installments we've looked at silent-cron, a wrapper around external programs that silences them in case their exit status is zero. But to make it really practical, it should also silence occasional failures.

External APIs fail, networks become congested, and other things happen that prevent a job from succeeding, so some kind of retry mechanism is desirable. In case of a cron job, cron already takes care of retrying a job on a regular basis, so silent-cron should just suppress occasional errors. On the other hand, if a job fails consistently, this is usually something that an operator or developer should look into, so it's a problem worth reporting.

To implement this functionality, silent-cron needs to store persistent state between separate runs. It needs to record the results from the current run and then analyze if the failure history qualifies as "occasional".

Persistent Storage

The storage backend needs to write and retrieve structured data, and protect concurrent access to the state file with locking. A good library for such a storage backend is SQLite, a zero-maintenance SQL engine that's available as a C library. It's public domain software and in use in most major browsers, operating systems and even some airliners.

Perl 6 gives you access to SQLite's functionality through DBIish, a generic database interface with backend drivers for SQLite, MySQL, PostgreSQL and Oracle DB. To use it, first make sure that SQLite3 is installed, including its header files. On a Debian-based Linux system, for example, you can achieve this with apt-get install libsqlite3-dev. If you are using the Rakudo Star distribution, DBIish is already available. If not, you can use one of the module installers to retrieve and install it: panda install DBIish or zef install DBIish.

To use the DBIish's SQLite backend, you first have to create a database handle by selecting the backend and supplying connection information:

use DBIish;
my $dbh = DBIish.connect('SQLite', :database('database-file.sqlite3'));

Connecting to a database file that does not yet exist creates that file.

One-off SQL statements can be executed directly on the database handle:

$dbh.do('INSERT INTO player (name) VALUES ?', 'John');

The ? in the SQL is a placeholder that is passed out-of-band as a separate argument to the do method, which avoids potential errors such as SQL injection vulnerabilities.

Queries tend to work by first preparing a statement which returns a statement handle. You can execute a statement once or multiple times, and retrieve result rows after each execute call:

my $sth = $dbh.prepare('SELECT id FROM player WHERE name = ?');

my %ids;
for <John Jack> -> $name {
    $sth.execute($name);
    %ids{ $name } = $sth.row[0];
}
$sth.finish;

Developing the Storage Backend

We shouldn't just stuff all the storage handling code into sub MAIN, we should instead carefully consider the creation of a useful API for the storage backend. At first, we need only two pieces of functionality: insert the result of a job execution; and retrieve the most recent results.

Since silent-cron can be used to guard multiple cron jobs on the same machine, we might need something to distinguish the different jobs so that one of them succeeding doesn't prevent error reporting for one that is constantly failing. For that we introduce a job name, which can default to the command (including arguments) being executed but which can be set explicitly on the command line.

The API for the storage backend could look something like this:

my $repo = ExecutionResultRepository.new(
    jobname   => 'refresh cache',
    statefile => 'silent-cron.sqlite3',
);
$repo.insert($result);
my @last-results = $repo.tail(5);

This API isn't specific to the SQLite backend at all; a storage backend that works with plain text files could have the exact same API.

Let's implement this API. First we need the class and the two attributes that should be obvious from the usage example above:

class ExecutionResultRepository {
    has $.jobname   is required;
    has $.statefile is required;
    # ... more code

To implement the insert method, we need to connect to the database and create the relevant table if it doesn't exist yet.

has $!db;
method !db() {
    return $!db if $!db;
    $!db = DBIish.connect('SQLite', :database($.statefile));
    self!create-schema();
    return $!db;
}

This code uses a private attribute $!db to cache the database handle and a private method !db to create the handle if it doesn't exist yet.

Private methods are declared like ordinary methods, except that the name starts with an exclamation mark. To call one, substitute the method call dot for the exclamation mark, in other words, use self!db() instead of self.db().

The !db method also calls the next private method, !create-schema, which creates the storage table and some indexes:

method !create-schema() {
    $!db.do(qq:to/SCHEMA/);
        CREATE TABLE IF NOT EXISTS $table (
            id          INTEGER PRIMARY KEY,
            jobname     VARCHAR NOT NULL,
            exitcode    INTEGER NOT NULL,
            timed_out   INTEGER NOT NULL,
            output      VARCHAR NOT NULL,
            executed    TIMESTAMP NOT NULL DEFAULT (DATETIME('NOW'))
        );
    SCHEMA
    $!db.do(qq:to/INDEX/);
        CREATE INDEX IF NOT EXISTS {$table}_jobname_exitcode ON $table ( jobname, exitcode );
    INDEX
    $!db.do(qq:to/INDEX/);
        CREATE INDEX IF NOT EXISTS {$table}_jobname_executed ON $table ( jobname, executed );
    INDEX
}

Multi-line string literals are best written with the heredoc syntax. qq:to/DELIMITER/ tells Perl 6 to finish parsing the current statement so that you can still close the method call parenthesis and add the statement-ending semicolon. The next line starts the string literal, which goes on until Perl 6 finds the delimiter on a line on its own. Leading whitespace is stripped from each line of the string literal by as much as the closing delimiter is indented.

For example

print q:to/EOS/;
    Not indented
        Indented four spaces
    EOS

Produces the output

Not indented
    Indented four spaces

Now that we have a working database connection and know that the database table exists, inserting a new record becomes easy:

method insert(ExecutionResult $r) {
    self!db.do(qq:to/INSERT/, $.jobname, $r.exitcode, $r.timed-out, $r.output);
        INSERT INTO $table (jobname, exitcode, timed_out, output)
        VALUES(?, ?, ?, ?)
    INSERT
}

Selecting the most recent records is a bit more work, partially because we need to convert the table rows into objects:

method tail(Int $count) {
    my $sth = self!db.prepare(qq:to/SELECT/);
        SELECT exitcode, timed_out, output
          FROM $table
         WHERE jobname = ?
      ORDER BY executed DESC
         LIMIT $count
    SELECT
    $sth.execute($.jobname);
    $sth.allrows(:array-of-hash).map: -> %h {
        ExecutionResult.new(
            exitcode  => %h<exitcode>,
            timed-out => ?%h<timed_out>,
            output    => %h<output>,
        );
    }
}

The last statement in the tail method deserves a bit of extra attention. $sth.allrows(:array-of-hash) produces the database rows as a list of hashes. This list is lazy, that is, it's generated on-demand. Lazy lists are a very convenient feature because they allow you to use iterators and lists with the same API. For example when reading lines from a file, you can write for $handle.lines -> $line { ... }, and the lines method doesn't have to load the whole file into memory; instead it can read a line whenever it is accessed.

$sth.allrows(...) is lazy, and so is the .map call that comes after it. map transforms a list one element at a time by calling the code object that's passed to it. And that is done lazily as well. So SQLite only retrieves rows from the database file when elements of the resulting list are actually accessed.

Using the Storage Backend

With the storage API in place, it's time to use it:

multi sub MAIN(*@cmd, :$timeout, :$jobname is copy,
               :$statefile='silent-cron.sqlite3', Int :$tries = 3) {
    $jobname //= @cmd.Str;
    my $result = run-with-timeout(@cmd, :$timeout);
    my $repo = ExecutionResultRepository.new(:$jobname, :$statefile);
    $repo.insert($result);

    my @runs = $repo.tail($tries);

    unless $result.is-success or @runs.grep({.is-success}) {
        say "The last @runs.elems() runs of @cmd[] all failed, the last execution ",
            $result.timed-out ?? "ran into a timeout"
                              !! "exited with code $result.exitcode()";

        print "Output:\n", $result.output if $result.output;
    }
    exit $result.exitcode // 2;
}

Now a job that succeeds a few times, and then fails up to two times in a row doesn't produce any error output, and only the third failed execution in a row produces output. You can override that on the command line with --tries=5.

Summary

We've discussed DBIish, a database API with pluggable backend, and explored using it with SQLite to store persistent data. In the process we also came across lazy lists and a new form of string literals called heredocs.

Subscribe to the Perl 6 book mailing list

* indicates required

Weekly changes in and around Perl 6: 2017.02 Dogfooding and Powerbotting

Published by liztormato on 2017-01-09T22:05:32

Seems everybody likes robots nowadays. The #perl6 channels on irc.freenode.org have quite a few of them. One of the oldest bots, dalek (written in Perl 5) announces commits to various repositories on the channel. Well, announced, because it has been decommissioned this week. A new, shiny bot called Geth (written in Perl 6) has replaced it (thanks to Zoffix Znet).

Meanwhile, the bisectable bot seems to have been inspirational to the Rust community. Too bad that post was not from a Rust developer.

If you’re interested in knowing what bots frequent the #perl6 channels, finding out about them is a bit troublesome. They’re not all officially documented, but there is a list that appears to be maintained. Maybe next week I can post a proper URL 🙂 .

Sparrowdo Blog

Alexey Melezhik announced the start of a blog about Sparrowdo, the lightweight and very flexible configuration management system written on Perl 6. It’s good to see such a tool being in active development!

FOSDEM

In a few weeks it’s FOSDEM time again: on 4 and 5 February, it will all be happening in Brussels, Belgium. There will be a Perl DevRoom on Sunday. With quite a few Perl 6 related presentations:

Of course, there will also be a Perl booth, where you can get your tuits, stickers, buttons, leaflets and other swag for free. And stuffed camels, stuffed Camelias and books (even a Perl 6 book) at special FOSDEM prices!

OSCON

It should also be noted that Jeffrey Goff will be given a half-day Perl 6 tutorial titled “Fundamentals of Perl 6 – From Zero to Scripting”. Too bad that’s the only Perl element at OSCON (8-11 May). But one can say we’re working on the future!

The Perl Conference in DC

The Perl Conference in DC (18-23 June) (formerly known as YAPC::NA) has presented its Call for Presentations. So please start submitting your Perl 6 presentations! 🙂

Meanwhile on FaceBook

The Perl 6 FaceBook Group has grown to 360 members. In the past week, a few people posted feedback as to how Rakudo Perl 6 has become faster and faster. Some quotes:

Over the last week Perl6::Parser‘s test suite magically sped up from 120 seconds total to 100 seconds total, and all I did was rebuild perl6. (cusr time went up a bit as did csys, but that’s another 16% speed increase in a matter of days! Good work to the #perl6 core team!
Jeffrey Goff

Found some random Perl 6 toy-code I wrote a few years ago, at the time when no compiler existed that would compile it. I forget what the issue was, but it was plum broken. Wouldn’t ya know? It worked first time with Rakudo this morning. That says something mighty fine about the development & developers of Perl6 and Rakudo. Nice work, all.
Paul Bennett

I switched to Wunderlist for managing my to do lists last month. But it didn’t quite handle recurring tasks the ways I wanted to. Good news: it has a public API. Better news: there is a Python library to work with it, and it works GREAT with Perl 6’s Inline::Python. Basically there are six lines of straightforward boilerplate at the beginning of my code, and after that you can call into the Python library almost exactly like it was native Perl 6.
Solomon Foster

That’s the type of stuff we really like to hear!

Meanwhile on Twitter

The Perl 6 News Feed on Twitter now has more than 120 followers, and has seen quite a lot of tweets. If you really want to be at the cutting edge of Perl 6 news, that’s a way to get it!

Meanwhile on GitHub

Rakudo Perl 6 was added to the Programming Languages Showcase. It’s always good to get a little more exposure!

Perl Foundation Grants

If you’re interested to get a grant from The Perl Foundation to do some Perl 6 development work you would otherwise not be able to do, you have until 15 January to send in your proposal!

Core Developments

Blog Posts

Ecosystem Additions

A nice catch again!

Winding Down

Phew! If this is any indication of the amount of news every week in 2017, I think I will need quite a lot more tea! Check in again next week to see if I did need more tea 🙂


Death by Perl6: Hello Web! with Purée Perl 6

Published by Tony O'Dell on 2017-01-09T18:19:56

Let's build a website.

Websites are easy to build. They are dozens of frameworks out there to use, perl has Mojo and Catalyst as their major frameworks and other languages also have quite a few decent options. Some of them come with boilerplate templates and you just go from there. Others don't and you spend your first few hours learning how to actually set up the framework and reading about how to share your DB connection between all of your controllers and blah, blah, blah. Let's look at one of P6's web frameworks.

Enter Hiker

Hiker doesn't introduce a lot of (if any) new ideas. It does use paradigms you're probably used to and it aims to make the initialization of creating your website very straight forward and easy, that way you can get straight to work sharing your content with the English.

The Framework

Hiker is intended to make things fast and easy from the development side. Here's how it works. If you're not into the bleep blop and just want to get started, skip to the Boilerplate heading.

Application Initialization

  1. Hiker reads from the subdirectories we'll look at later. The controllers and models are classes.
  2. Looking at all controllers, initializes a new object for that class, and then checks for their .path attribute
    1. If Hiker can't find the path attribute then it doesn't bind anything and produces a warning
  3. After setting up the controller routes, it instantiates a new object for the model as specified by the controller (.model)
    1. If none is given by the controller then nothing is instantiated or bound and nothing happens
    2. If a model is required by the controller but it cannot be found then Hiker refuses to bind
  4. Finally, HTTP::Server::Router is alerted to all of the paths that Hiker was able to find and verify

The Request

  1. If the path is found, then the associated class' .model.bind is called.
    1. The response (second parameter of .model.bind($req, $res)) has a hash to store information: $res.data
  2. The controller's .handler($req, $res) is then executed
    1. The $res.data hash is available in this context
  3. If the handler returns a Promise then Hiker waits for that to be kept (and expects the result to be True or False)
    1. If the response is already rendered and the Promise's status is True then the router is alerted that no more routes should be explored
    2. If the response isn't rendered and the Promise's result is True, then .render is called automagically for you
    3. If the response isn't rendered and the Promise's result is False, then the next matching route is called

Boilerplate

Ensure you have Hiker installed:

$ zef install Hiker
$ rakudobrew rehash #this may be necessary to get the bin to work

Create a new directory where you'd like to create your project's boilerplate and cd. From here we'll initialize some boilerplate and look at the content of the files.

somedir$ hiker init  
==> Creating directory controllers
==> Creating directory models
==> Creating directory templates
==> Creating route MyApp::Route1: controllers/Route1.pm6
==> Creating route MyApp::Model1: models/Model1.pm6
==> Creating template templates/Route1.mustache
==> Creating app.pl6

Neato burrito. From the output you can see that Hiker created some directories - controllers, models, templates - for us so we can start out organized. In those directories you will find a few files, let's take a look.

The Model

use Hiker::Model; 

class MyApp::Model1 does Hiker::Model {  
  method bind($req, $res) {
    $res.data<who> = 'web!';
  }
}  

Pretty straight forward. MyApp::Model1 is instantiated during Hiker initialization and .bind is called whenever the controller's corresponding path is requested. As you can see here, this Model just adds to the $res.data hash the key value pair of who => 'web!'. This data will be available in the Controller as well as available in the template files (if the controller decides to use that).

The Controller

use Hiker::Route; 

class MyApp::Route1 does Hiker::Route {  
  has $.path     = '/';
  has $.template = 'Route1.mustache';
  has $.model    = 'MyApp::Model1';

  method handler($req, $res) {
    $res.headers<Content-Type> = 'text/plain';
  }
}  

As you can see above, the Hiker::Route has a lot of information in a small space and it's a class that does a Hiker role called Hiker::Route. This let's our framework know that we should inspect that class for the path, template, model so it can handle those operations for us - path and template are the only required attributes.

As discussed above, our Route can return a Promise if there is some asynchronous operation that is to be performed. In this case all we're going to do is set the header's to indicated the Content Type and then, automagically, render the template file. Note: if you return a Falsey value from the handler method, then the router will not auto render and it will attempt to find the next route. This is so that you can cascade paths in the event that you want to chain them together, do some type of decision making real time to determine whether that's the right class for the request, or perform some other unsaid dark magic. In the controller above we return a Truethy value and it auto renders.

By specifying the Model in the Route, you're able to re-use the same Model class across multiple routes.

The Path

Quick notes about .path. You can pass a ('/staticpath'), maybe a path with a placeholder ('/api/:placeholder'), or if you're path is a little more complicated then you can pass in a regex (/ .* /). Check out the documentation for HTTP::Server::Router (repo).

The Template

The template is specified by the controller's .template attribute and Hiker checks for that file in the ./templates folder. The default template engine is Template::Mustache (repo). See that module's documentation for more info.

Running the App

Really pretty straight forward from the boilerplate:

somedir$ perl6 app.pl6  

Now you can visit http://127.0.0.1:8080/ in your favorite Internet Explorer and find a nice 'Hello web!' message waiting to greet you. If you visit any other URI you'll receive the default 'no route found' message from HTTP::Server::Router.

The Rest

The module is relatively young. With feedback from the community, practical applications, and some extra feature expansion, Hiker could be pretty great and it's a good start to taking the tediousness out of building a website in P6. I'm open to feedback and I'd love to hear/see where you think Hiker can be improved, what it's missing to be productive, and possibly anything else [constructive or otherwise] you'd like to see in a practical, rapid development P6 web server.

Perlgeek.de: Perl 6 By Example: Testing Silent Cron

Published by Moritz Lenz on 2017-01-07T23:00:01

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


The previous blog post left us with a bare-bones silent-cron implementation, but without tests. I probably sound like a broken record for bringing this up time and again, but I really want some tests when I start refactoring or extending my programs. And this time, getting the tests in is a bit harder, so I think it's worth discussing how to do it.

Refactoring

As a short reminder, this is what the program looks like:

#!/usr/bin/env perl6

sub MAIN(*@cmd, :$timeout) {
    my $proc = Proc::Async.new(|@cmd);
    my $collector = Channel.new;
    for $proc.stdout, $proc.stderr -> $supply {
        $supply.tap: { $collector.send($_) }
    }
    my $promise = $proc.start;
    my $waitfor = $promise;
    $waitfor = Promise.anyof(Promise.in($timeout), $promise)
        if $timeout;
    $ = await $waitfor;

    $collector.close;
    my $output = $collector.list.join;

    if !$timeout || $promise.status ~~ Kept {
        my $exitcode = $promise.result.exitcode;
        if $exitcode != 0 {
            say "Program @cmd[] exited with code $exitcode";
            print "Output:\n", $output if $output;
        }
        exit $exitcode;
    }
    else {
        $proc.kill;
        say "Program @cmd[] did not finish after $timeout seconds";
        sleep 1 if $promise.status ~~ Planned;
        $proc.kill(9);
        $ = await $promise;
        exit 2;
    }
}

There's logic in there for executing external programs with a timeout, and then there's logic for dealing with two possible outcomes. In terms of both testability and for future extensions it makes sense to factor out the execution of external programs into a subroutine. The result of this code is not a single value, we're potentially interested in the output it produced, the exit code, and whether it ran into a timeout. We could write a subroutine that returns a list or a hash of these values, but here I chose to write a small class instead:

class ExecutionResult {
    has Int $.exitcode = -1;
    has Str $.output is required;
    has Bool $.timed-out = False;
    method is-success {
        !$.timed-out && $.exitcode == 0;
    }
}

We've seen classes before, but this one has a few new features. Attributes declared with the . twigil automatically get an accessor method, so

has Int $.exitcode;

is roughly the same as

has Int $!exitcode;
method exitcode() { $!exitcode }

So it allows a user of the class to access the value in the attribute from the outside. As a bonus, you can also initialize it from the standard constructor as a named argument, ExecutionResult.new( exitcode => 42 ). The exit code is not a required attribute, because we can't know the exit code of a program that has timed out. So with has Int $.exitcode = -1 we give it a default value that applies if the attribute hasn't been initialized.

The output is a required attribute, so we mark it as such with is required. That's a trait. Traits are pieces of code that modify the behavior of other things, here of an attribute. They crop up in several places, for example in subroutine signatures (is copy on a parameter), variable declarations and classes. If you try to call ExecutionResult.new() without specifying an output, you get such an error:

The attribute '$!output' is required, but you did not provide a value for it.

Mocking and Testing

Now that we have a convenient way to return more than one value from a hypothetical subroutine, let's look at what this subroutine might look like:

sub run-with-timeout(@cmd, :$timeout) {
    my $proc = Proc::Async.new(|@cmd);
    my $collector = Channel.new;
    for $proc.stdout, $proc.stderr -> $supply {
        $supply.tap: { $collector.send($_) }
    }
    my $promise = $proc.start;
    my $waitfor = $promise;
    $waitfor = Promise.anyof(Promise.in($timeout), $promise)
        if $timeout;
    $ = await $waitfor;

    $collector.close;
    my $output = $collector.list.join;

    if !$timeout || $promise.status ~~ Kept {
        say "No timeout";
        return ExecutionResult.new(
            :$output,
            :exitcode($promise.result.exitcode),
        );
    }
    else {
        $proc.kill;
        sleep 1 if $promise.status ~~ Planned;
        $proc.kill(9);
        $ = await $promise;
        return ExecutionResult.new(
            :$output,
            :timed-out,
        );
    }
}

The usage of Proc::Async has remained the same, but instead of printing this when an error occurs, the routine now returns ExecutionResult objects.

This simplifies the MAIN sub quite a bit:

multi sub MAIN(*@cmd, :$timeout) {
    my $result = run-with-timeout(@cmd, :$timeout);
    unless $result.is-success {
        say "Program @cmd[] ",
            $result.timed-out ?? "ran into a timeout"
                              !! "exited with code $result.exitcode()";

        print "Output:\n", $result.output if $result.output;
    }
    exit $result.exitcode // 2;
}

A new syntactic feature here is the ternary operator, CONDITION ?? TRUE-BRANCH !! FALSE-BRANCH, which you might know from other programming languages such as C or Perl 5 as CONDITION ? TRUE-BRANCH : FALSE-BRANCH.

Finally, the logical defined-or operator LEFT // RIGHT returns the LEFT side if it's defined, and if not, runs the RIGHT side and returns its value. It works like the || and or infix operators, except that those check for the boolean value of the left, not whether they are defined.

In Perl 6, we distinguish between defined and true values. By default, all instances are true and defined, and all type objects are false and undefined. Several built-in types override what they consider to be true. Numbers that equal 0 evaluate to False in a boolean context, as do empty strings and empty containers such as arrays, hashes and sets. On the other hand, only the built-in type Failure overrides definedness. You can override the truth value of a custom type by implementing a method Bool (which should return True or False), and the definedness with a method defined.

Now we could start testing the sub run-with-timeout by writing custom external commands with defined characteristics (output, run time, exit code), but that's rather fiddly to do so in a reliable, cross-platform way. So instead I want to replace Proc::Async with a mock implementation, and give the sub a way to inject that:

sub run-with-timeout(@cmd, :$timeout, :$executer = Proc::Async) {
    my $proc = $executer.defined ?? $executer !! $executer.new(|@cmd);
    # rest as before

Looking through sub run-with-timeout, we can make a quick list of methods that the stub Proc::Async implementation needs: stdout, stderr, start and kill. Both stdout and stderr need to return a Supply. The simplest thing that could possibly work is to return a Supply that will emit just a single value:

my class Mock::Proc::Async {
    has $.out = '';
    has $.err = '';
    method stdout {
        Supply.from-list($.out);
    }
    method stderr {
        Supply.from-list($.err);
    }

Supply.from-list returns a Supply that will emit all the arguments passed to it; in this case just a single string.

The simplest possible implementation of kill just does nothing:

    method kill($?) {}

$? in a signature is an optional argument ($foo?) without a name.

Only one method remains that needs to be stubbed: start. It's supposed to return a Promise that, after a defined number of seconds, returns a Proc object or a mock thereof. Since the code only calls the exitcode method on it, writing a stub for it is easy:

has $.exitcode = 0;
has $.execution-time = 1;
method start {
    Promise.in($.execution-time).then({
        (class {
            has $.exitcode;
        }).new(:$.exitcode);
    });
}

Since we don't need the class for the mock Proc anywhere else, we don't even need to give it a name. class { ... } creates an anonymous class, and the .new call on it creates a new object from it.

As mentioned before, a Proc with a non-zero exit code throws an exception when evaluated in void context, or sink context as we call it in Perl 6. We can emulate this behavior by extending the anonymous class a bit:

class {
    has $.exitcode;
    method sink() {
        die "mock Proc used in sink context";
    }
}

With all this preparation in place, we can finally write some tests:

multi sub MAIN('test') {
    use Test;

    my class Mock::Proc::Async {
        has $.exitcode = 0;
        has $.execution-time = 0;
        has $.out = '';
        has $.err = '';
        method kill($?) {}
        method stdout {
            Supply.from-list($.out);
        }
        method stderr {
            Supply.from-list($.err);
        }
        method start {
            Promise.in($.execution-time).then({
                (class {
                    has $.exitcode;
                    method sink() {
                        die "mock Proc used in sink context";
                    }
                }).new(:$.exitcode);
            });
        }
    }

    # no timeout, success
    my $result = run-with-timeout([],
        timeout => 2,
        executer => Mock::Proc::Async.new(
            out => 'mocked output',
        ),
    );
    isa-ok $result, ExecutionResult;
    is $result.exitcode, 0, 'exit code';
    is $result.output, 'mocked output', 'output';
    ok $result.is-success, 'success';

    # timeout
    $result = run-with-timeout([],
        timeout => 0.1,
        executer => Mock::Proc::Async.new(
            execution-time => 1,
            out => 'mocked output',
        ),
    );
    isa-ok $result, ExecutionResult;
    is $result.output, 'mocked output', 'output';
    ok $result.timed-out, 'timeout reported';
    nok $result.is-success, 'success';
}

This runs through two scenarios, one where a timeout is configured but not used (because the mocked external program exits first), and one where the timeout takes effect.

Improving Reliability and Timing

Relying on timing in tests is always unattractive. If the times are too short (or too slow together), you risk sporadic test failures on slow or heavily loaded machines. If you use more conservative temporal spacing of tests, the tests can become very slow.

There's a module (not distributed with Rakudo) to alleviate this pain: Test::Scheduler provides a thread scheduler with virtualized time, allowing you to write the tests like this:

use Test::Scheduler;
my $*SCHEDULER = Test::Scheduler.new;
my $result = start run-with-timeout([],
    timeout => 5,
    executer => Mock::Proc::Async.new(
        execution-time => 2,
        out => 'mocked output',
    ),
);
$*SCHEDULER.advance-by(5);
$result = $result.result;
isa-ok $result, ExecutionResult;
# more tests here

This installs the custom scheduler, and $*SCHEDULER.advance-by(5) instructs it to advance the virtual time by 5 seconds, without having to wait five actual seconds. At the time of writing (December 2016), Test::Scheduler is rather new module, and has a bug that prevents the second test case from working this way.

Installing a Module

If you want to try out Test::Scheduler, you need to install it first. If you run Rakudo Star, it has already provided you with the panda module installer. You can use that to download and install the module for you:

$ panda install Test::Scheduler

If you don't have panda available, you can instead bootstrap zef, an alternative module installer:

$ git clone https://github.com/ugexe/zef.git
$ cd zef
$ perl6 -Ilib bin/zef install .

and then use zef to install the module:

$ zef install Test::Scheduler

Summary

In this installment, we've seen attributes with accessors, the ternary operator and anonymous classes. Testing of threaded code has been discussed, and how a third-party module can help. Finally we had a very small glimpse at the two module installers, panda and zef.

Subscribe to the Perl 6 book mailing list

* indicates required

gfldex: Leaving out considered dangerous

Published by gfldex on 2017-01-07T19:50:17

A knowledge seeker asked us why a loop spec allows $i>10 but not $i<10. The reason is that the postcircumfix:«< >» changes the grammar in such a way that it expects a list quote after the $i. As a result you get the following.

loop (my $i=0;$i<10;$i++) {};
# OUTPUT«===SORRY!=== Error while compiling ␤Whitespace required before < operator␤at :1␤------> loop (my $i=0;$i<10;$i++) {};⏏␤    expecting any of:␤        postfix␤»

I tried to illustrate the problem by making the $i>10 case fail as well by defining a new operator.

sub postcircumfix:«> <»($a){}; loop (my $i=0;$i>10;$i++) {};
# OUTPUT«===SORRY!=== Error while compiling ␤Unable to parse expression in postcircumfix:sym«> <»; couldn't find final $stopper ␤at :1␤------> rcumfix:«> <»($a){}; loop (my $i=0;$i>10⏏;$i++) {};␤    expecting any of:␤        s…»

I concluded with the wisdom that that Perl 6 is a dynamic dynamic language. While filing a related bug report I made the new years resolution to put a white space around each and every operator. You may want to do the same.


gfldex: Perl 6 is Smalltalk

Published by gfldex on 2017-01-04T22:35:59

Masak kindly pointed the general public to a blog post that talks about how awesome Smalltalk is.
The example presented there reads:

a < b
  ifTrue: [^'a is less than b']
  ifFalse: [^'a is greater than or equal to b']

The basic idea is that ifTrue and ifFalse are methods on the class Bool. Perl 6 don’t got that and I thought it would be tricky to add because augment enum doesn’t work. After some tinkering I found that augment doesn’t really care what you hand it as long as it is a class. As it happens Rakudo doesn’t check if the class is really a class, it simply looks for a type object with the provided name. The following just works.

use MONKEY-TYPING;
augment class Bool {
    method ifTrue(&c){ self ?? c(self) !! Nil; self }
    method ifFalse(&c){ self ?? Nil !! c(self); self }
}

(1 < 2)
    .ifTrue({say ‚It's True!‘})
    .ifFalse({ say ‚It's False!‘});

If we call only one of the new methods on Bool, we could even use the colon form.

(True)
    .ifTrue: { say "It's $^a!" };

As you likely spotted I went a little further as Smalltalk by having the added methods call the blocks with the Bool in question. Since Block got a single optional positional parameter the compiler wont complain if we just hand over a block. If a pointy block or a Routine is provided it would need a Signature with a single positional or a slurpy.

Please note that augment on an enum that we call a class is not in the spec yet. A bug report was filed and judgement is pending. If that fails there is always the option to sneak the methods into the type object behind Bool at runtime via the MOP.

And so I found that Perl 6 is quite big but still nice to talk about.

UPDATE: There where complains that IfTrue contained an if statement. That’s was silly and fixed.


Weekly changes in and around Perl 6: 2017.01 Glancing At A Prime Time

Published by liztormato on 2017-01-02T22:26:54

Yes, it’s going to be a prime year this year.

say "Prime Year" if 2017.is-prime;

In other ways, Perl 6 is also ready for the future!

say "News" if True but False;

Leaving it to the reader to decide which way that will go 😏

Perl 6 at a Glance

Andrew Shitov has beaten everybody with the first (modern) Perl 6 book on the market. So that will make at least four generic Perl 6 books this year. Can’t wait to actually have them all in my hands!

Samantha McVey

It is an honour to welcome our latest Perl 6 Core Developer. Looking at her track record so far, one knows that she will do great things in the coming years. Meanwhile, I will try to Keep Calm and Continue Programming.

Core Developments

Blog Posts

Ecosystem Additions

Quite a nice bunch again this week.

Winding Down

One could say we’re off to a great start of the New Year! It also appears that everybody in the Perl 6 community still has the same number of fingers and eyes as before all of the New Year’s Eve fireworks. Please check in again next week for your weekly dose of Perl 6 News!


Steve Mynott: Rakudo Star: Past Present and Future

Published by Steve Mynott on 2017-01-02T14:07:31

At YAPC::EU 2010 in Pisa I received a business card with "Rakudo Star" and the
date July 29, 2010 which was the date of the first release -- a week earlier
with a countdown to 1200 UTC. I still have mine, although it has a tea stain
on it and I refreshed my memory over the holidays by listening again to Patrick
Michaud speaking about the launch of Rakudo Star (R*):

https://www.youtube.com/watch?v=MVb6m345J-Q

R* was originally intended as first of a number of distribution releases (as
opposed to a compiler release) -- useable for early adopters but not initially production
Quality. Other names had been considered at the time like Rakudo Beta (rejected as
sounding like "don't use this"!) and amusingly Rakudo Adventure Edition.
Finally it became Rakudo Whatever and Rakudo Star (since * means "whatever"!).

Well over 6 years later and we never did come up with a better name although there
was at least one IRC conversation about it and perhaps "Rakudo Star" is too
well established as a brand at this point anyway. R* is the Rakudo compiler, the main docs, a module installer, some modules and some further docs.

However, one radical change is happening soon and that is a move from panda to
zef as the module installer. Panda has served us well for many years but zef is
both more featureful and more actively maintained. Zef can also install Perl
6 modules off CPAN although the CPAN-side support is in its early days. There
is a zef branch (pull requests welcome!) and a tarball at:

http://pl6anet.org/drop/rakudo-star-2016.12.zef-beta2.tar.gz

Panda has been patched to warn that it will be removed and to advise the use of
zef. Of course anyone who really wants to use panda can reinstall it using zef
anyway.

The modules inside R* haven't changed much in a while. I am considering adding
DateTime::Format (shown by ecosystem stats to be widely used) and
HTTP::UserAgent (probably the best pure perl6 web client library right now).
Maybe some modules should also be removed (although this tends to be more
controversial!). I am also wondering about OpenSSL support (if the library is
available).

p6doc needs some more love as a command line utility since most of the focus
has been on the website docs and in fact some of these changes have impacted
adversely on command line use, eg. under Windows cmd.exe "perl 6" is no longer
correctly displayed by p6doc. I wonder if the website generation code should be
decoupled from the pure docs and p6doc command line (since R* has to ship any
new modules used by the website). p6doc also needs a better and faster search
(using sqlite?). R* also ships some tutorial docs including a PDF generated from perl6intro.com.
We only ship the English one and localisation to other languages could be
useful.

Currently R* is released roughly every three months (unless significant
breakage leads to a bug fix release). Problems tend to happen with the
less widely used systems (Windows and the various BSDs) and also with the
module installers and some modules. R* is useful in spotting these issues
missed by roast. Rakudo itself is still in rapid development. At some point a less frequently
updated distribution (Star LTS or MTS?) will be needed for Linux distribution
packagers and those using R* in production). There are also some question
marks over support for different language versions (6.c and 6.d).

Above all what R* (and Rakudo Perl 6 in general) needs is more people spending
more time working on it! JDFI! Hopefully this blog post might
encourage more people to get involved with github pull requests.

https://github.com/rakudo/star

Feedback, too, in the comments below is actively encouraged.


Perlgeek.de: Perl 6 By Example: Silent Cron, a Cron Wrapper

Published by Moritz Lenz on 2016-12-31T23:00:01

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


On Linux and UNIX-Like systems, a program called cron periodically executes user-defined commands in the background. It is used for system maintenance tasks such as refreshing or removing caches, rotating and deleting old log files and so on.

If such a command produces any output, cron typically sends an email containing the output so that an operator can look at it and judge if some action is required.

But not all command line programs are written for usage with cron. For example they might produce output even on successful execution, and indicate failure through a non-zero exit code. Or they might hang, or otherwise misbehave.

To deal with such commands, we'll develop a small program called silent-cron, which wraps such commands and suppresses output when the exit code is zero. It also allows you to specify a timeout that kills the wrapped program if it takes too long:

$ silent-cron -- command-that-might-fail args
$ silent-cron --timeout=5 -- command-that-might-hang

Running Commands Asynchronously

When you want to run external commands, Perl 6 gives you basically two choices: run, a simple, synchronous interface, and Proc::Async, an asynchronous and slightly more complex option. Even though we will omit the timeout in the first iteration, we need to be aware that implementing the timeout is easier in the asynchronous interface, so that's what we'll use:

#!/usr/bin/env perl6

sub MAIN(*@cmd) {
    my $proc = Proc::Async.new(@cmd);
    my $collector = Channel.new;
    for $proc.stdout, $proc.stderr -> $supply {
        $supply.tap: { $collector.send($_) }
    }
    my $result = $proc.start.result;
    $collector.close;
    my $output = $collector.list.join;

    my $exitcode = $result.exitcode;
    if $exitcode != 0 {
        say "Program @cmd[] exited with code $exitcode";
        print "Output:\n", $output if $output;
    }
    exit $exitcode;
}

There's a big chunk of new features and concepts in here, so let's go through the code bit by bit.

sub MAIN(*@cmd) {
    my $proc = Proc::Async.new(@cmd);

This collects all the command line arguments in the array variable @cmd, where the first element is the command to be executed, and any further elements are arguments passed to this command. The second line creates a new Proc::Async instance, but doesn't yet run the command.

We need to capture all output from the command; thus we capture the output of the STDOUT and STDERR streams (file handles 1 and 2 on Linux), and combine it into a single string. In the asynchronous API, STDOUT and STDERR are modeled as objects of type Supply, and hence are streams of events. Since supplies can emit events in parallel, we need a thread-safe data structure for collecting the result, and Perl 6 conveniently provides a Channel for that:

my $collector = Channel.new;

To actually get the output from the program, we need to tap into the STDOUT and STDERR streams:

for $proc.stdout, $proc.stderr -> $supply {
    $supply.tap: { $collector.send($_) }
}

Each supply executes the block { $collector.send($_) } for each string it receives. The string can be a character, a line or something larger if the stream is buffered. All we do with it is put the string into the channel $collector via the send method.

Now that the streams are tapped, we can start the program and wait for it to finish:

my $result = $proc.start.result;

Proc::Async.start executes the external process and returns a Promise. A promise wraps a piece of code that potentially runs on another thread, has a status (Planned, Kept or Broken), and once it's finished, a result. Accessing the result automatically waits for the wrapped code to finish. Here the code is the one that runs the external program and the result is an object of type Proc (which happens to be the same as the run() function from the synchronous interface).

After this line, we can be sure that the external command has terminated, and thus no more output will come from $proc.stdout and $proc.stderr. Hence we can safely close the channel and access all its elements through Channel.list:

$collector.close;
my $output = $collector.list.join;

Finally it's time to check if the external command was successful -- by checking its exit code -- and to exit the wrapper with the command's exit code:

my $exitcode = $result.exitcode;
if $exitcode != 0 {
    say "Program @cmd[] exited with code $exitcode";
    print "Output:\n", $output if $output;
}
exit $exitcode;

Implementing Timeouts

The idiomatic way to implement timeouts in Perl 6 is to use the Promise.anyof combinator together with a timer:

sub MAIN(*@cmd, :$timeout) {
    my $proc = Proc::Async.new(|@cmd);
    my $collector = Channel.new;
    for $proc.stdout, $proc.stderr -> $supply {
        $supply.tap: { $collector.send($_) }
    }
    my $promise = $proc.start;
    my $waitfor = $promise;
    $waitfor = Promise.anyof(Promise.in($timeout), $promise)
        if $timeout;
    await $waitfor;

The initialization of $proc hasn't changed. But instead of accessing $proc.start.result, we store the promise returned from $proc.start. If the user specified a timeout, we run this piece of code:

$waitfor = Promise.anyof(Promise.in($timeout), $promise)

Promise.in($seconds) returns a promise that will be fulfilled in $seconds seconds. It's basically the same as start { sleep $seconds }, but the scheduler can be a bit smarter about not allocating a whole thread just for sleeping.

Promise.anyof($p1, $p2) returns a promise that is fulfilled as soon as one of the arguments (which should also be promises) is fulfilled. So we wait either until the external program finished, or until the sleep promise is fulfilled.

With await $waitfor; the program waits for the promise to be fulfilled (or broken). When that is the case, we can't simply access $promise.result as before, because $promise (which is the promise for the external program) might not be fulfilled in the case of a timeout. So we have to check the status of the promise first and only then can we safely access $promise.result:

if !$timeout || $promise.status ~~ Kept {
    my $exitcode = $promise.result.exitcode;
    if $exitcode != 0 {
        say "Program @cmd[] exited with code $exitcode";
        print "Output:\n", $output if $output;
    }
    exit $exitcode;
}
else {
    ...
}

In the else { ... } branch, we need to handle the timeout case. This might be as simple as printing a statement that a timeout has occurred, and when silent-cron exits immediately afterwards, that might be acceptable. But we might want to do more in the future, so we should kill the external program. And if the program doesn't terminate after the friendly kill signal, it should receive a kill(9), which on UNIX systems forcefully terminates the program:

else {
    $proc.kill;
    say "Program @cmd[] did not finish after $timeout seconds";
    sleep 1 if $promise.status ~~ Planned;
    $proc.kill(9);
    await $promise;
    exit 2;
}

await $promise returns the result of the promise, so here a Proc object. Proc has a safety feature built in that if the command returned with a non-zero exit code, evaluating the object in void context throws an exception.

Since we explicitly handle the non-zero exit code in the code, we can suppress the generation of this exception by assigning the return value from await to a dummy variable:

my $dummy = await $promise

Since we don't need the value, we can also assign it to an anonymous variable instead:

$ = await $promise

More on Promises

If you have worked with concurrent or parallel programs in other languages, you might have come across threads, locks, mutexes, and other low-level constructs. These exist in Perl 6 too, but their direct usage is discouraged.

The problem with such low-level primitives is that they don't compose well. You can have two libraries that use threads and work fine on their own, but lead to deadlocks when combined within the same program. Or different components might launch threads on their own, which can lead to too many threads and high memory consumption when several such components come together in the same process.

Perl 6 provides higher-level primitives. Instead of spawning a thread, you use start to run code asynchronously and the scheduler decides which thread to run this on. If more start calls happen that ask for threads to schedule things on, some will run serially.

Here is a very simple example of running a computation in the background:

sub count-primes(Int $upto) {
    (1..$upto).grep(&is-prime).elems;
}

my $p = start count-primes 10_000;
say $p.status;
await $p;
say $p.result;

It gives this output:

Planned
1229

You can see that the main line of execution continued after the start call, and $p immediately had a value -- the promise, with status Planned.

As we've seen before, there are combinators for promises, anyof and allof. You can also chain actions to a promise using the then method:

sub count-primes(Int $upto) {
    (1..$upto).grep(&is-prime).elems;
}

my $p1 = start count-primes 10_000;
my $p2 = $p1.then({ say .result });
await $p2;

If an exception is thrown inside asynchronously executing code, the status of the promise becomes Broken, and calling its .result method re-throws the exception.

As a demonstration of the scheduler distributing tasks, let's consider a small Monte Carlo simulation to calculate an approximation for π. We generate a pair of random numbers between zero and one, and interpret them as dots in a square. A quarter circle with radius one covers the area of π/4, so the ratio of randomly placed dots within the quarter circle to the total number of dots approaches π/4, if we use enough dots.

sub pi-approx($iterations) {
    my $inside = 0;
    for 1..$iterations {
        my $x = 1.rand;
        my $y = 1.rand;
        $inside++ if $x * $x + $y * $y <= 1;
    }
    return ($inside / $iterations) * 4;
}
my @approximations = (1..1000).map({ start pi-approx(80) });
await @approximations;

say @approximations.map({.result}).sum / @approximations;

The program starts one thousand computations asynchronously, but if you look at a system monitoring tool while it runs, you'll observe only 16 threads running. This magic number comes from the default thread scheduler, and we can override it by providing our own instance of a scheduler above the previous code:

my $*SCHEDULER = ThreadPoolScheduler.new(:max_threads(3));

For CPU bound tasks like this Monte Carlo Simulation, it is a good idea to limit the number of threads roughly to the number of (possibly virtual) CPU cores; if many threads are stuck waiting for IO, a higher number of threads can yield better performance.

Possible Extensions

If you want to play with silent-cron, you could add a retry mechanism. If a command fails because of an external dependency (like an API or an NFS share), it might take time for that external dependency to recover. Hence you should add a quadratic or exponential backoff, that is, the wait time between retries should increase quadratically (1, 2, 4, 9, 16, ...) or exponentially (1, 2, 4, 8, 16, 32, ...).

Summary

We've seen an asynchronous API for running external programs and how to use Promises to implement timeouts. We've also discussed how promises are distributed to threads by a scheduler, allowing you to start an arbitrary number of promises without overloading your computer.

Subscribe to the Perl 6 book mailing list

* indicates required

Weekly changes in and around Perl 6: 2016.52 Twittering Towards The End…

Published by liztormato on 2016-12-26T22:47:52

…of the year, of course! No (new) Apocalypse planned or anything like that. No, it feels more like there’s a new marketing push gaining strength. For news about the development of Learning Perl 6, you can now follow a dedicated Twitter page: @LearningPerl6, as brian d foy announced recently. Meanwhile, the Perl 6 Facebook Group is now at 340 members. And is still looking for new members!

There hasn’t been much news on Twitter about Perl 6 the past years. The reasons for that are manyfold. But Moritz Lenz has taken it upon him to start a Perl 6 News Feed: @perl6org. If you have anything you want to say on Twitter, contact him about it. Even better, if you would like to help him by tweeting stuff directly on that account, contact him as well. Thank you in advance!

Endventing

The final bunch of the Perl 6 Advent Calendar posts:

Please check the Perl 6 Advent Calendar again on the 1st of December, 2017!

Other Blog Posts

Core Developments

Not bad for a week in the Holiday Season!

Ecosystem Additions

No ecosystem additions this week. Too bad. But hardly surprising this time of year.

Winding down 2016

This is the last post of the Perl 6 Weekly this year. Only a few issues were missed at the beginning of the year, when everybody was recovering from the frenzy of getting the first official release of Perl 6 into the world. Pretty sure we won’t be missing any issue in 2017! So, until then: be careful with the fireworks, keep all of your fingers and eyes, and see you again next year!


gfldex: Awesome and Custom

Published by gfldex on 2016-12-25T13:43:45

While toying with roles to find a better example for our class tutorial I believe to have stumbled onto a nice idiom. And so I wrote:

role Unitish[$unit = fail('Please provide a SI unit quantifier as a Parameter to the role Unitish')]

What leads to the following error message when the role argument is missing.

Could not instantiate role 'Unitish':
Please provide a SI unit quantifier as a Parameter to the role Unitish
  in any  at gen/moar/Metamodel.nqp line 2441
  in any protect at gen/moar/stage2/NQPCORE.setting line 802
  in any specialize at gen/moar/Metamodel.nqp line 2428
  in any specialize at gen/moar/Metamodel.nqp line 2644
  in any compose at gen/moar/Metamodel.nqp line 3010
  in any generate_mixin at gen/moar/Metamodel.nqp line 1319
  in any  at gen/moar/Metamodel.nqp line 1235
  in any mixin at gen/moar/Metamodel.nqp line 1270
  in sub postfix:<s> at si.p6 line 12
  in block <unit> at si.p6 line 14

So instead of letting the compiler say: “Dude, the module author wants you to provide a Str!”, I can actually tell the user what the string should look like. The way I’m using it results in a runtime error. The does-operator can be executed at compile time, providing awesome error messages the way you want.


Perlgeek.de: Perl 6 By Example: Testing the Say Function

Published by Moritz Lenz on 2016-12-24T23:00:01

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


Testing say()

In the previous installment I changed some code so that it wouldn't produce output, and instead did the output in the MAIN sub, which conveniently went untested.

Changing code to make it easier to test is a legitimate practice. But if you do have to test code that produces output by calling say, there's a small trick you can use. say works on a file handle, and you can swap out the default file handle, which is connected to standard output. Instead, you can put a dummy file handle in its place that captures the lower-level commands issued to it, and record this for testing.

There's a ready-made module for that, IO::String, but for the sake of learning we'll look at how it works:

use v6;

# function to be tested
sub doublespeak($x) {
    say $x ~ $x;
}

use Test;
plan 1;

my class OutputCapture {
    has @!lines;
    method print(\s) {
        @!lines.push(s);
    }
    method captured() {
        @!lines.join;
    }
}

my $output = do {
    my $*OUT = OutputCapture.new;
    doublespeak(42);
    $*OUT.captured;
};

is $output, "4242\n", 'doublespeak works';

The first part of the code is the function we want to test, sub doublespeak. It concatenates its argument with itself using the ~ string concatenation operator. The result is passed to say.

Under the hood, say does a bit of formatting, and then looks up the variable $*OUT. The * after the sigil marks it as a dynamic variable. The lookup for the dynamic variable goes through the call stack, and in each stack frame looks for a declaration of the variable, taking the first it finds. say then calls the method print on that object.

Normally, $*OUT contains an object of type IO::Handle, but the say function doesn't really care about that, as long as it can call a print method on that object. That's called duck typing: we don't really care about the type of the object, as long as it can quack like a duck. Or in this case, print like a duck.

Then comes the loading of the test module, followed by the declaration of how many tests to run:

use Test;
plan 1;

You can leave out the second line, and instead call done-testing after your tests. But if there's a chance that the test code itself might be buggy, and not run tests it's supposed to, it's good to have an up-front declaration of the number of expected tests, so that the Test module or the test harness can catch such errors.

The next part of the example is the declaration of type which we can use to emulate the IO::Handle:

my class OutputCapture {
    has @!lines;
    method print(\s) {
        @!lines.append(s);
    }
    method captured() {
        @!lines.join;
    }
}

class introduces a class, and the my prefix makes the name lexically scoped, just like in a my $var declaration.

has @!lines declares an attribute, that is, a variable that exists separately for each instance of class OutputCapture. The ! marks it as an attribute. We could leave it out, but having it right there means you always know where the name comes from when reading a larger class.

The attribute @!lines starts with an @, not a $ as other variables we have seen so far. The @ is the sigil for an array variable.

You might be seeing a trend now: the first letter of a variable or attribute name denotes its rough type (scalar, array, & for routines, and later we'll learn about % for hashes), and if the second letter is not a letter, it specifies its scope. We call this second letter a twigil. So far we've seen * for dynamic variables, and ! for attributes. Stay tuned for more.

Then penultimate block of our example is this:

my $output = do {
    my $*OUT = OutputCapture.new;
    doublespeak(42);
    $*OUT.captured;
};

do { ... } just executes the code inside the curly braces and returns the value of the last statement. Like all code blocks in Perl 6, it also introduces a new lexical scope.

The new scope comes in handy in the next line, where my $*OUT declares a new dynamic variable $*OUT, which is however only valid in the scope of the block. It is initialized with OutputCapture.new, a new instance of the class declared earlier. new isn't magic, it's simply inherited from OutputCapture's superclass. We didn't declare one, but by default, classes get type Any as a superclass, which provides (among other things) the method new as a constructor.

The call to doublespeak calls say, which in turn calls $*OUT.print. And since $*OUT is an instance of OutputCapture in this dynamic scope, the string passed to say lands in OutputCapture's attribute @!lines, where $*OUT.captured can access it again.

The final line,

is $output, "4242\n", 'doublespeak works';

calls the is function from the Test module.

In good old testing tradition, this produces output in the TAP format:

1..1
ok 1 - doublespeak works

Summary

We've seen that say() uses a dynamically scoped variable, $*OUT, as its output file handle. For testing purposes, we can substitute that with an object of our making. Which made us stumble upon the first glimpses of how classes are written in Perl 6.

Subscribe to the Perl 6 book mailing list

* indicates required

Perl 6 Advent Calendar: Day 24 – Make It Snow

Published by ab5tract on 2016-12-24T13:14:02

Hello again, fellow sixers! Today I’d like to take the opportunity to highlight a little module of mine that has grown up in some significant ways this year. It’s called Terminal::Print and I’m suspecting you might already have a hint of what it can do just from the name. I’ve learned a lot from writing this module and I hope to share a few of my takeaways.

Concurrency is hard

Earlier in the year I decided to finally try to tackle multi-threading in Terminal::Print and… succeeded more or less, but rather miserably. I wrapped the access to the underlying grid (a two-dimensional array of Cell objects) in a react block and had change-cell and print-cell emit their respective actions on a Supply. The react block then handled these actions. Rather slowly, unfortunately.

Yet, there was hope. After jnthn++ fixed a constructor bug in OO::Monitors I was able to remove all the crufty hand-crafted handling code and instead ensure that method calls to the Terminal::Print::Grid object would only run in a single thread at any given time. (This is the class which holds the two-dimensional array mentioned before and was likewise the site of my react block experiment).

Here below are the necessary changes:

- unit class Terminal::Print::Grid;
+ use OO::Monitors;
+ unit monitor Terminal::Print::Grid;

This shift not only improved the readability and robustness of the code, it was significantly faster! Win! To me this is really an amazing dynamic of Perl 6. jnthn’s brilliant, twisted mind can write a meta-programming module that makes it dead simple for me to add concurrency guarantees at a specific layer of my library. My library in turn makes it dead simple to print from multiple threads at once on the screen! It’s whipuptitude enhancers all the the way down!

That said, our example today will not be writing from multiple threads. For some example code that utilizes async, I point you to examples/async.p6 and examples/matrix-ish.p6.

Widget Hero

Terminal::Print is really my first open source library in the sense that it is the first time that I have started my own module from scratch with the specific goal of filling a gap in a given language’s ecosystem. It is also the first time that I am not the sole contributor! I would be remiss not to mention japhb++ in this post, who has contributed a great deal in a relatively short amount of time.

In addition to all the performance related work and the introduction of a span-oriented printing mechanism, japhb’s work on widgets especially deserves its own post! For now let’s just say that it has been a real pleasure to see the codebase grow and extend even as I have been too distracted to do much good. My takeaway here is a new personal milestone in my participation in libre/open software (my first core contributor!) that reinforces all the positive dynamics it can have on a code base.

Oh, and I’ll just leave this here as a teaser of what the widget work has in store for us:

rpg-ui-p6

You can check it out in real-time and read the code at examples/rpg-ui.p6.

Snow on the Golf Course

Now you are probably wondering, where is the darn, snow! Well, here we go! The full code with syntax highlighting is available in examples/snowfall.p6. I will be going through it step by step below.

use Terminal::Print;

class Coord {
    has Int $.x is rw where * <= T.columns = 0;
    has Int $.y is rw where * <= T.rows = 0 ;
}

Here we import Terminal::Print. The library takes the position that when you import it somewhere, you are planning to print to the screen. To this end we export an instantiated Terminal::Print object into the importer’s lexical scope as T. This allows me to immediately start clarifying the x and y boundaries of our coordinate system based on run-time values derived from the current terminal window.

class Snowflake {
    has $.flake = ('❆','❅','❄').roll;
    has $.pos = Coord.new;
}

sub create-flake {
    state @cols = ^T.columns .pick(*); # shuffled
    if +@cols > 0 {
        my $rand-x = @cols.pop;
        my $start-pos = Coord.new: x => $rand-x;
        return Snowflake.new: pos => $start-pos;
    } else {
        @cols = ^T.columns .pick(*);
        return create-flake;
    }
}

Here we create an extremely simple Snowflake class. What is nice here is that we can leverage the default value of the $.flake attribute to always be random at construction time.

Then in create-flake we are composing a way to make sure we have hit every x coordinate as a starting point for the snowfall. Whenever create-flake gets called, we pop a random x coordinate out of the @cols state variable. The state variable enables this cool approach because we can manually fill @cols with a new randomized set of our available x coordinates once it is depleted.

draw( -> $promise {

start {
    my @flakes = create-flake() xx T.columns;
    my @falling;
    
    Promise.at(now + 33).then: { $promise.keep };
    loop {
        # how fast is the snowfall?
        sleep 0.1; 
    
        if (+@flakes) {
            # how heavy is the snowfall?
            my $limit = @flakes > 2 ?? 2            
                                    !! +@flakes;
            # can include 0, but then *cannot* exclude $limit!
            @falling.push: |(@flakes.pop xx (0..$limit).roll);  
        } else {
            @flakes = create-flake() xx T.columns;
        }
    
        for @falling.kv -> $idx, $flake {
            with $flake.pos.y -> $y {
                if $y > 0 {
                    T.print-cell: $flake.pos.x, ($flake.pos.y - 1), ' ';
                }

                if $y < T.rows {
                    T.print-cell: $flake.pos.x, $flake.pos.y, $flake.flake;            
                }

                try {
                    $flake.pos.y += 1;
                    CATCH {
                        # the flake has fallen all the way
                        # remove it and carry on!
                        @falling.splice($idx,1,());
                        .resume;
                    }
                }
            }
        }
    }
}

});

Let’s unpack that a bit, shall we?

So the first thing to explain is draw. This is a handy helper routine that is also imported into the current lexical scope. It takes as its single argument a block which accepts a Promise. The block should include a start block so that keeping the argument promise works as expected. The implementation of draw is shockingly simple.

So draw is really just short-hand for making sure the screen is set up and torn down properly. It leverages promises as (I’m told) a “conv-var” which according to the Promises spec might be an abuse of promises. I’m not very futzed about it, to be honest, since it suits my needs quite well.

This approach also makes it quite easy to create a “time limit” for the snowfall by scheduling a promise to be kept at now + 33 — thirty three seconds from when the loop starts. then we keep the promise and draw shuts down the screen for us. This makes “escape” logic for your screensavers quite easy to implement (note that SIGINT also restores your screen properly. The most basic exit strategy works as expected, too :) ).

The rest is pretty straightforward, though I’d point to the try block as a slightly clever (but not dangerously so) combination of where constraints on Coord‘s attributes and Perl 6’s resumable exceptions.

Make it snow!

And so, sixers, I bid you farewell for today with a little unconditional love from ab5tract’s curious little corner of the universe. Cheers!

snowfall-p6


Perl 6 Advent Calendar: Day 24 – One Year On

Published by liztormato on 2016-12-24T10:51:57

This time of year invites one to look back on things that have been, things that are and things that will be.

Have Been

I was reminded of things that have been when I got my new notebook a few weeks ago. Looking for a good first sticker to put on it, I came across an old ActiveState sticker:

If you don’t know Perl
you don’t know Dick

A sticker from 2000! It struck me that that sticker was as old as Perl 6. Only very few people now remember that a guy called Dick Hardt was actually the CEO of ActiveState at the time. So even though the pun may be lost on most due to the mists of time, the premise still rings true to me: that Perl is more about a state of mind, then about versions. There will always be another version of Perl. Those who don’t know Perl are doomed to re-implement it, poorly. Which, to me, is why so many ideas were borrowed from Perl. And still are!

Are

Where are we now? Is it the moment we know, We know, we know? I don’t think we are at twenty thousand people using Perl 6 just yet. But we’re keeping our fingers crossed. Just in case.

We are now 12 compiler releases after the initial Christmas release of Perl 6. In this year, many, many areas of Rakudo Perl 6 and MoarVM have dramatically improved in speed and stability. Our canary-in-the-coalmine test has dropped from around 14 seconds a year ago to around 5.5 seconds today. A complete spectest run is now about 3 minutes, whereas it was about 4.5 minutes at the beginning of the year, while about 4000 tests were added (from about 50K to 54K). And we now have 757 modules in the Perl 6 ecosystem (aka temporary CPAN for Perl 6 modules), with a few more added every week.

The #perl6 IRC channel has been too busy for me to follow consistently. But if you have a question related to Perl 6 and you want a quick answer, the #perl6 channel is the place to be. You don’t even have to install an IRC client: you can also use a browser to chat, or just follow “live” what is being said.

There are also quite a few useful bots on that channel: they e.g. take care of running a piece of Perl 6 code for you. Or find out at which commit the behaviour of a specific piece of code changed. These are very helpful for the developers of Perl 6, who usually also hang out on the #perl6-dev IRC channel. Which could be you! The past year, at least one contributor was added to the CREDITS every month!

Will Be

The coming year will see at least three Perl 6 books being published. First one will be Think Perl 6 – How To Think Like A Computer Scientist by Laurent Rosenfeld. It is an introduction to programming using Perl 6. But even for those of you with programming experience, it will be a good book to start learning Perl 6. And I can know. Because I’ve already read it :-)

Second one will be Learning Perl 6 by veteran Perl developer and writer brian d foy. It will have the advantage of being written by a seasoned writer going through the newbie experience that most people will have when coming from Perl 5.

The third one will be Perl 6 By Example by Moritz Lenz, which will, as the title already gives away, introduce Perl 6 topics by example.

There’ll be at least two (larger) Perl Conferences apart from many smaller Perl workshops: the The Perl Conference NA on 18-23 June, and the The Perl Conference in Amsterdam on 9-11 August. Where you will meet all sorts of nice people!

And for the rest? Expect a faster, leaner, Perl 6 and MoarVM compiler release on the 3rd Saturday every month. And an update of weekly events in the Perl 6 Weekly on every Monday evening/Tuesday morning (depending on where you live).


Perl 6 Advent Calendar: Day 23 – Everything is either wrong or less than awesome

Published by AlexDaniel on 2016-12-23T00:07:12

Have you ever spent your precious time on submitting a bug report for some project, only to get a response that you’re an idiot and you should f⊄∞÷ off?

Right! Well, perhaps consider spending your time on Perl 6 to see that not every free/open-source project is like this.

In the Perl 6 community, there is a very interesting attitude towards bug reports. Is it something that was defined explicitly early on? Or did it just grow organically? This remains to be a Christmas mystery. But the thing is, if it wasn’t for that, I wouldn’t be willing to submit all the bugs that I submitted over the last year (more than 100). You made me like this.

Every time someone submits a bug report, Perl 6 hackers always try to see if there is something that can done better. Yes, sometimes the bug report is invalid. But even if it is, is there any way to improve the situation? Perhaps a warning could be thrown? Well, if so, then we treat the behavior as LTA (Less Than Awesome), and therefore the bug report is actually valid! We just have to tweak it a little bit, meaning that the ticket will now be repurposed to improve or add the error message, not change the behavior of Perl 6.

The concept of LTA behavior is probably one of the key things that keeps us from rejecting features that may seem to do too little good for the amount of effort required to implement them, but in the end become game changers. Another closely related concept that comes to mind is “Torment the implementors on behalf of the users”.

OK, but what if this behavior is well-defined and is actually valid? In this case, it is still probably our fault. Why did the user get into this situation? Maybe the documentation is not good enough? Very often that is the issue, and we acknowledge that. So in a case of a problem with the documentation, we will usually ask you to submit a bug report for the documentation, but very often we will do it ourselves.

Alright, but what if the documentation for this particular case is in place? Well, maybe the thing is not easily searchable? That could be the reason why the user didn’t find it in the first place. Or maybe we lack some links? Maybe the places that should link to this important bit of information are not doing so? In other words, perhaps there are still ways to improve the docs!

But if not, then yes, we will have to write some tests for this particular case (if there are no tests yet) and reject the ticket. This happens sometimes.

The last bit, even if obvious to some, is still worth mentioning. We do not mark tickets resolved without tests. One reason is that we want roast (which is a Perl 6 spec) to be as full as possible. The other reason is that we don’t want regressions to happen (thanks captain obvious!). As the first version of Perl 6 was released one year ago, we are no longer making any changes that would affect the behavior of your code. However, occasional regressions do happen, but we have found an easy way to deal with those!

If you are not on #perl6 channel very often, you might not know that we have a couple of interesting bots. One of them is bisectable. In short, Bisectable performs a more user-friendly version of git bisect, but instead of building Rakudo on each commit, it has done it before you even asked it to! That is, it has over 5500 rakudo builds, one for every commit done in the last year and a half. This turns the time to run git bisect from minutes to about 10 seconds (Yes, 10 seconds is less than awesome! We are working on speeding it up!). And there are other bots that help us inspect the progress. The most recent one is Statisfiable, here is one of the graphs it can produce.

So if you pop up on #perl6 with a problem that seems to be a regression, we will be able to find the cause in seconds. Fixing the issue will usually take a bit more than that though, but when the problem is significant, it will usually happen in a day or two. Sorry for breaking your code in attempts to make it faster, we will do better next time!

But as you are reading this, perhaps you may be interested in seeing some bug reports? I thought that I’d go through the list of bugs of the last year to show how horribly broken things were, just to motivate the reader to go hunting for bugs. The bad news (oops, good news I mean), it seems that the number of “horrible” bugs is decreasing a bit too fast. Thanks to many Rakudo hackers, things are getting more stable at a very rapid pace.

Anyway, there are still some interesting things I was able to dig up:

That being said, my favorite bug of all times is RT #127473. Three symbols in the source code causing it to go into an infinite loop printing stuff about QAST nodes. That’s a rather unique issue, don’t you think?

I hope this post gave you a little insight on how we approach bugs, especially if you are not hanging around on #perl6 very often. Is our approach less than awesome? Do you have some ideas for other bots that could help us work with bugs? Leave it in the comments, we would like to know!


Perl 6 Advent Calendar: Day 22 – Generative Testing

Published by SmokeMachine on 2016-12-22T00:00:47

OK! So say you finished writing your code and it’s looking good. Let’s say it’s this incredible sum function:

module Sum {
   sub sum($a, $bis export {
      $a + $b
   }
}

Great, isn’t it?! Let’s use it:

use Sum;
say sum 2, 3; # 5

That worked! We summed the number 2 with the number 3 as you saw. If you carefully read the function you’ll see the variables $a and $b haven’t a type set. If you don’t type a variable it’s, by default, of type Any. 2 and 3 are Ints… Ints are Any. So that’s OK! But do you know what’s Any too? Str (just a example)!

Let’s try using strings?

use Sum;
say sum "bla", "ble";

We got a big error:

Cannot convert string to number: base-10 number must begin with valid digits or '.' in 'bla' (indicated by ⏏)
  in sub sum at sum.p6 line 1
  in block  at sum.p6 line 7

Actually thrown at:
  in sub sum at sum.p6 line 1
  in block  at sum.p6 line 7

Looks like it does not accept Strs… It seems like Any may not be the best type to use in this case.

Worrying about every possible input type for all our functions can prove to demand way too much work, as well as still being prone to human error. Thankfully there’s a module to help us with that! Test::Fuzz is a perl6 module that implements the “technique” of generative testing/fuzz testing.

Generative testing or Fuzz Testing is a technique of generating random/extreme data and using this data to call the function being tested.

Test::Fuzz gets the signature of your functions and decides what generators it should use to test it. After that it runs your functions giving it (100, by default) different arguments and testing if it will break.

To test our function, all that’s required is:

module Sum {
   use Test::Fuzz;
   sub sum($a, $bis export is fuzzed {
      $a + $b
   }
}
multi MAIN(:$fuzz!) {
   run-tests
}

And run:

perl6 Sum.pm6 --fuzz

This case will still show a lot of errors:

Use of uninitialized value of type Thread in numeric context
  in sub sum at Sum.pm6 line 4
Use of uninitialized value of type int in numeric context
  in sub sum at Sum.pm6 line 4
    ok 1 - sum(Thread, int)
Use of uninitialized value of type X::IO::Symlink in numeric context
  in sub sum at Sum.pm6 line 4
    ok 2 - sum(X::IO::Symlink, -3222031972)
Use of uninitialized value of type X::Attribute::Package in numeric context
  in sub sum at Sum.pm6 line 4
    ok 3 - sum(X::Attribute::Package, -9999999999)
Use of uninitialized value of type Routine in numeric context
  in sub sum at Sum.pm6 line 4
    not ok 4 - sum(áéíóú, (Routine))
...

What does that mean?

That means we should use one of the big features of perl6: Gradual typing. $a and $b should have types.

So, let’s modify the function and test again:

module Sum {
   use Test::Fuzz;
   sub sum(Int $a, Int $bis export is fuzzed {
      $a + $b
   }
}
multi MAIN(:$fuzz!) {
   run-tests
}
    ok 1 - sum(-2991774675, 0)
    ok 2 - sum(5471569889, 7905158424)
    ok 3 - sum(8930867907, 5132583935)
    ok 4 - sum(-6390728076, -1)
    ok 5 - sum(-3558165707, 4067089440)
    ok 6 - sum(-8930867907, -5471569889)
    ok 7 - sum(3090653502, -2099633631)
    ok 8 - sum(-2255887318, 1517560219)
    ok 9 - sum(-6085119010, -3942121686)
    ok 10 - sum(-7059342689, 8930867907)
    ok 11 - sum(-2244597851, -6390728076)
    ok 12 - sum(-5948408450, 2244597851)
    ok 13 - sum(0, -5062049498)
    ok 14 - sum(-7229942697, 3090653502)
    not ok 15 - sum((Int), 1)

    # Failed test 'sum((Int), 1)'
    # at site#sources/FB587F3186E6B6BDDB9F5C5F8E73C55195B73C86 (Test::Fuzz) line 62
    # Invocant requires an instance of type Int, but a type object was passed.  Did you forget a .new?
...

A lot of OKs!  \o/

But there’re still some errors… We can’t sum undefined values…

We didn’t say the attributes should be defined (with :D). So Test::Fuzz generated every undefined sub-type of Int that it could find. It uses every generator of a sub-type of Int to generate values. It also works if you use a subset or even if you use a where in your signature. It’ll use a super-type generator and grep the valid values.

So, let’s change it again!

module Sum {
   use Test::Fuzz;
   sub sum(Int:D $a, Int:D $bis export is fuzzed {
      $a + $b
   }
}
multi MAIN(:$fuzz!) {
   run-tests
}
    ok 1 - sum(6023702597, -8270141809)
    ok 2 - sum(-8270141809, -3762529280)
    ok 3 - sum(242796759, -7408209799)
    ok 4 - sum(-5813412117, -5280261945)
    ok 5 - sum(2623325683, 2015644992)
    ok 6 - sum(-696696815, -7039670011)
    ok 7 - sum(1, -4327620877)
    ok 8 - sum(-7712774875, 349132637)
    ok 9 - sum(3956553645, -7039670011)
    ok 10 - sum(-8554836757, 7039670011)
    ok 11 - sum(1170220615, -3)
    ok 12 - sum(-242796759, 2015644992)
    ok 13 - sum(-9558159978, -8442233570)
    ok 14 - sum(-3937367230, 349132637)
    ok 15 - sum(5813412117, 1170220615)
    ok 16 - sum(-7408209799, 6565554452)
    ok 17 - sum(2474679799, -3099404826)
    ok 18 - sum(-5813412117, 9524548586)
    ok 19 - sum(-6770230387, -7408209799)
    ok 20 - sum(-7712774875, -2015644992)
    ok 21 - sum(8442233570, -1)
    ok 22 - sum(-6565554452, 9999999999)
    ok 23 - sum(242796759, 5719635608)
    ok 24 - sum(-7712774875, 7039670011)
    ok 25 - sum(7408209799, -8235752818)
    ok 26 - sum(5719635608, -8518891049)
    ok 27 - sum(8518891049, -242796759)
    ok 28 - sum(-2474679799, 2299757592)
    ok 29 - sum(5356064609, 349132637)
    ok 30 - sum(-3491438968, 3438417115)
    ok 31 - sum(-2299757592, 7580671928)
    ok 32 - sum(-8104597621, -8158438801)
    ok 33 - sum(-2015644992, -3)
    ok 34 - sum(-6023702597, 8104597621)
    ok 35 - sum(2474679799, -2623325683)
    ok 36 - sum(8270141809, 7039670011)
    ok 37 - sum(-1534092807, -8518891049)
    ok 38 - sum(3551099668, 0)
    ok 39 - sum(7039670011, 4327620877)
    ok 40 - sum(9524548586, -8235752818)
    ok 41 - sum(6151880628, 3762529280)
    ok 42 - sum(-8518891049, 349132637)
    ok 43 - sum(7580671928, 9999999999)
    ok 44 - sum(-8235752818, -7645883481)
    ok 45 - sum(6460424228, 9999999999)
    ok 46 - sum(7039670011, -7788162753)
    ok 47 - sum(-9999999999, 5356064609)
    ok 48 - sum(8510706378, -2474679799)
    ok 49 - sum(242796759, -5813412117)
    ok 50 - sum(-3438417115, 9558159978)
    ok 51 - sum(8554836757, -7788162753)
    ok 52 - sum(-9999999999, 3956553645)
    ok 53 - sum(-6460424228, -8442233570)
    ok 54 - sum(7039670011, -7712774875)
    ok 55 - sum(-3956553645, 1577669672)
    ok 56 - sum(0, 9524548586)
    ok 57 - sum(242796759, -6151880628)
    ok 58 - sum(7580671928, 3937367230)
    ok 59 - sum(-8554836757, 7712774875)
    ok 60 - sum(9524548586, 2474679799)
    ok 61 - sum(-7712774875, 2450227203)
    ok 62 - sum(3, 1257247905)
    ok 63 - sum(8270141809, -2015644992)
    ok 64 - sum(242796759, -3937367230)
    ok 65 - sum(6770230387, -6023702597)
    ok 66 - sum(2623325683, -3937367230)
    ok 67 - sum(-5719635608, -7645883481)
    ok 68 - sum(1, 6770230387)
    ok 69 - sum(3937367230, 7712774875)
    ok 70 - sum(6565554452, -5813412117)
    ok 71 - sum(7039670011, -8104597621)
    ok 72 - sum(7645883481, 9558159978)
    ok 73 - sum(-6023702597, 6770230387)
    ok 74 - sum(-3956553645, -7788162753)
    ok 75 - sum(-7712774875, 8518891049)
    ok 76 - sum(-6770230387, 6565554452)
    ok 77 - sum(-8554836757, 5356064609)
    ok 78 - sum(6460424228, 8518891049)
    ok 79 - sum(-3438417115, -9999999999)
    ok 80 - sum(-1577669672, -1257247905)
    ok 81 - sum(-5813412117, -3099404826)
    ok 82 - sum(8158438801, -3551099668)
    ok 83 - sum(-8554836757, 1534092807)
    ok 84 - sum(6565554452, -5719635608)
    ok 85 - sum(-5813412117, -2623325683)
    ok 86 - sum(-8158438801, -3937367230)
    ok 87 - sum(5813412117, -46698532)
    ok 88 - sum(9524548586, -2474679799)
    ok 89 - sum(3762529280, -2474679799)
    ok 90 - sum(7788162753, 9558159978)
    ok 91 - sum(6770230387, -46698532)
    ok 92 - sum(1577669672, 6460424228)
    ok 93 - sum(4327620877, 3762529280)
    ok 94 - sum(-6023702597, -2299757592)
    ok 95 - sum(1257247905, -8518891049)
    ok 96 - sum(-8235752818, -6151880628)
    ok 97 - sum(1577669672, 7408209799)
    ok 98 - sum(349132637, 6770230387)
    ok 99 - sum(-7788162753, 46698532)
    ok 100 - sum(-7408209799, 0)
    1..100
ok 1 - sum

No errors!!!

Currently Test::Fuzz only implement generators for Int and Str, but as I said, it will be used for its super and sub classes. If you want to have generators for your custom class, you just need to implement a “static” method called generate-samples that returns sample instances of your class, infinite number of instances if possible.

Test::Fuzz is under development and isn’t in perl6 ecosystem yet. And we’re needing some help!

EDITED: New now you can only call run-tests()


Death by Perl6: Adding on to Channels and Supplies in Perl6

Published by Tony O'Dell on 2016-12-21T16:11:13

Channels and supplies are perl6's way of implementing the Oberserver pattern. There's some significant differences behind the scenes of the two but both can be used to implement a jQuery.on("event" like experience for the user. Not a jQuery fan? Don't you worry your pretty little head because this is perl6 and it's much more fun than whatever you thought.

Why?

Uhh, why do we want this?

This adds some sugar to the basic reactive constructs and it makes the passing of messages a lot more friendly, readable, and manageable.

What in Heck Does that Look Like?

Let's have an example and then we'll dissect it.

A Basic Example

use Event::Emitter;  
my Event::Emitter $e .= new;

$e.on(/^^ .+ $$/, -> $data {
  # you can operate on $data here
  '  regex matches'.say;
});

$e.on({ True; }, -> $data {
  '  block matches'.say;
});

$e.on('event', -> $data {
  '  string matches'.say;
});

'event'.say;  
$e.emit("event", { });

'empty event'.say;  
$e.emit("", { });

'abc'.say;  
$e.emit("abc", { });

Output * this is the output for an emitter using Supply, more on this later

event  
  regex matches
  block matches
  string matches
empty event  
  block matches
abc  
  regex matches
  block matches

Okay, that looks like a lot. It is, and it's much nicer to use than a large given/when combination. It also reduces indenting, so you have that going for you, which is nice.

Let's start with the simple .on blocks we have.

  $e.on(/^^ .+ $$/, -> $data { ...

This is telling the emitter handler that whenever an event is received, run that regular expression against it and if it matches, execute the block (passed in as the second argument). As a note, and illustrated in the example above, the handler can match against a Callable, Str, or Regex. The Callable must return True or False to let the handler know whether or not to execute the block.

If that seems pretty basic, it is. But little things like this add up over time and help keep things manageable. Prepare yourself for more convenience.

The Sugar

Do you want ants? This is how you get ants.

So, now we're looking for more value in something like this. Here it is: you can inherit from the Event::Emitter::Role::Template (or roll your own) and then your classes will automatically inherit these on events.

Example
use Event::Emitter::Role::Template;

class ZefClass does Event::Emitter::Role::Template {  
  submethod TWEAK {
    $!event-emitter.on("fresh", -> $data {
      'Aint that the freshness'.say;
    });
  }
}

Then, further along in your application, whenever an object wants ZefClass to react to the 'fresh' event, all it needs to do is:

$zef-class-instance.emit('fresh');

Pretty damn cool.

Development time is reduced significantly for a few reasons right off the bat:

  1. Implementing Supplier (or Channel) methods, setup, and event handling becomes unnecessary
  2. Event naming or matching is handled so it's easy to debug
  3. Handling or adding new event handling functions during runtime (imagine a plugin that may want to add more events to handle - like an IRC client that implements a handler for channel parting messages)
  4. Messages can be multiplexed through one Channel or Supply rather easily
  5. Creates more readable code

That last reason is a big one. Imagine going back into one of your modules 2 years from now and debugging an issue where a Supplier is given an event and some data and digging through that 600 lines of given/when.

Worse, imagine debugging someone else's.

A Quick Note on Channel vs Supply

The Channel and Supply thing can take some getting used to for newcomers. The quick and dirty is that a Channel will distribute the event to only one listener (chosen by the scheduler) and order isn't guaranteed while a Supply will distribute to all listeners and the order of the messages are distributed in the order received. Because the Event::Emitter Channel based handler executes the methods registered with it directly, when it receives a message all of your methods are called with the data.

So, you've seen the example above as a Supply based event handler, check it out as a Channel based and note the difference in .say and the instantiation of the event handler.

use Event::Emitter;  
my Event::Emitter $e .= new(:threaded); # !important - this signifies a Channel based E:E

$e.on(/^^ .+ $$/, -> $data {
  # you can operate on $data here
  "  regex matches: $data".say;
});

$e.on({ True; }, -> $data {
  "  block matches: $data".say;
});

$e.on('event', -> $data {
  "  string matches: $data".say;
});

'event'.say;  
$e.emit("event", "event");

'empty event'.say;  
$e.emit("", "empty event");

'abc'.say;  
$e.emit("abc", "abc");

Output

event  
empty event  
abc  
  regex matches: event
  block matches: event
  string matches: event
  block matches: empty event
  regex matches: abc
  block matches: abc

Perl 6 Advent Calendar: Day 21 – Show me the data!

Published by nadimkhemir on 2016-12-21T00:01:18

Over the years, I have enjoyed using the different data dumpers that Perl5 offers. From the basic Data::Dumper to modules dumping in hexadecimal, JSON, with colors, handling closures, with a GUI, as graphs via dot and many other that fellow module developers have posted on CPAN (https://metacpan.org/search?q=data+dump&search_type=modules).

I always find things easier to understand when I can see data and relationships. The funkiest display belonging to ddd (https://www.gnu.org/software/ddd/) that I happen to fire up now and then just for the fun (in the example showing C data but it works as well with the Perl debugger).

ddd

Many dumpers are geared towards data transformation and data transmission/storage. A few modules specialize in generating output for the end user to read; I have worked on system that generated hundreds of thousands lines of output and it is close to impossible to read dumps generated by, say, Data::Dumper.

When I started using Perl6, I immediately felt the need to dump data structures (mainly because my noob code wasn’t doing what I expected it to do); This led me to port my Perl5 module (https://metacpan.org/pod/Data::TreeDumper  https://github.com/nkh/P6-Data-Dump-Tree) to Perl6. I am now also thinking about porting my HexDump module. I recommend warmly learning Perl6 by porting your modules (if you have any on CPAN), it’s fun, educative, useful for the Perl6 community, and your modules implement a need in a domain that you master leaving you time to concentrate on the Perl6.

My Perl5 module was ripe for a re-write and I wanted to see if and how it would be better if written in Perl6, I was not disappointed.

Perl6 is a big language, it takes time to get the pieces right, for a beginner it may seem daunting, even if one has years of experience, the secret is to take it easy, not give up and listen. Porting a module is the perfect exercise, you can take it easy because you have already done it before, you’re not going to give up because you know you can do it, and you have time to listen to people that have more experience (they also need your work), the Perl6 community has been examplary, helpful, patient, supportive and always present; if you haven visited #perl6 irc channel yet, now is a good time.

.perl

Every object in Perl6 has a ‘perl’ method, it can be used to dump the object and objects under it. The official documentation (https://docs.perl6.org/language/5to6-nutshell#Data%3A%3ADumper) provides a good example.

.gist

Every object also inherits a ‘gist’ method from Mu, the official documentation (https://docs.perl6.org/routine/gist#(Mu)_routine_gist) states: “Returns a string representation of the invocant, optimized for fast recognition by humans.”

dd, the micro dumper

It took me a while to discover this one, I saw that in a post on IRC. You know how it feel when you discover something simple after typing .perl and .gist a few hundred times, bahhh!

https://docs.perl6.org/routine/dd

The three dumpers above are built-in. They are also the fastest way to dump data but as much as their output is welcome, I know that it is possible to present data in a more legible way.

Enter Data::Dump

You can find the module on https://modules.perl6.org/ where all the Perl6 modules are. Perl6 modules link to repositories, Data::Dump source is on https://github.com/tony-o/perl6-data-dump.

Data::dump introduces color, depth limitation, and type specific dumps. The code is a compact hundred lines that is quite easy to understand. This module was quite helpful for a few cases that I had. It also dumps all the methods associated with objects. Unfortunately, it did fail on a few types of objects. Give it a try.

Data::Dump::Tree

Emboldened by the Perl6 community, the fact that I really needed a Dumper for visualization, and the experience from my Perl5 module (mainly the things that I wanted to be done differently) I started working on the module. I had some difficulties at the beginning, I knew nothing about the details of Perl6 and even if there is a resemblance with Perl5, it’s another beast. But I love it, it’s advanced, clean, and well designed, I am grateful for all the efforts that where invested in Perl6.

P6 vs P5 implementation

It’s less than half the size and does as much, which makes it clearer (as much as my newbie code can be considered clean). The old code was one monolithic module with a few long functions, the new code has a better organisation and some functionality was split out to extra modules. It may sound like bit-rot (and it probably is a little) but writing the new code in Perl6 made the changes possible, multi dispatch, traits and other built-in mechanism greatly facilitate the re-factoring.

What does it do that the other modules don’t?

I’ll only talk about a few points here and refer you to the documentation for all the details (https://raw.githubusercontent.com/nkh/P6-Data-Dump-Tree/master/lib/Data/Dump/Tree.pod); also have a look at the examples in the distribution.

The main goal for Data::Dump::Tree is readability, that is achieved with filter, type specific dumpers, colors, and dumper specialization via traits. In the examples directory, you can find JSON_parsed.pl which parses 20 lines of JSON by JSON::Tiny(https://github.com/moritz/json),. I’ll use it as an example below. The parsed data is dumped with .perl,  .gist , Data::Dump, and Data::Dump::Tree

.perl output (500 lines, unusable for any average human, Gods can manage)screenshot_20161219_185724

.gist (400 lines, quite readable, no color and long lines limit the readability a bit). Also note that it looks better here than on my terminal who has problems handling unicode properly.screenshot_20161219_190004

Data::Dump (4200 lines!, removing the methods would probably make it usable)screenshot_20161219_190439

The methods dump does not help.screenshot_20161219_190601

Data::Dump::Tree (100 lines, and you are the judge for readability as I am biased). Of course, Data::Dump::Tree is designed for this specific usage, first it understands Match objects, second it can display only part of the string that are matched, which greatly reduces the noise.
screenshot_20161219_190932

Tweeking output

The options are explained in the documentation but here is a little list
– Defining type specific dumper
screenshot_20161219_185409

– filtering to remove data or add a representation for a data set;  below the data structure is dumped as it is and then filtered (a filter that shows what it is doing).

As filtering happens on the “header” and “footer” is should be easy to make a HTML/DHTML plugin; Althoug bcat (https://rtomayko.github.io/bcat/), when using ASCII glyphs, works fine.

screenshot_20161219_191525
– set the display colors
– change the glyphs
– display address information or not
– use subscripts for indexes
– use ASCII, ANSI, or unicode for the glyphs

Diffs

I tried to implement a diff display with the Perl5 module but failed miserably as it needed architectural changes, The Perl6 version was much easier, in fact, it’s an add-on, a trait, that synchronizes two data dumps. This could be used in tests to show differences between expected and gotten data.

screenshot_20161219_184701
Of course we can eliminate the extra glyphs and the data that is equivalent (I also changed the glyph types to ASCII)screenshot_20161219_185035

From here

Above anything else, I hope many authors will start writing Perl6 modules. And I also hope to see other data dumping modules. As for Data::Dump::Tree, as it gathers more users, I hope to get requests for change, patches, and error reports.


Weekly changes in and around Perl 6: 2016.51 Flowing Towards Christmas

Published by liztormato on 2016-12-19T21:42:56

This weekend saw the Rakudo 2016.12 compiler release. This will be the last release that has a bug in the module loading logic. This bug makes a module available globally, when it shouldn’t. As Zoffix explains in this announcement. So if you’re a module developer, please make sure that your module keeps working the coming month before the next release, which will be on the 21 January 2017. On behalf of the Perl 6 core developers, thank you!

Adventing Along Some More

Some real cool advent posts this week (again)!

And now only 5 more to go 😦

Learning Perl 6 Kickstarted Succeeded!

We have liftoff! With 563 backers and $40,404 in pledges before the deadline, the project has been brought to life. And here are the last Quick Tips that brian d foy promised us:

Other Blog Posts

Not so many, but still a few:

Core Developments

Ecosystem Additions

Another nice batch, some of them from the Perl 6 Advent Calendar.

Winding Down

The dark days before Christmas are almost over. See you again, next week, JAC (Just After Christmas)!


Perlgeek.de: Perl 6 By Example: Testing the Timestamp Converter

Published by Moritz Lenz on 2016-12-17T23:00:01

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


In the previous installment, we've seen some code go through several iterations of refactoring. Refactoring without automated tests tends to make me uneasy, so I actually had a small shell script that called the script under development with several different argument combinations and compared it to an expected result.

Let's now look at a way to write test code in Perl 6 itself.

As a reminder, this is what the code looked like when we left it:

#!/usr/bin/env perl6

#| Convert timestamp to ISO date
multi sub MAIN(Int \timestamp) {
    sub formatter($_) {
        sprintf '%04d-%02d-%02d %02d:%02d:%02d',
                .year, .month,  .day,
                .hour, .minute, .second,
    }
    given DateTime.new(+timestamp, :&formatter) {
        when .Date.DateTime == $_ { say .Date }
        default { .say }
    }
}

#| Convert ISO date to timestamp
multi sub MAIN(Str $date where { try Date.new($_) }, Str $time?) {
    my $d = Date.new($date);
    if $time {
        my ( $hour, $minute, $second ) = $time.split(':');
        say DateTime.new(date => $d, :$hour, :$minute, :$second).posix;
    }
    else {
        say $d.DateTime.posix;
    }
}

In the Perl community it's common to move logic into modules to make it easier to test with external test scripts. In Perl 6, that's still common, but for small tools such as this, I prefer to stick with a single file containing code and tests, and to run the tests via a separate test command.

To make testing easier, let's first separate I/O from the application logic:

#!/usr/bin/env perl6

sub from-timestamp(Int \timestamp) {
    sub formatter($_) {
        sprintf '%04d-%02d-%02d %02d:%02d:%02d',
                .year, .month,  .day,
                .hour, .minute, .second,
    }
    given DateTime.new(+timestamp, :&formatter) {
        when .Date.DateTime == $_ { return .Date }
        default { return $_ }
    }
}

sub from-date-string(Str $date, Str $time?) {
    my $d = Date.new($date);
    if $time {
        my ( $hour, $minute, $second ) = $time.split(':');
        return DateTime.new(date => $d, :$hour, :$minute, :$second);
    }
    else {
        return $d.DateTime;
    }
}

#| Convert timestamp to ISO date
multi sub MAIN(Int \timestamp) {
    say from-timestamp(+timestamp);
}

#| Convert ISO date to timestamp
multi sub MAIN(Str $date where { try Date.new($_) }, Str $time?) {
    say from-date-string($date, $time).posix;
}

With this small refactoring out of the way, let's add some tests:

#| Run internal tests
multi sub MAIN('test') {
    use Test;
    plan 4;
    is-deeply from-timestamp(1450915200), Date.new('2015-12-24'),
        'Timestamp to Date';;

    my $dt = from-timestamp(1450915201);
    is $dt, "2015-12-24 00:00:01",
        'Timestamp to DateTime with string formatting';

    is from-date-string('2015-12-24').posix, 1450915200,
        'from-date-string, one argument';
    is from-date-string('2015-12-24', '00:00:01').posix, 1450915201,
        'from-date-string, two arguments';
}

And you can run it:

./autotime test
1..4
ok 1 - Timestamp to Date
ok 2 - Timestamp to DateTime with string formatting
ok 3 - from-date-string, one argument
ok 4 - from-date-string, two arguments

The output format is that of the Test Anything Protocol (TAP), which is the de facto standard in the Perl community, but is now also used in other communities. For larger output strings it is a good idea to run the tests through a test harness. For our four lines of test output, this isn't yet necessary, but if you want to do that anyway, you can use the prove program that's shipped with Perl 5:

$ prove -e "" "./autotime test"
./autotime-tested.p6 test .. ok
All tests successful.
Files=1, Tests=4,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.23 cusr  0.02 csys =  0.28 CPU)
Result: PASS

In a terminal, this even colors the "All tests successful" output in green, to make it easier to spot. Test failures are marked up in red.

How does the testing work? The first line of code uses a new feature we haven't seen yet:

multi sub MAIN('test') {

What's that, a literal instead of a parameter in the subroutine signature? That's right. And it's a shortcut for

multi sub MAIN(Str $anon where {$anon eq 'test'}) {

except that it does not declare the variable $anon. So it's a multi candidate that you can only call by supplying the string 'test' as the sole argument.

The next line, use Test;, loads the test module that's shipped with Rakudo Perl 6. It also imports into the current lexical scope all the symbols that Test exports by default. This includes the functions plan, is and is-deeply that are used later on.

plan 4 declares that we want to run four tests. This is useful for detecting unplanned, early exits from the test code, or errors in looping logic in the test code that leads to running fewer tests than planned. If you can't be bothered to count your tests in advance, you can leave out the plan call, and instead call done-testing after your tests are done.

Both is-deeply and is expect the value to be tested as the first argument, the expected value as the second argument, and an optional test label string as the third argument. The difference is that is() compares the first two arguments as strings, whereas is-deeply uses a deep equality comparison logic using the eqv operator. Such tests only pass if the two arguments are of the same type, and recursively are (or contain) the same values.

More testing functions are available, like ok(), which succeeds for a true argument, and nok(), which expects a false argument. You can also nest tests with subtest:

#| Run internal tests
multi sub MAIN('test') {
    use Test;
    plan 2;
    subtest 'timestamp', {
        plan 2;
        is-deeply from-timestamp(1450915200), Date.new('2015-12-24'),
            'Date';;

        my $dt = from-timestamp(1450915201);
        is $dt, "2015-12-24 00:00:01",
            'DateTime with string formatting';
    };

    subtest 'from-date-string', {
        plan 2;
        is from-date-string('2015-12-24').posix, 1450915200,
            'one argument';
        is from-date-string('2015-12-24', '00:00:01').posix, 1450915201,
            'two arguments';
    };
}

Each call to subtest counts as a single test to the outer test run, so plan 4; has become plan 2;. The subtest call has a test label itself, and then inside a subtest, you have a plan again, and calls to test functions as below. This is very useful when writing custom test functions that execute a variable number of individual tests.

The output from the nested tests looks like this:

1..2
    1..2
    ok 1 - Date
    ok 2 - DateTime with string formatting
ok 1 - timestamp
    1..2
    ok 1 - one argument
    ok 2 - two arguments
ok 2 - from-date-string

The test harness now reports just the two top-level tests as the number of run (and passed) tests.

And yes, you can nest subtests within subtests, should you really feel the urge to do so.

Subscribe to the Perl 6 book mailing list

* indicates required

rakudo.org: Advance Notice: Lexical Module Loading Bug Fix

Published by Zoffix Znet on 2016-12-17T15:06:40

Please note that the NEXT (January 2017) release of the Rakudo Compiler will include a fix for a bug some may be relying on under the assumption of it being a feature. For details, please see the explanation below. If you have any further questions, please ask us on our irc.freenode.net/#perl6 IRC channel.

What

Perl 6 takes great care to avoid global state, i.e. whatever you do in your module, it should not affect other code. That’s why e.g. subroutine definitions are lexically (my) scoped by default. If you want others to see them, you need to explicitly make them our scoped or export them.

Classes are exported by default on the assumption that loading a module will not be of much use when you cannot access the classes it contains. This works as advertised with a small but important caveat. Those classes are not only visible in the computation unit that loads the module, but globally. This means that as soon as some code loads a module, those classes are immediately visible everywhere.

For example, given a module Foo:

unit class Foo;
use Bar;

And your own program:

use Foo;
my $foo = Foo.new; # works as expected
my $bar = Bar.new; # huh!? Where is Bar coming from?

Why

This doesn’t sound so bad (it at least saves you some typing), except for that it makes another feature of Perl 6 impossible to have: the ability to load multiple versions of a module at the same time in different parts of your program:

{
    use Baz:ver(v1);
    my $old-baz = Baz.new;
}
{
    use Baz:ver(v2);
    my $shiny-new-baz = Baz.new;
}

This will explode as on loading Baz:ver(v2), rakudo will complain about “Baz” already being defined.

What To Do

To fix this, we no longer register loaded classes globally but only in the scope which loaded them in the first place. Coming back to our first example, we would need to explicitly load Bar in the main program:

use Foo;
use Bar;
my $foo = Foo.new; # still works of course
my $bar = Bar.new; # now it's clear where Bar is coming from

So if you suddenly get an “Undeclared name: Bar” error message after upgrading to a newer Perl 6 compiler, you will most probably just need to add a: “use Bar;” to your code.

Pawel bbkr Pabian: Let the fake times roll...

Published by Pawel bbkr Pabian on 2016-12-14T17:59:11

In my $dayjob at GetResponse I have to deal constantly with time dependent features. For example this email marketing platform allows you to use something called 'Time Travel', which is sending messages to your contacts at desired hour in their time zones. So people around the world can get email at 8:00, when they start their work and chance for those messages message being read are highest. No matter where they live.

But even such simple feature has more pitfalls that you can imagine. For example user has three contacts living in Europe/Warsaw, America/Phoenix and Australia/Sydney time zones.

The obvious validation is to exclude nonexistent days, for example user cannot select 2017-02-29 because 2017 is not a leap year. But what if he wants to send message at 2017-03-26 02:30:00? For America/Phoenix this is piece of cake - just 7 hours difference from UTC (or unix time). For Australia/Sydney things are bit more complicated because they use daylight saving time and this is their summer so additional time shift must be calculated. And for Europe/Warsaw this will fail miserably because they are just changing to summer time from 01:59:00 to 03:00:00 and 02:30 simply does not exist therefore some fallback algorithm should be used.

So for one date and time there are 3 different algorithms that have to be tested!
Unfortunately most of the time dependent code does not expose any interface to pass current time to emulate all edge cases, methods usually call time( ) or DateTime.now( ) internally. So let's test such blackbox - it takes desired date, time and time zone and it returns how many seconds are left before message should be sent.

package Timers;

use DateTime;

sub seconds_till_send {

my $when = DateTime->new( @_ )->epoch( );
my $now = time( );

return ( $when > $now ) ? $when - $now : 0;
}

Output of this method changes in time. To test it in consistent manner we must override system time( ) call:

#!/usr/bin/env perl

use strict;
use warnings;

BEGIN {
*CORE::GLOBAL::time = sub () { $::time_mock // CORE::time };
}

use Timers;
use Test::More;

# 2017-03-22 00:00:00 UTC
$::time_mock = 1490140800;

is Timers::seconds_till_send(
'year' => 2017, 'month' => 3, 'day' => 26,
'hour' => 2, 'minute' =>30,
'time_zone' => 'America/Phoenix'
), 379800, 'America/Phoenix time zone';

Works like a charm! We have consistent test that pretends our program is ran at 2017-03-22 00:00:00 UTC and that means there are 4 days, 9 hours and 30 minutes till 2017-03-26 02:30:00 in America Phoenix.

We can also test DST case in Australia.

# 2017-03-25 16:00:00 UTC
$::time_mock = 1490457600;

is Timers::seconds_till_send(
'year' => 2017, 'month' => 3, 'day' => 26,
'hour' => 2, 'minute' =>30,
'time_zone' => 'Australia/Sydney'
), 0, 'America/Phoenix time zone';

Because during DST Sydney has +11 hours from UTC instead of 10 that means when we run our program at 2017-03-25 16:00:00 UTC requested hour already passed there and message should be sent instantly. Great!

But what about nonexistent hour in Europe/Warsaw? We need to fix this method to return some useful values in DWIM-ness spirit instead of crashing. And I haven't told you whole, scarry truth yet, because we have to solve two issues at once here. First is nonexistent hour - in this case we want to calculate seconds to nearest possible hour after requested one - so 03:00 Europe/Warsaw should be used if 02:30 Europe/Warsaw does not exist. Second is ambiguous hour that happens when clocks are moved backwards and for example 2017-10-29 02:30 Europe/Warsaw occurs twice during this day - in this case first hour occurrence should be taken - so if 02:30 Europe/Warsaw is both at 00:30 UTC and 01:30 UTC seconds are calculated to the former one. Yuck...

For simplicity let's assume user cannot schedule message more than one year ahead, so only one time change related to DST will take place. With that assumption fix may look like this:

sub seconds_till_send {
    my %params = @_;
    my $when;

# expect ambiguous hour during summer to winter time change
if (DateTime->now( 'time_zone' => $params{'time_zone'} )->is_dst) {

# attempt to create ambiguous hour is safe
# and will always point to latest hour
$when = DateTime->new( %params );

# was the same hour one hour ago?
my $tmp = $when->clone;
$tmp->subtract( 'hours' => 1 );

# if so, correct to earliest hour
if ($when->hms eq $tmp->hms) {
$when = $when->epoch - 3600;
}
else {
$when = $when->epoch;
}
}

# expect nonexistent hour during winter to summer time change
else {

do {

# attempt to create nonexistent hour will die
$when = eval { DateTime->new( %params )->epoch( ) };

# try next minute maybe...
if ( ++$params{'minute'} > 59 ) {
$params{'minute'} = 0;
$params{'hour'}++;
}

} until defined $when;

}

my $now = time( );

return ( $when > $now ) ? $when - $now : 0;
}

If your eyes are bleeding here is TL;DR. First we have to determine which case we may encounter by checking if there is currently DST in requested time zone or not. For nonexistent hour we try to brute force it into next possible time by adding one minute periods and adjusting hours when minutes overflow. There is no need to adjust days because DST never happens on date change. For ambiguous hour we check if by subtracting one hour we get the same hour (yep). If so we have to correct unix timestamp to get earliest one.

But what about our tests? Can we still write it in deterministic and reproducible way? Luckily it occurs that DateTime->now( ) uses time( ) internally so no additional hacks are needed.

# 2017-03-26 00:00:00 UTC
$::time_mock = 1490486400;

is Timers::seconds_till_send(
'year' => 2017, 'month' => 3, 'day' => 26,
'hour' => 2, 'minute' => 30,
'time_zone' => 'Europe/Warsaw'
), 3600, 'Europe/Warsaw time zone nonexistent hour';

Which is expected result, 02:30 is not available in Europe/Warsaw so 03:00 is taken that is already in DST season and 2 hours ahead of UTC.

Now let's solve leap seconds issue where because of Moon slowing down Earth and causing it to run on irregular orbit you may encounter 23:59:60 hour every few years. OK, OK, I'm just kidding :) However in good tests you should also take leap seconds into account if needed!

I hope you learned from this post how to fake time in tests to cover weird edge cases.
Before you leave I have 3 more things to share:

  1. Dave Rolsky, maintainer of DateTime module does tremendous job. This module is a life saver. Thanks!
  2. Overwrite CORE::time before loading any module that calls time( ). If you do it this way
    use DateTime;
    
    

    BEGIN {
    *CORE::GLOBAL::time = sub () { $::time_mock // CORE::time };
    }

    $::time_mock = 123;
    say DateTime->now( );

    it won't have any effect due to sub folding.


  3. Remember to include empty time signature

    BEGIN {
        # no empty signature
        *CORE::GLOBAL::time = sub { $::time_mock // CORE::time }; 
    }
    
    

    $::time_mock = 123;

    # parsed as time( + 10 )
    # $y = 123, not what you expected !
    my $y = time + 10;

    because you cannot guarantee someone always used parenthesis when calling time( ).



6guts: Complex cocktail causes cunning crash

Published by jnthnwrthngtn on 2016-12-09T00:02:33

I did a number of things for Perl 6 yesterday. It was not, however, hard to decide which of them to write up for the blog. So, let’s dig in.

Horrible hiding heisenbugs

It all started when I was looking into this RT, reporting a segfault. It was filed a while ago, and I could not reproduce it. So, add a test and case closed? Well, not so fast. As I discussed last week, garbage collection happens when a thread fills up its nursery. Plenty of bugs only show up when the timing is just right (or is that just wrong?) What can influence when GC runs? How many allocations we’ve done. And what can influence that? Pretty much any change to the program being run, the compiler, or the environment. The two that most often have people tearing their hair out are:

While GC-related issues are not the only cause of SEGVs, in such a simple piece of code using common features it’s far and away the most likely cause. So, to give myself more confidence that the bug truly was gone, I adjusted the nursery size to be just 32KB instead of 4MB, which causes GC to run much more often. This, of course, is a huge slowdown, but it’s good for squeezing out bugs.

And…no bug! So, in goes the test. Simples!

In even better news, it was lunch time. Well, actually, that was only sort of good news. A few days ago I cooked a rather nice chicken and berry pulao. It came out pretty good, but cooking 6 portions worth of it when I’m home alone for the week wasn’t so smart. I don’t want to see chicken pulao for a couple of months now. Anyway, while I was devouring some of my pulao mountain, I set off a spectest run on the 32KB nursery stress build of MoarVM, just to see if it showed up anything.

Putrid pointers

A couple of failures did, in fact, show up, one of them in constant.t. This rang a bell. I was sure somebody had a in the last couple of weeks reported a crash in that test file, which had then vanished. I checked in with the person who I vaguely recalled mentioning it and…sure enough, it was that very test file. In their normal test runs, the bug had long since vanished. I figured, having now got a reproduction of it, I should probably hunt it down right away. Otherwise, we’d probably end up playing “where’s Wally” with it for another month or ten.

So, how did the failure look?

$ ./perl6-m -Ilib t/spec/S04-declarations/constant.rakudo.moar 
Segmentation fault (core dumped)

It actually segfaulted while compiling the test file. Sad! So, where?

$ ./perl6-gdb-m -Ilib t/spec/S04-declarations/constant.rakudo.moar 
[boring output omitted]
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()

That looks…ungood. That final line is meant to be a code location, which means something decided to tell the CPU to go execute code living at the NULL address. At this point, things could go two ways: the JIT spat out something terrible, or a function pointer somewhere was NULL. But which?

(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff78cacbc in MVM_coerce_smart_stringify (tc=0x6037c0, obj=0x605c10, res_reg=0x56624d8)
    at src/core/coerce.c:214
#2  0x00007ffff789dff4 in MVM_interp_run (tc=tc@entry=0x6037c0, initial_invoke=0x60ea80, 
    invoke_data=0x56624d8) at src/core/interp.c:827
#3  0x00007ffff7978b21 in MVM_vm_run_file (instance=0x603010, 
    filename=0x7fffffffe09f "/home/jnthn/dev/rakudo/perl6.moarvm") at src/moar.c:309
#4  0x000000000040108b in main (argc=9, argv=0x7fffffffdbc8) at src/main.c:192

Phew, it looks like the second case, given there’s no JIT entry stub on the stack. So, we followed a NULL function pointer. Really?

(gdb) frame 1
#1  0x00007ffff78cacbc in MVM_coerce_smart_stringify (tc=0x6037c0, obj=0x605c10, res_reg=0x56624d8)
at src/core/coerce.c:214
214     ss = REPR(obj)->get_storage_spec(tc, STABLE(obj));

Yes, really. Presumably, that get_storage_spec is bogus. (I did a p to confirm it.) So, how is obj looking?

(gdb) p *obj
$1 = {header = {sc_forward_u = {forwarder = 0x48000000000001, sc = {sc_idx = 1, idx = 4718592}, 
  st = 0x48000000000001}, owner = 6349760, flags = 0, size = 0}, st = 0x6d06c0}

Criminally corrupt; let me count the ways. For one, 6349760 looks like a very high thread ID for a program that’s only running a single thread (they are handed out sequentially). For two, 0 is not a valid object size. And for three, idx is just a nuts value too (even Rakudo’s CORE.setting isn’t made up of 4 million objects). So, where does this object live? Well, let’s try out last week’s handy object locator to figure out:

(gdb) p MVM_gc_debug_find_region(tc, obj)
In tospace of thread 1

Well. Hmpfh. That’s actually an OK place for an object to be. Of course, the GC spaces swap often enough at this nursery size that a pointer could fail to be updated, point into fromspace after one GC run, not be used until a later GC run, and then come to point into some random bit of tospace again. How to test this hypothesis? Well, instead of 32768 bytes of nursery, what if I make it…well, 40000 maybe?

Here we go again:

$ ./perl6-gdb-m -Ilib t/spec/S04-declarations/constant.rakudo.moar 
[trust me, this omitted stuff is boring]
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff78b00db in MVM_interp_run (tc=tc@entry=0x6037c0, initial_invoke=0x0, invoke_data=0x563a450)
    at src/core/interp.c:2855
2855                    if (obj && IS_CONCRETE(obj) && STABLE(obj)->container_spec)

Aha! A crash…somewhere else. But where is obj this time?

(gdb) p MVM_gc_debug_find_region(tc, obj)
In fromspace of thread 1

Hypothesis confirmed.

Dump diving

So…what now? Well, just turn on that wonder MVM_GC_DEBUG flag and the bug will make itself clear, of course. Alas, no. It didn’t trip a single one of the sanity checks added by enabling thee flag. So, what next?

The where in gdb tells us where in the C code we are. But what high level language code was MoarVM actually running at the time? Let’s dump the VM frame stack and find out:

(gdb) p MVM_dump_backtrace(tc)
   at <unknown>:1  (./blib/Perl6/Grammar.moarvm:initializer:sym<=>)
 from gen/moar/stage2/QRegex.nqp:1378  (/home/jnthn/dev/MoarVM/install/share/nqp/lib/QRegex.moarvm:!protoregex)
 from <unknown>:1  (./blib/Perl6/Grammar.moarvm:initializer)
 from src/Perl6/Grammar.nqp:3140  (./blib/Perl6/Grammar.moarvm:type_declarator:sym<constant>)
 from gen/moar/stage2/QRegex.nqp:1378  (/home/jnthn/dev/MoarVM/install/share/nqp/lib/QRegex.moarvm:!protoregex)
 from <unknown>:1  (./blib/Perl6/Grammar.moarvm:type_declarator)
 from <unknown>:1  (./blib/Perl6/Grammar.moarvm:term:sym<type_declarator>)
 from gen/moar/stage2/QRegex.nqp:1378  (/home/jnthn/dev/MoarVM/install/share/nqp/lib/QRegex.moarvm:!protoregex)
 from src/Perl6/Grammar.nqp:3825  (./blib/Perl6/Grammar.moarvm:termish)
 from gen/moar/stage2/NQPHLL.nqp:886  (/home/jnthn/dev/MoarVM/install/share/nqp/lib/NQPHLL.moarvm:EXPR)
from src/Perl6/Grammar.nqp:3871  (./blib/Perl6/Grammar.moarvm:EXPR)
...

I’ve snipped out a good chunk of a fairly long stack trace. But look! We were parsing and compiling a constant at the time of the crash. That’s somewhat interesting, and explains why constant.t was a likely test file to show this bug up. But MoarVM has little idea about parsing or Perl 6’s idea of constants. Rather, something on that codepath of the compiler must run into a bug of sorts.

Looking at the location in interp.c the op being interpreted at the time was decont, which takes a value out of a Scalar container, if it happens to be in one. Combined with knowing what code we were in, I can invoke moar --dump blib/Perl6/Grammar.moarvm, and then locate the disassembly of initializer:sym<=>.

There were a few uses of the decont op in that function. All of them seemed to be on things looked up lexically or dynamically. So, I instrumented those ops with a fromspace check. Re-compiled, and…

(gdb) break MVM_panic
Breakpoint 1 at 0x7ffff78a19a0: file src/core/exceptions.c, line 779.
(gdb) r
Starting program: /home/jnthn/dev/MoarVM/install/bin/moar --execname=./perl6-gdb-m --libpath=/home/jnthn/dev/MoarVM/install/share/nqp/lib --libpath=/home/jnthn/dev/MoarVM/install/share/nqp/lib --libpath=. /home/jnthn/dev/rakudo/perl6.moarvm --nqp-lib=blib -Ilib t/spec/S04-declarations/constant.rakudo.moar
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, MVM_panic (exitCode=1, 
    messageFormat=0x7ffff799bc58 "Collectable %p in fromspace accessed") at src/core/exceptions.c:779
779 void MVM_panic(MVMint32 exitCode, const char *messageFormat, ...) {
(gdb) where
#0  MVM_panic (exitCode=1, messageFormat=0x7ffff799bc58 "Collectable %p in fromspace accessed")
    at src/core/exceptions.c:779
#1  0x00007ffff78ba657 in MVM_interp_run (tc=0x1, tc@entry=0x6037c0, initial_invoke=0x0, 
    invoke_data=0x604b80) at src/core/interp.c:374
#2  0x00007ffff7979071 in MVM_vm_run_file (instance=0x603010, 
    filename=0x7fffffffe09f "/home/jnthn/dev/rakudo/perl6.moarvm") at src/moar.c:309
#3  0x000000000040108b in main (argc=9, argv=0x7fffffffdbc8) at src/main.c:192

And what’s in interp.c around that line? The getdynlex op. That’s the one that is used to lookup things like $*FOO in Perl 6. So, a dynamic lexical lookup seemed to be handing back an outdated object. How could that happen?

Interesting idea is insufficient

My next idea was to see if I could catch the moment that something bad was put into the lexical. I’d already instrumented the obvious places with no luck. But…what if I could intercept every single VM register access and see if an object from fromspace was read? Hmmm… It turned out that I could make that happen with a sufficiently cunning patch. I made it opt-in rather than the default for MVM_GC_DEBUG because it’s quite a slow-down. I’m sure that this will come in really useful for finding some GC bug some day. But for this bug? It was no direct help.

It was of some indirect help, however. It suggested strongly that at the time the lexical was set (actually, it turned out to be $*LEFTSIGIL), everything was valid. Somewhere between then and the lookup of it using the getdynlex op, things went bad.

Cache corruption

So what does getdynlex actually do? It checks if the current frame declares a lexical of the specified name. If so, it returns the value. If not, it looks in the caller for the value. If that fails, it goes on to the caller’s caller, until it runs out of stack to search and then gives up.

If that’s what it really did, then this bug would never have happened. But no, people actually want Perl 6 to run fast and stuff, so we can’t just implement the simplest possible thing and go chill. Instead, there’s a caching mechanism. And, as well all know, the two hardest problems in computer science are cache invalidation and cache invalidation.

The caching is relatively simple: each frame has slots for sticking a name, register pointer, and type in it.

MVMString   *dynlex_cache_name;
MVMRegister *dynlex_cache_reg;
MVMuint16    dynlex_cache_type;

When getdynlex finds something the slow way, it then looks down the stack again and finds a frame with an empty dynlex_cache_name. It then sticks the name of dynamic lexical into the name slot, a pointer to the MoarVM lexical into the reg slot, and what type of lexical it was (native int, native num, object, etc.) into the type slot. The most interesting of these is the reg slot. The MVMRegister type is actually a union of different types that we may store in a register or lexical. We re-use the union for registers that live while the frame is on the callstack and lexicals that may need to live longer thanks to closures. So, each frame as two arrays of these:

MVMRegister *env;   /* The lexical environment */
MVMRegister *work;  /* Working space that dies with the frame */

And so the dynlex_cache_reg ends up pointing to env somewhere in the frame that we found the lexical in.

So, the big question: was the caching to blame? I shoved in a way to disable it and…the bug vanished.

Note by this point we’re up to two pieces that contribute to the bug: the GC and the dynamic lexical cache. The thing is, the dynamic lexical cache is used very heavily. My gut feeling told me there must be at least one more factor at play here.

Suspicious specialization

So, what could the other factor be? I re-enabled the cache, verified the crash came back, and then stuck MVM_SPESH_DISABLE=1 into the environment. And…no bug. So, it appeared that dynamic optimization was somehow involved too. That’s the magic that looks at what types actually show up at runtime, and compiles specialized versions of the code that fast-paths a bunch of operations based on that (this specialization being where the name “spesh” comes from). Unfortunately, MVM_SPESH_DISABLE is a rather blunt instrument. It disables a huge range of things, leaving a massive surface area to consider for the bug. Thankfully, there are some alternative environment variables that just turn off parts of spesh.

First, I tried MVM_JIT_DISABLE=1, which results in spesh interpreting the specialized version of the code rather than turning it into machine code to remove the interpreter overhead. The bug remained.

Next, I tried MVM_SPESH_OSR_DISABLE, which disables On Stack Replacement. This is a somewhat fiddly optimization that detects hot loops as they are being interpreted, pauses execution, produces an optimized version of the code, and then recalculates the program counter so it points to the appropriate point in the optimize code and continues execution. Basically, the interpreter gets the code it’s executing replaced under it – perhaps with machine code, which the interpreter is instructed to jump into immediately. Since this also fiddles with frames “in flight”, it seemed like a good candidate. But…nope. Bug remained.

Finally, I tried MVM_SPESH_INLINE_DISABLE, which disables inlining. That’s where we spot a call to a relatively small subroutine or method, and just replace the call with the code of the sub or method itself, saving the cost of setting up callframes. And…bug vanished!

So, inlining was apparently a factor too. The trouble is, that also didn’t seem to add up to an obvious bug. Consider:

sub foo($a) {
    bar($a);
}
sub bar($b) {
    my $c = $b + 6;
    $c * 6
}

Imagine that bar was to be inlined into foo. Normally they’d have lexical slots in ->env as follows:

A:  | $_ | $! | $/ | $a |
B:  | $_ | $! | $/ | $b | $c |

The environment for the frame inline(A, B) would look like:

inline(A, B):  | $_ | $! | $/ | $a | $_ | $! | $/ | $b | $c |
                \---- from A ----/  \------- from B -------/

Now, it’s easy to imagine various bugs that could arise in the initial lookup of a dynamic lexical in such a frame. Indeed, the dynamic lexical lookup code actually has two bunches of code that deal with such frames, one in the case the specialized code is being interpreted and one in the case where it has been JIT compiled. But by the time we are hitting the cache, there’s nothing smart going on at all: it’s just a cheap pointer deference.

Dastardly deoptimization

So, it seems we need a fourth ingredient to explain the bug. By now, I had a pretty good idea what it was. MoarVM doesn’t just to optimizations based on properties it can prove will always hold. It can also do speculative optimization based on properties that it expects will probably hold up. For example, suppose we have:

sub foo($a, $b) {
    $b.some-huge-complex-call($a);
    return $a.Str;
}

Imagine we’re generating a specialization of this routine for the case $a is an object of type Product. The Str method is tiny, so we go ahead and inline it. However, some-huge-complex-call takes all kinds of code paths. We can’t be sure, from our analysis, that at some point it won’t mix in to the object in $a. What if it mixes in a role that has an alternate Str method? Our inlining would break stuff! We’d end up calling the Product.Str method, not the one from the mixin.

One reaction is to say “well, we’ll just not ever optimize stuff unless we can be REALLY sure”, which is either hugely limiting or relies on much more costly analyses. The other path, which MoarVM does, is to say “eh, let’s just assume mixins won’t happen, and if they do, we’ll fix things then!” The process of fixing things up is called deoptimization. We walk the call stack, rewriting return addresses to point to the original interpreted code instead of the optimized version of the code.

But what, you might wonder, do we do if there’s a frame on the stack that is actually the result of an inlining operation? What if we’re in the code that resulted from inline(A,B), in the bit that corresponds to the code of B? Well, we have to perform – you guessed it – uninlining! The composite call frame has to be dissected, and the call stack rewritten to look like it would have if we’d been running the original interpreted code. To do this, we’d create a call frame for B, complete with space for its lexicals, and copy the lexicals from inline(A,B) that belong to B into that new buffer.

The code that does this is one of the very few parts of MoarVM that frightens me.

For good reason, it turns out. This deoptimization, together with uninlining, was the final ingredient needed for the bug. Here’s what happened:

  1. The method EXPR in Perl6::Grammar was inlined into one of its callers. This EXPR method declares a $*LEFTSIGIL variable.
  2. While parsing the constant, the $*LEFTSIGIL is assigned to the sigil of the constant being declared, if it has one (so, in constant $a = 42 it would be set to $).
  3. Something does a lookup of $*LEFTSIGIL. It is located and cached. The cache entry points into a region of the ->env of the frame that inlined, and thus incorporated, the lexical environment of EXPR.
  4. At some point, a mixin happens, causing a deoptimization of the call stack. The frame that inlined EXPR gets pulled apart. A new EXPR frame comes to exist, with the lexicals that used to live in the composite frame copied into them. Execution continues.
  5. A GC happens. The object containing the $ substring moves. The new EXPR frame’s lexical environment is updated.
  6. Another lookup of $*LEFTSIGIL happens. It hits the cache. The cache, however, still points to the place the lexical used to live in the composite frame. This memory has not been freed, because the first part of it is still being used. However, the GC no longer cares about its contents because that content is unreachable. Therefore, it contains an outdated pointer, thus leading to accessing memory that’s being used for something else entirely by that point, leading to the eventual segmentation fault.

The most natural fix was to invalidate the cache during deoptimization.

Lessons learned

The bug I wrote up last week was thanks to a comparatively simple oversight made within the scope of a few lines of a single C function. While this one could be fixed with a small amount of code added in a single file, the segfault arose from the interaction of four distinct features existing in MoarVM:

Even when a segfault was not produced, thanks to “lucky” GC timing, the bug would lead to reading of stale data. It just so turned out that the data wasn’t ever stale enough in reality to break things on this particular code path.

All of garbage collection, inlining, and deoptimization are fairly complicated. By contrast, the dynamic lexical lookup cache is fairly easy. Interestingly, it was the addition of this easy feature that introduced the bug – not because the code that was added was wrong, but rather because it did something that some other far flung piece of code – the deoptimizer – had quietly relied on not happening.

So, what might be learned for the future?

The most immediate practical learning is that taking interior pointers into mutable data structures is risky. In this case, that data structure was a composite lexical environment, that later got taken apart. Conceptually, the environment was resized and the interior pointer was off the end of the new size. This suggests either providing a safe way to acquire such a reference, or an alternative design for the dynamic lexical cache to avoid needing to do so.

Looking at the bigger picture, this is all about managing complexity. Folks who work with me tend to observe I worry a good bit about loose coupling, to the degree that I’m much more hesitant than the typical developer when it comes to code re-use. Acutely aware that re-use means use, and use means dependency, and dependency means coupling, I tend to want things to prove they really are the same thing rather than just looking like they might be the same thing. MoarVM reflects this in various ways: to the degree I can, I try to organize it as a collection of parts that either keep themselves very much to themselves, or that collaborate over a small number of very stable data structures. One of the reasons Git works architecturally is because while all the components of it are messing with the same data structure, it’s a very stable and well-understood data structure.

In this bug, MVMFrame is the data structure in question. A whole load of components know about it and work with it because – so the theory went – it’s one of the key stable data structures of the VM. Contrast it with the design of things like the Unicode normalizer or the fixed size allocator, which nothing ever pokes into directly. These are likely to want to evolve over time to choose smarter data structures, or to get extra bits of state to cope with Unicode’s latest and greatest emoji boundary specification. Therefore, all work with them is hidden nicely behind an API.

In reality, MVMFrame has grown to contain quite a fair few things as MoarVM has evolved. At the same time, its treated as a known quantity by lots of parts of the codebase. This is only sustainable if every addition to MVMFrame is followed by considering how every other part of the VM that interacts with it will be affected by the change, and making compensating changes to those components. In this case, the addition of the dynamic lexical cache into the frame data structure was not accompanied by sufficient analysis of which other parts of the VM may need compensating changes.

The bug I wrote up last week isn’t really the kind that causes an architect a headache. It was a localized coding slip-up that could happen to anyone on a bad day. It’s a pity we didn’t catch it in code review, but code reviewers are also human. This bug, by contrast, arose as a result of the complexity of the VM – or, more to the point, insufficient management of that complexity. And no, I’m not beating myself up over this. But, as MoarVM architect, this is exactly the type of bug that catches my eye, and causes me to question assumptions. In the immediate, it tells me what kinds of patches I should be reviewing really carefully. In the longer run, the nature of the MVMFrame data structure and its level of isolation from the rest of the codebase deserves some questioning.


6guts: Taking a couple of steps backwards to fix a GC bug

Published by jnthnwrthngtn on 2016-11-30T23:16:42

When I popped up with a post here on Perl 6 OO a few days ago, somebody noted in the comments that they missed my write-ups of my bug hunting and fixing work in Rakudo and MoarVM. The good news is that the absence of posts doesn’t mean an absence of progress; I’ve fixed dozens of things over the last months. It was rather something between writers block and simply not having the energy, after a day of fixing things, to write about it too. Anyway, it seems I’ve got at least some of my desire to write back, so here goes. (Oh, and I’ll try and find a moment in the coming days to reply to the other comments people wrote on my OO post too.)

Understanding a cryptic error

There are a number of ways MoarVM can come tumbling down when memory gets corrupted. Some cases show up as segmentation faults. In other cases, the VM comes across something that simply does make any kind of sense and can infer that memory has become corrupted. Two panics commonly associated with this are “zeroed target thread ID in work pass” and “invalid thread ID XXX in GC work pass”, where XXX tends to be a sizable integer. At the start of a garbage collection – where we free up memory associated with dead objects – we do something like this:

  1. Go through all the threads that have been started, and signal those that are not blocked (e.g. waiting for I/O, a lock acquisition, or for native code to finish) to come and participate in the garbage collection run.
  2. Assign each non-blocked thread itself to work on.
  3. Assign each blocked thread’s work to a non-blocked thread.

So, every thread – blocked or not – ends up assigned to a running thread to take care of its collection work. It’s the participation of multiple threads that makes the MoarVM GC parallel (which is a different thing to having a concurrent GC; MoarVM’s GC can barely claim to be that).

The next important thing to know is that the every object, at creation, is marked with the ID of the thread that allocated it. This means that, as we perform GC, we know whether the object under consideration “belongs” to the current thread we’re doing GC work for, or some other one. In the case that the ID in the object header doesn’t match up with the thread ID we’re doing GC work for, then we stick it into a list of work to pass off to the thread that is responsible. To avoid synchronization overhead, we pass then off in batches (so there’s only synchronization overhead per batch). This is far from the only way to do parallel GC (other schemes include racing to write forwarding pointers), but it keeps the communication between participating threads down and leaves little surface area for data races in the GC.

The funny thing is that if none of that really made any sense to you, it doesn’t actually matter at all, because I only told you about it all so you’d have a clue what the “work pass” in the error message means – and even that doesn’t matter much for understanding the bug I’ll eventually get around to discussing. Anyway, TL;DR version (except you did just read it all, hah!) is that if the owner ID in an object header is either zero or an out-of-range thread ID, then we can be pretty sure there’s memory corruption afoot. The pointer under consideration is either to zeroed memory, or to somewhere in memory that does not correspond to an object header.

So, let’s debug the panic!

Getting the panic is, perhaps, marginally better than a segmentation fault. I mean, sure, I’m a bit less embarrassed when Moar panics than SEGVs, and perhaps it’s mildly less terrifying for users too. But at the end of the day, it’s not much better from a debugging perspective. At the point we spot the memory corruption, we have…a pointer. That points somewhere wrong. And, this being the GC, it just came off the worklist, which is full of a ton of pointers.

If only we could know where the pointer came from, I hear you think. Well, it turns out we can: we just need to detect the problem some steps back, where the pointer is added to the worklist. In src/gc/debug.h there’s this:

#define MVM_GC_DEBUG 0

Flip that to a 1, recompile, and magic happens. Here’s a rather cut down snippet from in worklist.h:

#if MVM_GC_DEBUG
#define MVM_gc_worklist_add(tc, worklist, item) \
    do { \
        MVMCollectable **item_to_add = (MVMCollectable **)(item); \
        if (*item_to_add) { \
            if ((*item_to_add)->owner == 0) \
                MVM_panic(1, "Zeroed owner in item added to GC worklist"); \
                /* Various other checks here.... */ 
        } \
        if (worklist->items == worklist->alloc) \
            MVM_gc_worklist_add_slow(tc, worklist, item_to_add); \
        else \
            worklist->list[worklist->items++] = item_to_add; \
    } while (0)
#else
#define MVM_gc_worklist_add(tc, worklist, item) \
    do { \
        MVMCollectable **item_to_add = (MVMCollectable **)(item); \
        if (worklist->items == worklist->alloc) \
            MVM_gc_worklist_add_slow(tc, worklist, item_to_add); \
        else \
            worklist->list[worklist->items++] = item_to_add; \
    } while (0)
#endif

So, in the debug version of the macro, we do some extra checks – including the one to detect a zeroed owner. This means that when MoarVM panics, the GC code that is placing the bad pointer into the list is on the stack. Then it’s a case of using GDB (or your favorite debugger), sticking a breakpoint on MVM_panic (spelled break MVM_panic in GDB), running the code that explodes, and then typing where. In this case, I was pointed at the last line of this bit of code from roots.c:

void MVM_gc_root_add_frame_roots_to_worklist(MVMThreadContext *tc, MVMGCWorklist *worklist,
                                             MVMFrame *cur_frame) {
    /* Add caller to worklist if it's heap-allocated. */
    if (cur_frame->caller && !MVM_FRAME_IS_ON_CALLSTACK(tc, cur_frame->caller))
        MVM_gc_worklist_add(tc, worklist, &cur_frame->caller);

    /* Add outer, code_ref and static info to work list. */
    MVM_gc_worklist_add(tc, worklist, &cur_frame->outer);

So, this tells me that the bad pointer is to an outer. The outer pointer of a call frame points to the enclosing lexical scope, which is how closures work. This provides a bit of inspiration for bug hunting; for example, it would now make sense to consider codepaths that assign outer to see if they could ever fail to keep a pointer up to date. The trouble is, for such an incredibly common language feature to be broken in that way, we’d be seeing it everywhere. It didn’t fit the pattern. In fact, both my private $dayjob application that was afflicted with this, together with the whateverable set of IRC bots, had in common that they did a bunch of concurrency work and both spawned quite a lot of subprocesses using Proc::Async.

But where does the pointer point to?

Sometimes I look at a pointer and it’s obviously totally bogus (a small integer usually suggests this). But this one looked feasible; it was relatively similar to the addresses of other valid pointers. But where exactly does it point to?

There are only a few places that a GC-managed object can live. They are:

So, it would be very interesting to know if the pointer was into one of those. Now, I could just go examining it in the debugger, but with a dozen running threads, that’s tedious as heck. Laziness is of course one of the virtues of a programmer, so I wrote a function to do the search for me. Another re-compile, reproducing the bug in GDB again, and then calling that routine from the debugger told me that the pointer was into the tospace of another thread.

Unfortunately, thinking is now required

Things get just a tad mind-bending here. Normally, when a program is running, if we see a pointer into fromspace we know we’re in big trouble. It means that the pointer points to where an object used to be, but was then moved into either tospace or the old generation. But when we’re in the middle of a GC run, the two spaces are flipped. The old tospace is now fromspace, the old fromspace becomes the new tospace, and we start evacuating living objects in to it. The space left at the end will then be zeroed later.

I should mention at this point that the crash only showed up a fraction of the time in my application. The vast majority of the time, it ran just fine. The odd time, however, it would panic – usually over a zeroed thread owner, but sometimes over a junk value being in the thread owner too. This all comes down to timing: different thread are working on GC, in different runs of the program they make progress at different paces, or get head starts, or whatever, and so whether the zeroing of the unused part of tospace happened or not yet will vary.

But wait…why didn’t it catch the problem even sooner?

When the MVM_GC_DEBUG flag is turned on, it introduces quite a few different sanity checks. One of them is in MVM_ASSIGN_REF, which happens whenever we assign a reference to one object into another. (The reason we don’t simply use the C assignment operator for that is because the inter-generational write barrier is needed.) Here’s how it looks:

#if MVM_GC_DEBUG
#define MVM_ASSIGN_REF(tc, update_root, update_addr, referenced) \
    { \
        void *_r = referenced; \
        if (_r && ((MVMCollectable *)_r)->owner == 0) \
            MVM_panic(1, "Invalid assignment (maybe of heap frame to stack frame?)"); \
        MVM_ASSERT_NOT_FROMSPACE(tc, _r); \
        MVM_gc_write_barrier(tc, update_root, (MVMCollectable *)_r); \
        update_addr = _r; \
    }
#else
#define MVM_ASSIGN_REF(tc, update_root, update_addr, referenced) \
    { \
        void *_r = referenced; \
        MVM_gc_write_barrier(tc, update_root, (MVMCollectable *)_r); \
        update_addr = _r; \
    }
#endif

Once again, the debug version does some extra checks. Those reading carefully will have spotted MVM_ASSERT_NOT_FROMSPACE in there. So, if we used this macro to assign to the ->outer that had the outdated pointer, why did it not trip this check?

It turns out, because it only cared about checking if it was in fromspace of the current thread, not all threads. (This is in turn because the GC debug bits only really get any love when I’m hunting a GC bug, and once I find it then they go back in the drawer until next time around.) So, I enriched that check and…the bug hunt came to a swift end.

Right back to the naughty deed

The next time I caught it under the debugger was not at the point that the bad ->outer assignment took place. It was even earlier than that – lo and behold, inside of some of the guts that power Proc::Async. Once I got there, the problem was clear and fixed in a minute. The problem was that the callback pointer was not rooted while an allocation took place. The function MVM_repr_alloc_init can trigger GC, which can move the object pointed to by callback. Without an MVMROOT to tell the GC where the callback pointer is so it can be updated, it’s left pointing to where the callback used to be.

So, bug fixed, but you may still be wondering how exactly this bug could have led to a bad ->outer pointer in a callframe some way down the line. Well, callback is a code object, and code objects point to an outer scope (it’s actually code objects that we clone to make closures). Since we held on to an outdated code object pointer, it in turn would point to an outdated pointer to the outer frame it closed over. When we invoked callback, the outer from the code object would be copied to be the outer of the call frame. Bingo.

Less is Moar

The hard part about GCs is not just building the collector itself. It’s that collectors bring invariants that are to be upheld, and a momentary lapse in concentration by somebody writing or reviewing a patch can let a bug like this slip through. At least 95% of the time when I handwavily say, “it was a GC bug”, what I really mean was “it was a bug that arose because some code didn’t play by the rules the GC requires”. A comparatively tiny fraction of the time, there’s actually something wrong in the code living under src/gc/.

People sometimes ask me about my plans for the future of MoarVM. I often tell them that I plan for there to be less of it. In this case, the code with the bug is something that I hope we’ll eventually write in, say, NQP, where we don’t have to worry about low-level details like getting write barriers correct. It’s just binding code to libuv, a C library, and we should be able to do that using the MoarVM native calling support (which is likely mature enough by now). Alas, that also has its own set of costs, and I suspect we’d need to improve native calling performance to not come out at a measurable loss, and that means teaching the JIT to emit native calls, but we only JIT on x64 so far. “You’re in a maze of twisty VM design trade-offs, and their funny smells are all alike.”


rakudo.org: Announce: Rakudo Star Release 2016.11

Published by Steve Mynott on 2016-11-27T18:51:38

On behalf of the Rakudo and Perl 6 development teams, I’m pleased to
announce the November 2016 release of “Rakudo Star”, a useful and usable
production distribution of Perl 6. The tarball for the November 2016 release
is available from http://rakudo.org/downloads/star/.

This is the fifth post-Christmas (production) release of Rakudo Star and
implements Perl v6.c. It comes with support for the MoarVM backend (all
module tests pass on supported platforms).

Please note that this release of Rakudo Star is not fully functional with
the JVM backend from the Rakudo compiler. Please use the MoarVM backend
only.

In the Perl 6 world, we make a distinction between the language (“Perl
6”) and specific implementations of the language such as “Rakudo Perl”.
This Star release includes release 2016.11 of the Rakudo Perl 6 compiler,
version 2016.11 of MoarVM, plus various modules, documentation, and
other resources collected from the Perl 6 community.

The changes in this release are outlined below:

New in 2016.11:

Fixes:
+ Various improvements to warning/error-reporting
+ Fixed assigning values to shaped arrays through iterators [839c762]
+ Fixed Str.Int not failing on numerics with combining characters [d540fc8]
+ [JVM] Fixed <a b c>.antipairs breakage [dd7b055]
+ defined routine now correctly authothreads with Junctions [189cb23]
+ Fixed poor randomness when .pick()ing on ranges with >32-bit numbers [34e515d]
+ Fixed infix:<x> silencing Failures [2dd0ddb]
+ Fixed edge case in is-approx that triggers DivByZero exception [f7770ed]
+ (Windows) Fixed returning of an error even when succeeding in mkdir [208a4c2]
+ (Windows) Fixed precomp unable to rename a newly compiled file [44a4c75]
+ (Test.pm) Fixed indent of multi-line diag() and test failure messages [43dbc96]
+ Fixed a callframe crash due to boxing of NULL filenames [200364a]
+ ∞ ≅ ∞ now gives True [4f3681b]
+ Fixed oversharing with grammars used by multiple threads [7a456ff]
+ Fixed incorrect calculations performed by acotan(num) [8e9fd0a]
+ Fixed incorrect calculations performed by asinh(num)/acosh(num) [a7e801f]
+ Fixed acosh return values for large negative numbers [5fe8cf7]
+ asinh(-∞) now returns -∞ instead of NaN [74d0e36]
+ atanh(1) now returns ∞ instead of throwing [906719c][66726e8]
+ Fixed missing close in IO::Path.slurp(:bin) [697a0ae]
+ :U QuantHashes now auto-vivify to their correct type and not Hash [79bb867]
+ Mix/MixHash.Bag/BagHash coersion now ignores negative weights [87bba04]
+ arity-0 infix:<Z> now returns a Seq instead of a List [3fdae43]
+ Fix augment of a nested package [87880ca]
+ Smartmatch with Regex variable now returns a Match instead of Bool [5ac593e]
+ Empty ()[0] now returns Nil instead of False [f50e39b]
+ Failed IO::Socket::Async connection no longer produces unexpected crash [f50e39b]
+ Quitting Supplies with no QUIT phasers no longer unexpectedly crash [f50e39b]
+ Fixed NativeCall issues on big endian machines [627a77e]
+ Fixed broken handling of $/ in some uses of `.match` [ba152bd]
+ Fixed Lock.protect not releasing the lock on control exceptions [48c2af6]
+ MoarVM now builds on any version of macOS [b4dfed2]
+ Fixed concurrency crashes due to garbage collection [6dc5074]
+ Fixed race condition in EmptyIterator [ed2631c]
+ Fixed hang with multi-threaded long-running NativeCall calls [f99d958]
+ Made my @a[10] = ^Inf work [aedb8e7]
+ Fixed [;]:delete [3b9c4c9]
+ Fixed incorrect handling of negative weights in ∪ operator [e10f767]
+ duckmap now preserves types of Iterables [43cb55f]
+ Fixed premature overflow to Inf with large Num literals [729d7e3]
+ Fixed race condition in NativeCall callsite used by multiple threads [49fd825]
+ Fixed instabilities in programs launching many threads at startup [0134132]
+ Fixed crash when reporting X::Composition::NotComposable or
X::Inheritance::Unsupported exceptions [a822bcf]
+ Fixed clock_gettime issue on macOS [ee8ae92]
+ Fixed SEGV in multi-threaded programs with strings-as-strands [395f369]
+ `regex` TOP Grammar rule will now backtrack if needed [4ccb2f3]
+ Fixed .rotate/.reverse on 1-dimmed arrays assigning to self [2d56751]
+ Fixed exponentiation involving zeros for Complex numbers [7f32243]
+ Fixed Label.gist [29db214][53d7b72]
+ Fixed certain edge cases of incorrect stringification of Rationals
with .Str, .perl, and .base [b5aa3c5]

Additions:
+ Added TWEAK submethod for object construction [fdc90a2][9409d68]
+ Added DateTime.hh-mm-ss [bf51eca]
+ Added parse-base routine [7e21a24]
+ is-approx with no explicit tolerances now uses more complex algorithm to
choose a tolerance to use (same as old is_approx) [82432a4]
+ on failure, is-approx now displays actual values received [b4fe680]
+ Added Iterator.skip-one to skip a single value [71a01e9]
+ Added Iterator.skip-at-least to skip several values [8d357af]
+ Re-imagined Str.match [b7201a8]:
+ the family of :nth is now lazy will return whatever can find
+ non-monotonically increasing :nth iterator values will now die
+ Str.match/subst/subst-mutate now have :as adverb [1b95636][c9a24d9][aaec517]
+ &infix: now works with Setty objects [d92e1ad]
+ :ov and :ex adverbs are now illegal in Str.subst [b90c741]
+ Made nextwith/samewith/nextsame/callwith to be real subroutines [70a367d]
+ Added CX::Emit and CX::Done control exceptions [07eeea8]
+ Setty.Mix/.MixHash coercers now use Int weights instead of Bool [7ba7eb4]
+ Implemented :kv,:p,:k,:v on 1-element multidim [;] [764cfcd]
+ .chrs can now also accepts codepoint numbers as Str [4ae3f23]
+ Added support for compilation of Rakudo on Solaris [a43b0c1]
+ IterationEnd.perl/gist/Str now returns text “IterationEnd” [59bb1b1]
+ Added X::Syntax::Number::InvalidCharacter exception [2faa55b]
+ .reverse/rotate on 1-dimmed arrays are now nodal [cd765e6]
+ .line and .file on core Code now references original source files [b068e3a]
+ .rethrow now works on unthrown exceptions [58a4826]
+ All Reals now accept `Whatever` as the second argument to .base() [c1d2599]
+ sub MAIN usage message shows possible Enum values if param is
named and is an Enum [a3be654]

Efficiency:
+ Made slip(@a) about 1.2x faster [37d0e46]
+ Made initialization of 2+dimmed array 10x to 16x faster [dfb58d4]
+ Str.match is now about 5% faster [4fc17df]
+ Str.match with :nth feature is now about 2x faster [41e2572]
+ my @a = Str.match(…) is now about 5% faster [e472420]
+ Str.comb(Regex) is now about 7x faster [1794328]
+ Simple Str.subst/subst-mutate is now about 30% faster [364e67b]
+ Match.Str|prematch|postmatch is now about 2x faster [e65d931]
+ Match.Bool is now about 1% faster (not much, but used a lot) [1fce095]
+ Made ~~ /foo/ faster: 2% for successful/6% for failed matches [05b65d0]
+ Made <foo bar baz> ~~ /foo/ about 2x faster [e4dc8b6]
+ Made %h ~~ /foo/ about 2x faster [33eeb32]
+ Frequent accesses of multiple adverbs (e.g. %h<a>:p:exists)
is now 2x faster [f22f894]
+ Made infix: faster: Str: 14x, type: 10x, Range: 3.5x,
Int|Seq|Hash: 1.5x, Array: 1.2x [bc7fcc6]
+ IO::Spec::Unix.canonpath is now 7x to 50x faster [f3f00fb]
+ Baggy.roll/pick is now about 10% faster [fc47bbf]
+ Made copying shaped arrays 10x to 20x faster [a1d8e93][0cf7b36][d27ecfa]
+ Made setting up a shaped array 2x as fast [f06e4c3]
+ Made creation of typed shaped arrays 15% faster [f5bf6c1]
+ Made 2d/3d array accesses about 7x as fast [d3a0907]
+ Made AT-POS on 1,2,3dim arrays about 20x faster [feb7bcb]
+ Made creating a shaped array about 50% faster [1293188][576f3a1]
+ Made .AT-POS on 3+ dimmed arrays 20% faster [1bb5aad]
+ Made over-indexed .AT-POS on 1,2,3 dimmed arrays about 10% faster [1bb5aad]
+ Made multi-dimmed ASSIGN-POS about 30% faster [5b2bdeb]
+ Made .ASSIGN-POS for 1,2,3dimmed arrays about 40x faster [050cf72]
+ Made n-dimmed .EXISTS-POS about 1.5x faster [006f008]
+ Made .EXISTS-POS for 1,2,3dimmed arrays about 10x faster [b1c41b7]
+ Made n-dimmed DELETE-POS about 1.3x faster [6ccecb1]
+ Made .DELETE-POS for 1,2,3dimmed arrays about 20x faster [55b9e90]
+ Made n-dimmed BIND-POS about 1.3x faster [2827edb]
+ Made .BIND-POS for 1,2,3dimmed arrays about 20x faster [9f94525]
+ Made @a[10].STORE at least as fast as @a.STORE [8064ff1]
+ Made .kv on shaped Arrays about 4x faster [e42b68e]
+ Made .pairs/.antipairs on shaped Arrays about 2.8x faster [0f2566a][f608a33]
+ Made List.kv about 30% faster [7a2baf4]
+ for loops on 1-dimmed arrays are now 3x faster [ed36e60]
+ .kv on 1-dimmed arrays is now 7x faster [608de26]
+ .pairs/.antipairs on 1-dimmed arrays is now 3x faster [b7d9537][1c425f9]
+ postcircumfix[;] on 2/3 dimmed arrays is now 9x faster [0b97362]
+ Assignment to [;] for 2/3 dimmed arrays is now 10x faster [ce85ba3]
+ [;]:exists for 2/3 dimmed arrays is now 7x faster [e3e3fef]
+ [;]:delete for 2/3 dimmed arrays is now 10x faster [3b9c4c9]
+ [;] := foo for 2/3 dimmed arrays is now 10x faster [eaf4132]
+ .iterator and .values on shaped arrays are now about 4x faster [736ab11]
+ Fixed optimization of shaped arrays that gives 10% gain on simple `for`
loops and likely will give larger gains on bigger programs [b7e632e]
+ Made starting MappyIterator faster, affecting all Hash/Map/Baggy iterator
methods. 2-elem Hash iteration is 1.6x faster [97fb6c2]
+ Made starting a grepper faster: .grep on with no adverbs on 2-element list
is now 20% faster [077c8f0]
+ Made Date/DateTime creation 20% faster [0e7f480]
+ Hashes now use 4 (32-bit) or 8 (64-bit) bytes less memory per stored item [395f369]
+ Rational.Str is now about 40% faster [b5aa3c5]
+ Rational.base is now about 30% faster [b5aa3c5]
+ Various streamlining of code that’s hard to measure the final impact of

Notable changes in modules shipped with Rakudo Star:

+ DBIish: Prepare tests for lexical module loading and do not rely on your dependencies to load modules
+ Pod-To-HTML: Bump version and use a separate element as anchor for “to top” links
+ Terminal-ANSIColor: Update Pod documentation and support 256-color and 24-bit RGB modes
+ doc: Large number of changes – too many to detail here
+ json_fast: Speed gains
+ panda: Explicitly use Panda::Project as we use it in class Panda
+ perl6-lwp-simple: Change to http.perl6.org since the main site is going https and add NO_NETWORK_TESTING

There are some key features of Perl 6 that Rakudo Star does not yet
handle appropriately, although they will appear in upcoming releases.
Some of the not-quite-there features include:

* advanced macros
* non-blocking I/O (in progress)
* some bits of Synopsis 9 and 11

There is an online resource at http://perl6.org/compilers/features
that lists the known implemented and missing features of Rakudo’s
backends and other Perl 6 implementations.

In many places we’ve tried to make Rakudo smart enough to inform the
programmer that a given feature isn’t implemented, but there are many
that we’ve missed. Bug reports about missing and broken features are
welcomed at rakudobug@perl.org.

See http://perl6.org/ for links to much more information about
Perl 6, including documentation, example code, tutorials, presentations,
reference materials, design documents, and other supporting resources.
Some Perl 6 tutorials are available under the “docs” directory in
the release tarball.

The development team thanks all of the contributors and sponsors for
making Rakudo Star possible. If you would like to contribute, see
http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org
mailing list, or join us on IRC #perl6 on freenode.

6guts: Perl 6 is biased towards mutators being really simple. That’s a good thing.

Published by jnthnwrthngtn on 2016-11-25T01:09:33

I’ve been meaning to write this post for a couple of years, but somehow never quite got around to it. Today, the topic of mutator methods came up again on the #perl6 IRC channel, and – at long last – conincided with me having the spare time to write this post. Finally!

At the heart of the matter is a seemingly simple question: why does Perl 6 not have something like the C# property syntax for writing complex setters? First of all, here are some answers that are either wrong or sub-optimal:

Back to OO basics

The reason the question doesn’t have a one-sentence answer is because it hinges on the nature of object orientation itself. Operationally, objects consist of:

If your eyes glazed over on the second bullet point, then I’m glad you’re reading. If I wasn’t trying to make a point, I’d have simply written “a mechanism for calling a method on an object”. So what is my point? Here’s a quote from Alan Kay, who coined the term “object oriented”:

I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea. The big idea is “messaging”…”

For years, I designed OO systems primarily thinking about what objects I’d have. In class-based languages, this really meant what classes I’d have. How did I figure that out? Well, by thinking about what fields go in which objects. Last of all, I’d write the methods.

Funnily enough, this looks very much like procedural design. How do I build a C program? By modeling the state into various structs, and then writing functions work with with those structs. Seen this way, OO looks a lot like procedural. Furthermore, since OO is often taught as “the next step up” after procedural styles of programming, this way of thinking about objects is extremely widespread.

It’s little surprise, then, that a lot of OO code in the wild might as well have been procedural code in the first place. Many so-called OO codebases are full of DTOs (“Data Transfer Objects”), which are just bundles of state. These are passed to classes with names like DogManager. And a manager is? Something that meddles with stuff – in this case, probably the Dog DTO.

Messaging thinking

This is a far cry from how OO was originally conceived: autonomous objects, with their own inner state, reacting to messages received from the outside world, and sending messages to other objects. This thinking can be found today. Of note, it’s alive and well in the actor model. These days, when people ask me how to get better at OO, one of my suggestions is that they take a look at actors.

Since I grasped that the messages are the important thing in OO, however, the way I design objects has changed dramatically. The first question I ask is: what are the behaviors? This in turn tells me what messages will be sent. I then consider the invariants – that is, rules that the behaviors must adhere to. Finally, by grouping invariants by the state they care about, I can identify the objects that will be involved, and thus classes. In this approach, the methods come first, and the state comes last, usually discovered as I TDD my way through implementing the methods.

Accessors should carry a health warning

An accessor method is a means to access, or mutate, the state held within a particular attribute of an object. This is something I believe we should do far more hesitantly than is common. Objects are intended to hide state behind a set of interesting operations. The moment the underlying state model is revealed to the outside world, our ability to refactor is diminished. The world outside of our object couples to that view of it, and it becomes far too tempting to put operations that belong inside of the object on the outside. Note that a get-accessor is a unidirectional coupling, while a mutate-accessor implies a bidirectional (and so tighter) coupling.

But it’s not just refactoring that suffers. Mutable state is one of the things that makes programs difficult to understand and reason about. Functional programming suggests abstinence. OO suggests you just stick to a pint or two, so your side-effects will be at least somewhat less obnoxious. It does this by having objects present a nice message-y view to the outside world, and keeping mutation of state locked up inside of objects. Ideas such as value objects and immutable objects take things a step further. These have objects build new objects that incorporate changes, as opposed to mutating objects in place. Perl 6 encourages these in various ways (notice how clone lets you tweak data in the resulting object, for example).

Furthermore, Perl 6 supports concurrent and parallel programming. Value objects and immutable objects are a great fit for that. But what about objects that have to mutate their state? This is where state leakage will really, really, end up hurting. Using OO::Monitors or OO::Actors, turning an existing class into a monitor (method calls are synchronous but enforce mutual exclusion) or an actor (method calls are asynchronous and performed one at a time on a given object) is – in theory – easy. It’s only that easy, however, if the object does not leak its state, and if all complex operations on the object are expressed as a single method. Contrast:

unless $seat.passenger {
    $seat.passenger = $passenger;
}

With:

$seat.assign-to($passenger);

Where the method does:

method assign-to($passenger) {
    die "Seat already taken!" if $!passenger;
    $!passenger = $passenger;
}

Making the class of which $seat is an instance into a monitor won’t do a jot of good in the accessor/mutator case; there’s still a gaping data race. With the second approach, we’d be safe.

So if mutate accessors are so bad, why does Perl 6 have them at all?

To me, the best use of is rw on attribute accessors is for procedural programming. They make it easy to create mutable record types. I’d also like to be absolutely clear that there’s no shame in procedural programming. Good OO design is hard. There’s a reason Perl 6 has sub and method, rather than calling everything a method and then coining the term static method, because subroutine sounds procedural and “that was the past”. It’s OK to write procedural code. I’d choose to deal with well organized procedural code over sort-of-but-not-really-OO code any day. OO badly used tends to put the moving parts further from each other, rather than encapsulating them.

Put another way, class is there to serve more than one purpose. As in many languages, it doubles up as the thing used for doing real OO programming, and a way to define a record type.

So what to do instead of a fancy mutator?

Write methods for semantically interesting operations that just happen to set an attribute among their various other side-effects. Give the methods appropriate and informative names so the consumer of the class knows what they will do. And please do not try to hide complex operations, potentially with side-effects like I/O, behind something that looks like an assignment. This:

$analyzer.file = 'foo.csv';

Will lead most readers of the code to think they’re simply setting a property. The = is the assignment operator. In Perl 6, we make + always mean numeric addition, and pick ~ to always mean string concatenation. It’s a language design principle that operators should have predictable semantics, because in a dynamic language you don’t statically know the types of the operands. This kind of predictability is valuable. In a sense, languages that make it easy to provide custom mutator behavior are essentially making it easy to overload the assignment operator with additional behaviors. (And no, I’m not saying that’s always wrong, simply that it’s inconsistent with how we view operators in Perl 6.)

By the way, this is also the reason Perl 6 allows definition of custom operators. It’s not because we thought building a mutable parser would be fun (I mean, it was, but in a pretty masochistic way). It’s to discourage operators from being overloaded with unrelated and surprising meanings.

And when to use Proxy?

When you really do just want more control over something that behaves like an assignment. A language binding for a C library that has a bunch of get/set functions to work with various members of a struct would be a good example.

In summary…

Language design is difficult, and involves making all manner of choices where there is no universally right or wrong answer, but just trade-offs. The aim is to make choices that form a consistent whole – which is far, far, easier said than done becuase there’s usually a dozen different ways to be consistent too. The choice to dehuffmanize (that is, make longer) the writing of complex mutators is because it:


Steve Mynott: Rakudo Star 2016.11 Release Candidate

Published by Steve Mynott on 2016-11-20T14:01:22

There is a Release Candidate for Rakudo Star 2016.11 (currently RC2) available at

http://pl6anet.org/drop/

This includes binary installers for Windows and Mac.

Usually Star is released about every three months but last month's release didn't include a Windows installer so there is another release.

I'm hoping to release the final version next weekend and would be grateful if people could try this out on as many systems as possible.

Any feedback email steve *dot* mynott *at* gmail *dot* com

Full draft announce at

https://github.com/rakudo/star/blob/master/docs/announce/2016.11.md

brrt to the future: A guide through register allocation: Introduction

Published by Bart Wiegmans on 2016-11-06T10:29:00

This is the first post in what I intend to be a series on the register allocator for the MoarVM JIT compiler. It may be a bit less polished than usual, because I also intend to write more of these posts than I have in the past few months.

The main reason to write a register allocator is that it is needed by the compiler. The original 'lego' MoarVM JIT didn't need one, because it used what is called a 'memory-to-memory' model, meaning that every operation is expected to move operands from and to memory. In this it follows closely the behavior of virtually every other interpreter existing and especially that of MoarVM. However, many of these memory operations are logically redundant (for example, when storing and immediately loading an intermediate value, or loading the same value twice). Such redundancies are inherent to a memory-to-memory code model. In theory some of that can be optimized away, but in practice that involves building an unreasonably complicated state machine.

The new 'expression' JIT compiler was designed with the explicit (well, explicit to me, at least) goals of enabling optimization and specialization of machine code. That meant that a register-to-register code model was preferable, as it makes all memory operations explicit, which in turn enables optimization to remove some of them. (Most redundant 'load' operations can already be eliminated, and I'm plotting a way to remove most redundant 'store' operations, too). However, that also means the compiler must ensure that values can fit into the limited register set of the CPU, and that they aren't accidentally overwritten (for example as a result of a subroutine call). The job of the register allocator is to translate virtual registers to physical registers in a given code segment. This may involve modifying the original code by inserting load, store and copy operations.

Register allocation is known as a hard problem in computer science, and I think there are two reasons for that. The first reason is that finding the optimal allocation for a code segment is (probably) NP-complete. (NP-complete basically means that you have to consider all possible solutions in order to find the one you are after. A common feature of NP-complete problems is that the effect of a local choice on the global solution cannot be fully predicted). However, for what I think are excellent reasons, I can sidestep most of that complexity using the 'linear scan' register allocation algorithm. The details of that algorithm are subject of a later post.

The other reason that register allocation is hard is that the output code must meet the demanding specifications of the target CPU. For instance, some instructions take input only from specific registers, and some implicitly overwrite other registers. Calling conventions can also present a significant source of complexity as values must be placed in the right registers (or on the right stack locations) where the called function may expect them. So the register allocator must somehow encode these specific demands and ensure they are not violated.

Now that I've introduced register allocation, why it is needed, and what the challenges are, the next posts can begin to describe the solutions that I'm implementing.

gfldex: You have to take what you can get

Published by gfldex on 2016-10-25T12:53:50

And sometimes you have to get it by any means necessary. If it’s a file, you could use Jonathan Stowes URI::FetchFile. Said module checks if any of four modules are available and takes the first that sticks to turn a URI into a file on disk. There is one interesting line in his code that triggered an ENODOC.

$type = try require ::($class-name);

Here require returns a type object of a class declared by a module with the same name then that module.

Checking roast for that neat trick and playing with the whole dynamic module magic made me realise, that we don’t really cover this in the docs. When I try do handle an ENODOC I like to start with an example that compiles. This time, we need two files.

# M.pm6
unit module M;
class C is export { method m { 'method C::m' } };
class D is export { method m { 'method D::m' } };
# dynamic-modules.p6
use v6;
use lib '.';

subset C where ::('M::C');

my C $context = try { 
    CATCH { default { .note } };
    require ::('M');
    ::('M::C')
};

dd $context.HOW.^methods.elems;
dd $context.HOW.shortname($context);

Any symbol that is loaded via require will not be available at runtime. Consequently, we can’t have static type checks. Using a subset and dynamic lookup, we can get us a type object to check against. The where-clause will smart match against the type object. Since dynamic lookups are slow it may be sensible to cache the type object like so:

subset C where $ //= ::('M::C');

Now we got a type constraint to guard against require not returning a type that matches the name we expect. Please note we check against a name, not a type or interface. If you have the chance to design the modules that are loaded dynamically, you may want to define a role (that may even be empty) that must be implemented by the classes you load dynamically, to make sure you can actually call the methods you expect. Not just methods with the same name.

Now we can actually load the module by its name and resolve one of the classes dynamically and return it from the try block. Since M.pm6 defined a Module (as in Perl6::Metamodel::ModuleHOW) as its top level package, we can’t just simply take the return value of require because Module is not the most introspective thing we have in Perl 6. Please note that the symbols loaded by require are available via dynamic lookup outside the try-block. What happens if you go wild and load modules that have symbols with the same fully qualified name, I do not know. There may be dragons.

The case of loading any of a set of modules that may or may not be installed is quite a general one and to my limited knowledge we don’t got a module for that in the ecosystem yet. I therefore would like to challenge my three reads to write a module that sports the following interface.

sub load-any-module(*%module-name-to-adapter);
load-any-module({'Module::Name' => &Callable-adapter});

Whereby Callable-adapter provides a common interface to translate one sub or method call of the module to whatever user code requires. With such a module Jonathan may be able to boil URI::FetchFile down to 50 lines of code.

UPDATE:

Benchmarking a little revealed that `$` is not equivalent to `state $` while it should. To get the speedup right now use the following.

subset C where state $ = ::('M::C');

UPDATE 2:

The behaviour is by design and has been documented.


rakudo.org: Announce: Rakudo Star Release 2016.10

Published by Steve Mynott on 2016-10-23T16:01:59

On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce
the October 2016 release of “Rakudo Star”, a useful and usable production
distribution of Perl 6. The tarball for the October 2016 release is available
from http://rakudo.org/downloads/star/.

This is the fourth post-Christmas (production) release of Rakudo Star and
implements Perl v6.c. It comes with support for the MoarVM backend (all module
tests pass on supported platforms).

Please note that this release of Rakudo Star is not fully functional with the
JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

In the Perl 6 world, we make a distinction between the language (“Perl 6”) and
specific implementations of the language such as “Rakudo Perl”. This Star
release includes release 2016.10 of the Rakudo Perl 6 compiler, version 2016.10
of MoarVM, plus various modules, documentation, and other resources collected
from the Perl 6 community.

Some of the new compiler features since the last Rakudo Star release include:

+ Test.pm now provides bail-out()
+ Implemented $?MODULE and ::?MODULE
+ Implemented CompUnit::Repository::Installation::installed
+ .min/.max can now be called on Hashes
+ .succ now increments additional 29 digit ranges, such as Thai digits
+ Coercions now work in return types
+ Added RAKUDO_EXCEPTIONS_HANDLER env var to control exceptions output
+ CompUnit::Repository::Installation now cleans up short-name folders when empty
+ All Unicode quotes can now be used as quoters inside qqww/qww
+ Added basic Unicode 9 support (NFG changes for latest TR#29 still to do))
+ IO::Handle.new now uses ‘utf8’ encoding by default
+ It’s now possible to refer to sigiless parameters inside where clauses
+ Made Complex.new() return 0+0i
+ Added shortname() method to CurriedRoleHOW

Compiler maintenance since the last Rakudo Star release includes:

+ zef and panda now correctly install on OpenBSD and FreeBSD
+ Numerous improvements to content of error messages and addition of new previously-undetected warnings
+ Race conditions and deadlocks fixed
+ Large number of Hash, Bag, Baggie BagHash etc. speedups
+ Numerous improvements in CUR, offering up to 10x faster module loading
+ Many permutations() and combinations() speedups
+ Distribution::Path now handles native libraries correctly
+ Junctions now work in .classify
+ Fixed a memory leak involving EVAL
+ Fixed tab-completion issues with non-identifiers in REPL
+ Made huge improvement to CUR::Filesystem.id performance
+ Many fixes and additions improving JVM backend support
+ Removed argument-taking Dateish candidates for is-leap-year, days-in-month, and day-of-week (were never part of the spec)
+ Cool.IO no longer accepts any arguments
+ Overflow check has been removed from infix:<**>(int, int)

Notable changes in modules shipped with Rakudo Star:

+ prove6 has been added and will be used by panda unless PROVE_COMMAND env var set
+ perl6-pod-to-bigpage: added as dependency for doc tests
+ DBIish: Add simple Str support for Postgres type 790 (money)
+ Linenoise: Remove dependency on Native::Resources
+ Pod-To-HTML: Add support for lang=”” attribute, ignore .precomp and other changes
+ doc: Many documentation additions and improvements (too many to list!)
+ json-fast: Don’t explode when using to-json on a Num
+ p6-native-resources: Mark this module as deprecated in the ecosystem
+ panda: support for prove6 and uninstallation of modules and much else
+ perl6-lwp-simple: Make SSL testing optional and more control over character encoding.
+ svg-plot: Background color option and bubble plot type added.
+ perl6intro.pdf has also been updated.

There are some key features of Perl 6 that Rakudo Star does not yet handle
appropriately, although they will appear in upcoming releases. Some of the
not-quite-there features include:

+ advanced macros
+ non-blocking I/O (in progress)
+ some bits of Synopsis 9 and 11
+ There is an online resource at http://perl6.org/compilers/features that lists
the known implemented and missing features of Rakudo’s backends and other Perl
6 implementations.

In many places we’ve tried to make Rakudo smart enough to inform the programmer
that a given feature isn’t implemented, but there are many that we’ve missed.
Bug reports about missing and broken features are welcomed at
rakudobug@perl.org.

See http://perl6.org/ for links to much more information about Perl 6,
including documentation, example code, tutorials, presentations, reference
materials, design documents, and other supporting resources. Some Perl 6
tutorials are available under the “docs” directory in the release tarball.

The development team thanks all of the contributors and sponsors for making
Rakudo Star possible. If you would like to contribute, see
http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org mailing list,
or join us on IRC #perl6 on freenode.

Steve Mynott: Rakudo Star 2016.10 Release Candidate

Published by Steve on 2016-10-16T13:10:00

There is a Release Candidate for Rakudo Star 2016.10 (currently RC0) available at

http://pl6anet.org/drop/

This should be quite a bit faster than previous releases and work better on OpenBSD/FreeBSD than the previous release.

It also features "prove6" which is now used by Panda -- removing a run-time dependency on Perl 5. Although it still needs Perl 5 to build.

I'm hoping to release the final version next weekend (Oct 21st) and would be grateful if people could try this out on as many systems as possible (eg. exotic systems like Solaris-like ones and Windows!) 

Full draft announce at

https://github.com/rakudo/star/blob/master/docs/announce/2016.10.md

Note compiling under Windows is possible using the gcc which comes with Strawberry Perl and gmake running under cmd.exe.  Further instructions will be added (thanks to Christopher for feedback).

Any feedback email steve *underscore* mynott *at* gmail *dot* com

Zoffix Znet: "Perl 6: What Programming In The Future Is Like?" (Lightning Talk Slides and Video)

Published on 2016-09-29T00:00:00

Brief overview of Perl 6's multi-core power

Strangely Consistent: The curious case of the disappearing test

Published by Carl Mäsak

I've recently learned a few valuable things about testing. I outline this in my Bondcon talk — Bondcon is a fictional anti-conference running alongside YAPC::Europe 2016 in a non-corporeal location but unfortunately frozen in time due to a procrastination-related mishap, awaiting the only speaker's tuits — but I thought I might blog about it, too.

Those of us who use and rely on TDD know to test the software itself: the model, the behaviors, etc. But as a side effect of attaching TravisCI to the 007, another aspect of testing came to light: testing your repository itself. Testing code-as-artifact, not code-as-effect.

Thanks to TravisCI, we currently test a lot of linter-like things that we care about, such as four spaces for indentation, no trailing whitespace, and that META.info parses as correct JSON. That in itself is not news — it's just using the test suite as a linter.

But there are also various bits of consistency that we test, and this has turned out to be very useful. I definitely want to do more of this in the future in my projects. We often talk about coupling as something bad: if you change component A and some unrelated component B breaks, then they are coupled and things are bad.

But some types of coupling are necessary. For example, part of the point of the META.info is to declare what modules the project provides. Do you know how easy it is to forget to update META.info when you add a new module? (Hint: very.) Well, we have a test which double-checks.

We also have a consistency test that makes sure a method which does a certain resource-allocating thing also remembers to do the corresponding resource-deallocating thing. (Yes, there are still a few of those, even in memory-managed languages.) This was after a bug happened where allocations/deallocations were mismatched. The test immediately discovered another location in the code that needed fixing.

All of the consistency tests are basically programmatic ways for the test suite to send you a message from a future, smarter you that remembered to do some B action immediately after doing some A action. No wonder I love them. You could call it "managed coupling", perhaps: yes, there's non-ideal coupling, but the consistency test makes it manageable.

But the best consistency check is the reason I write this post. Here's the background. 007 has a bunch of builtins: subs, operators, but also all the types. These types need to be installed into the setting by the initialization code, so that when someone actually looks up Sub from a 007 program, it actually refers to the built-in Sub type.

Every now and again, we'd notice that we'd gotten a few new types in the language, but they hadn't been added as built-ins. So we added them manually and sighed a little.

Eventually this consistency-test craze caught up, and we got a test for this. The test is text-based, which is very ironic considering the project itself; but hold that thought.

Following up on a comment by vendethiel, I realized we could do better than text-based comparison. On the Perl 6 types side, we could simply walk the appropriate module namespaces to find all the types.

There's a general rule at play here. The consistency tests are very valuable, and testing code-as-artifact is much better than nothing. But if there's a corresponding way to do it by running the program instead of just reading it, then that way invariably wins.

Anyway, the test started doing Stash traversal, and after a few more tweaks looked really nice.

And then the world paused a bit, like a good comedian, for maximal effect.

Yes, the test now contained an excellent implementation of finding all the types in Perl 6 land. This is exactly what the builtin initialization code needed to never be inconsistent in the first place. The tree walker moved into the builtins code itself. The test file vanished in the night, its job done forever.

And that is the story of the consistency check that got so good at its job that it disappeared. Because one thing that's better than managed coupling is... no coupling.

Pawel bbkr Pabian: Oh column, where art thou?

Published by Pawel bbkr Pabian on 2016-09-26T09:53:06

When ramiroencinas added FileSystem::Capacity::VolumesInfo to Perl 6 ecosystem I've spotted that it has no macOS support. And while trying to contribute to this module I've discovered how less known Perl 6 features can save the day. What FileSystem::Capacity::VolumesInfo module does is parsing output from df command, which looks like this:

$ df -k -P
Filesystem                                  1024-blocks       Used Available Capacity  Mounted on
/dev/disk3                                   1219749248  341555644 877937604    29%    /
devfs                                               343        343         0   100%    /dev
/dev/disk1s4                                  133638140  101950628  31687512    77%    /Volumes/Untitled
map -hosts                                            0          0         0   100%    /net
map auto_home                                         0          0         0   100%    /home
map -fstab                                            0          0         0   100%    /Network/Servers
//Pawel%20Pabian@biala-skrzynka.local./Data  1951417392 1837064992 114352400    95%    /Volumes/Data
/dev/disk5s2                                 1951081480 1836761848 114319632    95%    /Volumes/Time Machine Backups
bbkr@localhost:/Users/bbkr/foo 123           1219749248  341555644 877937604    29%    /Volumes/osxfuse

(if you see wrapped or trimmed output check raw one here)

And while this looks nice for humans, it is tough task for parser.

So let's use Perl 6 features to deal with this mess.

Capture command line output.

my ($header, @volumes) = run('df', '-k', '-P', :out).out.lines;

Method run executes shell command and returns Proc object. Method out creates Pipe object to receive shell command output. Method lines splits this output into lines, first goes to $header variable, remaining to @volumes array.

Parse header.

my $parsed_header = $header ~~ /^
    ('Filesystem')
    \s+
    ('1024-blocks')
    \s+
    ('Used')
    \s+
    ('Available')
    \s+
    ('Capacity')
    \s+
    ('Mounted on')
$/;

We do it because match object keeps each capture, and each capture knows from and to position at which it matched, for example:

say $parsed_header[1].Str;
say $parsed_header[1].from;
say $parsed_header[1].to;

Will return:

1024-blocks
44
55

That will help us a lot with dynamic columns width!

Extract row values.

First we have to look at border between Filesystem and 1024-blocks column. Because Filesystem is aligned to left and 1024-blocks is aligned to right so data from both columns can occupy space between those headers, for example:

Filesystem                      1024-blocks
/dev/someverybigdisk        111111111111111
me@host:/some directory 123    222222222222
         |                      |
         |<----- possible ----->|
         |<--- border space --->|

We cannot simply split it by space. But we know where 1024-blocks columns ends, so the number that ends at the same position is our volume size. To extract it, we can use another useful Perl 6 feature - regexp position anchor.

for @volumes -> $volume {
    $volume ~~ / (\d+) <.at($parsed_header[1].to)> /;
    say 'Volume size is ' ~ $/[0] ~ 'KB';
}

That finds sequence of digits that are aligned to the position of the end of the header. Every other column can be extracted using this trick if we know the data alignment.

$volume ~~ /
    # first column is not used by module, skip it
    \s+

    # 1024-blocks, right aligned
    (\d+) <.at($parsed_header[1].to)>

    \s+

    # Used, right aligned
    (\d+) <.at($parsed_header[2].to)>

    \s+

    # Available, right aligned
    (\d+) <.at($parsed_header[3].to)>

    \s+

    # Capacity, middle aligned, never longer than header
    <.at($parsed_header[4].from)>
        \s* (\d+ '%') \s*
    <.at($parsed_header[4].to)> 

    \s+

    # Mounted on, left aligned
    <.at($parsed_header[5].from)>(.*)
$/;

Profit!

By using header name positions and position anchors in regular expression we got bomb-proof df parser for macOS that works with regular disks, pendrives, NFS / AFS / FUSE shares, weird directory names and different escaping schemas.

Zoffix Znet: Perl 6 Core Hacking: Can Has Moar Cover?

Published on 2016-09-22T00:00:00

Discussion about roast coverage of Rakudo compiler

Strangely Consistent: Where in the sky

Published by Carl Mäsak

Our 1.5yo has a favorite planet. By a long margin, it's Mars.

I've told him he's currently on Earth, and shown him where, and he's OK with that. Mars is still the favorite. Earth's moon is OK too.

Such an interest does not occur — pun not intended — in a vacuum. We have a book at home (much like this one, but a different one, and in Swedish), and we open it sometimes to admire Mars (and then everything else).

But what really left its mark is the Space Room in our local Tekniska Museet — a complete dark room with projectors filling a wall with pictures of space. Using an Xbox controller, you decide where to fly in the solar system. If anything, this is what made Mars real for our son. In that room, we've orbited Mars. We've stood on its surface, looking at the Mars moons in the Mars sky. We've admired a Mars sunrise, standing quiet together in the red sands.

Inevitably, I got the first hard science question I couldn't answer from him a couple of weeks ago.

I really should have seen it coming. I was dropping him off at kindergarten. Before we went inside, I crouched next to him to point up at the moon above the city. He looked at it and said "Moon" in Swedish. Then he turned to me, eyes intent, and said "Mars?". It was a question.

He had put together that the planets we were admiring in the book and in the Space Room were actually up there somewhere. And now he just wanted me to point him towards Mars.

I had absolutely no idea. I told him it's not on the sky right now, but for all I knew, it might have been.

Now I really want to know how to find this out. Sure, there are calculations involved. I want to learn enough about them to be able to write a small, understandable computer program for me. It should be able to answer a question such as "Where in the sky is Mars?". Being able to ask it for all the planets in our solar system seems like an easy generalization without much extra cost.

Looking at tutorials like this one with illustrations and this one with detailed calculations, I'm heartened to learn that it's not only possible to do this, but more or less the steps I hoped:

There are many complicating factors that together make a simple calculation merely approximate, and the underlying reasons are frankly fascinating, but it seems that if I just want to be able to point roughly in the right direction and have a hope of finding a planet there, a simple method will do.

I haven't written any code yet, so consider this a kind of statement of intent. I know there must be oodles of "night sky" apps and desktop programs that already present this information... but my goal is to make the calculation myself, with a program, and to get it right. Lovingly handcrafted planetary positions.

It would also be nice to be able to ask "Where in the sky is the moon?" (that one's easy to double-check) or "Where in the sky is the International Space station?". If anything, that ought to be a much simpler calculation, since these orbit Earth.

Once I can reliably calculate all the positions, being able to know at what time things rise and set would also be very useful.

I went outside to throw the garbage last night, and it turned out it was a cloudless late evening. I saw some of the brighter stars, even from the light-polluted vantage point of our yard. I may have been gazing on Mars without knowing it. It's a nice feeling to find out how to learn something. Even nicer when it's for your child, so you can show him his favorite planet in the sky.

Zoffix Znet: Perl6.Fail, Release Robots, and Other Goodies

Published on 2016-09-17T00:00:00

Description of work done to automate Perl 6 releases

Zoffix Znet: Perl 6 Core Hacking: Wrong Address; Return To Sender

Published on 2016-09-14T00:00:00

Following along with debugging dispatch bugs

Zoffix Znet: Perl 6 Core Hacking: Grammatical Babble

Published on 2016-09-12T00:00:00

Following along with fixing a grammar bug

Death by Perl6: Slightly less basic Perl6 module management

Published by Nick Logan on 2016-09-03T15:33:40

This is meant to answer some of the reoccurring questions I've seen in #perl6 about module management. Solutions use the Perl6 package manager called Zef


Q) Install a module to a custom/non-default location?
A) Use the install command with the --install-to parameter

zef --install-to=inst#/my/custom/location install My::Module

Q) I installed a module to a custom location, but rakudo doesn't see it?
A) Set the PERL6LIB env variable or -I flag accordingly

PERL6LIB="inst#/my/custom/location" perl6 -e "use Text::Table::Simple; say 'ok'"
ok

perl6 -Iinst#/my/custom/location -e "use Text::Table::Simple; say 'ok'"
ok

Q) Uninstall a module installed to a custom/non-default location?
A) Use the uninstall command with the --from parameter

zef --from=inst#/my/custom/location uninstall My::Module

Q) Install a module from a uri instead of a name?
A) JFDI - Many extraction formats are supported by default, and others can easily be added

zef install https://github.com/ugexe/zef.git
zef install https://github.com/ugexe/zef/archive/master.zip
zef install https://github.com/ugexe/zef/archive/master.tar.gz

Q) Install dependencies when the top level module fails to install?
A) Use the --serial flag in addition to the install command

# Without --serial any dependencies of My::Module would only be installed if
# My::Module passes its build/test phase (even if the dependencies pass their tests)

zef --serial install My::Module 

Q) Use TAP6 instead of prove?
A1) Temporarily disable the services that are configured to be used first via a CLI flag (--/$service-short-name>)

zef --/prove --/default-tester install My::Module
zef --/prove --/default-tester test My::Module

A2) Open resources/config.json and reorder the entries under Test so Zef::Service::TAP is first (or the only)

"Test" : [
    {
        "short-name" : "tap-harness",
        "module" : "Zef::Service::TAP",
    }
]

Q) Find out what this SHA1 string from an error message is?
A) Use the locate command with the --sha1 flag

zef --sha1 locate A9948E7371E0EB9AFDF1EEEB07B52A1B75537C31

Q) How do I search/install/etc specific versions or verion ranges?
A) The same way you'd use it in Perl6 code

zef search "CSV::Parser:ver<2.*>"
===> Found 0 results

zef search "CSV::Parser:ver<0.*>"
===> Found 2 results
...

Steve Mynott: You Wouldn't BELIEVE what I saw at YAPC::EU!!!

Published by Steve Mynott on 2016-08-28T18:57:57

We turned up in Cluj via Wizz Air to probably one of the best pre YAPC parties ever located on three levels on the rooftop of Evozon‎’s plush city centre offices. We were well supplied with excellent wine, snacks and the local Ursus beer and had many interesting conversations with old friends.

On the first day Tux spoke about his Text::CSV modules for both Perl 5 and 6 on the first day and I did a short talk later in the day on benchmarking Perl 6. Only Nicholas understood my trainspotter joke slide with the APT and Deltic! Sadly my talk clashed with Lee J talking about Git which I wanted to see so I await the youtube version! Jeff G then spoke about Perl 6 and parsing languages such as JavaScript. Sadly I missed Leon T’s Perl 6 talk which I also plan on watching on youtube. Tina M gave an excellent talk on writing command line tools. She also started the lightning talks with an evangelical talk about how tmux was better than screen. Geoffrey A spoke about configuring sudo to run restricted commands in one directory which seemed a useful technique to me. Dave C continued his conference tradition of dusting off his Perl Vogue cover and showing it again. The age of the image was emphasised by the amazingly young looking mst on it. And Stefan S ended with a call for Perl unification.

The main social event was in the courtyard of the main museum off the central square with free food and beer all evening and an impressive light show on the slightly crumbling facade. There were some strange chairs which resembled cardboard origami but proved more comfortable than they looked when I was finally able to sit in one. The quality of the music improved as the evening progressed (or maybe the beer helped) I was amazed to see Perl Mongers actually dancing apparently inspired by the younger Cluj.pm members.

Day Two started with Sawyer’s State of the Velociraptor‎ which he had, sensibly, subcontracted to various leading lights of the Perl Monger community. Sue S (former London.pm leader) was up first with a short and sweet description of London.pm. Todd R talked about Houston.pm. Aaron Crane spoke about the new improved friendlier p5p. Tina about Berlin.pm and the German Perl community site she had written back in the day. This new format worked very well and it was obvious Perl Mongers groups could learn much from each other. Max M followed with a talk about using Perl and ElasticSearch to index websites and documents and Job about accessibility.

1505 had, from the perspective of London.pm, one of the most unfortunate scheduling clashes at YAPC::EU ever, with three titans of London.pm (all former leaders) battling for audience share. I should perhaps tread carefully here lest bias become apparent but the heavyweight Sue Spence was, perhaps treacherously, talking about Go in the big room and Dave Cross and Tom talking about Perl errors and HTML forms respectively in the other rooms. This momentous event should be reproducible by playing all three talks together in separate windows once they are available.

Domm did a great talk on Postgres which made me keen to use this technology again. André W described how he got Perl 6 running on his Sailfish module phone while Larry did a good impression of a microphone stand. I missed most of Lance Wick’s talk but the bit I caught at the end made me eager to watch the whole thing.

Guinevere Nell gave a fascinating lightning talk about agent based economic modelling. Lauren Rosenfield spoke of porting (with permission) a “Python for CS” book to perl 6. Lukas Mai described his journey from Perl to Rust. Lee J talked about photography before Sue encouraged people to break the London.pm website. Outside the talk rooms on their stall Liz and Wendy had some highly cool stuffed toy Camelia butterflies produced by the Beverly Hills Teddy Bear Company and some strange “Camel Balls” bubblegum. At the end of the day Sue cat herded many Mongers to eat at the Enigma Steampunk Bar in central Cluj with the cunning ploy of free beer money (recycled from the previous year’s Sherry money).

The third day started with Larry’s Keynote in which photographs of an incredible American house “Fallingwater” and Chinese characters (including “arse rice”) featured heavily. Sweth C gave a fast and very useful introduction to swift. Nicholas C then confused a room of people for an hour with a mixture of real Perl 5 and 6 and an alternative timeline compete with T shirts. The positive conclusion was that even if the past had been different the present isn’t likely to have been much better for the Perl language family than it is now! Tom spoke about Code Review and Sawyer about new features in Perl 5.24. Later I heard Ilya talk about running Perl on his Raspberry PI Model B and increasing the speed of his application very significantly to compensate for its low speed! And we finished with lightning talks where we heard about the bug tracker OTRS (which was new to me), Job spoke about assistive tech and Nine asked us to ask our bosses for money for Perl development amongst several other talks. We clapped a lot in thanks, since this was clearly a particularly well organised YAPC::EU (due to Amalia and her team!) and left to eat pizza and fly away the next day. Some stayed to visit a salt mine (which looked most impressive from the pictures!) and some stayed longer due to Lufthansa cancelling their flights back!


6guts: Concurrency bug squishing: part 1

Published by jnthnwrthngtn on 2016-08-22T22:24:26

Most of my recent Perl 6 development time has been spent hunting down and fixing various concurrency bugs. We’ve got some nice language features in this area, and they’ve been pretty well received so far. However, compared with many areas of Perl 6, they have been implemented relatively recently. Therefore, they have had less time to mature – which, of course, means more bugs and other rough edges.

Concurrency bugs tend to be rather tedious to hunt down. This is in no small part because reproducing them reliably can be challenging. Even once a fairly reliable reproduction is available, working out what’s going on can be tricky. Like all debugging, being methodical and patient is the key. I tend to keep notes of things I’ve tried and observed, and output produced by instrumenting programs with logging (fancy words for “adding prints and exception throws when sanity checks fail”). I will use interactive debuggers when needed, but even then the data from them tends to end up in my editor on any extended bug hunt. Debugging is in some ways the experimental science of programming. Even with a good approach, being sufficiently patient – or perhaps just downright stubborn – matters plenty too.

In the next 2-3 posts here, I’ll discuss a few of the bugs I recently hunted down. It will not be anything close to an exhaustive list of them, which would surely be as boring for you to read as it would be for me to write. This is just the “greatest hits”, if you like.

A tale of silly suspicions and dodgy data

This hunt started out looking in to various hangs that had been reported. Deadlocks are a broad category of bug (there’s another completely unrelated one I’ll cover in this little series, even). However, a number of ones that had shown up looked very similar. All of them showed a number of threads trying to enter GC, stuck in the consensus loop (which, yes, really is just a loop that we go around asking, “did every thread agree we’re going to do GC yet?”)

I’ll admit I went into this one with a bit of a bias. The GC sync-up code in question is a tad clever-looking. That doesn’t mean it’s wrong, but it makes it harder to be comfortable it’s correct. I also worried a bit that the cost of the consensus loop might well be enormous. So I let myself become a bit distracted for some minutes doing a profiling run to find out. I mean, who doesn’t want to be the person who fixed concurrency bug in the GC and made it faster at the same time?!

I’ve found a lot of useful speedups over the years using callgrind. I’ve sung its praises on this blog at least once before. By counting CPU cycles, it can give very precise and repeatable measurements on things that – measured using execution time – I’d consider within noise. With such accuracy, it must be the only profiler I need in my toolbox, right?

Well, no, of course not. Any time somebody says that X tool is The Best and doesn’t explain the context when it’s The Best, be very suspicious. Running Callgrind on a multi-threaded benchmark gave me all the ammunition I could ever want for replacing the GC sync-up code. It seemed that a whopping 30% of CPU cycles were spent in the GC consensus loop! This was going to be huuuuge…

Or was it? I mean, 35% seems just a little too huge. Profiling is vulnerable to the observer effect: the very act of measuring a program’s performance inevitably changes the program’s performance. And, while on the dubious physics analogies (I shouldn’t; I was a pretty awful physicist once I reached the relativity/QM stuff), there’s a bit of an uncertainty principle thing going on. The most invasive profilers can tell you very precisely where in your program time is spent, but you’re miles off knowing how fast you’re normally going in those parts of the program. They do this by instrumenting the program (like the current Perl 6 on MoarVM profiler does) or running it on a synthetic CPU, as Callgrind does. By contrast, a sampling profiler lets your program run pretty much as normal, and just takes regular samples of the call stack. The data is much less precise about where the program is spending its time, but it is a much better reflection of how fast the program is normally going.

So what would, say, the perf sampling profiler make of the benchmark? Turns out, well less than 1% of time was spent in the GC consensus loop in question. (Interestingly, around 5% was spent in a second GC termination consensus loop, which wasn’t the one under consideration here. That will be worth looking into in the future.) The Visual Studio sampling profiler on Windows – which uses a similar methodology – also gave a similar figure, which was somewhat reassuring also.

Also working against Callgrind is the way the valgrind suite of tools deal with multi-threaded applications – in short by serializing all operations onto a single thread. For a consensus loop, which expects to be dealing with syncing up a number of running threads, this could make a radical difference.

Finally, I decided to check in on what The GC Handbook had to say on the matter. It turns out that it suggests pretty much the kind of consensus loop we already have in MoarVM, albeit rather simpler because it’s juggling a bit less (a simplification I’m all for us having in the future too). So, we’re not doing anything so unusual, and with suitable measurements it’s performing as expected.

So, that was an interesting but eventually fairly pointless detour. What makes this all the more embarassing, however, is what came next. Running an example I knew to hang under GDB, I waited for it to do so, hit Ctrl + C, and started looking at all of the threads. Here’s a summary of their states:

And yes, for the sake of this being a nice example for the blog I perhaps should not have picked one with 17 threads. Anyway, we’ll cope. First up, the easy to explain stuff. All the threads that are “blocking on concurrent queue read (cond var wait)” are fairly uninteresting. They are Perl 6 thread pool threads waiting for their next task (that is, wanting to pull an item from the scheduler’s queue, and waiting for it to be non-empty).

Thread 01 is the thread that has triggered GC. It is trying to get consensus from other threads to begin GC. A number of other threads have already been interrupted and are also in the consensus loop (those marked “in AO_load_read at MVM_gc_enter_from_interrupt”). This is where I had initially suspected the problem would be. That leaves 4 other threads.

You might wonder how we cope with threads that are simply not in a position to participate in the consensus process, because they’re stuck in OS land, blocked waiting on I/O, a lock, a condition variable, a semaphore, a thread join, and so forth. The answer is that before they hand over control, they mark themselves as blocked. Another thread will steal their work if a GC happens while the thread is blocked. When the thread becomes unblocked, it marks itself as such. However, if a GC is already happening at that point, it’s not safe for the thread to proceed. Thus, it yields until GC is done. This accounts for the 3 threads described as “in mark thread unblocked; yielded”.

Which left one thread, which was trying to acquire a lock in order to peek a queue. The code looked like this:

    if (kind != MVM_reg_obj)
        MVM_exception_throw_adhoc(tc, "Can only shift objects from a ConcBlockingQueue");

    uv_mutex_lock(&cbq->locks->head_lock);

    while (MVM_load(&cbq->elems) == 0) {
        MVMROOT(tc, root, {

Spot anything missing?

Here’s the corrected version of the code:

    if (kind != MVM_reg_obj)
        MVM_exception_throw_adhoc(tc, "Can only shift objects from a ConcBlockingQueue");

    MVM_gc_mark_thread_blocked(tc);
    uv_mutex_lock(&cbq->locks->head_lock);
    MVM_gc_mark_thread_unblocked(tc);

    while (MVM_load(&cbq->elems) == 0) {
        MVMROOT(tc, root, {

Yup, it was failing to mark itself as blocked while contending for a lock, meaning the GC could not steal its work. So, the GC’s consensus algorithm wasn’t to blame after all. D’oh.

To be continued…

I actually planned to cover a second issue in this post. But, it’s already gone midnight, and perhaps that’s enough fun for one post anyway. :-) Tune in next time, for more GC trouble and another cute deadlock!


6guts: Assorted fixes

Published by jnthnwrthngtn on 2016-07-23T17:36:04

I’ve had a post in the works for a while about my work to make return faster (as well as routines that don’t return), as well as some notable multi-dispatch performance improvements. While I get over my writer’s block on that, here’s a shorter post on a number of small fixes I did on Thursday this week.

I’m actually a little bit “between things” at the moment. After some recent performance work, my next focus will be on concurrency stability fixes and improvements, especially to hyper and race. However, a little down on sleep thanks to the darned warm summer weather, I figured I’d spend a day picking a bunch of slightly less demanding bugs off from the RT queue. Some days, it’s about knowing what you shouldn’t work on…

A nasty string bug

MoarVM is somewhat lazy about a number of string operations. If you ask it to concatenate two simple strings, it will produce a string consisting of a strand list, with two strands pointing to the two strings. Similarly, a substring operation will produce a string with one strand and an offset into the original, and a repetition (using the x operator) will just produce a string with one strand pointing to the original string and having a repetition count. Note that it doesn’t currently go so far as allowing trees of strand strings, but it’s enough to prevent a bunch of copying – or at least delay it until a bunch of it can be done together and more cheaply.

The reason not to implement such cleverness is because it’s of course a whole lot more complex than simple immutable strings. And both RT #123602 and RT #127782 were about a sequence of actions that could trigger a bug. The precise sequence of actions were a repeat, followed by a concatenation, followed by a substring with certain offsets. It was caused by an off-by-one involving the repetition optimization, which was a little tedious to find but easy to fix.

Constant folding Seqs is naughty

RT #127749 stumbled across a case where an operation in a loop would work fine if its input was variable (like ^$n X ^$n), but fail if it were constant (such as ^5 X ^5). The X operator returns a Seq, which is an iterator that produces values once, throwing them away. Thus iterating it a second time won’t go well. The constant folding optimization is used so that things like 2 + 2 will be compiled into 4 (silly in this case, but more valuable if you’re doing things with constants). However, given the 1-shot nature of a Seq, it’s not suitable for constant folding. So, now it’s disallowed.

We are anonymous

RT #127540 complained that an anon sub whose name happened to match that of an existing named sub in the same scope would trigger a bogus redeclaration error. Wait, you ask. Anonymous sub…whose name?! Well, it turns out that what anon really means is that we don’t install it anywhere. It can have a name that it knows itself by, however, which is useful should it show up in a backtrace, for example. The bogus error was easily fixed up.

Charset, :ignoremark, :global, boom

Yes, it’s the obligatory “dive into the regex compiler” that all bug fixing days seem to come with. RT #128270 mentioned that that "a" ~~ m:g:ignoremark/<[á]>/ would whine about chr being fed a negative codepoint. Digging into the MoarVM bytecode this compiled into was pretty easy, as chr only showed up one time, so the culprit had to be close to that. It turned out to be a failure to cope with end of string, and as regex bugs go wasn’t so hard to fix.

Hang, crash, wallop

This is one of those no impact on real code, but sorta embarrassing bugs. A (;) would cause an infinite loop of errors, and (;;) and [0;] would emit similar errors also. The hang was caused by a loop that did next but failed to consider that the iteration variable needed updating in the optimizer. The second was because of constructing bad AST with integers hanging around in it rather than AST nodes, which confused all kinds of things. And that was RT #127473.

Improving an underwhelming error

RT #128581 pointed out that my Array[Numerix] $x spat out an error that fell rather short of the standards we aim for in Perl 6. Of course, the error should complain that Numerix isn’t known and suggest that maybe you wanted Numeric. Instead, it spat out this:

===SORRY!=== Error while compiling ./x.pl6
An exception occurred while parameterizing Array
at ./x.pl6:1
Exception details:
  ===SORRY!=== Error while compiling
  Cannot invoke this object (REPR: Null; VMNull)
  at :

Which is ugly. The line number was at least correct, but still… Anyway, a small tweak later, it produced the much better:

$ ./perl6-m -e 'my Array[Numerix] $x;'
===SORRY!=== Error while compiling -e
Undeclared name:
    Numerix used at line 1. Did you mean 'Numeric'?

Problems mixing unit sub MAIN with where constraints

RT #127785 observed that using a unit sub MAIN – which takes the entire program body as the contents of the MAIN subroutine – seemed to run into trouble if the signature contained a where clause:

% perl6 -e 'unit sub MAIN ($x where { $^x > 1 } );  say "big"'  4
===SORRY!===
Expression needs parens to avoid gobbling block
at -e:1
------> unit sub MAIN ($x where { $^x > 1 }⏏ );  say "big"
Missing block (apparently claimed by expression)
at -e:1
------> unit sub MAIN ($x where { $^x > 1 } );⏏  say "big"

The error here is clearly bogus. Finding a way to get rid of it wasn’t too hard, and it’s what I ended up committing. I’ll admit that I’m not sure why the check involved was put there in the first place, however. After some playing around with other situations that it might have aided, I failed to find any. There were also no spectests that depended on it. So, off it went.

$?MODULE

The author of RT #128552 noticed that the docs talked about $?MODULE (“what module am I currently in”), to go with $?PACKAGE and $?CLASS. However, trying it out let to an undeclared variable error. It seems to have been simply overlooked. It was easy to add, so that’s what I did. I also found some old, provisional tests and brought them up to date in order to cover it.

Subtypes, definedness types, and parameters

The submitter of RT #127394 was creative enough to try -> SomeSubtype:D $x { }. That is, take a subset type and stick a :D on it, which adds the additional constraint that the value must be defined. This didn’t go too well, resulting in some rather strange errors. It turns out that, while picking the type apart so we can code-gen the parameter binding, we failed to consider such interesting cases. Thankfully, a small refactor made the fix easy once I’d figured out what was happening.

1 day, 10 RTs

Not bad going. Nothing earth-shatteringly exciting, but all things that somebody had run into – and so others would surely run into again in the future. And, while I’ll be getting back to the bigger, hairier things soon, spending a day making Perl 6 a little nicer in 10 different ways was pretty fun.


rakudo.org: Announce: Rakudo Star Release 2016.07

Published by Steve Mynott on 2016-07-22T10:41:46

On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the July 2016 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the July 2016 release is available from http://rakudo.org/downloads/star/.

This is the third post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

In the Perl 6 world, we make a distinction between the language (“Perl 6”) and specific implementations of the language such as “Rakudo Perl”. This Star release includes release 2016.07 of the Rakudo Perl 6 compiler, version 2016.07 of MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

Some of the new compiler features since the last Rakudo Star release include:

Compiler maintenance since the last Rakudo Star release includes:

Notable changes in modules shipped with Rakudo Star:

perl6intro.pdf has also been updated.

There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at rakudobug@perl.org.

See http://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org mailing list, or join us on IRC #perl6 on freenode.

Pawel bbkr Pabian: Comprehensive guide and tools to split monolithic database into shards using Perl

Published by Pawel bbkr Pabian on 2016-06-21T04:47:32

You can find the most recent version of this tutorial here.

Intro

When you suddenly get this brilliant idea, the revolutionary game-changer, all you want to do is to immediately hack some proof of concept to start small project flame from a spark of creativity. So I'll leave you alone for now, with your third mug of coffee and birds chirping first morning songs outside of the window...

...Few years later we meet again. Your proof of concept has grown into a mature, recognizable product. Congratulations! But why the sad face? Your clients are complaining that your product is slow and unresponsive? They want more features? They generate more data? And you cannot do anything about it, despite the fact that you bought the most shiny, expensive database server that is available?

When you were hacking your project on day 0, you were not thinking about the long term scalability. All you wanted to do was to create working prototype as fast as possible. So single database design was easiest, fastest and most obvious to use. You didn't think back then, that a single machine cannot be scaled up infinitely. And now it is already too late.

LOOKS LIKE YOU'RE STUCK ON THE SHORES OF [monolithic design database] HELL. THE ONLY WAY OUT IS THROUGH...

(DOOM quote)

Sharding to the rescue!

Sharding is the process of distributing your clients data across multiple databases (called shards). By doing so you will be able to:

But if you already have single (monolithic) database this process is like converting your motorcycle into a car... while riding.

Outline

This is step-by-step guide of a a very tricky process. And the worst thing you can do is to panic because your product is collapsing under its own weight and you have a lots of pressure from clients. The whole process may take weeks, even months. Will use significant amount of human and time resources. And will pause new features development. Be prepared for that. And do not rush to the next stage until you are absolutely sure the current one is completed.

So what is the plan?

Understand your data

In monolithic database design data classification is irrelevant but it is the most crucial part of sharding design. Your tables can be divided into three groups: client, context and neutral.

Let's assume your product is car rental software and do a quick exercise:

  +----------+      +------------+      +-----------+
  | clients  |      | cities     |      | countries |
  +----------+      +------------+      +-----------+
+-| id       |   +--| id         |   +--| id        |
| | city_id  |>--+  | country_id |>--+  | name      |
| | login    |   |  | name       |      +-----------+
| | password |   |  +------------+
| +----------+   |
|                +------+
+--------------------+  |
                     |  |     +-------+
  +---------------+  |  |     | cars  |
  | rentals       |  |  |     +-------+
  +---------------+  |  |  +--| id    |
+-| id            |  |  |  |  | vin   |
| | client_id     |>-+  |  |  | brand |
| | start_city_id |>----+  |  | model |
| | start_date    |     |  |  +-------+
| | end_city_id   |>----+  |
| | end_date      |        |
| | car_id        |>-------+   +--------------------+
| | cost          |            | anti_fraud_systems |
| +---------------+            +--------------------+
|                              | id                 |--+
|      +-----------+           | name               |  |
|      | tracks    |           +--------------------+  |
|      +-----------+                                   |
+-----<| rental_id |                                   |
       | latitude  |     +--------------------------+  |
       | longitude |     | blacklisted_credit_cards |  |
       | timestamp |     +--------------------------+  |
       +-----------+     | anti_fraud_system_id     |>-+
                         | number                   |
                         +--------------------------+

Client tables

They contain data owned by your clients. To find them, you must start in some root table - clients in our example. Then follow every parent-to-child relation (only in this direction) as deep as you can. In this case we descend into rentals and then from rentals further to tracks. So our client tables are: clients, rentals and tracks.

Single client owns subset of rows from those tables, and those rows will always be moved together in a single transaction between shards.

Context tables

They put your clients data in context. To find them, follow every child-to-parent relation (only in this direction) from every client table as shallow as you can. Skip if table is already classified. In this case we ascend from clients to cities and from cities further to countries. Then from rentals we can ascend to clients (already classified), cities (already classified) and cars. And from tracks we can ascend into rentals (already classified). So our context tables are: cities, countries and cars.

Context tables should be synchronized across all shards.

Neutral tables

Everything else. They must not be reachable from any client or context table through any relation. However, there may be relations between them. So our neutral tables are: anti_fraud_systems and blacklisted_credit_cards.

Neutral tables should be moved outside of shards.

Checkpoint

Take any tool that can visualize your database in the form of a diagram. Print it and pin it on the wall. Then take markers in 3 different colors - each for every table type - and start marking tables in your schema.

If you have some tables not connected due to technical reasons (for example MySQL partitioned tables or TokuDB tables do not support foreign keys), draw this relation and assume it is there.

If you are not certain about specific table, leave it unmarked for now.

Done? Good :)

Q&A

Q: Is it a good idea to cut all relations between client and context tables, so that only two types - client and neutral - remain?

A: You will save a bit of work because no synchronization of context data across all shards will be required. But at the same time any analytics will be nearly impossible. For example, even simple task to find which car was rented the most times will require software script to do the join. Also there won't be any protection against software bugs, for example it will be possible to rent a car that does not even exist.

There are two cases when converting a context table to neutral table is justified:

In every other case it is a very bad idea to make neutral data out of context data.

Q: Is it a good idea to shard only big tables and leave all small tables together on a monolithic database?

A: In our example you have one puffy table - tracks. It keeps GPS trail of every car rental and will grow really fast. So if you only shard this data, you will save a lot of work because there will be only small application changes required. But in real world you will have 100 puffy tables and that means 100 places in application logic when you have to juggle database handles to locate all client data. That also means you won't be able to split your clients between many data centers. Also you won't be able to reduce downtime costs to 1/nth of the amount of shards if some data corruption in monolithic database occurs and recovery is required. And analytics argument mentioned above also applies here.

It is a bad idea to do such sub-sharding. May seem easy and fast - but the sooner you do proper sharding that includes all of your clients data, the better.

Fix your monolithic database

There are a few design patterns that are perfectly fine or acceptable in monolithic database design but are no-go in sharding.

Lack of foreign key

Aside from obvious risk of referencing nonexistent records, this issue can leave junk when you will migrate clients between shards later for load balancing. The fix is simple - add foreign key if there should be one.

The only exception is when it cannot be added due to technical limitations, such as usage of TokuDB or partitioned MySQL tables that simply do not support foreign keys. Skip those, I'll tell you how to deal with them during data migration later.

Direct connection between clients

Because clients may be located on different shards, their rows may not point at each other. Typical case where it happens is affiliation.

+-----------------------+
| clients               |
+-----------------------+
| id                    |------+
| login                 |      |
| password              |      |
| referred_by_client_id |>-----+
+-----------------------+

To fix this issue you must remove foreign key and rely on software instead to match those records.

Nested connection between clients

Because clients may be located on different shards their rows may not reference another client (also indirectly). Typical case where it happens is post-and-comment discussion.

  +----------+        +------------+
  | clients  |        | blog_posts |
  +----------+        +------------+
+-| id       |---+    | id         |---+
| | login    |   +---<| client_id  |   |
| | password |        | text       |   |
| +----------+        +------------+   |
|                                      |
|    +--------------+                  |
|    | comments     |                  |
|    +--------------+                  |
|    | blog_post_id |>-----------------+
+---<| client_id    |
     | text         |
     +--------------+

First client posted something and a second client commented it. This comment references two clients at the same time - second one directly and first one indirectly through blog_posts table. That means it will be impossible to satisfy both foreign keys in comments table if those clients are not in single database.

To fix this you must choose which relation from table that refers to multiple clients is more important, remove the other foreign keys and rely on software instead to match those records.

So in our example you may decide that relation between comments and blog_posts remains, relation between comments and clients is removed and you will use application logic to find which client wrote which comment.

Accidental connection between clients

This is the same issue as nested connection but caused by application errors instead of intentional design.

                    +----------+
                    | clients  |
                    +----------+
+-------------------| id       |--------------------+
|                   | login    |                    |
|                   | password |                    |
|                   +----------+                    |
|                                                   |
|  +-----------------+        +------------------+  |
|  | blog_categories |        | blog_posts       |  |
|  +-----------------+        +------------------+  |
|  | id              |----+   | id               |  |
+-<| client_id       |    |   | client_id        |>-+
   | name            |    +--<| blog_category_id |
   +-----------------+        | text             |
                              +------------------+

For example first client defined his own blog categories for his own blog posts. But lets say there was mess with www sessions or some caching mechanism and blog post of second client was accidentally assigned to category defined by first client.

Those issues are very hard to find, because schema itself is perfectly fine and only data is damaged.

Not reachable clients data

Client tables must be reached exclusively by descending from root table through parent-to-child relations.

              +----------+
              | clients  |
              +----------+
+-------------| id       |
|             | login    |
|             | password |
|             +----------+
|
|  +-----------+        +------------+
|  | photos    |        | albums     |
|  +-----------+        +------------+
|  | id        |    +---| id         |
+-<| client_id |    |   | name       |
   | album_id  |>---+   | created_at |
   | file      |        +------------+
   +-----------+

So we have photo management software this time and when a client synchronizes photos from a camera, new album is created automatically for better import visualization. This is an obvious issue even in monolithic database - when all photos are removed from album, then it becomes zombie row. It won't be deleted automatically by cascade and cannot be matched with client anymore. In sharding, this also causes misclassification of client table as context table.

To fix this issue foreign key should be added from albums to clients. This may also fix classification for some tables below albums, if any.

Polymorphic data

Table cannot be classified as two types at the same time.

              +----------+
              | clients  |
              +----------+
+-------------| id       |-------------+
|             | login    |             |
|             | password |             |
|             +----------+         (nullable)
|                                      |
|  +-----------+        +-----------+  |
|  | blogs     |        | skins     |  |
|  +-----------+        +-----------+  |
|  | id        |    +---| id        |  |
+-<| client_id |    |   | client_id |>-+
   | skin_id   |>---+   | color     |
   | title     |        +-----------+
   +-----------+

In this product client can choose predefined skin for his blog. But can also define his own skin color and use it as well.

Here single interface of skins table is used to access data of both client and context type. A lot of "let's allow client to customize that" features end up implemented this way. While being a smart hack - with no table schema duplication and only simple WHERE client_id IS NULL OR client_id = 123 added to query to present both public and private templates for client - this may cause a lot of trouble in sharding.

The fix is to go with dual foreign key design and separate tables. Create constraint (or trigger) that will protect against assigning blog to public and private skin at the same time. And write more complicated query to get blog skin color.

              +----------+
              | clients  |
              +----------+
+-------------| id       |
|             | login    |
|             | password |
|             +----------+
|
|   +---------------+        +--------------+
|   | private_skins |        | public_skins |
|   +---------------+        +--------------+
|   | id            |--+  +--| id           |
+--<| client_id     |  |  |  | color        |
|   | color         |  |  |  +--------------+
|   +---------------+  |  |    
|                      |  |
|                   (nullable)
|                      |  |
|                      |  +------+
|                      +-----+   |
|                            |   |
|       +-----------------+  |   |
|       | blogs           |  |   |
|       +-----------------+  |   |
|       | id              |  |   | 
+------<| client_id       |  |   |
        | private_skin_id |>-+   |
        | public_skin_id  |>-----+
        | title           |
        +-----------------+

However - this fix is optional. I'll show you how to deal with maintaining mixed data types in chapter about mutually exclusive IDs. It will be up to you to decide if you want less refactoring but more complicated synchronization.

Beware! Such fix may also accidentally cause another issue described below.

Opaque uniqueness (a.k.a. horse riddle)

Every client table without unique constraint must be reachable by not nullable path of parent-to-child relations or at most single nullable path of parent-to-child relations. This is very tricky issue which may cause data loss or duplication during client migration to database shard.

              +----------+
              | clients  |
              +----------+
+-------------| id       |-------------+
|             | login    |             |
|             | password |             |
|             +----------+             |
|                                      |
|  +-----------+        +-----------+  |
|  | time      |        | distance  |  |
|  +-----------+        +-----------+  |
|  | id        |--+  +--| id        |  |
+-<| client_id |  |  |  | client_id |>-+
   | amount    |  |  |  | amount    |
   +-----------+  |  |  +-----------+
                  |  |
               (nullable)
                  |  |
                  |  |
         +--------+  +---------+
         |                     |
         |   +-------------+   |
         |   | parts       |   |
         |   +-------------+   |
         +--<| time_id     |   |
             | distance_id |>--+
             | name        |
             +-------------+

This time our product is application that helps you with car maintenance schedule. Our clients car has 4 tires that must be replaced after 10 years or 100000km and 4 spark plugs that must be replaced after 100000km. So 4 indistinguishable rows for tires are added to parts table (they reference both time and distance) and 4 indistinguishable rows are added for spark plugs (they reference only distance).

Now to migrate client to shard we have to find which rows from parts table does he own. By following relations through time table we will get 4 tires. But because this path is nullable at some point we are not sure if we found all records. And indeed, by following relations through distance table we found 4 tires and 4 spark plugs. Since this path is also nullable at some point we are not sure if we found all records. So we must combine result from time and distance paths, which gives us... 8 tires and 4 spark plugs? Well, that looks wrong. Maybe let's group it by time and distance pair, which gives us... 1 tire and 1 spark plug? So depending how you combine indistinguishable rows from many nullable paths to get final row set, you may suffer either data duplication or data loss.

You may say: Hey, that's easy - just select all rows through time path, then all rows from distance path that do not have time_id, then union both results. Unfortunately paths may be nullable somewhere earlier and several nullable paths may lead to single table, which will produce bizarre logic to get indistinguishable rows set properly.

To solve this issue make sure there is at least one not nullable path that leads to every client table (does not matter how many tables it goes through). Extra foreign key should be added between clients and part in our example.

TL;DR

Q: How many legs does the horse have?

A1: Eight. Two front, two rear, two left and two right.

A2: Four. Those attached to it.

Foreign key to not unique rows

MySQL specific issue.

CREATE TABLE `foo` (
  `id` int(10) unsigned DEFAULT NULL,
  KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `bar` (
  `foo_id` int(10) unsigned NOT NULL,
  KEY `foo_id` (`foo_id`),
  CONSTRAINT `bar_ibfk_1` FOREIGN KEY (`foo_id`) REFERENCES `foo` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

mysql> INSERT INTO `foo` (`id`) VALUES (1);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO `foo` (`id`) VALUES (1);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `bar` (`foo_id`) VALUES (1);
Query OK, 1 row affected (0.01 sec)

Which row from foo table is referenced by row in bar table?

You don't know because behavior of foreign key constraint is defined as "it there any parent I can refer to?" instead of "do I have exactly one parent?". There are no direct row-to-row references as in other databases. And it's not a bug, it's a feature.

Of course this causes a lot of weird bugs when trying to locate all rows that belong to given client, because results can be duplicated on JOINs.

To fix this issue just make sure every referenced column (or set of columns) is unique. They must not be nullable and must all be used as primary or unique key.

Self loops

Rows in the same client table cannot be in direct relation. Typical case is expressing all kinds of tree or graph structures.

+----------+
| clients  |
+----------+
| id       |----------------------+
| login    |                      |
| password |                      |
+----------+                      |
                                  |
          +--------------------+  |
          | albums             |  |
          +--------------------+  |
      +---| id                 |  |
      |   | client_id          |>-+
      +--<| parent_album_id    |
          | name               |
          +--------------------+

This causes issues when client data is inserted into target shard.

For example in our photo management software client has album with id = 2 as subcategory of album with id = 1. Then he flips this configuration, so that the album with id = 2 is on top. In such scenario if database returned client rows in default, primary key order then it won't be possible to insert album with id = 1 because it requires presence of album with id = 2.

Yes, you can disable foreign key constraints to be able to insert self-referenced data in any order. But by doing so you may mask many errors - for example broken references to context data.

For good sharding experience all relations between rows of the same table should be stored in separate table.

+----------+
| clients  |
+----------+
| id       |----------------------+
| login    |                      |
| password |                      |
+----------+                      |
                                  |
          +--------------------+  |
          | albums             |  |
          +--------------------+  |
  +-+=====| id                 |  |
  | |     | client_id          |>-+
  | |     | name               |
  | |     +--------------------+
  | |
  | |    +------------------+
  | |    | album_hierarchy  |
  | |    +------------------+
  | +---<| parent_album_id  |
  +-----<| child_album_id   |
         +------------------+

Triggers

Triggers cannot modify rows. Or roar too loud :)

            +----------+
            | clients  |
            +----------+
 +----------| id       |------------------+
 |          | login    |                  |
 |          | password |                  |
 |          +----------+                  |
 |                                        |
 |  +------------+        +------------+  |
 |  | blog_posts |        | activities |  |
 |  +------------+        +------------+  |
 |  | id         |        | id         |  |
 +-<| client_id  |        | client_id  |>-+
    | content    |        | counter    |
    +------------+        +------------+
          :                      :
          :                      :
 (on insert post create or increase activity)

This is common usage of a trigger to automatically aggregate some statistics. Very useful and safe - doesn't matter which part of application adds new blog post, activities counter will always go up.

However, when sharding this causes a lot of trouble when inserting client data. Let's say he has 4 blog posts and 4 activities. If posts are inserted first they bump activity counter through trigger and we have collision in activties table due to unexpected row. When activities are inserted first they are unexpectedly increased by posts inserts later, ending with invalid 8 activities total.

In sharding triggers can only be used if they do not modify data. For example it is OK to do sophisticated constraints using them. Triggers that modify data must be removed and their logic ported to application.

Checkpoint

Check if there are any issues described above in your printed schema and fix them.

And this is probably the most annoying part of sharding process as you will have to dig through a lot of code. Sometimes old, untested undocumented and unmaintained.

When you are done your printed schema on the wall should not contain any unclassified tables.

Ready for next step?

Prepare schema

It is time to dump monolithic database complete schema (tables, triggers, views and functions/procedures) to the shard_schema.sql file and prepare for sharding environment initialization.

Separate neutral tables

Move all tables that are marked as neutral from shard_schema.sql file to separate neutral_schema.sql file. Do not forget to also move triggers, views or procedures associated with them.

Bigints

Every primary key on shard should be of unsigned bigint type. You do not have to modify your existing schema installed on monolithic database. Just edit shard_schema.sql file and massively replace all numeric primary and foreign keys to unsigned big integers. I'll explain later why this is needed.

Create schema for dispatcher

Dispatcher tells on which shard specific client is located. Absolute minimum is to have table where you will keep client id and shard number. Save it to dispatch_schema.sql file.

More complex dispatchers will be described later.

Dump common data

From monolithic database dump data for neutral tables to neutral_data.sql file and for context tables to context_data.sql file. Watch out for tables order to avoid breaking foreign keys constraints.

Checkpoint

You should have shard_schema.sql, neutral_schema.sql, dispatch_schema.sql, neutral_data.sql and context_data.sql files.

At this point you should also freeze all schema and common data changes in your application until sharding is completed.

Set up environment

Finally you can put all those new, shiny machines to good use.

Typical sharding environment contains of:

Each database should of course be replicated.

Database for neutral data

Nothing fancy, just regular database. Install neutral_schema.sql and feed neutral_data.sql to it.

Make separate user for application with read-only grants to read neutral data and separate user with read-write grants for managing data.

Database for dispatch

Every time client logs in to your product you will have to find which shard he is located on. Make sure all data fits into RAM, have a huge connection pool available. And install dispatch_schema.sql to it.

This is a weak point of all sharding designs. Should be off-loaded by various caches as much as possible.

Databases for shards

They should all have the same power (CPU/RAM/IO) - this will speed things up because you can just randomly select shard for your new or migrated client without bothering with different hardware capabilities.

Configuration of shard databases is pretty straightforward. For every shard just install shard_schema.sql, feed context_data.sql file and follow two more steps.

Databases for shards - users

Remember that context tables should be identical on all shards. Therefore it is a good idea to have separate user with read-write grants for managing context data. Application user should have read-only access to context tables to prevent accidental context data change.

This ideal design may be too difficult to maintain - every new table will require setting up separate grants. If you decide to go with single user make sure you will add some mechanism that monitors context data consistency across all shards.

Databases for shards - mutually exclusive primary keys

Primary keys in client tables must be globally unique across whole product.

First of all - data split is a long process. Just pushing data between databases may take days or even weeks! And because of that it should be performed without any global downtime. So during monolithic to shard migration phase new rows will still be created in monolithic database and already migrated users will create rows on shards. Those rows must never collide.

Second of all - sharding does not end there. Later on you will have to load balance shards, move client between different physical locations, backup and restore them if needed. So rows must never collide at any time of your product life.

How to achieve that? Use offset and increment while generating your primary keys.

MySQL has ready to use mechanism:

Now your first shard for any table will generate 1, 101, 201, 301, ... , 901, 1001, 1101 auto increments and second shard will generate 2, 102, 202, 302, ... , 902, 1002, 1102 auto increments. And that's all! Your new rows will never collide, doesn't matter which shard they were generated on and without any communication between shards needed.

TODO: add recipes for another database types

Now you should understand why I've told you to convert all numerical primary and foreign keys to unsigned big integers. The sequences will grow really fast, in our example 100x faster than on monolithic database.

Remember to set the same increment and corresponding offsets on replicas. Forgetting to do so will be lethal to whole sharding design.

Checkpoint

Your database servers should be set up. Check routings from application, check user grants. And again - remember to have correct configurations (in puppet for example) for shards and their replicas offsets. Do some failures simulations.

And move to the next step :)

Q&A

Q: Can I do sharding without dispatch database? When client wants to log in I can just ask all shards for his data and use the one shard that will respond.

A: No. This may work when you start with three shards, but when you have 64 shards in 8 different data centers such fishing queries become impossible. Not to mention you will be very vulnerable to brute-force attacks - every password guess attempt will be multiplied by your application causing instant overload of the whole infrastructure.

Q: Can I use any no-SQL technology for dispatch and neutral databases?

A: Sure. You can use it instead of traditional SQL or as a supplementary cache.

Adapt code

Product

There will be additional step in your product. When user logs in then dispatch database must be asked for shard number first. Then you connect to this shard and... it works! Your code will also have to use separate database connection to access neutral data. And it will have to roll shard when new client registers and note this selection in dispatch database.

That is the beauty of whole clients sharding - majority of your code is not aware of it.

Synchronization of common data

If you modify neutral data this change should be propagated to every neutral database (you may have more of those in different physical locations).

Same thing applies to context data on shard, but all auto increment columns must be forced. This is because every shard will generate different sequence. When you execute INSERT INTO skins (color) VALUES ('#AA55BC') then shard 1 will assign different IDs for them than shard 2. And all client data that reference this language will be impossible to migrate between shards.

Dispatch

Dispatch serves two purposes. It allows you to find client on shard by some unique attribute (login, email, and so on) and it also helps to guarantee such uniqueness. So for example when new client is created then dispatch database must be asked if chosen login is available. Take an extra care of dispatch database. Off-load as much as you can by caching and schedule regular consistency checks between it and the shards.

Things get complicated if you have shards in many data centers. Unfortunately I cannot give you universal algorithm of how to keep them in sync, this is too much application specific.

Analytic tools

Because your clients data will be scattered across multiple shard databases you will have to fix a lot of global queries used in analytical and statistical tools. What was trivial previously - for example SELECT city.name, COUNT(*) AS amount FROM clients JOIN cities ON clients.city_id = cities.id GROUP BY city.name ORDER BY amount DESC LIMIT 8 - will now require gathering all data needed from shards and performing intermediate materialization for grouping, ordering and limiting.

There are tools that helps you with that. I've tried several solutions, but none was versatile, configurable and stable enough that I could recommend it.

Make a wish-ard

We got to the point when you have to switch your application to sharding flow. To avoid having two versions of code - old one for still running monolithic design and a new one for sharding, we will just connect monolithic database as another "fake" shard.

First you have to deal with auto increments. Set up in the configuration the same increment on your monolithic database as on shards and set up any free offset. Then check what is the current auto increment value for every client or context table and set the same value for this table located on every shard. Now your primary keys won't collide between "fake" and real shards during the migration. But beware: this can easily overflow tiny or small ints in your monolithic database, for example just adding three rows can overflow tiny int unsigned capacity of 255.

After that synchronize data on dispatch database - every client you have should point to this "fake" shard. Deploy your code changes to use dispatch logic.

Checkpoint

You should be running code with complete sharding logic but on reduced environment - with only one "fake" shard made out of your monolithic database. You may also already enable creating new client accounts on your real shards.

Tons of small issues to fix will pop up at this point. Forgotten pieces of code, broken analytics tools, broken support panels, need of neutral or dispatch databases tune up.

And when you squash all the bugs it is time for grande finale: clients migration.

Migrate clients to shards

Downtime semaphore

You do not need any global downtime to perform clients data migration. Disabling your whole product for a few weeks would be unacceptable and would cause huge financial loses. But you need some mechanism to disable access for individual clients while they are migrated. Single flag in dispatch databases should do, but your code should be aware of it and present nice information screen for client when he tries to log in. And of course do not modify clients data.

Timing

If you have some history of your client habits - use it. For example if client is usually logging in at 10:00 and logging out at 11:00 schedule his migration to another hour. You may also figure out which timezones clients are in and schedule migration for the night for each of them. The migration process should be as transparent to client as possible. One day he should just log in and bam - fast and responsive product all of a sudden.

Exodus tool

Exodus was the tool used internally at GetResponse.com to split monolithic database into shards. And is still used to load balance sharding environment. It allows to extract subset of rows from relational database and build series of queries that can insert those rows to another relational database with the same schema.

Fetch Exodus.pm and create exodus.pl file in the same directory with the following content:

#!/usr/bin/env perl

use strict;
use warnings;

my $database = DBI->connect(
    'DBI:mysql:database=test;host=localhost',
    'root', undef,
    {'RaiseError' => 1}
);

my $exodus = Exodus->new(
    'database' => $database,
    'root'     => 'clients',
);

$exodus->extract( 'id' => 1 );

Of course provide proper credentials to connect to your monolithic database.

Now when you call the script it will extract all data for client represented by record of id = 1 in clients root table. You can directly pipe it to another database to copy the client there.

./exodus.pl | mysql --host=my-shard-1 --user=....

Update dispatch shard for this client, check that product works for him and remove his rows from monolithic database.

Repeat for every client.

MySQL issues

Do not use user with SUPER grant to perform migration. Not because it is unsafe, but because they do not have locales loaded by default. You may end up with messed character encodings if you do so.

MySQL is quite dumb when it comes to cascading DELETE operations. If you have such schema

            +----------+
            | clients  |
            +----------+
 +----------| id       |----------------+
 |          | login    |                |
 |          | password |                |
 |          +----------+                |
 |                                      |
 |  +-----------+        +-----------+  |
 |  | foo       |        | bar       |  |
 |  +-----------+        +-----------+  |
 |  | id        |----+   | id        |  |
 +-<| client_id |    |   | client_id |>-+
    +------------+   +--<| foo_id    |
                         +----------+

and all relations are ON DELETE CASCADE then sometimes it cannot resolve proper order and may try to delete data from foo table before data from bar table, causing constraint error. In such cases you must help it a bit and manually delete clients data from bar table before you will be able to delete row from clients table.

Virtual foreign keys

Exodus allows you to manually add missing relations between tables, that couldn't be created in the regular way for some reasons (for example table engine does not support foreign keys).

my $exodus = Exodus->new(
    'database' => $dbh,
    'root' => 'clients',
    'relations' => [
        {
            'nullable'      => 0,
            'parent_table'  => 'foo',
            'parent_column' => 'id'
            'child_table'   => 'bar',
            'child_column'  => 'foo_id',
        },
        { ... another one ...}
    ]
);

Note that if foreign key column can be NULL such relation should be marked as nullable.

Also remember to delete rows from such table when your client migration is complete. Due to lack of foreign key it won't auto cascade.

Cleanup

When all of your clients are migrated simply remove your "fake" shard from infrastructure.

Exodus roadmap

I have a plans of refactoring this code to Perl 6 (it was prototyped in Perl 6, although back in the days when GetResponse introduced sharding Perl 6 was not fast or stable enough to deal with terabytes of data). It should get proper abstracts, more engines support and of course good test suite.

YAPC NA 2016 "Pull request challenge" seems like a good day to begin :)

Contact

If you have any questions about database sharding or want to contribute to this guide or Exodus tool contact me in person or on IRC as "bbkr".

Final screen

YOU'VE PROVEN TOO TOUGH FOR [monolithic database design] HELL TO CONTAIN

(DOOM quote)

Strangely Consistent: Train tracks

Published by Carl Mäsak

I don't know anything, but I do know that everything is interesting if you go into it deeply enough. — Richard Feynman

Someone gives you a box with these pieces of train track:

It's possible you quickly spot the one obvious way to put these pieces into one single train track:

But... if you're like me, you might stop and wonder:

Is that the only possible track? Or is there some equally possible track one could build?

I don't just mean the mirror image of the track, which is obviously a possibility but which I consider equivalent:

I will also not be interested in complete tracks that fail to use all of the pieces:

I would also reject all tracks that are physically impossible because of two pieces needing to occupy the same points in space. The only pieces which allow intersections even in 2D are the two bridge pieces, which allow something to pass under them.

I haven't done the search yet. I'm writing these words without first having written the search algorithm. My squishy, unreliable wetware is pretty certain that the obvious solution is the only one. I would love to be surprised by a solution I didn't think of!

Here goes.

Yep, I was surprised. By no less than nine other solutions, in fact. Here they are.

I really should have foreseen most of those solutions. Here's why. Already above the fold I had identified what we could call the "standard loop":

This one is missing four curve pieces:

But we can place them in a zig-zag configuration...

...which work a little bit like straight pieces in that they don't alter the angle. And they only cause a little sideways displacement. Better yet, they can be inserted into the standard loop so they cancel each other out. This accounts for seven solutions:

If we combine two zig-zag pieces in the following way:

...we get a piece which can exactly cancel out two orthogonal sets of straight pieces:

This, in retrospect, is the key to the final two solutions, which can now be extended from the small round track:

If I had required that the track pass under the bridge, then we are indeed back to only the one solution:

(But I didn't require that, so I accept that there are 10 solutions, according to my original search parameters.)

But then reality ensued, and took over my blog post. Ack!

For fun, I just randomly built a wooden train track, to see if it was on the list of ten solutions. It wasn't.

When I put this through my train track renderer, it comes up with a track that doesn't quite meet up with itself:

But it works with the wooden pieces. Clearly, there is some factor here that we're not really accounting for.

That factor turns out to be wiggle, the small amounts that two pieces can shift and rotate around the little connector that joins them:

Since there are sixteen pieces, there are sixteen connection points where wiggle can occur. All that wiggle adds up, which explains how the misrendered tack above can be made to cleanly meet up with itself.

Think of the gap in the misrendered track as a 2D vector (x, y). What was special about the ten solutions above is that they had a displacement of (0, 0). But there are clearly other small but non-zero displacements that wiggle can compensate for, leading to more working solutions.

How many working solutions? Well, armed with the new understanding of displacements-as-vectors, we can widen the search to tolerate small displacements, and then plot the results in the plane:

That's 380 solutions. Now, I hasten to add that not all of those are possible. At least one solution that I tried actually building turned out to be impossible because the track tried to run into itself in an unfixable way.

With yet another 8-shaped solution, I had given up trying to make it work because it seemed there was too much displacement. Then my wife stopped by, adjusted some things, and voilà! — it worked. (She had the idea to run the track through one of the small side-tunnels under the bridge; something that hadn't occurred to me at all. There's no way to run a wooden train through that small tunnel, but that also wasn't something I restricted up front. Clearly the track itself is possible.)

Anyway, I really enjoyed how this problem turned out to have a completely solvable constraint-programming core of 10 solutions, but then also a complete fauna — as yet largely unclassified — of approximate, more-or-less buildable solutions around it. All my searching and drawing can be found in this Github repository — others who wish to experiment may want to start from the ideas in there, rather than from scratch.

Essentially, all models are wrong, but some are useful. — George E. P. Box

brrt to the future: In praise of incremental change

Published by Bart Wiegmans on 2016-06-04T17:01:00

Hi everybody, welcome back. I'd like to share with you the things that have been happening in the MoarVM JIT since I've last posted, which was in fact March. To be brief, the JIT has been adapted to deal with moving frames, and I've started to rewrite the register allocator towards a much better design, if I do say so myself.

First, moving frames. jnthn has already written quite a bit about them. The purpose of it is to make the regular case of subroutine invocation cheaper by making special cases - like closures and continuations - a bit more expensive. I have nothing to add to that except that this was a bit of a problem for the JIT compiler. The JIT compiler liked to keep the current frame in a non-volatile register. These are CPU registers which are supposed not to be overwritten when you call a function. This is useful because it speeds up access of frequently used values. However, if the current frame object is be moved, the frame pointer in this register becomes stale. Thus, it had to go, and now we load the frame pointer from the thread context object (i.e. in memory), which never moves.

Unfortunately, that was not sufficient. Because MoarVM is an interpreter, control flow (like returning from a routine) is implemented updating the pointer to the next instruction (and the next frame). JIT compiled code never deals with this instruction pointer. Hence, whenever this instruction pointer could have been updated - we call this invokish or throwish code - the JIT may need to return control to the interpreter, which can then figure out what to do next. Originally, this had been implemented by comparing the frame pointer of the JIT frame - stored in the non-volatile register - with the frame pointer as understood by the interpreter - i.e., in the thread context object. This check no longer worked, because a): we didn't have a permanent pointer to the current frame anymore, and b): the current frame pointer might change for two different reasons, namely control flow and object movement.

I figured out a solution to this issue when I realized that what we really needed is a way to identify (cheaply) in the JIT whether or not we have changed control flow, i.e. whether we have entered another routine or returned out of the current one. This might be achieved by comparing immutable locations, but lacking those, another method is to simply assign increasing numbers to constructed frames. Such a sequence number then identifies the current position in the control flow, and whenever it is changed the JIT knows to return control to the interpreter. This caused some issues at first when I hadn't correctly updated the code in all places where the interpreter changed the current instruction, but afterwards it worked correctly. Special thanks go to lizmat who allowed me to debug this on Mac OS X, where it was broken.

Afterwards, I've focused on improving the register allocator. The primary function of a register allocator is to ensure that the values used in a calculations are placed in (correct) registers prior to that calculation. This involves, among other things, assigning the correct registers (some operations only work on specific registers), spilling registers to memory in order to make place, loading spilled values from memory if necessary, and ensuring that values in volatile registers are spilled prior to a function call. This was rather difficult because in the old design because, as it was inlined into the compilation step, it  couldn't really look behind or ahead, which is a problem if you want to place something correctly for future use. Furthermore, it allowed only for a one-on-one correspondence between a value that was computed and its current location in a register. That is a problem whenever -a value is copied to a different register, or stored in multiple memory locations.

So I've been, very slowly and methodically, in very small steps, moving code and structures through the JIT compiler in order to arrive at a register allocator that can handle these things. The first thing I did was remove the register allocation step out of compilation, into its own step (commit and another commit). Then I extracted the value descriptor structure - which describes in which location a value is stored - out of the expression tree (commit). I stored the tile list in a vector, in order to allow reverse and random access (commit). Because the register allocator works in a single pass and only requires temporary structures, I've 'internalized' it to its own file (commit one and commit two). Finally, I replaced the per-expression value structure with value descriptor structures (commit).

This places me in a position to replace register allocator structures (such as the active stack with an expiry heap), implement loads and stores, record register requirements per tile, implement pre-coloring, and correct allocation over basic blocks. All these things were impossible, or at least very difficult, with the old design.

What I think is interesting here is that in each of these commits, the logic of the program didn't substantially change, and the JIT continued to just as well as it had before. Nevertheless, all of this is progress - I replaced a rather broken design assumption (online register allocation with a value state machine) with another (offline register allocation with value descriptors) - that allows me to implement the necessary mechanics in a straightforward manner. And that, I think, demonstrates the power of incremental changes.

rakudo.org: Announce: Windows (64 bit) Installer for Rakudo Star 2016.04

Published by Steve Mynott on 2016-05-03T21:05:34

A Windows MSI installer is now available for x86_64 (64bit) platforms (probably Windows 7 or better) and has the JIT (Just in Time compiler) enabled.  This version was built with mingw (from Strawberry Perl) rather than the usual MSVC which may have performance implications. In order to use the module installer “panda” you will also need to install Strawberry Perl and Git for Windows. Also see http://www.perl6.org/downloads/ for Errata notices concerning panda.

Unfortunately the usual second installer targeting x86 (32bit) platforms isn’t currently available.  It’s hoped this will be available in the near future and until then the previous 2016.01 is recommended for x86 (32bit). No 32 bit versions feature the JIT. MSIs are available from http://rakudo.org/downloads/star/.