pl6anet

Perl 6 RSS Feeds

Steve Mynott (Freenode: stmuk) steve.mynott (at)gmail.com / 2017-09-22T09:12:14


p6steve: Clone Wars

Published by p6steve on 2017-09-20T18:52:04

Apologies to those that have OO steeped in their blood. I am a wary traveller in OO space,  maybe I am an technician, not an architect at heart. So for me, no sweeping frameworks unless and until they are needed. And, frankly, one can go a long way on procedural code with subroutines to gather repetitive code sequences.

(And don’t get me started on functional programming…)

Some time ago, I was tasked to write a LIMS in old perl. Physical ‘aliquots’ with injections of various liquids would be combined and recombined by bio-machines. This led to a dawning realization that these aliquot objects could be modelled in an object style with parent / child relationships. After a few weeks, I proudly delivered my lowly attempt at ‘Mu’ for this (and only this) problem. Kudos to the P6 team – after a couple of weeks in here, it’s just sensational the level of OO power that the real Mu delivers:

Screenshot 2017-09-20 19.32.21

Now, hard at work, on the perl6 version of Physics::Unit, I am wondering how to put the OO theory into productive practice. One of my aims was to find a medium sized (not core tools) problem that (i) I know something about and (ii) would be a good-sized problem to wrangle.

So I am enjoying the chance to design some classes and some interfaces that will make this all hang together. But – as an explorer, it has become clear that I only have three options. The problem runs like this:

Initially I had some success with object types ::T – but these only let you read the type and  duplicate if needed for a new left hand side container. Then I tried the built in (shallow) clone method. But…

Ultimately, thanks to rosettacode.org, I worked out that $x.perl.EVAL with some ~~ s/// substitions on the way would do the trick!

Phew. Please comment below if you have a better way to share – or would like to point out the risks of this technique…


Weekly changes in and around Perl 6: 2017.38 Color Me Booked

Published by liztormato on 2017-09-18T21:40:33

There’s been a lot of book activity in the Perl 6 world lately. Andrew Shitov announced his new book Perl 6 Deep Dive (preliminary table of contents). And Moritz Lenz also continued working on his “Parsing with Perl 6 Regexes and Grammars” book. To top it off this week, Zoffix Znet announced the Rakudo Book Project – a plan to write some Rakudo books (/r/perl and /r/perl6 comments).
Butterflies Galore!
Please check out his plans and support him in any way you can!

AlexDaniel++ for his second release

Aleks-Daniel Jakimenko-Aleksejev has done his second Rakudo compiler release! The announcement for Rakudo Perl 6 2017.09 shows quite a number of fixes and improvements again this month. Please note there are currently no plans for creating a Rakudo Star release for this compiler release.

London Perl Workshop – 25 November

Saturday 25 November will see another London Perl Workshop 2017. And yours truly would love to see a lot of Rakudo Perl 6 presentations there: so please submit your presentation. Hope to see you there!

New ThreadPoolScheduler implementation

Jonathan Worthington started work on a new thread pool scheduler (which got merged after the 2017.09 release because of possible ecosystem fallout). This implementation has separate general and timer queues with separate workers, and also introduces affinity queues, which are intended for cases where events will be fed into a Supply, and thus there’s no point having lots of threads competing over them only to immediately stumble over each other. The separate timer queue helps when timer events are being delayed, for example if a process is producing a load of output.

This implementation also adds a supervisor, which is where the smarts on how many threads to have in the pool will be put. For now, it is already smart enough to start a lot less threads than the previous scheduler when they obviously aren’t needed. This helps with memory consumption. And it can add more threads on demand when needed to break deadlocks. The default maximum number of threads has been raised to 64, now that the scheduler does not start up the maximum number of threads even when they don’t have any work to do.

For debugging, the RAKUDO_SCHEDULER_DEBUG and RAKUDO_SCHEDULER_DEBUG_STATUS environment variables can be set.

This work has been kindly sponsored by the Vienna Perl Mongers.

Other Core Developments

These features made it to the 2017.09 compiler release.

Other blog posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on perl6-users

Ecosystem Additions

Quite a nice catch this week!

Winding down

Between the rain and the wind, quite a lot happened in the Rakudo Perl 6 world yet again. Sometimes we forget how many ways we found how not to make Perl 6. With that in mind, see you next week for more Rakudo Perl 6 goodies!


6guts: MoarVM Specializer Improvements Part 2: Planning

Published by jnthnwrthngtn on 2017-09-17T22:39:51

It’s been a good while since I posted part 1 of this little series. In the meantime, I paid a visit to the Swiss Perl Workshop, situated in a lovely hotel in the beautiful village of Villars. Of course, I was working on my slides until the last minute – but at least I could do it with this great view!

DSC08286-small

I gave two talks at the workshop, one of them about the MoarVM specializer (slidesvideo). Spoiler alert: the talk covers much of what I will write about in this blog series. :-) Now that I’ve recovered from the talk writing and the travel, it’s time to get back to writing here. Before I dig in to it, I’d like to say thanks to The Perl Foundation and their sponsors, who have made this work possible.

Where we left off

In the last part I discussed how we collected statistics about running code, in order to understand what to optimize and how to optimize it. The running example, which I’ll continue with, was this

sub shorten($text, $limit) is export {
    $text.chars > $limit
        ?? $text.substr(0, $limit) ~ '...'
        !! $text
}

And an example of the kind of information collected is:

Latest statistics for 'infix:<~>' (cuid: 4245, file: SETTING::src/core/Str.pm:2797)

Total hits: 367

Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
Positional flags: obj, obj

    Callsite hits: 367

    Maximum stack depth: 32

    Type tuple 0
        Type 0: Str (Conc)
        Type 1: Str (Conc)
        Hits: 367
        Maximum stack depth: 32

Which, to recap, tells us that the ~ infix operator was called 367 times so far, and was always passed two Str instances, neither of which was held in a Scalar container.

The job of the planner

The planner is the component that takes this statistical data and decides what code to optimize and what cases to optimize it for. It turns out to be, at least at the time of writing, one of the smaller parts of the whole specialization subsystem; subtract memory management code and debug bits and it’s not much over 100 lines.

Its inputs are a list of static frames whose statistics were updated by the latest data. There’s no point considering anything whose statistics did not change, since there will be nothing new since the last time the planner saw them and so the conclusion would be the same. In Perl 6 terms, a static frame may represent a routine (SubMethod), Block, or Code thunk.

Initial filtering

The first thing the planner looks at is whether the static frame got enough hits to be worth even considering investing time on. So if it sees something like this:

Latest statistics for 'command_eval' (cuid: 8, file: src/Perl6/Compiler.nqp:31)

Total hits: 1

No interned callsite
    Callsite hits: 1

    Maximum stack depth: 5

It can simply ignore it based on the total hits alone. Something that gets a single hit isn’t worth spending optimization effort on.

There are cases where the total hits are very low – perhaps just a single hit – but optimization is still interesting, however. Here is the loop that I used to make shorten hot so we could look at its statistics and plan:

for 'x' x 50 xx 100000 {
    shorten($_, 42)
}

The statistics for the mainline of the program, containing the loop, start out like this:

Latest statistics for '<unit>' (cuid: 4, file: xxx.p6:1)

    Total hits: 1
    OSR hits: 273

    Callsite 0x7f1cd6241020 (0 args, 0 pos)

        Callsite hits: 1

        OSR hits: 273

        Maximum stack depth: 10

The total hits may well be 1, but obviously the loop inside of this code is hot and would be worth optimizing. MoarVM knows how to perform On Stack Replacement, where code running in a frame already on the callstack can be replaced with an optimized version. To figure out whether this is worth doing, it records the hits at “OSR points” – that is, each time we cross the location in the loop where we might replace the code with an optimized version. Here, the loop has racked up 273 OSR hits, which – despite only one hit in terms of entering the code at the top – makes it an interesting candidate for optimization.

Hot callsites

In MoarVM, a callsite (of type MVMCallsite) represents the “shape” of a set of arguments being passed. This includes:

When bytecode is produced, callsites are interned – that is to say, if we have two calls that are both passing two positional object arguments, then they will share the callsite data. This makes them unique within a compilation unit. When bytecode is loaded, then any callsites without flattening are further interned by the VM, meaning they are globally shared across all VM instances. This means that they can then be used as a kind of “key” for things like multi-dispatch caching. The specializer’s statistics are also aggregated by callsite.

The planner, therefore, takes a look at what callsites showed up. Often, code will be called with a consistent callsite, but things with optional parameters may not be. For example, if we were to run:

sub foo(:$bar) { }
for ^100000 { rand < 0.9 ?? foo(bar => 42) !! foo() }

The the statistics of foo might end up looking like this:

Latest statistics for 'foo' (cuid: 1, file: -e:1)

Total hits: 367

Callsite 0x506d920 (2 args, 0 pos)
  - bar

    Callsite hits: 331

    Maximum stack depth: 13

    Type tuple 0
        Type 0: Int (Conc)
        Hits: 331
        Maximum stack depth: 13

Callsite 0x7f3fd7461620 (0 args, 0 pos)

    Callsite hits: 36

    Maximum stack depth: 13

    Type tuple 0
        Hits: 36
        Maximum stack depth: 13

Unsurprisingly, the case where we pass bar is significantly more common than the case where we don’t, so the planner would in this case favor spending time making a specialization for the case where bar was passed.

As an aside, specialization by optional argument tends to be a good investment. A common pattern is:

sub foo($bar, Bool :$some-option) {
    if $some-option {
        extra-stuff();
    }
}

In the case that $some-option is not passed, the specializer:

That stripping out of code might in turn bring the optimized version below the inlining limit, whereas before it might have been too big, which could bring about an additional win.

Hot type tuples

Once one or more hot callsites have been identified, the types that were passed will be considered. In the best case, that turns out to be very boring. For example, here:

Latest statistics for 'chars' (cuid: 4219, file: SETTING::src/core/Str.pm:2728)

Total hits: 273

Callsite 0x7f1cd6231ae0 (1 args, 1 pos)
Positional flags: obj

    Callsite hits: 273

    Maximum stack depth: 14

    Type tuple 0
        Type 0: Scalar (Conc) of Str (Conc)
        Hits: 273
        Maximum stack depth: 14

The code is consistently called with a Scalar container holding a Str instance. Therefore, we can see this code is monomorphic in usage. It is quite clearly going to be worth spending time producing a specialization for this case.

In other cases, the code might be somewhat polymorphic:

Latest statistics for 'sink' (cuid: 4795, file: SETTING::src/core/List.pm:694)

Total hits: 856

Callsite 0x7f1cd6231ae0 (1 args, 1 pos)
Positional flags: obj

    Callsite hits: 856

    Maximum stack depth: 31

    Type tuple 0
        Type 0: Slip (Conc)
        Hits: 848
        Maximum stack depth: 31

    Type tuple 1
        Type 0: List (Conc)
        Hits: 7
        Maximum stack depth: 26

Here we can see the sink method is being called on both Slip and List. But it turns out that the Slip case is far more common. Thus, the planner would conclude that the Slip case is worth a specialization, and the List case is – at least for now – not worth the bother.

Sometimes, a situation like this comes up where the type distribution is more balanced. Any situation where a given type tuple accounts for 25% or more of the hits is considered to be polymorphic. In this case, we produce the various specializations that are needed.

Yet another situation is where there are loads of different type tuples, and none of them are particularly common. One place this shows up is in multiple dispatch candidate sorting, which is seeing all kinds of types. This case is megamorphic. It’s not worth investing the time on specializing the code for any particular type, because the types are all over the place. It is still worth specialization by callsite shape, however, and there may be other useful optimizations that can be performed. It can also be compiled into machine code to get rid of the interpreter overhead.

Sorting

By this point, the planner will have got a list of specializations to produce. Specializations come in two forms: observed type specializations, which are based on matching a type tuple, and certain specializations, which are keyed only on an interned callsite (or the absence of one). Certain specializations are produced for megamorphic code. The list of specializations are then sorted. But why?

One of the most valuable optimizations the specializer performs is inlining. When one specialized static frame calls another, and the callee is small, then it can often be incorporated into the caller directly. This avoids the need to create and tear down an invocation record. Being able to do this relies on a matching specialization being available – that is, given a set of types that will be used in the call, there should be a specialization for those types.

The goal of the sorting is to try and ensure we produce specializations of callees ahead of specializations of their callers. You might have wondered why every type tuple and callsite was marked up with the maximum call depth. Its role is for use in sorting: specializing things with the deepest maximum call stack depth first means we will typically end up producing specialized code for potential inlinees ahead of the potential inliners. (It’s not perfect, but it’s cheap to compute and works out pretty well. Perhaps more precise would be to form a graph and do a topological sort.)

Dumping

The plan is then dumped into the specialization log, if that is enabled. Here are some examples from our shorten program:

==========

Observed type specialization of 'infix:<~>' (cuid: 4245, file: SETTING::src/core/Str.pm:2797)

The specialization is for the callsite:
Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
Positional flags: obj, obj

It was planned for the type tuple:
    Type 0: Str (Conc)
    Type 1: Str (Conc)
Which received 367 hits (100% of the 367 callsite hits).

The maximum stack depth is 32.

==========

Observed type specialization of 'substr' (cuid: 4316, file: SETTING::src/core/Str.pm:3061)

The specialization is for the callsite:
Callsite 0x7f1cd6231a60 (3 args, 3 pos)
Positional flags: obj, obj, obj

It was planned for the type tuple:
    Type 0: Str (Conc)
    Type 1: Scalar (Conc) of Int (Conc)
    Type 2: Scalar (Conc) of Int (Conc)
Which received 274 hits (100% of the 274 callsite hits).

The maximum stack depth is 15.

==========

Observed type specialization of 'chars' (cuid: 4219, file: SETTING::src/core/Str.pm:2728)

The specialization is for the callsite:
Callsite 0x7f1cd6231ae0 (1 args, 1 pos)
Positional flags: obj

It was planned for the type tuple:
    Type 0: Scalar (Conc) of Str (Conc)
Which received 273 hits (100% of the 273 callsite hits).

The maximum stack depth is 14.

==========

Observed type specialization of 'substr' (cuid: 2562, file: SETTING::src/core/Cool.pm:74)

The specialization is for the callsite:
Callsite 0x7f1cd6231a60 (3 args, 3 pos)
Positional flags: obj, obj, obj

It was planned for the type tuple:
    Type 0: Scalar (Conc) of Str (Conc)
    Type 1: Int (Conc)
    Type 2: Scalar (Conc) of Int (Conc)
Which received 273 hits (100% of the 273 callsite hits).

The maximum stack depth is 14.

==========

Observed type specialization of 'infix:«>»' (cuid: 3165, file: SETTING::src/core/Int.pm:365)

The specialization is for the callsite:
Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
Positional flags: obj, obj

It was planned for the type tuple:
    Type 0: Int (Conc)
    Type 1: Scalar (Conc) of Int (Conc)
Which received 273 hits (100% of the 273 callsite hits).

The maximum stack depth is 14.

==========

Observed type specialization of 'shorten' (cuid: 1, file: xxx.p6:1)

The specialization is for the callsite:
Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
Positional flags: obj, obj

It was planned for the type tuple:
    Type 0: Str (Conc)
    Type 1: Int (Conc)
Which received 273 hits (100% of the 273 callsite hits).

The maximum stack depth is 13.

==========

Compared to before

So how does this compare to the situation before my recent work in improving specialization? Previously, there was no planner at all, nor the concept of certain specializations. Once a particular static frame hit a (bytecode size based) threshold, the type tuple of the current call was taken and a specialization produced for it. There was no attempt to discern monomorphic, polymorphic, and megamorphic code. The first four distinct type tuples that were seen got specializations. The rest got nothing: no optimization, and no compilation into machine code.

The sorting issue didn’t come up, however, because specializations were always produced at the point of a return from a frame. This meant that we naturally would produce them in deepest-first order. Or at least, that’s the theory. In practice, if we have a callee in a conditional that’s true 90% of the time, before there was a 10% chance we’d miss the opportunity. That’s a lot less likely to happen with the new design.

Another problem the central planning avoids is a bunch of threads set off executing the same code all trying to produce and install specializations. This was handled by letting threads race to install a specialization. However, it could lead to a lot of throwaway work. Now the planner can produce them once; when it analyzes the logs of other threads, it can see that the specialization was already produced, and not plan anything at all.

Future plans

One new opportunity that I would like to exploit in the future is for derived type specializations. Consider assigning into an array or hash. Of course, the types of the assigned values will, in a large application, be hugely variable. Today we’d consider ASSIGN-POS megamorphic and so not try and do any of the type-based optimizations. But if we looked more closely, we’d likely see the type tuples are things like (Array, Int, Str)(Array, Int, Piranha)(Array, Int, Catfish), and so forth. Effectively, it’s monomorphic in its first two arguments, and megamorphic in its third argument. Therefore, a specialization assuming the first two arguments are of type Array and Int, but without specialization of the third argument, would be a win.

Next time…

We now have a specialization plan. Let’s go forth and optimize stuff!


Zoffix Znet: The Rakudo Book Project

Published on 2017-09-17T00:00:00

A plan to write some Rakudo books

gfldex: The Siege Can Continue

Published by gfldex on 2017-09-16T20:14:31

A wise internet spaceship pirate once wrote: “Whining gets you stuff. That’s why humans are at the top of the food chain.”. My Whining got me a fix that put an end to segfaults on long running scripts that react to http requests.

So I continued to add missing bits to my golfed httpd and found another good use for term:<>.

constant term:<HTTP-HEADER-404> = "HTTP/1.1 404 Not Found", "Content-Type: text/plain; charset=UTF-8", "Content-Encoding: UTF-8", "";

Without term:<> the compiler would think I wanted to substract 400 from a http header.

If you got some time to spare please write a whiny blog post about a crash that is bugging you – works like a charm.


gfldex: Goto the Last Fifo

Published by gfldex on 2017-09-13T08:45:21

I always admired the elegance of the sysfs. You take a text and write it to a file to talk to a function running in kernel space. As soon as you know how to work with files, you can change the systems behaviour and reuse access control mechanism without any special tools. It’s very easy to script and dump (parts) of the systems state.

Linux and friends come with fifos that serve the same purpose. Create a fifo, set access rights and start reading from that pseudo-file. Very easy to do in Perl 5.

my $fifo; open($fifo, "+);
while (<$fifo>) {
    do-things-with $_;
} 

Rakudo doesn’t really know about fifos yet, as a result it doesn’t block on a read of a fifo that don’t got data anymore. After a bit of fiddeling I found a way around that problem.

      1 use v6.c;
      2
      3 # `mkfifo radio-fifo-in`
      4 # `echo "foo" > radio-fifo-in`
      5 # `echo "foo^D" > radio-fifo-in`
      6
      7 my $fifo-in = open(„radio-fifo-in“, :r);
      8
      9 LABEL: loop {
     10     react {
     11         whenever supply { .emit for $fifo-in.lines } {
     12             say .Str;
     13             last LABEL if /‚^D‘/;
     14         }
     15     }
     16 }

I learned that whenever reacts to last and will teach the docs about it later today. Luckily Perl 6 got labels so we can tell last where to goto.

UPDATE: scovit found a short expression that gets very close to the behaviour of Perl 5.

my $fifo = open("radio-fifo-in", :r);
while defined $_ = $fifo.get { .say }

Weekly changes in and around Perl 6: 2017.37 Collating Sorted

Published by liztormato on 2017-09-11T21:17:48

Samantha McVey completed the Unicode Collation Algorithm support in Rakudo Perl 6 on the MoarVM backend. She describes the functionality in the commit that did the merge back into the main branch of MoarVM. She also gave a less complete review on the #perl6 channel. Also check out the temporary documentation! A code example may tell more than a thousand words:

say <9 ① 1 ② 8 one hi>.sort;    # (1 8 9 hi one ① ②)
use experimental :collation;     # activate collation
say <9 ① 1 ② 8 one hi>.collate; # (1 ① ② 8 9 hi one)

TPF Grant Proposals (Sep 2017)

Will Coleda tells us we have only a few days left for this round of TPF Grant Proposals. So if you would like to do some development work on Rakudo Perl 6, write a proposal, do your thing and profit! Well, be sure to write a good proposal that will get accepted, of course 🙂

Handling of unclosed files has changed

The ecosystem started showing issues (RT #132030) after non-TTY handles had buffering switched on by default last week. Solution #1 (as suggested by Jonathan Worthington) was subsequently implemented by Elizabeth Mattijsen. What does this mean?

Please note that this behaviour only applies to the MoarVM backend. The old semantics of only automatically closing files when they are garbage collected, still applies to the JVM backend. But since there is no output buffering on the JVM backend, you only run the risk of running out of open handles.

Blog Posts

Other Core Developments

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on perl6-users

Ecosystem Additions

Winding down

An exciting week once again! Next weekend will see the 2017.09 release of Rakudo Perl 6. After the release, a number of optimizer and JIT improvements will find their way into the main branch again. So check back next week for more exciting Rakudo Perl 6 news!


gfldex: We Need to Siege Moar

Published by gfldex on 2017-09-10T20:32:25

Alas, my attempt to avoid bugs by moving with little load was not as fruitful as I hoped.

Siegeing my tiny httpd lasted for about an hour and then the gates gave way. With dmesg the culprit was swiftly revealed.

[4361268.087988] moar[12491]: segfault at 3c8b ip 00007f86b426868b sp 00007f86aaff86b0 error 4 in libmoar.so[7f86b407b000+55d000]

A had the suspicion that close to 70000 tests will only get us the failure modes that are cought within minutes. Looks like there is quite some squashing left to do for the moar team.


gfldex: Golfing httpd

Published by gfldex on 2017-09-10T19:12:17

I’m building a internet radio player and wanted both a curses and web interface to switch stations. Looking at the various modules in the ecosystem provided lots of options that do lots of things. I had a bug hunt in those field a while ago and didn’t like it. The amount of httpd I need is fairly small and I thought to myself: „Somebody should golf that!“. And so I did.

The objective is to have as little http as I can get away with. I want to display the names of available stations and receive a station-id to switch stations. And maybe a stop, play and record button. That can be done with with lines of text. So text/plain it is.

I can stuff channel-ids and button names into URLs. What means implementing only GET will do. Caching or other fancy stuff wont be needed, what makes the HTTP-header static. The most complex thing to do would be taking the URL apart.

That’s what I came up with:

my sub term:<now>() { DateTime.now.Instant but role :: { method Str { self.DateTime.hh-mm-ss } } };
my &BOLD = $*OUT.t ?? sub (*@s) { "\e[1m{@s.join('')}\e[0m" } !! sub (|c) { c };

constant HTTP-HEADER = "HTTP/1.1 200 OK", "Content-Type: text/plain; charset=UTF-8", "Content-Encoding: UTF-8", "";
constant term:<HTTP-HEADER-404> = "HTTP/1.1 404 Not Found", "Content-Type: text/plain; charset=UTF-8", "Content-Encoding: UTF-8", "";

react {
    whenever IO::Socket::Async.listen('0.0.0.0', 8080) -> $conn {
        note „{now} incomming connection from {$conn.peer-host}:{$conn.peer-port}“;
        my @msg = HTTP-HEADER;
        whenever $conn.Supply.lines {
            if /^GET  (<[\w„/“]>+) [„HTTP“ \d „/“ \d]? / {
                note „{now} GET $0“;
                given $0.Str {
                    @msg.push: „running since {BEGIN now} UTC“ when „/status“;

                    @msg.push: „Hello World!“ when „/“;

                    done when „/exit“
                    default {
                        @msg = HTTP-HEADER-404;
                        @msg.push: „Resource {.Str} not found.“;
                    }
                }
            }

            if /^$/ { 
                for {
                    once note .Str;
                    $conn.print(.Str ~ "\n")
                }
                $conn.close;
            }
        }
        CLOSE {
            note „{now} connection closed“;
        }
        CATCH { default { warn BOLD .^name, ': ', .Str; warn BOLD "handled in $?LINE"; } }
    }
}

The whole thing waits for a connection. Then takes the URL apart and fills an Array with lines of text. That Array is then send back to the client. A few lines of logging and error handling.

The less code there is the less ground a bug hunter has to cover. Luckily Perl 6 is very friendly to golfers.

UPDATE: Add proper handling of the http header terminator, what leads to less error handling. Also, add some more log output.


Weekly changes in and around Perl 6: 2017.36 Blogs-a-reading

Published by liztormato on 2017-09-04T20:58:50

In his compelling blog post titled “On Troll Hugging, Hole Digging, and Improving Open Source Communities (How to be Better)”, Zoffix Znet describes how to make Open Source communities even better, building on Audrey Tang’s “hug a troll” concept.

…here’s a well-shaped hole. Everyone’s more connected. It’s easy to get in and start digging. And even those who dug the deepest can still go and help out those who are about to start.

Example of a well shaped hole

The hole digging metaphor isn’t just about the shape of the hole. It’s also about people’s position within it.

He also introduces the seven Hugs for a better community:

  1. Gift a Shovel
  2. Feed The Hand That Bites You
  3. We All Leave Footprints
  4. Speak Up
  5. Simply a Hug
  6. Love Others
  7. Go For The Third Option

Zoffix shows in his examples that we can all learn to become better. And that he finds himself to be a lot happier because of this learning process:

…I’ve been applying the ideas I discussed in this article for about a week. I think they have something real behind them, as I feel a lot happier now than a week ago and I see some positive changes around me that I think I could attribute to these ideas

A well recommended read!

P6::Journey

Thanks to the pingback functionality, yours truly was pointed to a new blog about Rakudo Perl 6: P6::Journey by p6steve. His blog posts so far:

All yours truly can add is: Welcome to the Rakudo Perl 6 Blogoverse!

Unicode Grant update #4

Samantha McVey continued working on the Unicode support in Rakudo Perl 6 and reports on her progress in a blog post. It describes how using the Knuth-Morris-Pratt String Search Algorithm improved string searching, how looking for a single grapheme can be 9x faster in some situations, that the database is updated to Unicode 10, and that several bugs were fixed and documentation improved. Among many other things. A good read if you want to keep up-to-date in that area of Rakudo Perl 6!

Swiss Perl Workshop in review

Steve Mynott created an excellent diary-like blog-post about the Swiss Perl Workshop, with lots of pictures. Looking forward to many more of these in the future!

In related news, the videos of the Swiss Perl Workshop have been made available by Lee Johnson. The Rakudo Perl 6 related presentations for which videos were made, are:

Other blog posts

Other videos

Some more recent than others, but these somehow slipped through the cracks the past weeks.

Other core developments

This week yours truly was reminded that a lot can happen in two weeks. Let me try to recap this in a way that is not overly long without letting important things fall through the cracks:

Meanwhile on FaceBook

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on perl6-users

Ecosystem Additions

Winding Down

Hope I didn’t miss too much this week. Please let me know if I did. Hope to see you all again next week for more Rakudo Perl 6 news!


p6steve: perl6 Atomic Fission

Published by p6steve on 2017-09-03T18:46:48

I have been listening to the reaction on the web to the incorporation of an emoji as a unicode symbol in perl6 rakudo. Here’s a flavour…

(https://p6weekly.wordpress.com/2017/08/21/2017-34-going-atomic/ )

The rationale for the use of unicode symbols is as follows:

BTW- ASCII versions are known as Texas versions since they are always bigger

Certainly this has caused some consternation – ranging from how can I type ⚛️ on my keyboard (hit CTRL-CMD-SPACE if you are on macOS ) to this will never be accepted for the coding standards of my company.

On reflection, while it is understandable that programmers have a well established comfort zone of ASCII text and using English for keywords, I think that perl6 is leading the way on an irresistible path. Of the 6.5bn people on the planet, only a small fraction prefer to work in English – or even in Latin alphabets. Now, the pioneering work to embed unicode in a programming language will open the doors to all kinds of invention. What about:

And this, in combination with perl6 Grammars, opens some interesting conceptual doors.

~p6steve


gfldex: Hunting Documented Bugs

Published by gfldex on 2017-09-02T06:37:57

AlexDaniel came up with and Zoffix made some propaganda for a reoccuring bug hunt. This weeks pray are doc issues. In the following are detailed instructions on how to get dressed up.

zef install META6::bin
meta6 --fork-module=p6doc
cd doc
zef --deps-only install .
# you may have to add $HOME/.perl6/bin to $PATH
# edit away
git push
meta6 --pull-request --title='add t/meta.t'
# goto edit away

If you never used META6::bin you may have to do some setup.


samcv: Grant Status Update 4

Published by Samantha McVey on 2017-08-31T07:00:00

This is my fourth grant update. A productive and noteworthy month. I gave two presentations at YAPC-EU in the Netherlands earlier this month, on High End Unicode and MoarVM Internals (links to my slides). It was my first Perl conference and I had a great time meeting people in the Perl 6 and Perl community!

Despite the conference, I made some big changes this month: big speedups for indexing operations, as well as an update to the Unicode 10 database. Before I could regenerate the database I had to fix ucd2c.pl, the script that generates the database, to be more deterministic and have unique property values per property (without this regenerating it with the Unicode 10 data would result in partially broken Unicode properties). I also implemented the Knuth-Morris-Pratt string search algorithm for our index function.

I have added documentation to MoarVM which is an overview of how our strings are represented as well as details about normalization, synthetics and other topics. On Hacker News someone noted that this was not documented anywhere, so I made sure to add documentation for this. If you are interested in some of the internals of MoarVM, I’m hoping the document should be pretty accessible even if you are not a MoarVM developer.

Table of Contents

Index

Knuth-Morris-Pratt String Search Algorithm

Previously we did not have any natively implemented efficient string search algorithms, we only had memmem which was optimized, but would only work if both strings were flat and both the same bitwidth per grapheme.

Now all index operations with a 4096 or under length needle are optimized and no longer use brute force methods for searching (this does not include case insensitive or ignoremark indexing operations).

When my KMP implementation in MVM_string_index is used:
  • Strings with non-matching types

    • That have more than one codepoint in the needle

    • That don’t have a needle more than 4096 graphemes long

  • Speedup can be small, or large depending on the pattern of the needle

    • Repeating letters can cause multiple times speed increase depending on the haystack

We still use memmem when both strings are both flat strings and both the same data type (which use Knuth-Morris-Pratt or Booyer-Moore depending on platform). Most of the strings we will work with — especially once the strings have been modified — are strands.

Grapheme Caching

Since the Knuth-Morris-Pratt algorithm often will request the same grapheme again, but will never request an earlier point in the haystack, I was able to optimize the KMP string search function I added to cache the graphemes so we can use a grapheme iterator instead of using MVM_string_get_grapheme_at_nocheck which for strands, will have to find its place in the haystack from the beginning each time. What this grapheme caching does, is it caches the last returned grapheme, so if the same grapheme is requested again it continues to return that without requesting a new grapheme from the grapheme iterator. If it requests the next grapheme, or some number of graphemes after the current position, it will either grab the next grapheme or move the grapheme iterator forward (skip a certain number of graphemes) and then get a grapheme from the grapheme iterator.

Here are some timings I got before and after the grapheme iterator for an English language book text file, searching for English words misspelled by one letter (so it would search the entire file but be representative of searching for actual words).

Description

caching

master

index with needle (strand haystack)

0.832730

2.67385181

index with word needle (flat haystack)

0.8357464

1.01405131

As you can see from the table, we actually even got savings when the haystack was flat. This surprised me, since getting a grapheme with a flat haystack points to a function with one switch and then returning the integer of the blob array at the specified position. I am guessing this is likely caused by the caching function generating more efficient machine code, since the savings can’t be only explained by the savings from caching the grapheme — the speed up was seen even when I manipulated the needle so that there were no cache “hits” and it always requested different graphemes.

ℹ️
The grapheme caching has not yet been merged, but is ready to go after fixing some merge conflicts

Inlining MVM_string_get_grapheme_at_nocheck

After I finished writing the previous section, I was able to discover the reason for the speed difference with flat haystacks. By getting MVM_string_get_grapheme_at_nocheck to inline I was able to speed up index operations for a flat haystack by 2x. This is on top of the speedups of about we got from the KMP algorithm! This should affect any code which uses this function, making the function 2x as fast for flat strings, and likely a slight speedup for strands as well. This has huge implications as this function is used extensively throughout our string ops. This may change what I do with the grapheme caching code. It is likely I will change it so that it uses the grapheme caching for strands, and uses MVM_string_get_grapheme_at_nocheck for flat haystacks.

Single Grapheme Needle Index

I sped up string index by 9x when the haystack is a strand and the needle is 1 grapheme long. For this we use a grapheme iterator when looking for a single grapheme inside of a strand, and use a simpler faster loop since we are only looking for a single grapheme.

Unicode

Property Values

Property values are now unique for each property in the Unicode Database. Since property values are non-unique, we must store them uniquely. Previously this would cause only the property whose value was last in the C data array to work. Now that property values are unique for each property code, that should no longer cause breakage.

Unicode Database Updated to Unicode 10

The Unicode database has been updated for Unicode 10. Now that the previous breaking point — the database not always generating properly each time — was solved, I was able to make changes to the Unicode database, including updating it to version 10. The main significant changes in this release were the addition of new scripts, symbols and Emoji. You can see the full list of changes here.

Unicode Names

Hangul (Korean) characters now have names in our name database. These are generated algorithmically on database creation by decomposing the characters and concatening the 2 or 3 resulting codepoint’s Jamo names. This is needed since the Unicode data file does not include the name for these codepoints and leaves it to the implementer to create them. A roast test has been added to check for support of these Korean characters.

Fixed a bug in ignoremark/ordbaseat

Ignoremark relys on the MoarVM ordbaseat op under the hood. For graphemes made up of a single character, we decompose the character and get the resulting base charactewhich it relays on assumed once we had gotten a synthetic’s base character that we already had the final base character. For non-synthetics we would decompose This wasn’t true for example with: "\c[LATIN SMALL LETTER J WITH CARON, COMBINING DOT BELOW]" The "j with caron" would not be decomposed because it was the base character of a synthetic.

This also fixes a bug in indexim and eqatim ops which caused it to fail when encountering a synthetic.

We now have ord_getbasechar handle synthetic and non-synthetic’s and have MVM_string_ord_basechar_at just handle the needed string based checks, then grabbing the grapheme from the string and passing its value on to ord_getbasechar.

Synthetic Grapheme Rework

I reworked MVMNFGSynthetic, the C struct which represents our synthetic graphemes. This is done to (eventually) get Prepend support working for things like ignoremark regex and the ordbaseat op which gets the base codepoint of a grapheme.

Unlike all other marks which come after a base character, Prepend characters come before the base character. All of our current code assumed the first codepoint of the synthetic is the base character.

For now, we also assume the first codepoint of the synthetic is the base character, but we now have a base_index which will be able to hold the index of the base character in the codepoint array.

While this doesn’t add Prepend support to everything, this is one step toward getting that working, and decoupling base codepoints from being the first codepoint of a synthetic.

ℹ️
This is not yet merged, but ready to go

Collation

There’s not as much to say about this, since it is almost ready. As I said last month it is fully functional, and I have since done some work cleaning it up. Left to be done is integrating the data generation into the other Unicode database generation script. Although the file and code it uses to generate is different, ideally we will have only one script to run to generate and update all of our files on new Unicode releases.

Previously I created a script which downloads all of the Unicode files needed in the generation, so an update should hopefully only require a run of the download script to fetch the UCD, the UCA (Unicode Collation Algorithm), and Emoji data.

One thing I had not talked about previously was some of the ways I have sped up the UCA implementation to be quite fast and efficient, only having to evaluate the collation arrays efficiently even for very large strings. If you are interested, the details are in my slides.

NQP Documentation

In addition to the MoarVM string documentation I mentioned in the introduction, I also ensured all of the ops I have added over my project are documented in NQP.

Ensured all the NQP index, eqat variations (indexic, indexim, indexicim, eqatic, eqaticim) are documented, and added the couple that had not yet been added to the ops list.

Added a Perl 6 program which gets a list of ops which are not mentioned in the oplist which will hopefully be useful to other developers.

The Goal is in Sight

The grant is finally winding down. I have all significant things implemented although not everything has been merged yet. I also have implemented additional things that were not part of the grant (Knuth-Morris-Pratt string search algorithm).

Left to be Done

To finally complete my work and fullfill my objectives I will make any additional documentation or tests that need to be made. If other Perl 6 devs or users of the community want to make any requests for Unicode or string related documentation, you can send me an email or send me a message on freenode IRC (nick samcv).

Other than this, I only need to clean up and merge the collation arrays branch, merge the synthetic grapheme reword, and update the grapheme caching for KMP branch to use caching for strand haystacks and use the now inlined MVM_string_get_grapheme_at_nocheck for flat haystacks.

Zoffix Znet: On Troll Hugging, Hole Digging, and Improving Open Source Communities

Published on 2017-08-31T00:00:00

How to be better

stmuk: Swiss Perl Workshop 2017

Published by stmuk on 2017-08-30T17:48:17

cropped-lake_castle1.jpeg

After a perilous drive up a steep, narrow, winding road from Lake Geneva we arrived at an attractive Alpine village (Villars-sur-Ollon) to meet with fellow Perl Mongers in a small restaurant.  There followed much talk and a little clandestine drinking of exotic spirits including Swiss whisky. The following morning walking to the conference venue there was an amazing view of mountain ranges. On arrival I failed to operate the Nespresso machine which I later found was due to it simply being off.  Clearly software engineers should never try to use hardware. At least after an evening of drinking.

Wendy’s stall was piled high with swag including new Bailador (Perl 6 dancer like framework) stickers, a Shadowcat booklet about Perl 6 and the new O’Reilly “Thinking in Perl 6″. Unfortunately she had sold out of Moritz’s book “Perl 6 Fundamentals” (although there was a sample display copy present). Thankfully later that morning I discovered I had a £3 credit on Google Play Books so I bought the ebook on my phone.

The conference started early with Damian Conway’s Three Little Words.  These were “has”, “class” and “method” from Perl 6 which he liked so much that he had added them to Perl 5 with his “Dios” – “Declarative Inside-Out Syntax” module.  PPI wasn’t fast enough so he had to replace it with a 50,000 character regex PPR. Practical everyday modules mentioned included Regexp::Optimizer and Test::Expr. If the video  doesn’t appear shortly on youtube a version of his talk dating from a few weeks earlier is available at https://www.youtube.com/watch?v=ob6YHpcXmTg

Jonathan Worthington returned with his Perl 6 talk on “How does deoptimization help us go faster?” giving us insight into why Perl 6 was slow at the Virtual Machine level (specifically MoarVM). Even apparently simple and fast operations like indexing an array were slow due to powerful abstractions, late binding and many levels of Multiple Dispatch. In short the flexibility and power of such an extensible language also led to slowness due to the complexity of code paths. The AST optimizer helped with this at compile time but itself took time and it could be better to do this at a later compile time (like Just In Time).  Even with a simple program reading lines from a file it was very hard to determine statically what types were used (even with type annotations) and whether it was worth optimizing (since the file could be very short).

The solution to these dynamic problems was also dynamic but to see what was happening needed cheap logging of execution which was passed to another thread.  This logging is made visible by setting the environment variable MVM_SPESH_LOG to a filename. Better tooling for this log would be a good project for someone.

For execution planning we look for hot (frequently called) code, long blocks of bytecode (slow to run) and consider how many types are used (avoiding “megamorphic” cases with many types which needs many versions of code).  Also analysis of the code flow between different code blocks and SSA.  Mixins made the optimization particularly problematic.

MoarVM’s Spesh did statistical analysis of the code in order to rewrite it in faster, simpler ways. Guards (cheap check for things like types) were placed to catch cases where it got it wrong and if these were triggered (infrequently) it would deoptimize as well, hence the counterintuitive title since “Deoptimization enables speculation” The slides are at http://jnthn.net/papers/2017-spw-deopt.pdf with the video at https://www.youtube.com/watch?v=3umNn1KnlCY The older and more dull witted of us (including myself) might find the latter part of the video more comprehensible at 0.75 Youtube speed.

After a superb multi-course lunch (the food was probably the best I’d had at any Perl event) we returned promptly to hear Damian talk of “Everyday Perl 6”. He pointed out that it wasn’t necessary to code golf obfuscated extremes of Perl 6 and that the average Perl 5 programmer would see many things simpler in Perl 6.  Also a rewrite from 5 to 6 might see something like 25% fewer lines of code since 6 was more expressive in syntax (as well as more consistent) although performance problems remained (and solutions in progress as the previous talk had reminded us).

Next Liz talked of a “gross” (in the numerical sense of 12 x 12 rather than the American teen sense) of Perl 6 Weeklies as she took us down memory lane to 2014 (just about when MoarVM was launched and when unicode support was poor!)  with some selected highlights and memories of Perl 6 developers of the past (and hopefully future again!). Her talk was recorded at https://www.youtube.com/watch?v=418QCTXmvDU

newton

Cal then spoke of Perl 6 maths which he thought was good with its Rats and FatRats but not quite good enough and his ideas of fixing it.  On the following day he showed us he had started some TDD work on TrimRats. He also told us that Newton’s Method wasn’t very good but generated a pretty fractal. See https://www.youtube.com/watch?v=3na_Cx-anvw

Lee spoke about how to detect Perl 5 memory leaks with various CPAN modules and his examples are at https://github.com/leejo/Perl_memory_talk

The day finished with Lightning Talks and a barbecue at givengain — a main sponsor.

On the second day I noticed the robotic St Bernards dog in a tourist shop window had come to life.

dog1

Damian kicked off the talks with my favourite of his talks,  “Standing on the Shoulders of Giants”, starting with the Countess of Lovelace and her Bernoulli number program. This generated a strange sequence with many zeros. The Perl 6 version since it used rational numbers not floating point got the zeros right whereas the Perl 5 version initially suffered from floating point rounding errors (which are fixable).

Among other things he showed us how to define a new infix operator in Perl 6. He also showed us a Perl 6 sort program that looked exactly like LISP even down to the Lots of Irritating Superfluous Parentheses. I think this was quicksort (he certainly showed us a picture of Sir Tony Hoare at some point). Also a very functional (Haskell-like) equivalent  with heavy use of P6 Multiple Dispatch.  Also included was demonstration of P6 “before” as a sort of typeless/multi-type comparison infix. Damian then returned to his old favourite of Quantum Computing.

My mind and notes got a bit jumbled at this point but I particularly liked the slide that explained how factorisation could work by observing the product of possible inputs since this led to a collapse that revealed the factors.  To do this on RSA etc., of course, needs real hardware support which probably only the NSA and friends have (?). Damian’s code examples are at http://www.bit.do/Perl6SOG with  an earlier version of his talk at https://www.youtube.com/watch?v=Nq2HkAYbG5o Around this point there was a road race of classic cars going on outside up the main road into the village and there were car noises in the background that strangely were more relaxing than annoying.

File_000

After Quantum Chaos Paul Johnson brought us all back down to ground with an excellent practical talk on modernising legacy Perl 5 applications based on his war stories. Hell, of course, is “Other People’s Code”, often dating from Perl’s early days and lacking documentation and sound engineering.

Often the original developers had long since departed or, in the worse cases, were still there.  Adding tests and logging (with stack traces) were particularly useful. As was moving to git (although its steep learning curve meant mentoring was needed) and handling CPAN module versioning with pinto.  Many talks had spoken of the Perl 6 future whereas this spoke of the Perl 5 past and present and the work many of us suffer to pay the bills. It’s at https://www.youtube.com/watch?v=4G5EaUNOhR0

File_000 (1)

Jonathan then spoke of reactive distributed software.  A distributed system is an async one where “Is it working?” means “some of it is working but we don’t know which bits”.  Good OO design is “tell don’t ask” — you tell remote service to do something for you and not parse the response and do it yourself thus breaking encapsulation.  This is particularly important in building well designed distributed systems since otherwise the systems are less responsive and reliable.  Reactive (async) works better for distributed software than interactive (blocking or sync).

We saw a table that used a Perl 6 promise for one value and a supply for many values for reactive (async) code and the equivalent (one value) and a Perl 6 Seq for interactive code. A Supply could be used for pub/sub and the Observer Pattern. A Supply could either be live (like broadcast TV) or, for most Perl 6 supplies, on-demand (like Netflix). Then samples of networking (socket) based code were discussed including a web client, web server and SSH::LibSSH (async client bindings often very useful in practical applications like port forwarding)

https://github.com/jnthn/p6-ssh-libssh

Much of the socket code had a pattern of “react { whenever {” blocks with “whenever” as a sort of async loop.He then moved on from sockets to services (using a Supply pipeline) and amazed us by announcing the release of “cro”, a microservices library that even supports HTTP/2 and Websockets, at http://mi.cro.services/.  This is installable using Perl 6 by “zef install –/test cro”.

Slides at http://jnthn.net/papers/2017-spw-sockets-services.pdf and video at https://www.youtube.com/watch?v=6CsBDnTUJ3A

Next Lee showed Burp Scanner which is payware but probably the best web vulnerabilities scanner. I wondered if anyone had dare run it on ACT or the hotel’s captive portal.

Wendy did some cheerleading in her “Changing Image of Perl”.  An earlier version is at https://www.youtube.com/watch?v=Jl6iJIH7HdA

Sue’s talk was “Spiders, Gophers, Butterflies” although the latter were mostly noticeably absent. She promises me that a successor version of the talk will use them more extensively. Certainly any Perl 6 web spidering code is likely to fit better on one slide than the Go equivalent.

During the lightning talks Timo showed us a very pretty Perl 6 program using his SDL2::Raw to draw an animated square spiral with hypnotic colour cycling type patterns. Also there was a talk by the author about https://bifax.org/bif/— a distributed bug tracking system (which worked offline like git).

Later in the final evening many of us ate and chatted in another restaurant where we witnessed a dog fight being narrowly averted and learnt that Wendy didn’t like Perl 5’s bless for both technical and philosophical reasons.


Weekly changes in and around Perl 6: 2017.35 Serving Cro

Published by liztormato on 2017-08-28T23:37:01

Jonathan Worthington dropped something of a bomb during his presentation (slides) on the second day of the Swiss Perl Workshop, in which he presented

Cro, a Perl 6 framework for building reactive systems

With HTTP/1.1, HTTP/2.0, HTTPS, WebSockets and ZeroMQ support out of the box! Still in beta, but API-wise it should be fairly stable.

Which almost let me forget that Jonathan actually did another presentation, named ‎”How does deoptimization help us go faster”, and other questions you were sensible enough not to ask‎ (slides), which gives you a good impression of the problems you get when you try to develop a fast dynamic language with precompilation features. Videos of either presentation are not available yet at this time, but should be soon!

ASCII Forever!

That in short seems to be the general tone of the 233 comments (so far) about last week’s Perl 6 Weekly, spread over three reddits: r/perl, r/perl6 and r/programming. Fortunately, there were some positive comments as well. Ah well, the caravan of butterflies moves on.

Monthly Bug Squash Day

Aleks-Daniel Jakimenko-Aleksejev initiated a new monthly (online) event:
Monthly Bug Squash Day. The first time will be on Saturday 2 September, and the subject will be the perl6/doc repository, which currently has 279 unsolved issues. You can also print a poster to invite your local (hacker) friends (made by Zoffix Znet).

Relocation Message

Zoffix Znet‘s very useful overview of open tickets on Rakudo Perl 6 has been moved to fail.rakudo.party (from rakudo.fail) (Zoffix’ Tweet).

Pictures, Pictures!

Blog Posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on perl6-users

Ecosystem Additions

Winding Down

Didn’t have time to really see what other core developments we had the past week. Since many of the core developers were on the road or visiting conferences or otherwise in a vacation mood (with a few notable exceptions, though), it feels safe to keep whatever the developments were until a 2-week overview next week. See you then for more Rakudo Perl 6 news!


Weekly changes in and around Perl 6: 2017.34 Going ⚛

Published by liztormato on 2017-08-21T21:05:28

This week Rakudo Perl 6 went Atomic! Well, in the sense of “forming a single irreducible unit or component in a larger system”.

Locking is one of the evils in multi-threaded programming: Rakudo Perl 6 now has several lock-free primitives for updating native integer variables from several threads simultaneously. They’re all described on a brand new documentation page. In short, the new operators are:

So when would an “ordinary” module developer need to use these, even when they’re not writing threaded programs? Well, your module might be used in a threaded program. And any situation where a variable is incremented to produce something unique, would need this, or run the risk of two or more threads running away with the same “unique” (not!) value. Observe:

my int $a;
await do for ^10 { start { $a++ for ^1000 } }
say $a    # something less than 10000, like 9628, so 372
          # increments lost because of simultaneous updates

versus:

my atomicint $a;
await do for ^10 { start { $a⚛++ for ^1000 } }
say $a    # always 10000, because no updates are overwritten

Apart from these, a working version of cas (Atomic Compare and Swap) was also implemented. All thanks to the work of Jonathan Worthington, which was made possible by the kind sponsorship of Nick Logan.

AlexDaniel++ for his first release

Aleks-Daniel Jakimenko-Aleksejev has done his first Rakudo compiler release! This signals the end of an era in which Zoffix Znet did 14 consecutive Rakudo compiler releases. For which I can only give a big Thank You!

If you look at announcement for Rakudo Perl 6 2017.08, you will see quite a number of fixes and improvements this month. Let’s hope AlexDaniel will be able to do many more of these with an even larger number of new features and improvements!

Other core developments

Swiss Perl Workshop

The schedule of the Swiss Perl Workshop has been published. It contains the following Perl 6 related presentations (in chronological order):

For what it’s worth: you can still register!

TPCiA Followup

Unfortunately, the official videos of TPCiA have not arrived yet. But we do have some pictures that have been shared:

Wendy also gave an interview. And informed me that 80 copies of Perl 6 books (Perl Fundamentals, Think Perl 6 and Perl 6 At A Glance) and 52 copies spanning 13 different Perl 5 related books were sold during TPCiA.

Blog Posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on perl6-users

Ecosystem Additions

Winding down

Yours truly spent most of the past week recovering from TPCiA. And now needs to focus on slides for the Swiss Perl Workshop. Wish me strength. See you next week for more Perl 6 news!


p6steve: perl6 Module How To

Published by p6steve on 2017-08-18T16:29:28

Some investigation has discovered great resources on how to write and then list a perl6 module….


p6steve: Physics::Unit in perl6

Published by p6steve on 2017-08-18T16:23:07

First and foremost, homage to the original authors of Physics::Unit and related perl5 CPAN modules. I would be honoured to hear from you and to collaborate in any way.

What’s the big picture? TOP down, I have in mind:

So, that said, given my poor state of knowledge of most of these things, my thinking is to start build BOTTOM up and see the shape of things that emerges, while learning some perl6 on the way.

So, first I am going to need some MEASUREMENTS which are VALUES expressed in UNITS with associated ERROR.

I took a gander at this CPAN module Physics::Udunits2 which is a perl5 interface to udunits2 and felt that the richness of it’s units and adherence to NIST guidelines were not of sufficient benefit to overcome my sense of incoherence.

So, to cut a long story short, decided to take inspiration from Physics::Unit .

Next, I needed some guidance on How to Build a perl6 module…


p6steve: perl6, really?

Published by p6steve on 2017-08-18T15:19:54

I have been waiting for perl6 for over 15 years since it was first conceived. Recently, I have had an urge to get back to hands’ on coding and, having seen the latest Rakudo* release of perl6 felt that it is now sufficiently mature for my nefarious purposes.

No doubt I am not the only person to have been frustrated by the slow progress of perl6, and certainly many have dropped by the wayside. Perhaps going to the siren call of Python(2 or 3), Ruby, Swift or Go. And now it is finally here, the community is obviously worried that no one will adopt the fruits of their work.

Here ‘zoffix’ makes a desperate plea to change the name from ‘perl6’ to ‘rakudo’ to reboot the brand…. https://www.reddit.com/r/perl6/comments/6lstq3/the_hot_new_language_named_rakudo/

My rebuttal to this concept is reproduced here.

I’m a slow thinking guy who has had two careers:- perl5 dev and marketing manager. I have been wrestling with Zoffix’ proposed change and it is certainly a well construed argument made with feeling. Here’s my line:

So, I detect some natural frustration within and without the community. Keep the faith. We have a new audience, they value truth and beauty, and a story of battling the odds. They need this TECHNOLOGY. It is [perl6 | rakudo] – who cares. It may have to start in academia, it may have to start in the p5 stalwarts. It will ignite. Finish the journey. Do not deny your heritage.

So, yes, I think that perl6 is awesome (whatever it’s called). And I believe that it will be an interesting personal journey to come to grips with the ultimate programming power tool and to deliver something interesting on the way. As a user and sometime low level contributor perhaps.

~p6steve


Perlgeek.de: My Ten Years of Perl 6

Published by Moritz Lenz on 2017-08-08T22:00:01

Time for some old man's reminiscence. Or so it feels when I realize that I've spent more than 10 years involved with the Perl 6 community.

How I Joined the Perl 6 Community

It was February 2007.

I was bored. I had lots of free time (crazy to imagine that now...), and I spent some of that answering (Perl 5) questions on perlmonks. There was a category of questions where I routinely had no good answers, and those were related to threads. So I decided to play with threads, and got frustrated pretty quickly.

And then I remember that a friend in school had told me (about four years earlier) that there was this Perl 6 project that wanted to do concurrency really well, and even automatically parallelize some stuff. And this was some time ago, maybe they had gotten anywhere?

So I searched the Internet, and found out about Pugs, a Perl 6 compiler written in Haskell. And I wanted to learn more, but some of the links to the presentations were dead. I joined the #perl6 IRC channel to report the broken link.

And within three minutes I got a "thank you" for the report, the broken links were gone, and I had an invitation for a commit bit to the underlying SVN repo.

I stayed.

The Early Days

Those were they wild young days of Perl 6 and Pugs. Audrey Tang was pushing Pugs (and Haskell) very hard, and often implemented a feature within 20 minutes after somebody mentioned it. Things were unstable, broken often, and usually fixed quickly. No idea was too crazy to be considered or even implemented.

We had bots that evaluated Perl 6 and Haskell code, and gave the result directly on IRC. There were lots of cool (and sometimes somewhat frightening) automations, for example for inviting others to the SVN repo, to the shared hosting system (called feather), for searching SVN logs and so on. Since git was still an obscure and very unusable, people tried to use SVK, an attempt to implement a decentralized version control system on top of of the SVN protocol.

Despite some half-hearted attempts, I didn't really make inroads into compiler developments. Having worked with neither Haskell nor compilers before proved to be a pretty steep step. Instead I focused on some early modules, documentation, tests, and asking and answering questions. When the IRC logger went offline for a while, I wrote my own, which is still in use today.

I felt at home in that IRC channel and the community. When the community asked for mentors for the Google Summer of Code project, I stepped up. The project was a revamp of the Perl 6 test suite, and to prepare for mentoring task, I decided to dive deeper. That made me the maintainer of the test suite.

Pet Projects

I can't recount a full history of Perl 6 projects during that time range, but I want to reflect on some projects that I considered my pet projects, at least for some time.

It is not quite clear from this (very selected) timeline, but my Perl 6 related activity dropped around 2009 or 2010. This is when I started to work full time, moved in with my girlfriend (now wife), and started to plan a family.

Relationships

The technologies and ideas in Perl 6 are fascinating, but that's not what kept me. I came for the technology, but stayed for the community.

There were and are many great people in the Perl 6 community, some of whom I am happy to call my friends. Whenever I get the chance to attend a Perl conference, workshop or hackathon, I find a group of Perl 6 hackers to hang out and discuss with, and generally have a good time.

Four events stand out in my memory. In 2010 I was invited to the Open Source Days in Copenhagen. I missed most of the conference, but spent a day or two with (if memory serve right) Carl Mäsak, Patrick Michaud, Jonathan Worthington and Arne Skjærholt. We spent some fun time trying to wrap our minds around macros, the intricacies of human and computer language, and Japanese food. (Ok, the last one was easy). Later the same year, I attended my first YAPC::EU in Pisa, and met most of the same crowd again -- this time joined by Larry Wall, and over three or four days. I still fondly remember the Perl 6 hallway track from that conference. And 2012 I flew to Oslo for a Perl 6 hackathon, with a close-knit, fabulous group of Perl 6 hackers. Finally, the Perl Reunification Summit in the beautiful town of Perl in Germany, which brought together Perl 5 and Perl 6 hackers in a very relaxed atmosphere.

For three of these four events, different private sponsors from the Perl and Perl 6 community covered travel and/or hotel costs, with their only motivation being meeting folks they liked, and seeing the community and technology flourish.

The Now

The Perl 6 community has evolved a lot over the last ten years, but it is still a very friendly and welcoming place. There are lots of "new" folks (where "new" is everybody who joined after me, of course :D), and a surprising number of the old guard still hang around, some more involved, some less, all of them still very friendly and supportive

The Future

I anticipate that my family and other projects will continue to occupy much of my time, and it is unlikely that I'll be writing another Perl 6 book (after the one about regexes) any time soon. But the Perl 6 community has become a second home for me, and I don't want to miss it.

In the future, I see myself supporting the Perl 6 community through infrastructure (community servers, IRC logs, running IRC bots etc.), answering questions, writing a blog article here and there, but mostly empowering the "new" guard to do whatever they deem best.

samcv: Grant Status Update 3

Published on 2017-08-06T07:00:00

This month I accomplished a lot. We now support full Unicode 9.0/Emoji 4.0 text segmentation (putting codepoints into graphemes), some really awesome concatenation improvements (these are both in master). Also a fully functional Unicode Collation Algorithm (this not yet merged into master)

Table of Contents

Unicode Collation Algorithm

In my collation-arrays branch I now have implemented a fully working Unicode Collation Algorithm. Only 82/190,377 of the Unicode collation tests are failing (99.956% passing).

What I did:

  • Collation keys that need to be generated on the fly are working

  • Characters that need to be decomposed for collation (mainly Hangul characters) are working

  • The script that generates the linked list (used for codepoints with more than one collation array in them or for sequences of more than one codepoint with collation keys) was rewritten and I have not discovered any issues with the data

I was also able to properly keep the ability to adjust each of the collation levels (reversing or disabling primary, secondary or tertiary levels).

Since primary > secondary > tertiary for all collation elements we are able to have a default array:

{-1, 0, 1} /* aka Less, Same, More */

This is the default array that is used in case we are comparing between different levels (primary vs secondary for instance).

If we are comparing between the same level, we use a 3x3 array which holds the return values based on whether the collation values are either Less, More or Same. This is the array that is customized to allow you to reverse or disable different collation levels.

 {-1, 0, 1}, {-1, 0, 1}, {-1, 0, 1}
/*  primary,  secondary,   tertiary */

Below, pos_a and pos_b track moving between keys on the collation stack while level_a and level_b track comparing the levels between each other. For more information about how this works, see my [previous grant status update](/perl6/grant-status-update-2).

if (level_a == level_b)
    effective_level_eval = level_eval_settings.a[level_a];
else
    effective_level_eval = level_eval_default

rtrn =
stack_a.keys[pos_a].a[level_a] < stack_b.keys[pos_b].a[level_b] ?  effective_level_eval.s2.Less :
stack_a.keys[pos_a].a[level_a] > stack_b.keys[pos_b].a[level_b] ?  effective_level_eval.s2.More :
                                                                   effective_level_eval.s2.Same ;

Need to do:

  • Either use MVM_COLLATION_QC property to determine whether to look up a character in the linked list or instead store the index of the value of the main node in the Unicode database instead

    • Currently it checks all of the main nodes before then reverting to the values stored in the Unicode Database or based on generated values

  • Investigate country/language selective sorting

Other than needing to improve the speed of finding the main nodes, the implementation is feature complete.

The currently failing tests mostly relate to codepoints whose collation values are different in NFD form compared to NFC form. This happens if there are combining accent/extending characters after it, which would be reordered in NFD form. As the number of tests failing is quite low and there is only a very slight variation in sorting order because of this, I find it acceptable in its current form. I may find a work around to fix it, but since 99.956% of the tests pass and the incorrectness is only a slight variation of sorting.

Text Segmentation/Normalization

Emoji Fixes

Improvements to segmentation of Emoji w/ GCB=Other

Not all Emoji Modifiers have Grapheme_Cluster_Break = E_Base or E_Base_GAZ. In these cases we need to check the Emoji_Modifier_Base property. This fixed 25 more emoji than before.

Commit: 4ff2f1f9

Don’t break after ZWJ for Emoji=True + GCB=Other

With this change we are now counting 100% of the Emoji v4.0 emoji as a single grapheme. Since ASCII numbers and the pound symbol are also counted as Emoji, we disregard any codepoints in the ASCII range. I updated the MoarVM Readme to show we have full Unicode 9.0/Emoji 4.0 text segmentation support!

Other Improvements

Avoid property lookups for Hangul and just use the Grapheme_Cluster_Break property.

Lookup Canonical_Combining_Class as the raw integer property value instead of having to get the string and then converting to an integer. Even though the numbers we get back are not identical to the actual CCC, when doing the Unicode algorithms we only care about relative CCC. This results in lower CPU usage during normalization.

Fix cmp/leg to support deterministic comparing of synthetics

Fixes the cmp/leg operator to have deterministic results when the string contains a synthetic. Previously it compared based on the value the synthetic was stored as, which is based on which synthetic was allocated first. This makes it so cmp/leg always have the same result for the same string, no matter the order the synthetics are allocated. It also makes it so the codepoints the synthetic is made up of are compared instead of the value of the synthetic (whose value has no relation to the codepoints in the synthetic).

Previously we compared naïvely by grapheme, and ended up comparing synthetic codepoints with non-synthetics. This would cause synthetics to be sorted incorrectly, in addition to it making comparing things non-deterministic; if the synthetics were added in a different order, you would get a different result with MVM_string_compare.

Now, we compare down the string the same way we did before, but if the first non-matching grapheme is a synthetic, we iterate over the codepoints in the synthetic, so it returns the result based on codepoint. If the two non-matching graphemes contain the same initial codepoints but different in length, the grapheme with less codepoints in it is determined as less than the other.

Concatenation Improvements

When we concatenate strings normally, we create a new string which contains references to the string or strands of string a and the string or strands of string b.

Previously if there were any of the following conditions for the last grapheme of string a (last_a) or the first grapheme of string b (first_b), we would renormalize string a and string b in their entirety:

  • last_a or first_b were synthetics (contained multiple codepoints in one grapheme) other than the CRLF synthetic

  • Didn’t pass NFG Quickcheck

  • Had a non-zero Canonical_Combining_Class

The first improvement I made was to use same the code that we use during normalizations (should_break) to find out if two adjacent codepoints should be broken up or if they are part of the same grapheme (user visible character unit). This first improvement had a very big effect since previously we were renormalizing even if renormalization would not have caused there to be any change in the final output. This change reduced the amount we had to do full renormalization by about 80% in my tests of various different languages' scripts.

The second change I made was more technically challenging. While the first change reduces how often we would do full renormalization, the second change’s goal was to only renormalize a small section of string a and string b instead of string a and b in their entirety.

With the new code, we either copy the strands or set string a as the first strand of the new string (as we did previously). Then the should_break function we use for checking if we break between two codepoints or not is used. If we break between last_a and first_b we continue as I explained at the very start of this section. If should_break returns that we need to combine the two codepoints, then we create a renormalized_section strand for those two graphemes. Since last_a is now inside the renormalized_section, we adjust the reference to string a (similar to how .substr creates a substring) or if string a was made up of multiple strands, we adjust the last strand of string a.

We then insert the renormalized section. String b or its strands are then set after the renormalized section. The same way we adjusted a, we adjust b or its first strand to be a substring.

Since it’s possible for the adjusted strings or strands to be fully consumed (i.e. the substring is 0 in length), we "drop" the copied strand or string (for a or b) if the adjusted substring has no more graphemes in it. This currently can only happen if string a or the last strand of string a is a single grapheme or the first strand of string b or string b is a single grapheme.

Other minor fixes

Fix overflow in uniprop lookups

Makes excessively large uniprop queries not overflow into negative numbers, which would cause it to throw. Closes MoarVM issue #566.

Commit: a3e98692.

Made string_index 16% faster for incompatible string types

For incompatible string types use string_equal_at_ignore_case_INTERNAL_loop which results in a 16% speed boost. Only a minor change to the INTERNAL_loop function and it works for non-ignorecase/ignoremark as well as with ignorecase/ignoremark functionality.

Commit: 161ec639.

Add nqp::codes op to MoarVM for 3.5x faster doing .codes for short strings

Added a nqp::codes which gets the number of codepoints in the string. We had nqp::chars to get the number of graphemes already, but for getting the number of codepoints in the string we called .NFC.codes which is 3.5x slower for short strings and 2x slower for longer strings.

Tests

Added in tests to make sure that concat is stable when concatenating things which change under normalization. This was done to make sure there were no issues with the concatenation improvement.

Next Month

In the next month I need to get the Collation stuff merged and work on fixing some very longstanding bugs in our Unicode Database implementation. I may not be fully changing out our Unicode Database implementation as I had planned, but I will still at least fix the majority of the issues.

This includes ucd2c.pl not being deterministic and changing every time it is generated. I plan to make it generate the same code on every run. It was originally written to assumes properties and property values are unique. I need to make it so that when the script is generated, all of the properties and values are functional. Currently, when the script is regenerated, it is random which property values are functional and which are not; depending on which ends up last in the hash initializer array (since the property values where occur last overwrite the identical names which appear first).

6guts: MoarVM Specializer Improvements Part 1: Gathering Data

Published by jnthnwrthngtn on 2017-08-06T00:34:01

Over the last weeks I’ve had the chance to work full time on Perl 6, and have dedicated this time to improving the MoarVM specializer. Since “specializer” is such a lot to type, but shortening it to “spec” would result in it being confused with specification (or perhaps bacon), we refer to it as “spesh”. But what is it?

Specialization

Specialization is the process of taking code with lots of late-binding in it, and producing one or more versions of the code with as much of that stripped out as we can. We take late binding in a very broad sense, really using it to mean any place where we defer a decision until the code is executed (that is, runtime) because we don’t know the context it will be executed in. So, if we write a module with this in it:

sub shorten($text, $limit) is export {
    $text.chars > $limit
        ?? $text.substr(0, $limit) ~ '...'
        !! $text
}

Then this code isn’t committed to the types of $text and $limit. Thus, we must:

Even if we put Str and Int type constraints on the sub, that still isn’t really enough: these could be subclassed (for example, by mixing in to them) and thus override the methods that we call. (Granted in this particular case that’d be a tad odd, but in the general case, it’s not.)

Even with nice things like method caches and multi-dispatch caches, which do provide a huge speedup, we still must do the cache lookups, call the methods, and so forth. The job of the specializer is to notice what types we actually call shorten with, and the types methods like chars return, and thus strip out the need to even do cache lookups. It can then go further, for example inlining the body of the small chars method (and maybe substr too) into the caller so that there will be no invocation overhead for them at all!

Before we go any further, a quick thank you

This work is funded by my current TPF grant, which was approved some weeks back (so I’ve already been working away at it), though only recently publicly announced due to an internal TPF procedure that had to be completed first. So, thanks go to TPF for administering the funding, and to those who have donated to TPF and the Perl 6 Core Development Fund for providing the funding.

A little series

I could just write up a list of the improvements I’ve done, but I figure that many readers here don’t know a great deal about spesh. So instead, I will write a series of posts working through the way spesh works today, after my changes, together with some notes on how it used to work and why that changed. Hopefully that will be a more interesting and useful read.

What to optimize?

When I’ve talked people through how spesh works in the past, I’ve tended to start by discussing how the bytecode is turned into a control flow graph in single static assignment form, and then we go on and do the various bits of analysis and transformation. We’ll get to that, but in fact there has always been a step before it.

Constructing a CFG and putting it in SSA form takes time; transforming it by doing a load of optimizations takes more. How much depends on the size and structure of the code. So we need to decide what code to spend time on. This was done in two ways.

The first was just incrementing a counter every time the sub, block, method, or regex was called. If the count passed a threshold (which was chosen based upon the size of the bytecode) then an optimized version of it would be produced. Easy enough.

Trouble is that this doesn’t catch an important case: that where we have a program that spends most of its lifetime in a loop. In that case, we only enter the block holding the loop once, which is never going to trigger optimization. Thus we also count the number of iterations of loops. If that counter hits a high enough value, then we produce the optimized code and replace the running code with it, potentially moving from the interpreter into machine code. This replacement of code with an optimized version is known as “on stack replacement”, because it’s replacing code already running on the call stack).

Gathering data

Just throwing bytecode representing a highly dynamic program into an optimizer won’t achieve much, however. The cost is in the late binding, and to remove that we need an idea of what kinds of things are showing up when the code is run. This data came from two sources.

The first source was the incoming parameter types. When code was determined as hot, then the shape of the callsite (how many arguments, which named arguments) and the types of the arguments were taken and turned into a “key” for the specialization. These could then be assumed inside of the optimizer.

This was not enough on its own to produce good optimizations, however, because a lot of data comes from attribute accesses, lexical accesses, and the return values of things the code calls. Therefore, before doing any significant work, spesh would go through the code and insert logging instructions. These would record the values that were observed. After 8 runs, the specialization process would continue, using this data.

Bad decisions and missing information

This worked relatively well, but had notable shortcomings. The main one concerns code that is highly polymorphic in nature – that is, it is called with many different types. In this case, whichever type it was called with on, say, its 100th call, would have a specialization produced for it. If the next call used a different type, another specialization would be produced for that. And so forth, up until we had four specializations, which was the number picked as the limit.

This meant that things like defined or sink, which are called very often but on a lot of different types, would often go unoptimized and un-JITted. Worse, the way inlining works is to inline any existing specialization. Inlining all the defined and sink calls would be ideal. But in reality, for any non-trivial program, that would not happen in most cases. Clearly, if only the specializer were able to take a step back, it could see this situation and do something smarter. But the data just wasn’t there to do it.

The 8 samples taken when logging could also get lucky or unlucky about what values they saw. If a value returned from a method is a certain type over 99% of the time, and less than 1% of the time it is something else, then it makes sense to stick a guard in, optimize for the 99% case, and let the guard fail and trigger deoptimization for the less than 1% of the cases. (A deoptimization is like the opposite of OSR: we swap the optimized version of the code for the slower one that can handle all of the cases, and let it take care of the rare cases).

With only 8 values, however, 1 of them being different suggests we might have to deoptimize over 10% of the time. Deoptimization takes a bit of time to do, but leaves us interpreting the code with a bunch of late binding. The stakes are too high if all the data can tell us is that there’s a 1 in 8 chance.

Another problem in the data was a lack of knowing about the stability of the types on the callee side of a callsite. We relied on being able to infer them based on things we did know, or had inserted checks to guard against. However, what if we were just one more guard clause away from being able to inline something? Previously we could never determine that, so had to miss out on the opportunity. With better data, it would be possible to do better.

Interruptions and stampedes

Spesh needed to interrupt the execution of the program twice for each frame it wished to optimize: once to parse the bytecode into the SSA CFG, insert the logging and produce a logged version of the code, and again after the logging to produce an optimized version of the code. This introduced pauses into the execution. So, every time spesh makes a program run faster, it has to do so sufficiently to overcome the time it stole to do its optimization work. And if it speeds something up that doesn’t get run much in the future, it can make the program slower overall.

This is pretty poor use of the parallel hardware available in pretty much all computers nowadays. Better would be to spend time doing the optimization work on a thread separate from the execution of the user’s program. This not only gets rid of the pauses; it also means that if the work is completed by the interpreter before the optimized version is ready then we didn’t slow the program down by stopping it to do optimization work that was never going to pay off anyway.

Having a thread taking care of specialization work can also resolve another problem. Imagine that a bunch of threads are set off executing the same code on a range of different data items (so, data parallelism). In this case, more than one thread could see that the code is hot, notice there’s not yet any specialization, and get to work on producing it. There was detection of any duplicate ones at installation time, but that still meant there was wasted work producing specializations. If 3 different threads wasted their time this way, then the amount of time before the specializer had paid for itself was also increased.

A new approach to data collection

Clearly, there was plenty of room for improvement. It was fairly clear that having a better data model representing the execution of the program, which the specialier would use to make smarter decisions, was a key part of this. At the same time, it was important that threads executing user code didn’t spend much of their time building this. Ideally, they would just throw data into a sequential buffer, and toss it over to another thread – a specializer thread – when it was full.

So, that’s the direction I headed it. Each thread would log interesting events into a thread local buffer. The append-only and thread local (for writes) nature of it should make it fairly cache friendly. It was desirable that the entries were fairly small; the main place this had an impact was parameter type logging, where we both wish to log the container type and the type held inside of the container. This was taken care of by just writing two entries, rather than all the rest being padded out (fixed size entries weren’t all that important, I guess, but it did make things easier).

I converged on this set of events in the log. It’s 24 bytes per entry, which isn’t too bad. One early mistake I made was logging invokes. I was keen that we start being able to inline calls to closures in the future, when it’s always the same code we call but with a different environment. So I logged the invoked code object, figuring the specializer thread could then pick the interesting parts out. This turned out to quite notably extend the lifetime of the closed over data, however. As the things we cared about were what code was invoked and whether the calling frame was also the outer frame, it was quite cheap to just extract those at the point of writing the log anyway.

The end result only logs references to static frames, types objects, and code objects where we’re told the result of the lookup can be cached because it will always be the same. It never logs values. This fixes an issue in the previous logging mechanism: during its 8 runs it would keep values alive for longer, and – much worse – if the 8 runs were never completed, it could keep them alive indefinitely. It was a bounded leak, but certainly one I’m glad to no longer have.

A hugely important part of the log is the correlation ID. This is a per-thread incrementing counter. It is bumped each time a frame is entered with logging turned on. It is then included in all events occurring within the execution of that frame. This allows the specializer thread to work out what events go with what code.

The spesh worker thread

The spesh worker thread sits in a loop, reading a log from a blocking queue (meaning it waits in an efficient way, and – once the program has been fully specialized – just never gets woken up again). Once a thread running code fills up a log buffer, it sticks it into the blocking queue, the spesh worker receives it, and it gets to work. I’ll go through the various steps it takes later in this series; for today I’ll focus only on the first one (updating the statistics model, below) and what it does before going to wait for another data buffer.

To prevent runaway memory use if the threads running code are producing logs way faster than the specializer can make use of them, threads are given a quota of log buffers they are allowed to send. On sending, they decrement the quota. If it’s still greater than zero, then they allocate another log buffer and continue logging. Otherwise, logging is disabled for that thread. Once the specialization worker thread is done processing a buffer and acting upon its content, it increments the quotas. If it incremented it from zero, then it also installs a spesh log buffer for the thread, so it will continue logging.

Nailing the main loop

This scheme almost worked out fine, but there was an awkward problem. If we weren’t logging at the point that the outermost frame of the program started running (perhaps because we were specializing bits of the compiler still), then the outermost body of the program wouldn’t get a correlation ID, and so wouldn’t log any events. This would be especially unfortunate, since the first thing people measuring performance tend to do is write a hot loop in the mainline of a program! (More usefully, any perl6 -ne ... invocation has the same kind of program structure.)

Thankfully, there was an easy way to deal with this: when we enter a new compilation unit for the first time, if there is no spesh log, then grant a temporary quota boost. Alternatively, if the log is almost full, then we send it off and then make a new one (with a quota boost if needed). This boost is temporary.

Every heuristic written with good intentions can go rogue, and this one is no exception: imagine a program doing a load of EVALs. So, there is a fixed limit on how many times this quota boosting can happen, to ensure it handles the cases it is aimed at but doesn’t do harm.

Building a model

Over on the specialization thread, the events are fed into a simulation that recreates something much like the call stack of the logged program, so it can understand the relationships between callers and callees. This produces a set of statistics, hung off each invoked piece of code (know as a “static frame” in MoarVM); doing it this way has the advantage that the statistics will be garbage collected should the code also be garbage collected (this can matter in EVAL-heavy programs).

One of the things I like most about this new approach is that, with the MVM_SPESH_LOG environment variable set to the name of a file to log to, the assembled statistical data will be dumped. This makes it possible to see the data being used to make decisions. Enough with me describing stuff, though: let’s see the stats! Here’s an example program:

sub shorten($text, $limit) is export {
    $text.chars > $limit
        ?? $text.substr(0, $limit) ~ '...'
        !! $text
}
for ^10000 {
    shorten 'foo' x 100, (20..500).pick;
}

Here’s the statistics output after a while for the shorten sub:

Latest statistics for 'shorten' (cuid: 1, file: xxx.p6:1)

Total hits: 156

Callsite 0x7fdb8a1249e0 (2 args, 2 pos)
Positional flags: obj, obj

    Callsite hits: 156

    Maximum stack depth: 13

    Type tuple 0
        Type 0: Str (Conc)
        Type 1: Int (Conc)
        Hits: 156
        Maximum stack depth: 13
        Logged at offset:
            226:
                156 x type Int (Conc)
                156 x static frame 'chars' (4197) (caller is outer: 0)
                156 x type tuple:
                    Type 0: Scalar (Conc) of Str (Conc)
            260:
                156 x type Bool (Conc)
                1 x static frame 'infix:«>»' (2924) (caller is outer: 0)
                155 x static frame 'infix:«>»' (3143) (caller is outer: 0)
                156 x type tuple:
                    Type 0: Int (Conc)
                    Type 1: Scalar (Conc) of Int (Conc)
            340:
                91 x type Str (Conc)
                91 x static frame 'substr' (2541) (caller is outer: 0)
                91 x type tuple:
                    Type 0: Scalar (Conc) of Str (Conc)
                    Type 1: Int (Conc)
                    Type 2: Scalar (Conc) of Int (Conc)
            382:
                91 x type Str (Conc)
                        91 x static frame 'infix:<~>' (4223) (caller is outer: 0)
                        91 x type tuple:
                    Type 0: Str (Conc)
                    Type 1: Str (Conc)

Static values:
    - Sub+{<anon|77645888>} (0x2a70d48) @ 194
    - Sub+{<anon|77645888>}+{Precedence} (0x23a90c0) @ 288

From this we can see:

At compile time it was also worked out that some lexical lookups will always resolve to the very same object; the results of those have been logged also (the static values section), so the optimization process can elide the lookups and do further optimizations by knowing exactly what will be called. (The lookups in question are for the > and ~ operations, which are lexical, but in this case not overridden in a nested scope and so will come from the CORE setting.)

Here’s another example, this time for chars:

Latest statistics for 'chars' (cuid: 4197, file: SETTING::src/core/Str.pm:2728)

Total hits: 157

Callsite 0x7fdb8a124a00 (1 args, 1 pos)
Positional flags: obj

    Callsite hits: 157

    Maximum stack depth: 33

    Type tuple 0
        Type 0: Str (Conc)
        Hits: 1
        Maximum stack depth: 33

    Type tuple 1
        Type 0: Scalar (Conc) of Str (Conc)
        Hits: 156
        Maximum stack depth: 14

Here we can see that it was called once with a Str value, but a bunch of times with a Scalar holding a Str. Therefore, it’s worth producing optimized code for the latter case, but not worth bothering about the former.

Clearing up

But what about all of the statistics we record on one-off cold paths, such as at startup? We’ll never do optimizations based on them, and so they’d just sit around using memory. Or what about after we’ve optimized a frame in all the needed ways, and it’s no longer logging new data? We don’t really need the statistics any more.

To alleviate this, each time a log buffer is received a statistics version number is incremented. The statistics are marked with the current version when updated. Any statistics that are not updated in a while will be presumed to be no longer interesting, and so thrown out.

Debuggability

When I first mentioned moving specialization work off to its own thread on #moarvm, brrt (who leads development of the JIT) was immediately concerned about the unpredictability this would introduce. Threads are scheduled a bit differently between runs, which would make it a nightmare to reproduce and hunt down specializer and JIT compiler bugs, including using the bisection approach to work out exactly which specialization made things go wrong. It was an excellent point.

Therefore, I introduced the MVM_SPESH_BLOCKING environment variable. When set, a thread executing code will send off its log, and then block until the spesh worker thread has finished processing it and installing specializations. This means that, for a single threaded program, the behavior will again be fully deterministic.

You won’t believe what happens next!

Alas, that’s all I’m covering this time. Next time, I’ll talk about how the statistics are used.


6guts: Shrinking MoarVM call frames

Published by jnthnwrthngtn on 2017-07-30T19:16:35

Last week, I did some work to greatly decrease the size of call frames, also know as invocation records, in MoarVM. In theory, a call frame is created whenever a sub, method, regex, or block is entered. In reality, scopes may be flattened away at compile time, which decreases the number of call frames needed. Further to that, dynamic optimization at runtime leads to inlining, which has the same result except it can do it with late-bound calls, even with callees in different compilation units. Even with these optimizations (and in part because of their limitations), most programs will still need to create and destroy many call frames over their lifetime, making their setup and tear-down a hot path, and their size a factor in program memory performance.

The work I’ll describe in this post has been funded by OETIKER+PARTNER AG, who responded to my recent funding call. In fact, they’re funding around 10 hours of work per month over the space of a year, so this will be just the first of a number of posts describing work that they are making possible. So, thanks!

Background

The MVMFrame data structure has been there since the very earliest days of MoarVM. My memory is hazy, but I suspect it was among the first dozen data structures that I sketched out when starting to design and implement the VM. Initially, call frames were reference counted and allocated out of a special pool. Now they live either in a per-thread callsite region allocated by incrementing a pointer and deallocated by decrementing the pointer, much like a traditional call stack, or on the heap if they “escape” (as a result of a closure, exception throw, or continuation). At present, that heap promotion is always lazy (so call frames are always born on the call stack).

Therefore, the size of an MVMFrame impacts:

Over the years, MVMFrame has grown. Various things were added in support of additional features. However, not all of those things are used by all frames. Additionally, some of them, while used widely, were both quite rarely read and very cheap to compute when they were needed, meaning it was better to just re-calculate them on demand.

Effective handlers and bytecode

Two pointers were taken up for storing the effective handlers and bytecode. The bytecode holds the instructions to execute, and the handlers are the regions of that bytecode covered by exception handlers. At first, these fields did not exist in MVMFrame. The bytecode and handlers weren’t properties of a given call, but of the code being called (known as the “static frame” in MoarVM). So why these fields?

They were introduced with the dynamic optimizer. This produces one or more versions of the frame’s bytecode specialized by callsite and type. The specialized version of the code is selected based upon what the callee passes. Since the specialized bytecode contains different (typically many less) instructions, the code offests covered by exception handlers move also, so we need to use an updated table of those when locating an exception handler.

It was certainly convenient to hang pointers to these two off the frame. But we could always locate them just by following the spesh_cand pointer in the frame, if it was set, or the static_info pointer if not. And this isn’t something that we needed to do so often that this extra dereference was going to add up. The only common path we needed to do it on was on return. But we’d be losing the instruction to set it in the invocation, so it about balances anyway – and that’s before the considering the memory savings.

With those gone, MVMFrame shrunk 2 pointers, or 16 bytes in a 64-bit environment.

Throw address

When an exception was thrown, MoarVM stored the address in the program that it was thrown at into…a pointer in the currently executing frame. It then referenced the frame from the exception object. And that was pretty much all the throw address field was used for. This was a very easy 8 bytes (64-bit pointer) to win: just store the address in the exception object, where it belongs anyway. D’oh.

Rarely used things

Some things are used by just a small handful of frames:

These added up to 8 pointers and one 16-bit integer. By moving them into an “extras” data structure hung off the frame, and allocated on demand, space equivalent to 7 pointers could be saved off the frame. With 64-bit pointers, that’s 56 bytes of savings for most frames. The CORE.setting compilation ends up with only 6% of frames needing this extra storage space.

Alignment

With a 16-bit integer moved off into the extras I realized that, with a little care, we could re-order things, force an enum to only use a single byte, and save another 8 bytes off MVMFrame, simply by not needing some empty wasted padding space in the struct (C compilers insert this to make sure memory reads are aligned to correct boundaries).

Saving a memset

That’s 88 bytes of savings, which is around a cache line and a half on a typical CPU. It also means that nearly all of the things left in MVMFrame were being initialized on every invocation as part of the callframe setup. Meaning? That the memset of MVMFrame could go away, at the cost of just inserting a couple of instructions to manually zero or NULL things out (some of them on rarely taken paths).

But wait, there’s more…

While I was working on this, and looking at profiles, I noticed that a large number of allocations came from creating a buffer to keep track of which named arguments had been used and which had not (for the sake of error reporting and slurpy argument handling). We allocated a byte array, but only really need a bit per named argument. So, I turned the field into a union of a 64-bit bit field for when there are at most 64 named arguments (which is probably just about every real world use case), and fall back to the old byte array approach otherwise.

All in all…

These changes provide some memory use reductions, but more importantly are good for CPU cache locality. They also knock 2% off the number of CPU instructions run during CORE.setting compilation; many other programs should see a similar improvement (how much depending on how much of a hot path invocation is, and how many closures they take).


rakudo.org: Announce: Rakudo Star Release 2017.07

Published by Steve Mynott on 2017-07-24T18:36:58

A useful and usable production distribution of Perl 6

On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the July 2017 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the July 2017 release is available from https://rakudo.perl6.org/downloads/star/.

Binaries for macOS and Windows (64 bit) are also available.

This is the eighth post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

IMPORTANT: “panda” is to be removed very shortly since it is deprecated. Please use “zef” instead.

Currently, Star is on a quarterly release cycle and 2017.10 (October) will follow later this year.

Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

In the Perl 6 world, we make a distinction between the language (“Perl 6”) and specific implementations of the language such as “Rakudo Perl”.

This Star release includes release 2017.07 of the Rakudo Perl 6 compiler, version 2017.07 MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

Note this Star release contains NQP version 2017.07-9-gc0abee7 rather than the release NQP 2017.07 in order to fix the –ll-exception command line flag.

The Rakudo compiler changes since the last Rakudo Star release of 2017.01 are now listed in “2017.05.md”, “2017.06.md” and “2017.07.md” under the “rakudo/docs/announce” directory of the source distribution.

Notable changes in modules shipped with Rakudo Star:

+ DBIish: Doc and CI updates
+ doc: Too many to list. p6doc fixed.
+ grammar-debugger: Works again now.
+ p6-io-string: New dep for doc.
+ p6-native-resources: Removed since deprecated and not used by linenoise.
+ panda: Officially deprecate panda in favour of zef.
+ perl6-Test-When: New dep for perl6-pod-to-bigpage.
+ perl6-lwp-simple: Fix breakage due to rakudo encoding refactor.
+ tap-harness6: Replaces deprecated tap-harness6-prove6.
+ zef: Too many to list.

There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

+ advanced macros
+ non-blocking I/O (in progress)
+ some bits of Synopsis 9 and 11

There is an online resource at http://perl6.org/compilers/features that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at rakudobug@perl.org.

See https://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org mailing list, or join us on IRC #perl6 on freenode.

Perlgeek.de: Perl 6 Fundamentals Now Available for Purchase

Published by Moritz Lenz on 2017-07-21T22:00:01

After about nine months of work, my book Perl 6 Fundamentals is now available for purchase on apress.com and springer.com.

The ebook can be purchased right now, and comes in the epub and PDF formats (with watermarks, but DRM free). The print form can be pre-ordered from Amazon, and will become ready for shipping in about a week or two.

I will make a copy of the ebook available for free for everybody who purchased an earlier version, "Perl 6 by Example", from LeanPub.

The book is aimed at people familiar with the basics of programming; prior Perl 5 or Perl 6 knowledge is not required. It features a practical example in most chapters (no mammal hierarchies or class Rectangle inheriting from class Shape), ranging from simple input/output and text formatting to plotting with python's matplotlib libraries. Other examples include date and time conversion, a Unicode search tool and a directory size visualization.

I use these examples to explain subset of Perl 6, with many pointers to more documentation where relevant. Perl 6 topics include the basic lexicographic structure, testing, input and output, multi dispatch, object orientation, regexes and grammars, usage of modules, functional programming and interaction with python libraries through Inline::Python.

Let me finish with Larry Wall's description of this book, quoted from his foreword:

It's not just a reference, since you can always find such materials online. Nor is it just a cookbook. I like to think of it as an extended invitation, from a well-liked and well-informed member of our circle, to people like you who might want to join in on the fun. Because joy is what's fundamental to Perl. The essence of Perl is an invitation to love, and to be loved by, the Perl community. It's an invitation to be a participant of the gift economy, on both the receiving and the giving end.

Perlgeek.de: The Loss of Name and Orientation

Published by Moritz Lenz on 2017-07-10T22:00:01

The Perl 6 naming debate has started again. And I guess with good reason. Teaching people that Perl 6 is a Perl, but not the Perl requires too much effort. Two years ago, I didn't believe. Now you're reading a tired man's words.

I'm glad that this time, we're not discussing giving up the "Perl" brand, which still has very positive connotations in my mind, and in many other minds as well.

And yet, I can't bring myself to like "Rakudo Perl 6" as a name. There are two vary shallow reasons for that: Going from two syllables, "Perl six", to five of them, seems a step in the wrong direction. And two, I remember the days when the name was pretty young, and people would misspell it all the time. That seems to have abated, though I don't know why.

But there's also a deeper reason, probably sentimental old man's reason. I remember the days when Pugs was actively developed, and formed the center of a vibrant community. When kp6 and SMOP and all those weird projects were around. And then, just when it looked like there was only a single compiler was around, Stefan O'Rear conjured up niecza, almost single-handedly, and out of thin air. Within months, it was a viable Perl 6 compiler, that people on #perl6 readily recommended.

All of this was born out of the vision that Perl 6 was a language with no single, preferred compiler. Changing the language name to include the compiler name means abandoning this vision. How can we claim to welcome alternative implementations when the commitment to one compiler is right in the language name?

However I can't weigh this loss of vision against a potential gain in popularity. I can't decide if it's my long-term commitment to the name "Perl 6" that makes me resent the new name, or valid objections. The lack of vision mirrors my own state of mind pretty well.

I don't know where this leaves us. I guess I must apologize for wasting your time by publishing this incoherent mess.

Zoffix Znet: The Hot New Language Named Rakudo

Published on 2017-07-07T00:00:00

A rose by any other name...

samcv: Grant Status Update 2

Published on 2017-07-04T07:00:00

This is my second grant progress report for my Perl Foundation grant entitled "Improving the Robustness of Unicode Support in Rakudo on MoarVM".

I got to working on collation this month. I'm going to explain a bit of how the Unicode Collation Algorithm works.

Unicode Collation Algorithm

The collation data for UCA is made up of arrays like so:

[primary.secondary.tertiary]

Each one is an integer. Primary is different for different letters, 'a' vs 'z'. secondary is differences such as diacritics, 'a' vs 'á'. And tertiary is case, 'a' vs 'A'. While it works different for non-Latin characters, that is the gist of what they represent. In most cases you have one codepoint mapped to one collation array. Though in many cases, this is not true.

Single codepoints can map to multiple collation array elements. Sequences of codepoints can also map to one or more than one collation array elements.

Some sequences also can exist inside others.

So the string xyz may have one set of collation elements but xy has another, where x y and z are codepoints in a three codepoint sequence with its own set of multiple collation keys.

So, how do these collation elements translate into sorting the codepoints?

[.0706.0020.0002], [.06D9.0020.0002]

You take the two primary values, then append a 0 as a seperator. Then push the secondary, append another 0 as a separator and then push on the tertiary:

0707, 06D9, 0, 020, 020, 0, 02, 02

Now this would pose a problem since we would need to traverse the entire string before making any decisions. Instead what I have decided to do is to use the arrays with [primary.secondary.tertiary] and push them onto a stack instead of changing them into a linear progression, and iterate through the primary's, and then grab more collation elements as they are needed to resolve ties.

Also when collation data for the next codepoint is added to the stack, if it is a starter is a sequence we will also pull the next codepoint going through a linked list stored in C arrays as needed. If the next codepoint ends up not being a part of a sequence we just push the codepoint we just "peeked" at onto the stack as well, so don't have to go back over codepoints.

Now this improved Unicode Collation Algorithm is not complete, but I am continuing to work on integrating the new C data structure I've created into MoarVM, and it currently works partially, but not as well as the current implementation.

Improvements to the Current UCA Implementation

In the meantime I have made improvements to the current implementation of the Unicode Collation Algorithm. Previously it was possible to enable or disable the primary, secondary or tertiary levels. This allowed you to do things such as ignore diacritics when sorting or ignore casing. What you are now able to do is to reverse the sorting of different levels. This allows you to for example sort uppercase letters before lowercase (default UCA sorts lowercase before uppercase, since lowercase < uppercase). It can also let you put diacritic mark containing characters before the ordinary letters. Any of the three levels can be either enabled, disabled, or reversed. For anybody already using it, supplying True or False to set $*COLLATION still works the same as before, but you are now able to supply 1, 0, or -1 to enable, disable or reverse the collation for specific levels.

Fixes

Grapheme Cluster Break

As I said last week I made improvements to the script that tests our breakup of graphemes. Now we have full support for the Prepend property that was added in Unicode 9.0, as well as passing all the tests for regional indicators. The only tests we now don't pass in GraphemeClusterBreakTest.t are a few emoji tests, and I believe we only fail 3 or so of these! The Prepend mark fixes needed us to save more state across parsing the code, as Prepend is different from all other Unicode grapheme break logic in that it comes before not after a base character.

Igorecase+Ignoremark Regex

The longstanding bug I mentioned in my previous status report has now been fixed. The bug was in regex when both ignorecase and ignoremark adverbs were used.

say "All hell is breaking loose" ~~ m:i:m/"All is fine, I am sure of it"/
# OUTPUT«「All hell is breaking loose」␤» Output before the fix. This should not have matched.

This bug occurred when the entire length of the haystack was searched and all of the graphemes matched the needle.

If the needle exceeded the length of the haystack past that point, it would erroneously think there was a match there, as it only checked that it matched the whole length of the haystack.

Would cause 'fgh' to be found in: 'abcdefg'. This only occurred at the very end of the haystack.

The internal string_equal_at_ignore_case_INTERNAL_loop now returns -1 if there was no match and 0 or more if there was a match at that index.

This return value provides new information which is 0 if there was a match and some positive integer when the haystack was expanded when casefolding it.

As explained by my previous post, information about when characters expand when foldcased must be retained.

This information had been planned to be exposed in some way at a future date, as if we are searching for 'st' inside a string 'stabc', nqp::indexic (index ignorecase) will indicate that it is located at index 0, and in Perl 6 Rakudo it will return 'sta' when it should instead have returned 'st'.

For now this additional information is only internal and the return values of the nqp::indexic_s and nqp::equatic_s ops have not changed.

NQP Codepath Problems…

Previously there were way too many different codepaths handling different variations of no regex adverbs, ignorecase, ignoremark, ignorecase+ignoremark. Problematically each combination had their own codepath. To really solve this bug and improve the code quality I decided to clean it up and correct this.

In my past work I had already added a nqp::indexic op, so it was time to add another! I added a nqp::indexicim op and a nqp::eqaticim op and was able to reuse most of the code and not increase our code burden much on the MoarVM side, and greatly reduce the possibility for bugs to get in on varying combinations of ignorecase/ignoremark ops.

This is was a very longstanding Unicode bug (I don't think both adverbs together ever worked) so it's great that it is now fixed :).

Coming Up

I will be continuing to fix out the issues in the new Unicode Collation Algorithm implementation as I described earlier in this post. I also plan on taking stock of all of the current Grapheme Cluster Break issues, which only exist now for certain Emoji (though the vast majority of Emoji work properly).

I will also be preparing my talks for the Amsterdam Perl conference as well!

Sidenote

I released a new module, Font::QueryInfo which allows you to query font information using FreeType. It can even return the codepoints a font supports as a list of ranges!

6guts: Optimizing reading lines from a file

Published by jnthnwrthngtn on 2017-07-02T15:57:43

Reading lines from a file and processing them one at a time is a hugely common scripting task. However, to date our performance at this task has been somewhat underwhelming. Happily, a grateful Perl 6 fan stepped up in response to my recent call for funding, offering 25 hours of funding to work on whatever I felt was most pressing, but with a suggestion that perhaps I could look at some aspect of I/O performance. Having recently been working on refactoring I/O anyway, this was very timely. So, I crafted a benchmark and dug in.

The benchmark and a baseline

Perl 5 is considered to have very good I/O performance, so I figured I’d use that as a rough measure of how close Perl 6 was to performing well at this task. A completely equivalent benchmark isn’t quite possible, but I tried to pick something representative of what the average programmer would write. The task for the benchmark was to take a file with one million lines, each having 60 characters, loop over them, and add up the number of characters on each line. That number would then be printed out at the end (it’s important that benchmarks calculating results return or consume the result in some way, as a sufficiently smart optimizer may otherwise manage to eliminate the work we think we’re measuring). The rules were that:

The Perl 5 benchmark for this came out as follows:

perl -e 'open my $fh, "<:encoding(UTF-8)", "longfile";
         my $chars = 0;
         while ($_ = <$fh>) { chomp; $chars = $chars + length($_) };
         close $fh;
         print "$chars\n"'

With the Perl 6 one looking like this:

perl6 -e 'my $fh = open "longfile";
          my $chars = 0;
          for $fh.lines { $chars = $chars + .chars };
          $fh.close;
          say $chars'

I’ll note right off that in Perl 6 there are likely ways, today, to do a bit better. For example, the $chars variable could be given a native int type, and it’s entirely possible that a while loop might come out faster than the for loop. Neither of those are representative of what a typical programmer looking at the documentation and diving in to implementing stuff would do, however. I suspect that Perl 5 experts could similarly point out some trick I’ve missed, but I’m trying to benchmark typical use.

One slight unfairness is that the Perl 6 solution will actually count the number of grapheme clusters, since strings are at grapheme level. This entails some extra processing work, even in the case that there are no multi-codepoint clusters in the input file (as there were not in this case). But again, the average user making comparisons won’t much care for such technicalities.

All measurements were made on modern hardware with an Intel Xeon 6-core CPU and a fast SSD, and on Linux.

At the point I started work, the Perl 6 solution clocked in at 2.87s, to just 1.13s for Perl 5. This made Perl 6 a factor of 2.5 times slower.

First hints from the profile

The whole I/O stack recently got a good overhaul, and this was the first time I’d looked at a profile since that work was completed. Looking at the output from --profile immediately showed up some rather disappointing numbers. Of all callframes, 57.13% were JIT-compiled. Worse, basically nothing was being inlined.

At this point, it’s worth recalling that Perl 6 is implemented in Perl 6, and that there’s quite a bit going on between the code in the benchmark and ending up in either things implemented in C or a system call. The call to lines returns an Iterator object. Reading a line means calling the pull-one method on that Iterator. That in turn calls the consume-line-chars method on a $!decoder object, and that method is what actually calls down to the VM-backed decoder to read a line (so there’s a level of indirection here to support user provided decoders). The return value of that method then has defined called on it to check we actually got a line back. If yes, then it can be returned. If not, then read-internal should be called in order to fetch data from the file handle (given buffering, this happens relatively rarely). Running the loop body is a further invocation, passing the read line as a parameter. Getting the chars count is also a method call (which, again, actually calls down to the VM guts to access the string’s grapheme count).

That’s quite a lot of method calling. While the VM provides I/O, decoding, and finding input up to a separator, the coordination of that whole process is implemented in Perl 6, and involves a bunch of method calls. Seen that way, it’s perhaps not surprising that Perl 6 would come in slower.

There are, however, things that we can do to make it fast anyway. One of them is JIT compilation, where instead of having to interpret the bytecode that Perl 6 is compiled in to, we further translate it into machine code that runs on the CPU. That cuts out the interpreter overhead. Only doing that for 57% of the methods or blocks we’re in is a missed opportunity.

The other really important optimization is inlining. This is where small methods or subroutines are taken and copied into their callers by the optimizer. This isn’t something we can do by static analysis; the point of methods calls is polymorphism. It is something a VM doing dynamic analysis and type specialization can do, however. And the savings can be rather significant, since it cuts out the work of creating and tearing down call frames, as well as opening the door to further optimization.

The horrors in the logs

There are a couple of useful logs that can be written by MoarVM in order to get an idea of how it is optimizing, or failing to optimize, code. The JIT log’s main point of interest for the purpose of optimization is that it can indicate why code is not being JIT-compiled – most commonly because it contains something the JIT doesn’t know about. The first thing in this case was the call into the VM-backed decoder to extract a line, which was happily easily handled. Oddly, however, we still didn’t seem to be running the JIT-compiled version of the code. Further investigation uncovered an unfortunate mistake. When a specialized version of a method calls a specialized version of another method, we don’t need to repeat the type checks guarding the second method. This was done correctly. However, the code path that was taken in this case failed to check if there was a JIT-compiled version of the target rather than just a specialized bytecode version, and always ran the latter. I fixed that, and went from 57.13% of frames JIT-compiled to 99.86%. Far better.

My next point of investigation is why the tiny method to grab a line from the decoder was not being inlined. When I took a look at the post-optimization code for it, it turned out to be no surprise at all: while the logic of the method was very few instructions, it was bulked out by type checking of the incoming arguments and return values. The consume-line-chars method looks like this:

method consume-line-chars(Bool:D :$chomp = False, Bool:D :$eof = False --> Str) {
    my str $line = nqp::decodertakeline(self, $chomp, $eof);
    nqp::isnull_s($line) ?? Str !! $line
}

Specializations are always tied to a callsite object, from which we can know whether we’re being passed a parameter or not. Therefore, we should be able to optimize out those checks and, in the case the parameter is being passed, throw out the code setting the return value. Further, the *%_ that all methods get automatically should have been optimized out, but was not being.

The latter problem was fixed largely by moving code, although tests showed a regression that needed a little more care to handle – namely, that a sufficiently complex default value might do something that causes a deoptimization, and we need to make sure we can fall back into the interpreter and have things work correctly in that case.

While these changes weren’t enough to get consume-line-chars inlined, they did allow an inlining elsewhere, taking the inline ratio up to 28.49% of call frames.

These initial round of changes took the Perl 6 benchmark from 2.87s to 2.77s, so about 3.5% off. Not much, but something.

Continuing to improve code quality

The code we were producing even pre-optimization was disappointing in a few ways. Firstly, even though a simple method like consume-line-chars, or chars, would never possibly do a return, we were still spitting out a return exception handler. A little investigation revealed that we were only doing analysis and elimination of this for subs but not methods. Adding that analysis for methods too took the time down to 2.58s. Knocking 7% off with such a small change was nice.

Another code generation problem lay in consume-line-chars. Access to a native lexical can be compiled in two ways: either just by reading the value (fine if it’s only used as an r-value) or by taking a reference to it (which is correct if it will be used as an l-value). Taking a reference is decidedly costly compare to just reading the value. However, it’s always going to always have the correct behavior, so it’s the default. We optimize doing so away whenever we can (in fact, all the most common l-value usages of it never need a reference either).

Looking at consume-line-chars again:

method consume-line-chars(Bool:D :$chomp = False, Bool:D :$eof = False --> Str) {
    my str $line = nqp::decodertakeline(self, $chomp, $eof);
    nqp::isnull_s($line) ?? Str !! $line
}

We can see the read of $line here is, since consume-line-chars is not marked is rw, an r-value. Unfortunately, it was compiled as an l-value because the conditional compilation lost that context information. So, I addressed that and taught Rakudo to pass along return value’s r-value context.

A native reference means an allocation, and this change cut the number of GC runs enormously, from 182 or them to 41 of them. That sounds like it should make a sensational difference. In fact, it got things down to 2.45s, a drop of just 5%. Takeaway lesson: allocating less stuff is good, but MoarVM’s GC is also pretty good at throwing away short-lived things.

Meanwhile, back in the specializer…

With the worst issues of the code being fed into MoarVM addressed, it was back to seeing why the specializer wasn’t doing a better job of stripping out type checks. First of all, it turned out that optional named arguments were not properly marking the default code dead when the argument was actually passed.

Unfortunately, that wasn’t enough to get the type tests stripped out for the named parameters to consume-line-chars. In fact, this turned out to be an issue for all optional parameters. When doing type analysis, and there are two branches, the type information has to be merged at join points in the control flow graph. So it might see something like this in the case that the argument was not passed:

    Bool (default path) \   / Unknown (from passed path)
                         \ /
                   Result: Unknown

Or maybe this in the case that it was passed:

    Bool (default path) \   / Scalar holding Bool (from passed path)
                         \ /
                   Result: Unknown

In both cases, the types disagree, so they merge to unknown. This is silly, as we’ve already thrown out one of the two branches, so in fact there’s really no merge to do at all! To fix this up, I marked variables (in single static assignment form) that died as a result of a basic block being removed. To make the dead basic blocks from argument analysis actually be removed, we needed to do the dead code removal earlier as well as doing it at the very end of the optimization process. With that marking in place, it was then possible to ignore now-dead code’s contribution to a merge, which meant a whole load of type checks could now be eliminated. Well, in fact, only in the case where the optional was passed; a further patch to mark the writers of individual instructions dead for the purpose of merges was needed to handle the case where it was not.

That left the return type being checked on the way out, which also seemed a bit of a waste as we could clearly see it was a Str. After a tweak to Rakudo to better convey type information in one of its VM extension ops, that check was optimized out too.

And for all of this effort, the time went from…2.45s to 2.41s, just 2% off. While it’s cheaper to not type check things, it’s only so costly in the first place.

A further win was that, with the code for consume-line-chars now being so tiny, it should have been an inlining candidate. Alas, it was not, because the optional arguments was still having tracking information recorded just in case we needed to deoptimize. This seemed odd. It turned out that my earlier fix for this was too simplistic: it would leave them in if the method would ever deoptimize, not just if it would do it while handling arguments. I tightened that up and the time dropped to 2.37s, another 2% one. Again, very much worth it, but shows that invocation – while not super cheap – is also only so costly.

With consume-line-chars inlining now conquered, another area of the code we were producing caught by eye: boolification was, in some cases, managing to box an int into an Int only to them immediately unbox it and turn it into a Bool. Clearly this was quite a waste! It turned out that an earlier optimization to avoid taking native references had unexpected consequences. But even nicer was that my earlier work to pass down r-value context meant I could delete some analysis and just use that mechanism instead. That was worth 4%, bringing us to 2.28s.

Taking stock

None of these optimizations so far were specific to I/O or touched the I/O code itself. Instead, they are general optimization and code quality improvements that will benefit most Perl 6 programs. Together, they had taken the lines benchmark from 2.87s to 2.28s. Each may have been just some percent, but together they had knocked 20% off.

By this point, the code quality – especially after optimization – was far more pleasing. It was time to look for some other sources of improvement.

Beware associativity

Perhaps one of the easiest wins came from spotting the pull-one method of the lines iterator seemed to be doing two calls to the defined method. See if you can spot them:

method pull-one() {
    $!decoder.consume-line-chars(:$!chomp) // $!handle.get // IterationEnd
}

The // operator calls .defined to test for definedness. Buy why two calls in the common case? Because of associativity! Two added parentheses:

method pull-one() {
    $!decoder.consume-line-chars(:$!chomp) // ($!handle.get // IterationEnd)
}

Were worth a whopping 8%. At 2.09s, the 2 second mark was in sight.

Good idea, but…

My next idea for an improvement was a long-planned change to the way that simple for loops are compiled. With for being defined in terms of map, this is also how it had been implemented. However, for simple cases, we can just compile:

for some-iteratable { blah }

Not into:

some-iterable.map({ blah }).sink-all;

But instead in to something more like:

my \i = some-iterable.iterator;
while (my \v = i.pull-one) !== IterationEnd {
    blah
}

Why is this an advantage? Because – at least in theory – now the pull-one and loop body should become possible to inline. This is not the case if we call map, since that is used with dozens of different closures and iterator types. Unfortunately, however, due to limitations in MoarVM’s specializer, it was not actually possible to achieve this inlining even after the change. In short, because we don’t handle inlining of closure-y things, and the way the on-stack replacement works means the optimizer is devoid of type information to have a chance to doing better with pull-one. Both of these are now being investigated, but were too big to take on as part of this work.

Even without those larger wins being possible (when they are, we’ll achieve a tasty near-100% inlining rate in this benchmark), it brought the time down to the 2.00s mark. Here’s the patch.

Optimizing line separation and decoding

Profiling at the C level (using callgrind) showed up some notable hot spots in the string handling code inside of MoarVM, which seemed to offer the chance to get further wins. At this point, I also started taking measurements of CPU instructions using callgrind too, which makes it easier to see the effects of changes that may come out as noise on a simple time measurement (even with taking a number of them and averaging).

Finding the separator happens in a couple of steps. First, individual encodings are set up to decode to the point that they see the final character of any of the line separators (noting these are configurable, and multi-char separators are allowed). Then, a second check is done to check if the multi-char separator was found. This is complicated by needing to handle the case where a separator was not found, and another read needs to be done from a file handle.

It turns out that this second pass was re-scanning the entire buffer of chars, rather than just looking close to the end of it. After checking there should not be a case where just jumping to look at the end would ever be a problem, I did the optimization and got a reduction from 18,245,144,315 instructions to 16,226,602,756, or 11%.

A further minor hot-spot was re-resolving the CRLF grapheme each time it was needed. It turned out caching that value saved around 200 million instructions. Caching the maximum separator length saved another 78 million instructions. The wallclock time now stood at 1.79s.

The identification of separators when decoding chars seemed the next place to find some savings. CPUs don’t much like having to do loops and dereferences on hot paths. To do better, I made a compact array of the final separator graphemes that could be quickly scanned through, and also introduced a maximum separator codepoint filter, which given the common case is control characters works out really quite well. These were worth 420 million and 845 million instructions respectively.

Next, I turned to the UTF-8 decoding and NFG process. A modest 56 million instruction win came from tweaking this logic given we can never be looking for a separator and have a target number of characters to decode. But a vast win came from adding a normalization fast path for the common case where we don’t have any normalization work to do. In the case we do encounter such work, we simply fall into the slow path. One nice property of the way I implemented this is that, when reading line by line, one line may cause a drop into the slow path, but the next line will start back in the fast path. This change was worth a whopping 3,200 million decrease in the instruction count. Wallclock time now stood at 1.37s.

Better memory re-use

Another look at the profile now showed malloc/free as significant costs. Could anything be done to reduce the number of those we did?

Yes, it turned out. Firstly, keeping around a decoding result data structure instead of freeing and allocating it every single line saved a handy 450 million instructions. It turned out that we were also copying the decoded chars into a new buffer when taking a line, but in the common case that buffer would contain precisely the chars that make up the line. Therefore, this buffer could simply be stolen to use as the memory for the string. Another 400 million instructions worth dropped away by a call less to malloc/free per line.

Micro-optimizations

A few futher micro-optimizations in the new UTF-8 decoding fast-path were possible. By lifting some tests out of the main loop, reading a value into a local because the compiler couldn’t figure out it was invariant, and moving some position updates so they only happen on loop exit, a further 470 million instructions were removed. If you’re thinking that sounds like a lot, this is a loop that runs every single codepoint we decode. A million line file with 60 chars per line plus a separator is 61 million iterations. These changes between them only save 7 cycles per codepoint; that just turns out to be a lot when multiplied by the number of codepoints!

The final result

With these improvements, the Perl 6 version of the benchmark now ran in 1.25s, which is just 44% of the time it used to run in. The Perl 5 version still wins, but by a factor of 1.1 times, not 2.5 times. While an amount of the changes performed during this work were specific to the benchmark in question, many were much more general. For example, the separator finding improvements will help with this benchmark in all encodings, and the code generation and specializer improvements will have far more cross-cutting effects.

Actually, not so final…

There’s still a decent amount of room for improvement yet. Once MoarVM’s specializer can perform the two inlinings it is not currently able to, we can expect a further improvement. That work is coming up soon. And beyond that, there will be more ways to shave off some instructions here and there. Another less pleasing result is that if Perl 5 is not asked to do UTF-8 decoding, this represents a huge saving. Ask Perl 6 for ASCII or Latin-1, however, however, and it’s just a small saving. This would be a good target for some future optimization work. In the meantime, these are a nice bunch of speedups to have.


Zoffix Znet: Perl 6: Seqs, Drugs, And Rock'n'Roll (Part 2)

Published on 2017-06-27T00:00:00

How to make your very own Iterator object

Perlgeek.de: Living on the (b)leading edge

Published by Moritz Lenz on 2017-06-24T22:00:01

Perl 6 is innovative in many ways, and sometimes we don't fully appreciate all the implications, for good or for bad.

There's one I stumbled upon recently: The use of fancy Unicode symbols for built-in stuff. In this case: the `.gist` output of Match objects. For example

my token word { \w+ }
say 'abc=def' ~~ /<word> '=' <word>/;
produces this output:
「abc=def」
 word => 「abc」
 word => 「def」

And that's where the problems start. In my current quest to write a book on Perl 6 regexes, I noticed that the PDF that LeanPub generates from my Markdown sources don't correctly display those pesky 「」 characters, which are

$ uni -c 「」
「 - U+0FF62 - HALFWIDTH LEFT CORNER BRACKET
」 - U+0FF63 - HALFWIDTH RIGHT CORNER BRACKET

When I copied the text from the PDF and pasted into my editor, they showed up correctly, which indicates that the characters are likely missing from the monospace font.

The toolchain allows control over the font used for displaying code, so I tried all the monospace fonts that were available. I tried them in alphabetical order. Among the earlier fonts I tried was Deja Vu Sans Mono, which I use in my terminal, and which hasn't let me down yet. No dice. I arrived at Noto, a font designed to cover all Unicode codepoints. And it didn't work either. So it turns out these two characters are part of some Noto Sans variants, but not of the monospace font.

My terminal, and even some font viewers, use some kind of fallback where they use glyphs from other fonts to render missing characters. The book generation toolchain does not.

The Google Group for Leanpub was somewhat helpful: if I could recommend an Open Source mono space font that fit my needs, they'd likely include it in their toolchain.

So I searched and searched, learning more about fonts than I wanted to know. My circle of geek friends came up with several suggestions, one of them being Iosevka, which actually contains those characters. So now I wait for others to step up, either for LeanPub to include that font, or for the Noto maintainers to create a monospace variant of those characters (and then LeanPub updating their version of the font).

And all of that because Perl 6 was being innovative, and used two otherwise little-used characters as delimiters, in an attempt to avoid collisions between delimiters and content.

(In the mean time I've replaced the two offending characters with ones that look similar. It means the example output is technically incorrect, but at least it's readable).

Zoffix Znet: Perl 6: Seqs, Drugs, And Rock'n'Roll

Published on 2017-06-20T00:00:00

Seq type and its caching mechanism

6guts: Sorting out synchronous I/O

Published by jnthnwrthngtn on 2017-06-07T23:00:26

A few weeks back, I put out a call for funding. So far, two generous individuals have stepped up to help, enabling me to spend much more time on Perl 6 than would otherwise have been possible.

First in line was Nick Logan (ugexe), who is funding 60 hours to get a longstanding I/O bug resolved. The problem, in short, has been that a synchronous socket (that is, an IO::Socket::INET instance) accepted or connected on one thread could not be read from or written to by another thread. The same has been true of synchronous file handles and processes.

The typical answer for the socket case has been to use IO::Socket::Async instead. While that’s the best answer from a scalability perspective, many people new to Perl 6 will first reach for the more familiar synchronous socket API and, having heard about Perl 6’s concurrency support, will pass those off to a thread, probably using a start block. Having that fail to work is a bad early impression.

For processes, the situation has been similar; a Proc was bound to the thread it was created on, and the solution was to use Proc::Async. For situations where dealing with more than one of the input or output streams is desired, I’d argue that it’s actually easier to deal with it correctly using Proc::Async. However, Proc has also ended up with a few features that Proc::Async has not so far offered.

Finally, there are “file handles” – that is, instances of IO::Handle. Here, we typically would get away with passing handles of ordinary files around between threads, and due to an implementation detail could get away with it. The situation of an IO::Handle backed by a TTY or pipe was much less pleasing, however, which was especially unfortunate because it afflicted all of $*IN, $*OUT, and $*ERR. The upshot was that you could only read from $*IN from the main thread. While writing to $*OUT and $*ERR worked, it was skating on thin ice (and occasionally falling through it).

How did we get into this situation?

To understand the issue, we need to take a look at the history of MoarVM I/O. In its first year of its life, MoarVM was designed and built on a pretty extreme budget – that is to say, there wasn’t one. Building platform abstractions was certainly not a good use of limited time, so a library was brought in to handle this. Initially, MoarVM used the Apache Portable Library, which served us well.

As the concurrent and parallel language features of Perl 6 came into focus, together with asynchronous I/O, it became clear that libuv – the library that provides I/O and platform abstractions for Node.js – was a good option for supporting this. Since the APR and libuv had substantially overlapping feature sets, we decided to move over to using libuv. In the months that followed, a bunch of the asynchronous features found in Perl 6 today quickly fell in to place: Proc::Async, IO::Socket::Async, IO::Notification, asynchronous timers, and signal handlers. All seemed to be going well.

Alas, we’d made a problem for ourselves. libuv is centered around event loops. Working with an event loop is conceptually simple: we provide callbacks to be run when certain things happen (like data arriving over a socket, or a timer elapsing), and enter the event loop. The loop waits for some kind of event to happen, and then invokes the correct callback. Those callbacks may lead to other callbacks being set up (for example, the callback reacting to an incoming socket connection can set up a callback to be called whenever data arrives over that socket). This is a fine low-level programming model – and a good bit nicer than dealing with things like poll sets. However, a libuv event loop runs on a single thread. And handles are tied to the libuv event loop that they were created on.

For IO::Socket::Async and Proc::Async, this is not a problem. MoarVM internally runs a single event loop for all asynchronous I/O, timers, signals, and so forth. Whenever something happens, it pushes a callback into the queue of a scheduler (most typically that provided by ThreadPoolScheduler), where the worker threads lie in wait to handle the work. Since this event loop serves as a pure dispatcher, not running any user code or even such things as Unicode decoding, it’s not really limiting to have it only on a single thread.

When the synchronous I/O was ported from the APR, none of this was in place, however. Therefore, each thread got its own libuv event for handling the synchronous I/O. At the time, of course, there wasn’t really much in the way of threading support available in Rakudo on MoarVM. Therefore, at that point in time, that an event loop was tied to a thread was not problematic. It only came to be an issue as the concurrency support in Perl 6 became mature enough that people started using it…and then running into the limitation.

Seizing an opportunity

Having to deal with this was a chance to spend some quality time improving the lower level bits of I/O in the MoarVM/NQP/Rakudo stack, which have had very little love over the last couple of years. Recently, Zoffix did a bunch of great work on the higher level parts of I/O. Most helpfully for my endeavor, he also greatly improved the test coverage of I/O, meaning I could refactor the lower level bits with increased confidence.

A while back, I took the step of fully decoupling MoarVM’s asynchronous I/O from character encoding/decoding. For some months now, MoarVM has only provided support for asynchronous byte-level I/O. It also introduced a streaming decoder, which can turn bytes into characters for a bunch of common encodings (ASCII, UTF-8, and friends). This means that while the decoding hot paths are provided by the VM, the coordination is moved up to the Perl 6 level.

With synchronous I/O, the two were still coupled, with the runtime directly offering both character level and byte level I/O. While this is in some ways convenient, it is also limiting. It blocks us from supporting user-provided encodings – at least, not in such a way that they can just be plugged in and used with a normal IO::Handle. That aside, there are also situations where one might like to access the services provided by a streaming decoder when not dealing with an I/O handle. (A streaming decoder is one you can feed bytes to incrementally and pull characters out, trusting that it will do the right thing with regard to multi-byte and multi-codepoint sequences.)

Whatever changes were going to have to happen to fix the thread limitations of synchronous I/O, it was quite clear that only having to deal with binary I/O there would make it easier. Therefore, a plan was hatched:

  1. Re-work the synchronous I/O handles to use only binary I/O, and coordinate with the VM-backed decoder to handle char I/O.
  2. Rip out the synchronous char I/O.
  3. Re-implement the remaining synchronous binary I/O, so as not to be vulnerable to the threading limitations.

Making it possible to support user-defined encodings would be a future step, but the work done in these refactors would make it a small step – indeed, one that will only require work at a Perl 6 level, not any further down the stack. While it may well end up being me that does this anyway, it’s at least now in reach for a bunch more members of the Perl 6 development team.

Sockets first

I decided to start out with sockets. From a technical perspective, this made sense because they were the most isolated; break IO::Handle and suddenly such things as say are busted, and both it and Proc are used in the pre-compilation management code too. Conveniently, sockets were also Nick’s primary interest, so it was nice to get the most important goal of the work delivered first.

The streaming decode API, while fully implemented on MoarVM, had only been partially implemented on the JVM. Therefore, to complete the work on sockets without busting them on the JVM, I had to implement the missing pieces of the VM-backed streaming decode API. This meant dealing with NIO (“New IO”), the less about which is said the better. I’m pretty sure the buffer API wasn’t designed to trip me up at every turn, but it seems to reliably manage to do so. Since file handles on the JVM would soon also come to depend on this code, it was nice to be able to get it at least straight enough to be passing all of the sockets tests, plus another set of lower-level tests that live in the NQP repository.

Refactoring the socket code itself gave a good opportunity for cleanups. The IO::Socket::INET class does the IO::Socket role, the idea being that at some point other implementations that provide things like domain sockets will also do that role, and just add the domain socket specific parts. In reviewing what methods were where, it became clear that some things that really belonged in the IO::Socket role were not, so I moved them there as part of the work. I also managed to eliminate all of the JVM-specific workarounds in the socket code along the way, which was also a happy outcome.

With that refactored, I could rip out the character I/O support from sockets in MoarVM. This left me with a relatively small body of code doing binary socket I/O using libuv, implementing synchronous socket I/O atop of its asynchronous event loop. When it comes to sockets, there are two APIs to worry about: the Berkeley/BSD/POSIX one, and Winsock. Happily, in my case, there was a lot of overlap and just a handful of places that I had to deal with divergence. The extra lines spent coping with the difference were more than paid back by not faking synchronous I/O in terms of an asynchronous event loop.

File handles next

Buoyed by this success, it was time to dig into file handles. The internals of IO::Handle were poked into from outside of the class: a little in IO::Path and more so in Proc, which was actually setting up IO::Pipe, a subclass of IO::Handle. Thankfully, with a little refactoring, the encapsulation breakage could be resolved. Then it was a case of refactoring to use the streaming decode API rather than the character I/O. This went relatively smoothly, though it also uncovered a previously hidden bug in the JVM implementation of the streaming decoder, which I got fixed up.

So, now to rip the character file I/O out of MoarVM and refactor the libuv away in synchronous file handles too? Alas, not so fast. NQP was also using these operations. Worse, it didn’t even have an IO handle object! Everything was done in terms of nqp::ops instead. So, I introduced an NQP IO handle class, gave it many of the methods that its Perl 6 big sister has, and refactored stuff to use it.

With that blocker out of the way, I could move on to sorting things out down in MoarVM. Once again, synchronous file I/O looks similar enough everywhere to not need all that much in the way of an abstraction layer. On Windows, it turned out to be important to put all of the handles into binary mode, however, since we do our own \n <-> \r\n mapping. (Also, yes, it is very much true that it’s only very similar on Windows if you use their POSIX API emulation. It’s possible there may be a performance win from not using that, but I can’t imagine it’ll be all that much.)

Not entirely standard

So, all done? Well, not quite. For the standard handles, we only used the synchronous file code path when the handle was actually a regular file. This is not the common case; it’s often a pipe or a TTY. These used a completely different code path in libuv, using its streams API instead of the files API.

Thankfully, there isn’t much reason to retain this vast implementation difference. With a little work on EOF handling, and re-instating the faking of tell, it was possible to use the same code I had written to handle regular files. This worked out very easily on Linux. On Windows, however, a read operation would blow up if reading from the console.

Amazingly, the error string was “out of space”, which was a real head-scratcher given it was coming from a read operation! It turned out that the error string was a tad misleading; the error code is #defined as ENOMEM, so a better error string would have been “out of memory”. That still made little sense. So I went digging into the native console APIs on Windows, and discovered that reads from the console are allocated out of a 64KB buffer, which is also used for various other things. 64KB should be enough for anybody, I guess. Capping read requests to a console on Windows to 16KB was enough to alleviate this.

Job done!

At least, for IO::Socket::INET and IO::Handle, which now are not tied to a thread on HEAD of MoarVM/NQP/Rakudo. The work on processes is ongoing, although due to be completed in the coming week. So, I’ll talk about that in my next post here.


Perlgeek.de: Perl 6 Books Landscape in June 2017

Published by Moritz Lenz on 2017-06-07T22:00:01

There are lots of news around Perl 6 books to share these days. If you follow the community very closely, you might be aware of most of it. If not, read on :-).

Think Perl 6 is now available for purchase, and also for download as a free ebook. Heck, it's even Open Source, with the LaTeX sources on GitHub!

Perl 6 at a Glance, previously only available in print form, is now available as an ebook. Save paper and shipping costs!

My own book, Perl 6 Fundamentals, is now in the "production" phase: copyediting, indexing, layout. And just before the manuscript submission deadline, Larry Wall has contributed a foreword. How awesome is that?

I've revamped perl6book.com to provide a short overview of the current and future Perl 6 books. As a small gimmick, it contains a flow chart explaining which book to chose. And I even got input from two other Perl 6 book authors (Laurent Rosenfeld of "Think Perl 6", Andrew Shitov of "Perl 6 at a Glance", "Migrating to Perl 6".

From a pull request to perl6book.com, it looks like Andrew Shitov is working on two more Perl 6 books. Keep 'em coming!

Last but not least, Gabor Szabo has started a crowd funding campaign for a Perl 6 book on web app development. There are still a few day left, so you can help it succeed!

And as always, if you want to keep informed about Perl 6 books, you can sign up at perl6book.com for my Perl 6 books mailing list (low volume, typically less than one email per month).

samcv: Grant Status Update 1

Published on 2017-06-02T07:00:00

This is my first grant progress report for my Perl Foundation grant entitled "Improving the Robustness of Unicode Support in Rakudo on MoarVM".

I was not able to work quite as many hours as I would have liked this month, but I still made quite a lot of progress.

Improvement for Tests

Merged In

In Roast there is a new version of GraphemeBreakTest.t.

The script tests the contents of each grapheme individually from the GraphemeClusterBreak.txt file from the Unicode 9.0 test suite.

Previously we only checked the total number of ‘.chars’ each for the string as a whole. Obviously we want something more precise than that, since the test specifies the location of each of the breaks between codepoints. The new code checks that codepoints are put in the correct graphemes in the proper order. In addition we also check the string length as well.

This new test uses a grammar to parse the file and generally is much more robust than the previous script.

Running the parse class generates an array of arrays. The index of the outer array indicates the grapheme, while the inner arrays indicate which codepoints should be in that grapheme.

[[10084, 776], [9757]]

The array above would indicate that the 1st grapheme is made up of codepoint's 10084 and 776 while the 2nd grapheme is made up codepoint 9757. This allows us to easily test the contents of each grapheme.

The array shown above corresponds to the following line from the Unicode data file:

÷ 2764 × 0308 ÷ 261D ÷ where × means break and ÷ means no-break

Work in Progress

I have some currently unmerged tests which need to wait to be merged, although sections of it are complete and are being incorporated into the larger Unicode Database Retrofit, reusing this code.

I have written grammars and modules to process and provide data on the PropertyValueAliases and PropertyAliases. They will be used for testing that all of the canonical property names and all the property values themselves properly resolve to separate property codes, as well as that they are usable in regex.

Work on the Unicode Database Retrofit

As part of my grant work I am working on making Unicode property values distinct per property, and also on allowing all canonical Unicode property values to work. For a background on this see my previous post about Unicode Property Names. The WIP generated code can be seen in this gist here and was generated from UCD-gen.p6. The code resolves property name and property value command line arguments and matches them with property codes and property value codes. It is also case insensitive and ignores underscores as Unicode spec says is permissible. In addition it is also deduplicated, meaning we only store one hash per unique set of property values.

For example: Script and Script_Extensions both have the same values, so we don't store these more than once; likewise for the Boolean property values. The C program resolves the property string to a unique property code, and from there is able to look up the property value code. Note: aside from the property values which specify the lack of a property, these codes are internal and have no relation to the Unicode spec, for example Grapheme_Cluster_Break=Other is designated as property value 0.

Docs

I've also started adding some documentation to my Unicode-Grant wiki with information about what is enclosed in each Unicode data file; there are a few other pages as well. This wiki is planned to be expanded to have many more sections than it does currently. https://github.com/samcv/Unicode-Grant/wiki/All-Unicode-Files

Future Work

Next I must integrate the property name/value alias resolving code with UCD-gen.p6. UCD-gen.p6 already has a mostly functional Unicode database with a fair number of properties. When these two are integrated, the next step will be to start integrating it with the MoarVM codebase, making any changes to MoarVM or the database retrofit codebase as needed along the way.

I will also be exploring ways of compressing the mapping of codepoints to unique combinations of Unicode property data in the bitfield. Due to the vast number of codepoints within Unicode, currently the mapping of codepoints to rows in the bitfield takes up many times more space than the actual property value data itself.

For compressing the Unicode names, it is planned to use base 40 encoding with some additional tricks to save additional space for repeated words. I plan on making a blog post where I go into the details of the compression scheme.

I am considering also rolling in the ignorecase/ignoremark bug into my grant. Even though it was not originally planned to be part of the Grant, I think it is important enough to warrant inclusion. Currently, using regex using both ignorecase and ignoremark together is completely broken.

Note

The work described above has been commited to the two repositories as listed below (in addition to the test work described which was merged into Roast).

https://github.com/samcv/UCD

https://github.com/samcv/Unicode-Grant

brrt to the future: Function call return values

Published by Bart Wiegmans on 2017-05-25T19:02:00

Hi there, it's been about a month since last I wrote on the progress of the even-moar-jit branch, so it is probably time for another update.

Already two months ago I wrote about adding support for function calls in the expression JIT compiler. This was a major milestone as calling C functions is essential for almost everything that is not pure numerical computing. Now we can also use the return values of function calls (picture that!) The main issue with this was something I've come to call the 'garbage restore' problem, by which I mean that the register allocator would attempt to 'restore' an earlier, possibly undefined, version of a value over a value that would result from a function call.

This has everything to do with the spill strategy used by the compiler. When a value has to be stored to memory (spilled) in order to avoid being overwritten and lost, there are a number of things that can be done. The default, safest strategy is to store a value to memory after every instruction that computes it and to load it from memory before every instruction that uses it. I'll call this a full spill. It is safe because it effectively makes the memory location the only 'true' storage location, with the register being merely temporary caches. It can also be somewhat inefficient, especially if the code path that forces the spill is conditional and rarely taken. In MoarVM, this happens (for instance) around memory barriers, which are only necessary when creating cross-generation object references.

That's why around function calls the JIT uses another strategy, which I will call a point spill. What I mean by that is that the (live) values which could be overwritten by the function call are spilled to memory just before the function call, and loaded back into their original registers directly after. This is mostly safe, since under normal control flow, the code beyond the function call point will be able to continue as if nothing had changed. (A variant which is not at all safe is to store the values to memory at the point, and load them from memory in all subsequent code, because it isn't guaranteed that the original spill-point-code is reached in practice, meaning that you overwrite a good value with garbage. The original register allocator for the new JIT suffered from this problem).

It is only safe, though, if the value that is to be spilled-and-restored is both valid (defined in a code path that always precedes the spill) and required (the value is actually used in code paths that follow the restore). This is not the case, for instance, when a value is the result of a conditional function call, as in the following piece of code:

1:  my $x = $y + $z;
2:  if ($y < 0) {
3:      $x = compute-it($x, $y, $z);
4:  }
5:  say "\$x = $x";

In this code, the value in $x is defined first by the addition operation and then, optionally, by the function call to compute-it. The last use of $x is in the string interpolation on line 5. Thus, according to the compiler, $x holds a 'live' value at the site of the function call on line 3, and so to avoid it from being overwritten, it must be spilled to memory and restored. But in fact, loading $x from memory after compute-it would directly overwrite the new value with the old one.

The problem here appears to be that when the JIT decides to 'save' the value of $x around the function call, it does not take into account that - in this code path - the last use of the old value of $x is in fact when it is placed on the parameter list to the compute-it call. From the perspective of the conditional branch, it is only the new value of $x which is used on line 5. Between the use on the parameter list and the assignment from the return value, the value of $x is not 'live' at all. This is called a 'live range hole'. It is then the goal to find these holes and to make sure a value is not treated as live when it is in fact not.

I used an algorithm from a paper by Wimmer and Franz (2010) to find the holes. However, this algorithm relies on having the control flow structure of the program available, which usually requires a separate analysis step. In my case that was fortunately not necessary since this control flow structure is in fact generated by an earlier step in the JIT compilation process, and all that was necessary is to record it. The algorithm itself is really simple and relies on the following ideas:

I think it goes beyond the scope of this blog post to explain how it works in full, but it is really not very complicated and works very well. At any rate, it was sufficient to prevent the JIT from overwriting good values with bad ones, and allowed me to finally enable functions that return values, which is otherwise really simple.

When that was done, I obviously tried to use it and immediately ran into some bugs. To fix that, I've improved the jit-bisect.pl script, which wasn't very robust before. The jit-bisect.pl script uses two environment variables, MVM_JIT_EXPR_LAST_FRAME and MVM_JIT_EXPR_LAST_BB, to automatically find the code sequence where the expression compiler fails and compiles wrong code. (These variables tell the JIT compiler to stop running the expression compiler after a certain number of frames and basic blocks. If we know that the program fails with N blocks compiled, we can use binary search between 0 and N to find out which frame is broken). The jit-dump.pl script then provides disassembled bytecode dumps that can be compared and with that, it is usually relatively easy to find out where the JIT compiler bug is.

With that in hand I've spent my time mostly fixing existing bugs in the JIT compiler. I am now at a stage in which I feel like most of the core functionality is in place, and what is left is about creating extension points and fixing bugs. More on that, however, in my next post. See you then!

Death by Perl6: Perl Toolchain Summit 2017 - CPAN and Perl6

Published by Nick Logan on 2017-05-25T05:41:53

At the 2017 Perl Toolchain Summit (PTS) a lot of stuff got done. This is a brief demonstration style summary of the resulting CPAN-related feature enhancements to zef.

First I should mention that now Perl6 distributions can be uploaded to CPAN (without needing to add a special Perl6/ folder), and will have their source-url automatically set or replaced with the appropriate CPAN url. Additionally App::Mi6 now has mi6 dist and mi6 upload to make the process even simpler.

Now lets get started by making sure we are using a version with features developed at PTS:

$ zef install "zef:ver(v0.1.15+)"
All candidates are currently installed  
No reason to proceed. Use --force to continue anyway  

Perl6 distributions uploaded to CPAN are now indexed. Currently the index is generated by https://github.com/ugexe/Perl6-App--ecogen and stored at https://github.com/ugexe/Perl6-ecosystems alongside a mirror of the existing p6c ecosystem. It is also enabled by default now:

$ zef list --max=10
===> Found via Zef::Repository::Ecosystems<cpan>
Inline:ver('1.2.1')  
Inline:ver('1.2')  
Inline:ver('1')  
IO::Glob:ver('0.1'):auth('github:zostay')  
Text::CSV:ver('0.007'):auth('github:Tux')  
Text::CSV:ver('0.008'):auth('github:Tux')  
Data::Selector:ver('1.01')  
Data::Selector:ver('1.02')  
NativeCall:ver('1')  
CompUnit::Repository::Mask:ver('0.0.1')  
Inline::Perl5:ver('0.26'):auth('github:niner')

$ zef info CompUnit::Repository::Mask
- Info for: CompUnit::Repository::Mask
- Identity: CompUnit::Repository::Mask:ver('0.0.1')
- Recommended By: Zef::Repository::Ecosystems<cpan>
Description:     hide installed modules for testing.  
License:     Artistic-2.0  
Source-url:     http://www.cpan.org/authors/id/N/NI/NINE/Perl6/CompUnit-Repository-Mask-0.0.1.tar.gz  
Provides: 1 modules  
Depends: 0 items

A distribution can exist in multiple "ecosystems":

$ zef search Inline::Perl5
===> Found 3 results
-----------------------------------------------------------------------------------------------------------------------
ID|From                             |Package                                       |Description  
-----------------------------------------------------------------------------------------------------------------------
1 |Zef::Repository::LocalCache      |Inline::Perl5:ver('0.26'):auth('github:niner')|Use Perl 5 code in a Perl 6 program  
2 |Zef::Repository::Ecosystems<cpan>|Inline::Perl5:ver('0.26'):auth('github:niner')|Use Perl 5 code in a Perl 6 program  
3 |Zef::Repository::Ecosystems<p6c> |Inline::Perl5:ver('0.26'):auth('github:niner')|Use Perl 5 code in a Perl 6 program  
-----------------------------------------------------------------------------------------------------------------------

Dependencies can be resolved by any and all ecosystems available, so distributions can be put on cpan and still have their dependencies that aren't get resolved:

$ zef -v install Inline::Perl5
===> Searching for: Inline::Perl5
===> Found: Inline::Perl5:ver('0.26'):auth('github:niner') [via Zef::Repository::Ecosystems<cpan>]
===> Searching for missing dependencies: LibraryMake, File::Temp
===> Found dependencies: File::Temp [via Zef::Repository::Ecosystems<p6c>]
===> Found dependencies: LibraryMake:ver('1.0.0'):auth('github:retupmoca') [via Zef::Repository::LocalCache]
===> Searching for missing dependencies: Shell::Command, File::Directory::Tree
===> Found dependencies: Shell::Command, File::Directory::Tree:auth('labster') [via Zef::Repository::Ecosystems<p6c>]
===> Searching for missing dependencies: File::Which, File::Find
===> Found dependencies: File::Find:ver('0.1'), File::Which [via Zef::Repository::Ecosystems<p6c>]

...<more output>...

In addition to CPAN we have access to CPAN testers. Garu worked with me to create a perl6 cpan testers report module: Zef::CPANReporter

$ zef install Zef::CPANReporter
===> Searching for: Zef::CPANReporter
===> Searching for missing dependencies: Net::HTTP
===> Testing: Net::HTTP:ver('0.0.1'):auth('github:ugexe')
===> Testing [OK] for Net::HTTP:ver('0.0.1'):auth('github:ugexe')
===> Testing: Zef::CPANReporter:ver('0.0.1'):auth('github:garu')
===> Testing [OK] for Zef::CPANReporter:ver('0.0.1'):auth('github:garu')
===> Installing: Net::HTTP:ver('0.0.1'):auth('github:ugexe')
===> Installing: Zef::CPANReporter:ver('0.0.1'):auth('github:garu')

# ...and in use:

$ zef -v install Grammar::Debugger
===> Searching for: Grammar::Debugger
===> Found: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn') [via Zef::Repository::Ecosystems<p6c>]
===> Searching for missing dependencies: Terminal::ANSIColor
===> Found dependencies: Terminal::ANSIColor:ver('0.3') [via Zef::Repository::Ecosystems<p6c>]
===> Fetching [OK]: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn') to /Users/ugexe/.zef/tmp/grammar-debugger.git
===> Fetching [OK]: Terminal::ANSIColor:ver('0.3') to /Users/ugexe/.zef/tmp/Terminal-ANSIColor.git
===> Testing: Terminal::ANSIColor:ver('0.3')
t/00-load.t .. ok  
All tests successful.  
Files=1, Tests=1,  0 wallclock secs  
Result: PASS  
===> Testing [OK] for Terminal::ANSIColor:ver('0.3')
Report for Terminal::ANSIColor:ver('0.3') will be available at http://www.cpantesters.org/cpan/report/a9ed11ac-4108-11e7-9b92-c8514edb94d5  
===> Testing: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
t/debugger.t .. ok  
t/ltm.t ....... ok  
t/tracer.t .... ok  
All tests successful.  
Files=3, Tests=3,  1 wallclock secs  
Result: PASS  
===> Testing [OK] for Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
Report for Grammar::Debugger:ver('1.0.1'):auth('github:jnthn') will be available at http://www.cpantesters.org/cpan/report/ab9c911c-4108-11e7-a777-ca182bdd3934  
===> Installing: Terminal::ANSIColor:ver('0.3')
===> Install [OK] for Terminal::ANSIColor:ver('0.3')
===> Installing: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
===> Install [OK] for Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')

This work (and my attendance) was made possible by a bunch of great perl companies and people:

Booking.com, ActiveState, cPanel, FastMail, MaxMind, Perl Careers, MongoDB, SureVoIP, Campus Explorer, Bytemark, CAPSiDE, Charlie Gonzalez, Elastic, OpusVL, Perl Services, Procura, XS4ALL, Oetiker+Partner.

samcv: Unicode Property Names

Published on 2017-05-21T07:00:00

Currently when you do: 'word' ~~ /<:Latin>/, MoarVM looks in a hash which contains all of the property values and looks up what property name it is associated with. So in this case it looks up Latin, and then finds it is related to the Script property.

There is a longstanding issue in MoarVM. The Unicode database of MoarVM was created with the incorrect assumption that Unicode property values were distinct. As part of my work on the Unicode Grant this is one of the issues I am tackling. So to be better informed I generated a list of all of the overlaps. I won't paste it here because it is very long, but if you want to see a full list see my post here.

There are 68 property values which belong to multiple property names and an additional 126 that are shared between Script and Block properties.

In addition we must also make sure that we check overlap between property names and the values themselves.

Here are all of the property names that conflict with values

«« IDC Conflict with property name [blk]  is a boolean property
«« VS Conflict with property name [blk]  is a boolean property
«« White_Space Conflict with property name [bc]  is a boolean property
«« Alphabetic Conflict with property name [lb]  is a boolean property
«« Hyphen Conflict with property name [lb]  is a boolean property
«« Ideographic Conflict with property name [lb]  is a boolean property
«« Lower Conflict with property name [SB]  is a boolean property
«« STerm Conflict with property name [SB]  is a boolean property
«« Upper Conflict with property name [SB]  is a boolean property

Luckily these are all Bool properties and so we don't need to worry about anything complicated there.

A fun fact, currently the only reason ' ' ~~ /<:space>/ matches is because space resolves as Line_Break=space. In fact, it should resolve as White_Space=True. Luckily space character and a few others have Line_Break=space, though this does not work properly "\n" ~~ /<:space>/. I will note though, that using <:White_Space> does work properly, as it resolves to the property name.

I would make Bool properties to be 0th in priority

Then similar to other regex engines, we will allow you to designate General_Category and Script unqualified. <:Latin> <:L> # unqualified <:Script<Latin>> <:General_Category<L>> # qualified

I propose a heirarchy as follows

The following below I have not decided if we want to guarantee them but they should be a part of the internal hierarchy

We should resolve also Numeric_Type so that people can use <:Numeric> in their regex (I'm sure that there must already exist code where this is used so we need to make sure this is resolved as well).

In actuality this resolves as Numeric_Type != None. So this is covered under rule 0.

I am open to adding whichever properties people think most important to the ordered priority list as well. Due to how things are setup in MoarVM/NQP I will need to come up with some hierarchy to resolve all the properties. In addition to this we will have a Guaranteed list, where specs will specify that using them unqualified are guaranteed to work.

The ones property value names with overlap remaining after the proposed list above:

NU => ["Word_Break", "Line_Break", "Sentence_Break"],
NA => ["Age", "Hangul_Syllable_Type", "Indic_Positional_Category"],
E => ["Joining_Group", "Jamo_Short_Name"],
SP => ["Line_Break", "Sentence_Break"],
CL => ["Line_Break", "Sentence_Break"],
D => ["Jamo_Short_Name", "Joining_Type"],
Narrow => ["East_Asian_Width", "Decomposition_Type"],
NL => ["Word_Break", "Line_Break"],
Wide => ["East_Asian_Width", "Decomposition_Type"],
Hebrew_Letter => ["Word_Break", "Line_Break"],
U => ["Jamo_Short_Name", "Joining_Type"],
LE => ["Word_Break", "Sentence_Break"],
Close => ["Bidi_Paired_Bracket_Type", "Sentence_Break"],
BB => ["Jamo_Short_Name", "Line_Break"],
HL => ["Word_Break", "Line_Break"],
Maybe => ["NFKC_Quick_Check", "NFC_Quick_Check"],
FO => ["Word_Break", "Sentence_Break"],
H => ["East_Asian_Width", "Jamo_Short_Name"],
Ambiguous => ["East_Asian_Width", "Line_Break"],

Any ideas above adding further to the hierarchy (even if they don't have any overlap presently [Unicode 9.0] it could be introduced later) will be appreciated. Either comment on this Github issue or send me an email (address at bottom of the page).

rakudo.org: Announce: Rakudo Star Release 2017.04

Published by Steve Mynott on 2017-05-01T15:35:01

A useful and usable production distribution of Perl 6

On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the April 2017 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the April 2017 release is available from https://rakudo.perl6.org/downloads/star/.

Binaries for macOS and Windows (64 bit) are also available.

This is the seventh post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

This release includes “zef” as module installer. “panda” is to be shortly replaced by “zef” and will be removed in the near future.

It’s hoped to produce quarterly Rakudo Star releases during 2017 with 2017.07 (July) and 2017.10 (October) to follow.

Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

In the Perl 6 world, we make a distinction between the language (“Perl 6”) and specific implementations of the language such as “Rakudo Perl”.

This Star release includes [release 2017.04.3] of the Rakudo Perl 6 compiler, version 2017.04-53-g66c6dda of MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

The Rakudo compiler changes since the last Rakudo Star release of 2017.01 are now listed in “2017.02.md” and “2017.04.md” under the “rakudo/docs/announce” directory of the source distribution.

In particular this release featured many important improvements to the IO subsystem thanks to Zoffix and the support of the Perl Foundation.

Please see
Part 1: http://rakudo.org/2017/04/02/upgrade
Part 2: http://rakudo.org/2017/04/03/part-2
Part 3: http://rakudo.org/2017/04/17/final-notes

Note there were point releases of 2017.04 so also see “2017.04.1.md”, “2017.04.2.md” and “2017.04.3.md”.

Notable changes in modules shipped with Rakudo Star:

+ DBIish: New version with pg-consume-input
+ doc: Too many to list. Large number of “IO Grant” doc changes.
+ json\_fast: Too many to list. Big performance improvements.
+ perl6-lwp-simple: Fix for lexical require and incorrect regex for absolute URL matcher
+ test-mock: Enable concurrent use of mock objects
+ uri: Encoding fixes
+ zef: Too many to list. IO fixage.

There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

+ advanced macros
+ non-blocking I/O (in progress)
+ some bits of Synopsis 9 and 11
+ There is an online resource at http://perl6.org/compilers/features that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at rakudobug@perl.org.

See https://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org mailing list, or join us on IRC #perl6 on freenode.

brrt to the future: Letting templates do what you mean

Published by Bart Wiegmans on 2017-04-30T22:12:00

Hi everybody, today I'd like to promote a minor, but important improvement in the 'expression template compiler' for the new JIT backend. This is a tool designed to make it easy to develop expression templates, which are themselves a way to make it easy to generate the 'expression tree' intermediate representation used by the new JIT backend. This is important because MoarVM instructions operate on a perl-like level of abstraction - single instructions can perform operations such as 'convert object to string', 'find first matching character in string' or 'access the last element of an array'. Such operations require rather more instructions to represent as machine code.

This level of abstraction is rather convenient for the rakudo compiler, which doesn't have to consider low-level details when it processes your perl6 code. But it is not very convenient for the JIT compiler which does. The 'expression' intermediate representation is designed to be much closer to what hardware can support directly. Basic operations include loading from and storing to memory, memory address computation, integer arithmetic, (conditional) branching, and function calls. At some point in the future, floating point operations will also be added. But because of this difference in abstraction level, a single MoarVM instruction will often map to many expression tree nodes. So what is needed is an efficient way to convert between the two representations, and that is what expression templates are supposed to do.

Expression templates are very much like the expression tree structure itself, in that both are represented as arrays of integers. Some of the elements represent instructions, some are constants, and some are references (indexes into the same array), forming a directed acyclic graph (not a tree). The only difference is that the template is associated with a set of instructions that indicate how it should be linked into the tree. (Instruction operands, i.e. the data that each instruction operates on, are prepared and linked by the template application process as well).

Surprisingly, arrays of integers aren't a very user-friendly way to write instruction templates, and so the template compiler was born. It takes as input a text file with expression templates defined as symbolic expressions, best known from the LISP world, and outputs a header file that contains the templates, ready for use by the JIT compiler. Note that the word 'template' has become a bit overloaded, referring to the textual input of the template compiler as well as to the binary input to the JIT compiler. That's okay, I guess, since they're really two representations of the same thing. The following table shows how template text, binary, and expression tree relate to each other:

Text 'Binary'Tree

(template: unless_i
(when
(zr $0)
(branch (label $1))
))

template: {
MVM_JIT_ZR,
0,
MVM_JIT_LABEL,
1,
MVM_JIT_BRANCH,
2,
  MVM_JIT_WHEN,
  0,
  4,
},
info: ".f.f.l.ll",
len: 9,
root: 6

I hope it isn't too hard to see how one maps to the other. The unless_i instruction executes a branch if its integer argument is zero, specified by a constant as its second argument. All symbols (like when, label and zr) have been replaced by uppercase prefixed constants (MVM_JIT_WHEN), and all nesting has been replaced by references (indexes) into the template array. The 'info' string specifies how the template is to be linked into the tree. Instruction operands are indicated by an 'f', and internal links by an 'l'. In the tree representation the operands have been linked into the tree by the JIT; they form the LOAD and CONST nodes and everything below them.

Anyway, my improvement concerns a more complex form of template, such as the following example, an instruction to load an object value from the instance field of an object:

(template: sp_p6oget_o
(let: (($val (load (add (^p6obody $1) $2) ptr_sz)))
(if (nz $val) $val (^vmnull))))

This template contains a let: expression, which declares the $val variable. This value can be used in the subsequent expression by its name. Without such declarations the result of a computation could only have one reference, its immediate syntactic parent. (Or in other words, without let:, every template can only construct a tree). That is very inconvenient in case a result should be checked for null-ness, as in this case. (vmnull is a macro for the global 'null object' in MoarVM. The null object represents NULL wherever an object is needed, but isn't actually NULL, as that would mean it couldn't be dereferenced; it saves the interpreter from checking if a pointer to an object is NULL everywhere it is accessed).

The let: construct has another purpose: it ensures the ordering of operations. Although most operations can be ordered in whatever way suits the compiler, some do not, most notably function calls. (Function calls may have numerous unpredictable side effects, after all). All statements declared in the 'let declaration body' are compiled to run before any statements in the 'expression body'. This enables the programmer to ensure that a value is not needlessly computed twice, and more importantly, it ensures that a value that is used in multiple branches of a conditional statement is defined in both of them. For instance:


(let (($foo (...)))
(if (...)
(load $foo)
$foo))

This pseudo-snippet of template code would dereference $foo if some condition is met (e.g. $foo is not NULL) and returns $foo directly otherwise. Without let to order the computation of $foo prior to the blocks of if, the first (conditional) child of if would be the first reference to $foo. That would mean that the code to compute $foo is only compiled in the first conditional block, which would not be executed whenever the if condition was not true, meaning that $foo would be undefined in the alternative conditional block. This would mean chaos. So in fact let does order expressions. All is good.

Except... I haven't told you how this ordering works, which is where my change comes in. Prior to commit 7fb1b10 the let expression would insert a hint to the JIT compiler to add the declared expressions as tree roots. The 'tree roots' are where the compiler starts converting the expression tree (graph) to a linear sequence of byte code. Hence the declaring expressions are compiled prior to the dependent expressions. But this has, of course, one big disadvantage, which is that the set of roots is global for the tree. Every declaration, no matter how deep into the tree, was to be compiled prior to the head of the tree. As a result, the following template code would not at all do what you want:


(let ($foo (...))
(if (nz $foo)
(let (($bar (load $foo))) # dereference $foo !
(... $bar))
...)


The declaration of $bar would cause $foo to be dereferenced prior to checking whether it is non-null, causing a runtime failure. Chaos is back. Well, that's what I've changed. Fortunately, we have another ordering mechanism at our disposal, namely DO lists. These are nodes with a variable number of children that are also promised to be compiled in order. After the patch linked above, the compiler now transforms let expressions into the equivalent DO expressions. Because DO expressions can be nested safely, $bar is not computed prior to the null-check of $foo, as the programmer intended. I had originally intended to implement analysis to automatically order the expressions with regard to the conditionals, but I found that this was more complicated to implement and more surprising to the programmer. I think that in this case, relying on the programmer is the right thing.

One thing that I found interesting is that this reduces the number of mechanisms in the compiler. The 'root-hint' was no longer useful, and subsequently removed. At the same time, all but the last child of a DO list must be void expressions, i.e. yield no value, because DO can only return the value of its last child. Since all expressions in a let declaration must yield some value - otherwise they would be useless - they required a new operation type: discard. Thus with a new node type (extension of data range) we can remove a class of behavior.

After I had implemented this, I've started working on adding basic block analysis. That is a subject for a later post, though. Until next time!

Strangely Consistent: The root of all eval

Published by Carl Mäsak

Ah, the eval function. Loved, hated. Mostly the latter.

$ perl -E'my $program = q[say "OH HAI"]; eval $program'
OH HAI

I was a bit stunned when the eval function was renamed to EVAL in Perl 6 (back in 2013, after spec discussion here). I've never felt really comfortable with the rationale for doing so. I seem to be more or less alone in this opinion, though, which is fine.

The rationale was "the function does something really weird, so we should flag it with upper case". Like we do with BEGIN and the other phasers, for example. With BEGIN and others, the upper-casing is motivated, I agree. A phaser takes you "outside of the normal control flow". The eval function doesn't.

Other things that we upper-case are things like .WHAT, which look like attributes but are really specially code-generated at compile-time into something completely different. So even there the upper-casing is motivated because something outside of the normal is happening.

eval in the end is just another function. Yes, it's a function with potentially quite wide-ranging side effects, that's true. But a lot of fairly standard functions have wide-ranging side effects. (To name a few: shell, die, exit.) You don't see anyone clamoring to upper-case those.

I guess it could be argued that eval is very special because it hooks into the compiler and runtime in ways that normal functions don't, and maybe can't. (This is also how TimToady explained it in the commit message of the renaming commit.) But that's an argument from implementation details, which doesn't feel satisfactory. It applies with equal force to the lower-cased functions just mentioned.

To add insult to injury, the renamed EVAL is also made deliberately harder to use:

$ perl6 -e'my $program = q[say "OH HAI"]; EVAL $program'
===SORRY!=== Error while compiling -e
EVAL is a very dangerous function!!! (use the MONKEY-SEE-NO-EVAL pragma to override this error,
but only if you're VERY sure your data contains no injection attacks)
at -e:1
------> program = q[say "OH HAI"]; EVAL $program⏏<EOL>

$ perl6 -e'use MONKEY-SEE-NO-EVAL; my $program = q[say "OH HAI"]; EVAL $program'
OH HAI

Firstly, injection attacks are a real issue, and no laughing matter. We should educate each other and newcomers about them.

Secondly, that error message ("EVAL is a very dangerous function!!!") is completely over-the-top in a way that damages rather than helps. I believe when we explain the dangers of code injection to people, we need to do it calmly and matter-of-factly. Not with three exclamation marks. The error message makes sense to someone who already knows about injection attacks; it provides no hints or clues for people who are unaware of the risks.

(The Perl 6 community is not unique in eval-hysteria. Yesterday I stumbled across a StackOverflow thread about how to turn a string with a type name into the corresponding constructor in JavaScript. Some unlucky soul suggested eval, and everybody else immediately piled on to point out how irresponsible that was. Solely as a knee-jerk reaction "because eval is bad".)

Thirdly, MONKEY-SEE-NO-EVAL. Please, can we just... not. 😓 Random reference to monkies and the weird attempt at levity while switching on a nuclear-chainsaw function aside, I find it odd that a function that enables EVAL is called something with NO-EVAL. That's not Least Surprise.

Anyway, the other day I realized how I can get around both the problem of the all-caps name and the problem of the necessary pragma:

$ perl6 -e'my &eval = &EVAL; my $program = q[say "OH HAI"]; eval $program'
OH HAI

I was so happy to realize this that I thought I'd blog about it. Apparently the very dangerous function (!!!) is fine again if we just give it back its old name. 😜

rakudo.org: PART 3: Information on Changes Due to IO Grant Work

Published by Zoffix Znet on 2017-04-17T20:22:46

The IO grant work is at its wrap up. This note lists some of the last-minute changes to the plans delineated in earlier communications ([1], [2], [3]). Most of the listed items do not require any changes to users’ code.

Help and More Info

If you need help or more information, please join our IRC channel and ask there. You can also contact the person performing this work via Twitter @zoffix or by talking to user Zoffix in our dev IRC channel

brrt to the future: Function Call Milestone

Published by Bart Wiegmans on 2017-03-28T16:14:00

Hi everybody. It's high time for another update, and this time I have good news. The 'expression' JIT compiler can now compile native ('C') function calls (although it's not able to use the results). This is a major milestone because function calls are hard! (At least from the perspective of a compiler, and especially from the perspective of the register allocator). Also because native function calls are really very important in MoarVM. Most of its 'primitive' operations (like hash table access, string equality, big integer arithmetic) are implemented by invoking native functions, and so to compile almost any program the JIT has to compile many function calls.

What makes function calls 'hard' is that they must implement the 'calling convention' of the relevant 'application binary interface' (ABI). In short, the ABI specifies the locations of function call parameters.  A small number of parameters (on Windows, the first 4, for POSIX platforms, the first 6) are placed in registers, and if there are more parameters they are usually placed on the stack. Aside from the calling convention, the ABI also specifies the expected alignment of the stack pointer (per 16 bytes) and the registers a functions may overwrite (clobber in ABI-speak) and which registers must have their original values after the function returns. The last type of registers are called 'callee-saved'. Note that at least a few registers must be callee-saved, especially those related to call stack management, because if the callee function would overwrite those it would be impossible to return control back to the caller. By the way, manipulating exactly those registers is how the setjmp and longjmp 'functions' work.

So the compiler is tasked with generating code that ensures the correct values are placed in the correct registers. That sounds easy enough, but what if the these registers are taken by other values, and what if those other values might be required for another parameter? Indeed, what if the value in the %rdx register needs to be in the %rsi register, and the value of the %rsi register is required in the %rdx register? How to determine the correct ordering for shuffling the operands?

One simple way to deal with this would be to eject all values from registers onto the stack, and then to load the values from registers if they are necessary. However, that would be very inefficient, especially if most function calls have no more than 6 (or 4) parameters and most of these parameters are computed for the function call only. So I thought that solution wouldn't do.

Another way to solve this would be if the register allocator could ensure that values are placed in their correct registers directly,- especially for register parameters -  i.e. by 'precoloring'. (The name comes from register allocation algorithms that work by 'graph coloring', something I will try to explain in a later post). However, that isn't an option due to my choice of 'linear scan' as the register allocation algorithm. This is a 'greedy' algorithm, meaning that it decides the allocation for a live range as soon as it encounters them, and that it cannot revert that decision once it's been made. (If it could, it would be more like a dynamic programming algorithm). So to ensure that the allocation is valid I'd have to make sure that the information about register requirements is propagated backwards from the instructions to all values that might conflict with it... and that point we're no longer talking about linear scan, and I would be better off re-engineering a new algorithm. Not a very attractive option either!

Instead, I thought about it and it occurred to me that this problem seems a lot like unravelling a dependency graph, with a number of restrictions. That is to say, it can be solved by a topological sort. I map the registers to a graph structure as follows:

I linked to the topological sort page for an explanation of the problem, but I think my implementation is really quite different from that presented there. They use a node visitation map and a stack, I use an edge queue and and outbound count. A register transfer (edge) can be enqueued if it is clear that the destination register is not currently used. Transfers from registers to stack locations (as function call parameters) or local memory (to save the value from being overwritten by the called function) are also enqueued directly. As soon as the outbound count of a node reaches zero, it is considered to be 'free' and the inbound edge (if any) is enqueued.


Unlike a 'proper' dependency graph, cycles can and do occur, as in the example where '%rdx' and '%rsi' would need to swap places. Fortunately, because of the single-inbound edge rule, such cycles are 'simple' - all outbound edges not belonging to the cycle can be resolved prior to the cycle-breaking, and all remaining edges are part of the cycle. Thus, the cycle can always be broken by freeing just a single node (i.e. by copy to a temporary register).

The only thing left to consider are the values that are used after the function call returns (survive the function call) and that are stored in registers that the called function can overwrite (which is all of them, since the register allocator never selects callee-saved registers). So to make sure they are available afterwards, we must spill them. But there are a few spill strategies to choose from (terminology made up by me):

The current register allocator does a full spill when it's run out of registers, and it would make some sense to apply the same logic for function-call related spills. I've decided to use spill-and-restore, however, because a full spill complicates the sorting order (a value that used to be in a register is suddenly only in memory) and it can be wasteful, especially if the call only happens in an alternative branch. This is common for instance when assigning values to object fields, as that may sometimes require a write barrier (to ensure the GC tracks all references from 'old' to 'new' objects). So I'm guessing that it's going to be better to pay the cost of spilling and restoring only in those alternative branches, and that's why I chose to use spill-and-restore.

That was it for today. Although I think being able to call functions is a major milestone, this is not the very last thing to do. We currently cannot allocate any of the registers used for floating-point calculations, which is a relatively minor limitation since those aren't used very frequently. But I also need to do some more work to actually use function return values and apply generic register requirements of tiles. But I do think the day is coming near where we can start thinking about merging the new JIT with the MoarVM master branch, making it available to everybody. Until next time!

rakudo.org: PART 2: Upgrade Information for Changes Due to IO Grant Work

Published by Zoffix Znet on 2017-04-03T00:15:07

We’re making more changes!

Do the core developers ever sleep? Nope! We keep making Perl 6 better 24/7!

Why?

Not more than 24 hours ago, you may have read Upgrade Information for Changes Due to IO Grant Work. All of that is still happening.

However, it turned out that I, (Zoffix), had an incomplete understanding of how changes in 6.d language will play along with 6.c stuff. My original assumption was we could remove or change existing methods, but that assumption was incorrect. Pretty much the only sane way to incompatibly change a method in an object in 6.d is to add a new method with a different name.

Since I rather us not have, e.g. .child and .child-but-secure, for the next decade, we have a bit of an in-flight course correction:

ORIGINAL PLAN was to minimize incompatibilities with existing 6.c language code; leave everything potentially-breaking for 6.d

NEW PLAN is to right away add everything that does NOT break 6.c-errata specification, into 6.c language; leave everything else for 6.d. Note that current 6.c-errata specification for IO is sparse (the reason IO grant is running in the first place), so there’s lots of wiggle room to make most of the changes in 6.c.

When?

I (Zoffix) still hope to cram all the changes into 2017.04 release. Whether that’s overly optimistic, given the time constraints… we’ll find out on April 17th. If anything doesn’t make it into 2017.04, all of it definitely will be in 2017.05.

What?

Along with the original list in first Upgrade Information Notice, the following changes may affect your code. I’m excluding any non-conflicting changes.

Potential changes:

Changes for 6.d language:

Help and More Info

If you need help or more information, please join our IRC channel and ask there. You can also contact the person performing this work via Twitter @zoffix or by talking to user Zoffix in our dev IRC channel

rakudo.org: Upgrade Information for Changes Due to IO Grant Work

Published by Zoffix Znet on 2017-04-02T08:31:49

As previously notified, there are changes being made to IO routines. This notice is to provide details on changes that may affect currently-existing code.

When?

Barring unforeseen delays, the work affecting version 6.c language is planned to be included in 2017.04 Rakudo Compiler release (planned for release on April 17, 2017) on which next Rakudo Star release will be based.

Some or all of the work affecting 6.d language may also be included in that release and will be available if the user uses use v6.d.PREVIEW pragma. Any 6.d work that doesn’t make it into 2017.04 release, will be included in 2017.05 release.

If you use development commits of the compiler (e.g. rakudobrew), you will
receive this work as-it-happens.

Why?

If you only used documented features, the likelihood of you needing to change any of your code is low. The 6.c language changes due to IO Grant work affect either routines that are rarely used or undocumented routines that might have been used by users assuming they were part of the language.

What?

This notice describes only changes affecting existing code and only for 6.c language. It does NOT include any non-conflicting changes or changes slated for 6.d language. If you’re interested in the full list of changes, you can find it in the IO Grant Action Plan

The changes that may affect existing code are:

Help and More Info

If you need help or more information, please join our IRC channel and ask there. You can also contact the person performing this work via Twitter @zoffix or by talking to user Zoffix in our dev IRC channel

Pawel bbkr Pabian: Your own template engine in 4 flavors. With Benchmarks!

Published by Pawel bbkr Pabian on 2017-02-25T23:43:36

This time on blog I'll show you how to write your own template engine - with syntax and behavior tailored for your needs. And we'll do it in four different ways to analyze pros and cons of each approach as well as code speed and complexity. Our sample task for today is to compose password reminder text for user, which can then be sent by email.

use v6;

my $template = q{
Hi [VARIABLE person]!

You can change your password by visiting [VARIABLE link] .

Best regards.
};

my %fields = (
'person' => 'John',
'link' => 'http://example.com'
);

So we decided how our template syntax should look like and for starter we'll do trivial variables (although that's not very precise name because variables in templates are almost always immutable).
We also have data to populate template fields. Let's get started!

1. Substitutions

sub substitutions ( $template is copy, %fields ) {
    for %fields.kv -> $key, $value {
        $template ~~ s:g/'[VARIABLE ' $key ']'/$value/;
    }
    return $template;
}

say substitutions($template, %fields);

Yay, works:

    Hi John!

You can change your password by visiting http://example.com .

Best regards.

Now it is time to benchmark it to get some baseline for different approaches:

use Bench;

my $template_short = $template;
my %fields_short = %fields;

my $template_long = join(
' lorem ipsum ', map( { '[VARIABLE ' ~ $_ ~ ']' }, 'a' .. 'z')
) x 100;
my %fields_long = ( 'a' .. 'z' ) Z=> ( 'lorem ipsum' xx * );

my $b = Bench.new;
$b.timethese(
1000,
{
'substitutions_short' => sub {
substitutions( $template_short, %fields_short )
},
'substitutions_long' => sub {
substitutions( $template_long, %fields_long )
},
}
);

Benchmark in this post will tests two cases for each approach. Our template from example is "short" case. And there is "long" case with 62KB template containing 2599 text fragments and 2600 variables filled by 26 fields. So here are the results:

Timing 1000 iterations of substitutions_long, substitutions_short...
substitutions_long: 221.1147 wallclock secs @ 4.5225/s (n=1000)
substitutions_short: 0.1962 wallclock secs @ 5097.3042/s (n=1000)

Whoa! That is a serious penalty for long templates. And the reason for that is because this code has three serious flaws - original template is destroyed during variables evaluation and therefore it must be copied each time we want to reuse it, template text is parsed multiple times and output is rewritten every time after populating each variable. But we can do better...

2. Substitution

sub substitution ( $template is copy, %fields ) {
    $template ~~ s:g/'[VARIABLE ' (\w+) ']'/{ %fields{$0} }/;
    return $template;
}

This time we have single substitution. Variable name is captured and we can use it to get field value on the fly. Benchmarks:

Timing 1000 iterations of substitution_long, substitution_short...
substitution_long: 71.6882 wallclock secs @ 13.9493/s (n=1000)
substitution_short: 0.1359 wallclock secs @ 7356.3411/s (n=1000)

Mediocre boost. We have less penalty on long templates because text is not parsed multiple times. However remaining flaws from previous approach still apply and regexp engine still must do plenty of memory reallocations for each piece of template text replaced.

Also it won't allow our template engine to gain new features - like conditions or loops - in the future because it is very hard to parse nested tags in single regexp. Time for completely different approach...

3. Grammars and direct Actions

If you are not familiar with Perl 6 grammars and Abstract Syntax Tree concept you should study official documentation first.

grammar Grammar {
    regex TOP { ^ [  |  ]* $ }
    regex text { <-[ [ ] >+ }
    regex variable { '[VARIABLE ' $=(\w+) ']' }
}

class Actions {

has %.fields is required;

method TOP ( $/ ) {
make [~]( map { .made }, $/{'chunk'} );
}
method text ( $/ ) {
make ~$/;
}
method variable ( $/ ) {
make %.fields{$/{'name'}};
}

}

sub grammar_actions_direct ( $template, %fields ) {
my $actions = Actions.new( fields => %fields );
return Grammar.parse($template, :$actions).made;
}

The most important thing is defining our template syntax as a grammar. Grammar is just a set of named regular expressions that can call each other. On "TOP" (where parsing starts) we see that our template is composed of chunks. Each chunk can be text or variable. Regexp for text matches everything until it hits variable start ('[' character, let's assume it is forbidden in text to make things more simple). Regexp for variable should look familiar from previous approaches, however now we capture variable name in named way instead of positional.

Action class has methods that are called whenever regexp with corresponding name is matched. When called, method gets match object ($/) from this regexp and can "make" something from it. This "made" something will be seen by upper level method when it is called. For example our "TOP" regexp calls "text" regexp which matches "Hi " part of template and calls "text" method. This "text" method just "make"s this matched string for later use. Then "TOP" regexp calls "variable" regexp which matches "[VARIABLE name]" part of template. Then "variable" method is called and it checks in match object for variable name and "makes" value of this variable from %fields hash for later use. This continues until end of template string. Then "TOP" regexp is matched and "TOP" method is called. This "TOP" method can access array of text or variable "chunks" in match object and see what was "made" for those chunks earlier. So all it has to do is to "make" those values concatenated together. And finally we get this "made" template from "parse" method. So let's look at benchmarks:

Timing 1000 iterations of grammar_actions_direct_long, grammar_actions_direct_short...
grammar_actions_direct_long: 149.5412 wallclock secs @ 6.6871/s (n=1000)
grammar_actions_direct_short: 0.2405 wallclock secs @ 4158.1981/s (n=1000)

We got rid of two more flaws from previous approaches. Original template is not destroyed when fields are filled and that means less memory copying. There is also no reallocation of memory during substitution of each field because now every action method just "make"s strings to be joined later. And we can easily extend our template syntax by adding loops, conditions and more features just by throwing some regexps into grammar and defining corresponding behavior in actions. Unfortunately we see some performance regression and this happens because every time template is processed it is parsed, match objects are created, parse tree is built and it has to track all those "make"/"made" values when it is collapsed to final output. But that was not our final word...

4. Grammars and closure Actions

Finally we reached "boss level", where we have to exterminate last and greatest flaw - re-parsing.
The idea is to use grammars and actions like in previous approach, but this time instead of getting direct output we want to generate executable and reusable code that works like this under the hood:

sub ( %fields ) {
    return join '',
        sub ( %fields ) { return "Hi "}.( %fields ),
        sub ( %fields ) { return %fields{'person'} }.( %fields ),
        ...
}

That's right, we will be converting our template body to a cascade of subroutines.
Each time this cascade is called it will get and propagate %fields to deeper subroutines.
And each subroutine is responsible for handling piece of template matched by single regexp in grammars. We can reuse grammar from previous approach and modify only actions:

class Actions {
    
    method TOP ( $/ ) {
        my @chunks = $/{'chunk'};
        make sub ( %fields ) {
           return [~]( map { .made.( %fields ) }, @chunks );
        };
    }
    method text ( $/ ) {
        my $text = ~$/;
        make sub ( %fields ) {
            return $text;
        };
    }
    method variable ( $/ ) {
        my $name = $/{'name'};
        make sub ( %fields  ) {
            return %fields{$name}
        };
    }
    
}

sub grammar_actions_closures ( $template, %fields ) {
state %cache{Str};
my $closure = %cache{$template} //= Grammar.parse(
$template, actions => Actions.new
).made;
return $closure( %fields );
}

Now every action method instead of making final output makes a subroutine that will get %fields and do final output later. To generate this cascade of subroutines template must be parsed only once. Once we have it we can call it with different set of %fields to populate in our template variables. Note how Object Hash %cache is used to determine if we already have subroutines tree for given $template. Enough talking, let's crunch some numbers:

Timing 1000 iterations of grammar_actions_closures_long, grammar_actions_closures_short...
grammar_actions_closures_long: 22.0476 wallclock secs @ 45.3563/s (n=1000)
grammar_actions_closures_short: 0.0439 wallclock secs @ 22778.8885/s (n=1000)

Nice result! We have extensible template engine that is 4 times faster for short templates and 10 times faster for long templates than our initial approach. And yes, there is bonus level...

4.1. Grammars and closure Actions in parallel

Last approach opened a new optimization possibility. If we have subroutines that will generate our template why not run them in parallel? So let's modify our action "TOP" method to process text and variable chunks simultaneously:

method TOP ( $/ ) {
    my @chunks = $/{'chunk'};
    make sub ( %fields ) {
       return [~]( @chunks.hyper.map( {.made.( %fields ) } ).list );
    };
}

Such optimization will shine if your template engine must do some lengthy operations to generate chunk of final output, for example execute heavy database query or call some API. It is perfectly fine to ask for data on the fly to populate template, because in feature rich template engine you may not be able to predict and generate complete set of data needed beforehand, like we did with our %fields. Use this optimization wisely - for fast subroutines you will see a performance drop because cost of sending and retrieving chunks to/from threads will be higher that just executing them in serial on single core.

Which approach should I use to implement my own template engine?

That depends how much you can reuse templates. For example if you send one password reminder per day - go for simple substitution and reach for grammar with direct actions if you need more complex features. But if you are using templates for example in PSGI processes to display hundreds of pages per second for different users then grammar and closure actions approach wins hands down.

You can download all approaches with benchmarks in single file here.

To be continued?

If you like this brief introduction to template engines and want to see more complex features like conditions of loops implemented leave a comment under this article on blogs.perl.org or send me a private message on irc.freenode.net #perl6 channel (nick: bbkr).

brrt to the future: Register Allocator Update

Published by Bart Wiegmans on 2017-02-09T16:19:00

Hi everybody, I thought some yof you might be interested in an update regarding the JIT register allocator, which is after all the last missing piece for the new 'expression' JIT backend. Well, the last complicated piece, at least. Because register allocation is such a broad topic, I don't expect to cover all topics relevant to design decisions here, and reserve a future post for that purpose.

I think I may have mentioned earlier that I've chosen to implement linear scan register allocation, an algorithm first described in 1999. Linear scan is relatively popular for JIT compilers because it achieves reasonably good allocation results while being considerably simpler and faster than the alternatives, most notably via graph coloring (unfortunately no open access link available). Because optimal register allocation is NP-complete, all realistic algorithms are heuristic, and linear scan applies a simple heuristic to good effect. I'm afraid fully explaining the nature of that heuristic and the tradeoffs involves is beyond the scope of this post, so you'll have to remind me to do it at a later point.

Commit ab077741 made the new allocator the default after I had ironed out sufficient bugs to be feature-equivalent to the old allocator (which still exists, although I plan to remove it soon).
Commit 0e66a23d introduced support for 'PHI' node merging, which is really important and exciting to me, so I'll have to explain what it means. The expression JIT represents code in a form in which all values are immutable, called single static assignment form, or SSA form shortly. This helps simplify compilation because there is a clear correspondence between operations and the values they compute. In general in compilers, the easier it is to assert something about code, the more interesting things you can do with it, and the better code you can compile. However, in real code, variables are often assigned more than one value. A PHI node is basically an 'escape hatch' to let you express things like:

int x, y;
if (some_condition()) {
x = 5;
} else {
x = 10;
}
y = x - 3;

In this case, despite our best intentions, x can have two different values. In SSA form, this is resolved as follows:

int x1, x2, x3, y;
if (some_condition()) {
x1 = 5;
} else {
x2 = 10;
}
x3 = PHI(x1,x2);
y = x3 - 3;

The meaning of the PHI node is that it 'joins together' the values of x1 and x2 (somewhat like a junction in perl6), and represents the value of whichever 'version' of x was ultimately defined. Resolving PHI nodes means ensuring that, as far as the register allocator is concerned, x1, x2, and x3 should preferably be allocated to the same register (or memory location), and if that's not possible, it should copy x1 and x2 to x3 for correctness. To find the set of values that are 'connected' via PHI nodes, I apply a union-find data structure, which is a very useful data structure in general. Much to my amazement, that code worked the first time I tried it.

Then I had to fix a very interesting bug in commit 36f1fe94 which involves ordering between 'synthetic' and 'natural' tiles. (Tiles are the output of the tiling process about which I've written at some length, they represent individual instructions). Within the register allocator, I've chosen to identify tiles / instructions by their index in the code list, and to store tiles in a contiguous array. There are many advantages to this strategy but they are also beyond the scope of this post. One particular advantage though is that the indexes into this array make their relative order immediately apparent. This is relevant to linear scan because it relies on relative order to determine when to allocate a register and when a value is no longer necessary.

However, because of using this index, it's not so easy to squeeze in new tiles to that array - which is exactly what a register allocator does, when it decides to 'spill' a value to memory and load it when needed. (Because inserts are delayed and merged into the array a single step, the cost of insertion is constant). Without proper ordering, a value loaded from memory could overwrite another value that is still in use. The fix for that is, I think, surprisingly simple and elegant. In order to 'make space' for the synthetic tiles, before comparison all indexes are multiplied by a factor of 2, and synthetic tiles are further offset by -1 or +1, depending on whether they should be handled before or after the 'natural' tile they are inserted for. E.g. synthetic tiles that load a value should be processed before the tile that uses the value they load.

Another issue soon appeared, this time having to do with x86 being, altogether, quaint and antiquated and annoying, and specifically with the use of one operand register as source and result value. To put it simply, where you and I and the expression JIT structure might say:

a = b + c

x86 says:

a = a + b

Resolving the difference is tricky, especially for linear scan, since linear scan processes the values in the program rather than the instructions that generate them. It is therefore not suited to deal with instruction-level constraints such as these. If a, b, and c in my example above are not the same (not aliases), then this can be achieved by a copy:

a = b
a = a + c

If a and b are aliases, the first copy isn't necessary. However, if a and c are aliases, then a copy may or may not be necessary, depending on whether the operation (in this case '+') is commutative, i.e. it holds for '+' but not for '-'. Commit 349b360 attempts to fix that for 'direct' binary operations, but a fix for indirect operations is still work in progress. Unfortunately, it meant I had to reserve a register for temporary use to resolve this, meaning there are fewer available for the register allocator to use. Fortunately, that did simplify handling of a few irregular instructions, e.g. signed cast of 32 bit integers to 64 bit integers.

So that brings us to today and my future plans. The next thing to implement will be support for function calls by the register allocator, which involves shuffling values to the right registers and correct positions on the stack, and also in spilling all values that are still required after the function call since the function may overwrite them. This requires a bit of refactoring of the logic that spills variables, since currently it is only used when there are not enough registers available. I also need to change the linear scan main loop, because it processes values in order of first definition, and as such, instructions that don't create any values are skipped, even if they need special handling like function calls. I'm thinking of solving that with a special 'interesting tiles' queue that is processed alongside the main values working queue.

That was it for today. I hope to write soon with more progress.

Strangely Consistent: Deep Git

Published by Carl Mäsak

I am not good at chess.

I mean... "I know how the pieces move". (That's the usual phrase, isn't it?) I've even tried to understand chess better at various points in my youth, trying to improve my swing. I could probably beat some of you other self-identified "I know how the pieces move" folks out there. With a bit of luck. As long as you don't, like, cheat by having a strategy or something.

I guess what I'm getting at here is that I am not, secretly, an international chess master. OK, now that's off my chest. Phew!

Imagining what it's like to be really good at chess is very interesting, though. I can say with some confidence that a chess master never stops and asks herself "wait — how does the knight piece move, again?" Not even I do that! Obviously, the knight piece is the one that moves √5 distances on the board. 哈哈

I can even get a sense of what terms a master-level player uses internally, by reading what master players wrote. They focus on tactics and strategy. Attacks and defenses. Material and piece values. Sacrifices and piece exchange. Space and control. Exploiting weaknesses. Initiative. Openings and endgames.

Such high-level concerns leave the basic mechanics of piece movements far behind. Sure, those movements are in there somewhere. They are not irrelevant, of course. They're just taken for granted and no longer interesting in themselves. Meanwhile, the list of master-player concerns above could almost equally well apply to a professional Go player. (s:g/piece/stone/ for Go.)

Master-level players have stopped looking at individual trees, and are now focusing on the forest.

The company that employs me (Edument) has a new slogan. We've put it on the backs of sweaters which we then wear to events and conferences:

We teach what you can't google.

I really like this new slogan. Particularly, it feels like something we as a teaching company have already trended towards for a while. Some things are easy to find just by googling them, or finding a good cheat sheet. But that's not why you attend a course. We should position ourselves so as to teach answers to the deep, tricky things that only emerge after using something for a while.

You're starting to see how this post comes together now, aren't you? 😄

2017 will be my ninth year with Git. I know it quite well by now, having learned it in depth and breadth along the way. I can safely say that I'm better at Git than I am at chess at this point.

Um. I'm most certainly not an international Git grandmaster — but largely that's because such a title does not exist. (If someone reads this post and goes on to start an international Git tournament, I will be very happy. I might sign up.)

No, my point is that the basic commands have taken on the role for me that I think basic piece movements have taken on for chess grandmasters. They don't really matter much; they're a means to an end, and it's the end that I'm focusing on when I type them.

(Yes, I still type them. There are some pretty decent GUIs out there, but none of them give me the control of the command line. Sorry-not-sorry.)

Under this analogy, what are the things I value with Git, if not the commands? What are the higher-level abstractions that I tend to think in terms of nowadays?

(Yes, these are the ACID guarantees for database transactions, but made to work for Git instead.)

A colleague of mine talks a lot about "definition of done". It seems to be a Scrum thing. It's his favorite term more than mine, but I still like it for its attempt at "mechanizing" quality, which I believe can succeed in a large number of situations.

Another colleague of mine likes the Boy Scout Rule of "Always leave the campground cleaner than you found it". If you think of this in terms of code, it means something like refactoring a code base as you go, cleaning it up bit by bit and asymptotically approaching code perfection. But if you think of it in terms of process, it dovetails quite nicely with the "definition of done" above.

Instead of explaining how in the abstract, let's go through a concrete-enough example:

  1. Some regression is discovered. (Usually by some developer dogfooding the system.)
  2. If it's not immediately clear, we bisect and find the offending commit.
  3. ASAP, we revert that commit.
  4. We analyze the problematic part of the reverted commit until we understand it thoroughly. Typically, the root cause will be something that was not in our definition of done, but should've been.
  5. We write up a new commit/branch with the original (good) functionality restored, but without the discovered problem.
  6. (Possibly much later.) We attempt to add discovery of the problem to our growing set of static checks. The way we remember to do that is through a TODO list in a wiki. This list keeps growing and shrinking in fits and starts.

Note in particular the interplay between process, quality and, yes, Git. Someone could've told me at the end of step 6 that I had totalled 29 or so Git basic commands along the way, and I would've believed them. But that's not what matters to us as a team. If we could do with magic pixie dust what we do with Git — keep historic snapshots of the code while ensuring quality and isolation — we might be satisfied magic pixie dust users instead.

Somewhere along the way, I also got a much more laid-back approach to conflicts. (And I stopped saying "merge conflicts", because there are also conflicts during rebase, revert, cherry-pick, and stash — and they are basically the same deal.) A conflict happens when a patch P needs to be applied in an environment which differs too much from the one in which P was created.

Aside: in response to this post, jast++ wrote this on #perl6: "one minor nitpick: git knows two different meanings for 'merge'. one is commit-level merge, one is file-level three-way merge. the latter is used in rebase, cherry-pick etc., too, so technically those conflicts can still be called merge conflicts. :)" — TIL.

But we actually don't care so much about conflicts. Git cares about conflicts, becuase it can't just apply the patch automatically. What we care about is that the intent of the patch has survived. No software can check that for us. Since the (conflict ↔ no conflict) axis is independent from the (intent broken ↔ intent preserved) axis, we get four cases in total. Two of those are straightforward, because the (lack of) conflict corresponds to the (lack of) broken intent.

The remaining two cases happen rarely but are still worth thinking about:

If we care about quality, one lesson emerges from mst's example: always run the tests after you merge and after you've resolved conflicts. And another lesson from my example: try to introduce automatic checks for structures and relations in the code base that you care about. In this case, branch A could've put in a test or a linting step that failed as soon as it saw something according to the old naming convention.

A lot of the focus on quality also has to do with doggedly going to the bottom of things. It's in the nature of failures and exceptional circumstances to clump together and happen at the same time. So you need to handle them one at a time, carefully unraveling one effect at a time, slowly disassembling the hex like a child's rod puzzle. Git sure helps with structuring and linearizing the mess that happens in everyday development, exploration, and debugging.

As I write this, I realize even more how even when I try to describe how Git has faded into the background as something important-but-uninteresting for me, I can barely keep the other concepts out of focus. Quality being chief among them. In my opinion, the focus on improving not just the code but the process, of leaving the campground cleaner than we found it, those are the things that make it meaningful for me to work as a developer even decades later. The feeling that code is a kind of poetry that punches you back — but as it does so, we learn something valuable for next time.

I still hear people say "We don't have time to write tests!" Well, in our teams, we don't have time not to write tests! Ditto with code review, linting, and writing descriptive commit messages.

No-one but Piet Hein deserves the last word of this post:

The road to wisdom? — Well, it's plain
and simple to express:

Err
and err
and err again
but less
and less
and less.

Death by Perl6: Hello Web! with Purée Perl 6

Published by Tony O'Dell on 2017-01-09T18:19:56

Let's build a website.

Websites are easy to build. There are dozens of frameworks out there to use, perl has Mojo and Catalyst as its major frameworks and other languages also have quite a few decent options. Some of them come with boilerplate templates and you just go from there. Others don't and you spend your first few hours learning how to actually set up the framework and reading about how to share your DB connection between all of your controllers and blah, blah, blah. Let's look at one of P6's web frameworks.

Enter Hiker

Hiker doesn't introduce a lot of (if any) new ideas. It does use paradigms you're probably used to and it aims to make the initialization of creating your website very straight forward and easy, that way you can get straight to work sharing your content with the English.

The Framework

Hiker is intended to make things fast and easy from the development side. Here's how it works. If you're not into the bleep blop and just want to get started, skip to the Boilerplate heading.

Application Initialization

  1. Hiker reads from the subdirectories we'll look at later. The controllers and models are classes.
  2. Looking at all controllers, initializes a new object for that class, and then checks for their .path attribute
    1. If Hiker can't find the path attribute then it doesn't bind anything and produces a warning
  3. After setting up the controller routes, it instantiates a new object for the model as specified by the controller (.model)
    1. If none is given by the controller then nothing is instantiated or bound and nothing happens
    2. If a model is required by the controller but it cannot be found then Hiker refuses to bind
  4. Finally, HTTP::Server::Router is alerted to all of the paths that Hiker was able to find and verify

The Request

  1. If the path is found, then the associated class' .model.bind is called.
    1. The response (second parameter of .model.bind($req, $res)) has a hash to store information: $res.data
  2. The controller's .handler($req, $res) is then executed
    1. The $res.data hash is available in this context
  3. If the handler returns a Promise then Hiker waits for that to be kept (and expects the result to be True or False)
    1. If the response is already rendered and the Promise's status is True then the router is alerted that no more routes should be explored
    2. If the response isn't rendered and the Promise's result is True, then .render is called automagically for you
    3. If the response isn't rendered and the Promise's result is False, then the next matching route is called

Boilerplate

Ensure you have Hiker installed:

$ zef install Hiker
$ rakudobrew rehash #this may be necessary to get the bin to work

Create a new directory where you'd like to create your project's boilerplate and cd. From here we'll initialize some boilerplate and look at the content of the files.

somedir$ hiker init  
==> Creating directory controllers
==> Creating directory models
==> Creating directory templates
==> Creating route MyApp::Route1: controllers/Route1.pm6
==> Creating route MyApp::Model1: models/Model1.pm6
==> Creating template templates/Route1.mustache
==> Creating app.pl6

Neato burrito. From the output you can see that Hiker created some directories - controllers, models, templates - for us so we can start out organized. In those directories you will find a few files, let's take a look.

The Model

use Hiker::Model; 

class MyApp::Model1 does Hiker::Model {  
  method bind($req, $res) {
    $res.data<who> = 'web!';
  }
}  

Pretty straight forward. MyApp::Model1 is instantiated during Hiker initialization and .bind is called whenever the controller's corresponding path is requested. As you can see here, this Model just adds to the $res.data hash the key value pair of who => 'web!'. This data will be available in the Controller as well as available in the template files (if the controller decides to use that).

The Controller

use Hiker::Route; 

class MyApp::Route1 does Hiker::Route {  
  has $.path     = '/';
  has $.template = 'Route1.mustache';
  has $.model    = 'MyApp::Model1';

  method handler($req, $res) {
    $res.headers<Content-Type> = 'text/plain';
  }
}  

As you can see above, the Hiker::Route has a lot of information in a small space and it's a class that does a Hiker role called Hiker::Route. This let's our framework know that we should inspect that class for the path, template, model so it can handle those operations for us - path and template are the only required attributes.

As discussed above, our Route can return a Promise if there is some asynchronous operation that is to be performed. In this case all we're going to do is set the header's to indicated the Content Type and then, automagically, render the template file. Note: if you return a Falsey value from the handler method, then the router will not auto render and it will attempt to find the next route. This is so that you can cascade paths in the event that you want to chain them together, do some type of decision making real time to determine whether that's the right class for the request, or perform some other unsaid dark magic. In the controller above we return a Truethy value and it auto renders.

By specifying the Model in the Route, you're able to re-use the same Model class across multiple routes.

The Path

Quick notes about .path. You can pass a ('/staticpath'), maybe a path with a placeholder ('/api/:placeholder'), or if you're path is a little more complicated then you can pass in a regex (/ .* /). Check out the documentation for HTTP::Server::Router (repo).

The Template

The template is specified by the controller's .template attribute and Hiker checks for that file in the ./templates folder. The default template engine is Template::Mustache (repo). See that module's documentation for more info.

Running the App

Really pretty straight forward from the boilerplate:

somedir$ perl6 app.pl6  

Now you can visit http://127.0.0.1:8080/ in your favorite Internet Explorer and find a nice 'Hello web!' message waiting to greet you. If you visit any other URI you'll receive the default 'no route found' message from HTTP::Server::Router.

The Rest

The module is relatively young. With feedback from the community, practical applications, and some extra feature expansion, Hiker could be pretty great and it's a good start to taking the tediousness out of building a website in P6. I'm open to feedback and I'd love to hear/see where you think Hiker can be improved, what it's missing to be productive, and possibly anything else [constructive or otherwise] you'd like to see in a practical, rapid development P6 web server.