Perl 6 RSS Feeds

Steve Mynott (Freenode: stmuk) steve.mynott (at) / 2016-05-24T19:19:11

Weekly changes in and around Perl 6: 2016.21 A Quick One From Houston

Published by liztormato on 2016-05-24T01:06:20

Not a lot of time this time, but fortunately the past week appears to be low on events. So this should work out.

2016.05 Release

Rob Hoelz did the 2016.05 Rakudo Compiler Release. Please note there is no associated Rakudo Star release just yet.

A New Bot Bisecting

AlexDaniel created a new bot on the #perl6 channel: bisectable. It basically will do a bisect for you to find when the output of a code fragment changes. For more info, see his introduction and explanation. A real cool addition indeed!

Core Developments

Blog Posts

Ecosystem Additions

Winding Down

The continuous 30+℃ outside and 20-℃ inside temperature changes, are taking their toll. Hope I’ll be feeling better next week!

Weekly changes in and around Perl 6: 2016.20 Packaging Progress

Published by liztormato on 2016-05-17T18:07:56

Everything is larger in Texas, I guess. Like catching up with friends and needing more time to get rid of jet lag. So apologies for the delay, but here is your weekly dose of news from the Perl 6 world, straight from the OSCON in Austin, Texas. Where Jeff Goff gave an excellent 3.5 hour “Introduction to Perl 6” Tutorial yesterday, and TimToady will give the Perl 6 – Believe It Or Not! keynote in Ballroom A tomorrow.

The Perl Conference Returns

It seems fitting to write about The Perl Conference while at OSCON, which started life as “The Perl Conference” before becoming a more general Open Source Event. This year, the Perl Conference (aka YAPC::NA) has gotten its name back. It will be held on 20, 21 and 22 June in Orlando, Florida, with tutorials given on 19, 23 and 24 June. Most of the conference schedule is already available, with these Perl 6 related highlights:

And of course, I expect there will be quite a few lightning talks with a Perl 6 reference.:-)

YAPC::Europe 2016

The main European Perl Conference (still called YAPC::Europe at this point in time), will be held on 24, 25 and 26 August in Cluj-Napoca, the Silicon Valley of Romania. No schedule available yet, but you can still submit your Perl 6 talk!

Core Developments

Changing IRC Landscape

On-line communication within the Perl 6 World has grown so much, that one IRC channel (#perl6 on has not been enough for a long time. So in the past, the following special purpose alternate channels where added:

Module Installer Alternatives

Historically, panda has been the installer of Perl 6 Modules in the past. Since then, a lot has changed. A lot of the functionality of panda has been absorbed into the core with the great work that Stefan Seifert has been doing. Tadeusz Sośnierz has been working on redpanda, a cpanm-like client for installing Perl 6 modules. Meanwhile, zef has made inroads as a perl6 luxury package manager. Showing again, there is more than one way to do it!

Making Noise

The Perl 6 Noisegang is group for the promotion and support of audio and music application development in Perl 6. The aim of this group is to provide a focus for people writing sound oriented applications in Perl 6 and to help people find or use the tools and libraries that are already available. If there is stuff out there they want to bring it to people’s attention. There’s also an IRC channel available on #perl6-noise-gang, and a backlog.

Blog Posts

Zoffix Znet on a roll!

Ecosystem Additions

Winding Down

What I thought would be a small blog post, turned out to be quite large in the end. Next week’s Perl 6 Weekly might be delayed again due to travelling. Or not. :-) See you the next time!

Weekly changes in and around Perl 6: 2016.19 Summer Strikes

Published by liztormato on 2016-05-09T20:32:03

It seems like many people have taken advantage of the nice weather (well, for most of the core developers).

Rakudo Star 2016.04 Installers

Steve Mynott made sure that there is a Windows (64 bit) Installer for Rakudo Star 2016.04. Somehow the fact that he also made a Mac OS-X Installer for Rakudo Star 2016.04, was missed by yours truly. Generically, you should look at the How To Get Rakudo page to see which installers are available.


Next week will see OSCON again, this time in Austin, Texas. There will be a Perl 6 related tutorial and a Perl 6 keynote there!

Core developments

Blog Posts

Ecosystem Additions

Winding Down

Having seen too much sun in the past days, yours truly didn’t have the energy to go backlogging for gems. Next week’s Perl 6 weekly may also be delayed because of travelling to OSCON. Otherwise, everything is ok:-)

Independent Computing: Stock Tracker

Published by Michael on 2016-05-05T05:06:32

Stock Tracker

Today I completed a simple program to download stock pricing information. This Perl6 program takes a list of stock symbols and utilizes the Finance WebService API and retrieves the current day's closing stock price. This data is then saved into a text file in my clDB database format.

Stock Tracker

Perl6 modules to the rescue

Using the HTTP::Tinyish Perl6 Module, I craft a HTTP Get request to the WebService. In my request I state that I want the Name, stock symbol and the closing price for the current day to be returned for each stock listed.

I use the crontab service on my server to run the program every day at 1330 and grab the closing price for the stocks.

For now I have the output sent to a text file using the clDB data format. At this point I can use another program to analyze the data or even make an an SVG chart graphic. This will be a separate program I will work on latter.

Stock Tracker

Inside the Code

In this program I use the .trans and .split methods to clean up the data returned from the WebService.
On line 25, @data = $line.trans(' " ' => '', :delete).split(","); takes the current text in the $line Variable and transliterats the quotes (") character and replaces it with nothing, basically removing it. Then using the .split method, it takes that and splits it every time it finds a comma (,) character. The results end up as elements in the @data array, which I then use when I craft the $dataString which gets appended (Line 27) to the output file.

Transliterate is a very useful method provide by the Str class more info and examples can be found here at the Perl6 documentation page.

Sample output

The results of of the entire process can be seen on the page listed below.
View the results of the code here Announce: Windows (64 bit) Installer for Rakudo Star 2016.04

Published by Steve Mynott on 2016-05-03T21:05:34

A Windows MSI installer is now available for x86_64 (64bit) platforms (probably Windows 7 or better) and has the JIT (Just in Time compiler) enabled.  This version was built with mingw (from Strawberry Perl) rather than the usual MSVC which may have performance implications. In order to use the module installer “panda” you will also need to install Strawberry Perl and Git for Windows. Also see for Errata notices concerning panda.

Unfortunately the usual second installer targeting x86 (32bit) platforms isn’t currently available.  It’s hoped this will be available in the near future and until then the previous 2016.01 is recommended for x86 (32bit). No 32 bit versions feature the JIT. MSIs are available from

Weekly changes in and around Perl 6: 2016.18 Long Awaited Landings

Published by liztormato on 2016-05-02T20:57:47

This week saw the landing of Stefan Seifert‘s two-month work to make precompilation of modules better maintainable and less resource-hungry. We can now actually use the precomp files generated on installation, so e.g. users no longer have to wait on first use for precompilation of system installed modules. Same is true for developers who no longer have to wait for precompilation of installed modules just because they run their code with -Ilib. This also means that we spread around precomp files much less. So lib/.precomp will usually only contain precomp files for the modules we find in lib/. Finally, this new repo format fixes an issue with packaging modules for Linux distributions.

This week also saw the landing of Leon Timmermans‘ eight-month work on a Perl 6 Test Harness. This allows us to run the spectest/stresstest using a Perl 6 implementation of the TAP protocol. This should allow us to more easily and more rigorously test Perl 6, simply by setting the environment variable HARNESS_TYPE to 6. Unfortunately for those of us on OS X, this does not work yet: this seems related to a bug with awaiting a Promise. Anyway, this is dogfooding at its best! And a step closer to not needing Perl 5 to make and test Perl 6.

QA Hackathon in Rugby

This year’s QA Hackathon took place in Rugby, England. There were quite some accomplishments. I’ve taken the liberty of extracting the Perl 6 related ones for your convenience:

Blog Posts

Other Core Developments

Not too much else happened: some minor bugs were fixed.

Ecosystem Additions

More than one distribution per day added in the past week. M. Hainsworth created a very nice Ecosystem Citation Index. The additions for this week are:

Winding Down

Some loose ends from last week basically made me so tired, that I will need to postpone looking for gems in the backlog by yet another week.

6guts: Refactoring and torture

Published by jnthnwrthngtn on 2016-04-30T15:14:59

This post covers the previous two weeks of my Perl 6 grant work. Last time I wrote here, I plotted changes to call frames in MoarVM. Or, as the wonderful Perl Weekly put it, provided way too much information about call frames. :-)

That there is so much to say about call frames reflects their rather central role. They play a big part in supporting numerous language features (function calls, recursion, closures, continuations, dynamic variables, and pseudo-packages like OUTER and CALLER). The garbage collector scans them to find live objects. Both the interpreter and JIT compiler have to relate to them in various ways. The dynamic optimizer performs transforms over them when doing OSR (On Stack Replacement) and uninlining (the deoptimization that enables us to speculatively perform inlining optimizations).

All of which makes a major refactor of call frames a rather scary prospect. While they have picked up additional bits of state as MoarVM has evolved, they have been reference counted since, well, the day I first implemented call frames, which means “before MoarVM could even run any code”. Being reference counted, rather than handled by the garbage collector, gave them a property that is easy to rely on, and rather less easy to discover reliance on: that they never move over their lifetime.

I like to move it move it

Wait, what, move? Why would they even move?

Here’s a little Perl 6 program to illustrate. It declares a class, makes an instance of it, prints its memory address, does a load of further, throwaway, memory allocations, and then again prints the address of the object instance we made.

class A { }
my $obj =;
say $obj.WHERE; for ^10000;
say $obj.WHERE;

When I ran this locally just now, I got:


If you get the same number twice, just make the 10000 something bigger. What’s interesting to note here is that an object’s location in memory can change over time. This is a consequence of MoarVM’s garbage collector, which is both generational and manages its young generation using semi-space copying. (This is nothing special to MoarVM; many modern VMs do it.)

Being able to move objects relies on being able to find and update all of the references to them. And, since MoarVM is written in C, that includes those references on the C stack. Consider this bit of code, which is the (general, unoptimized) path for boxing strings:

MVMObject * MVM_repr_box_str(MVMThreadContext *tc, MVMObject *type, MVMString *val) {
    MVMObject *res;
    MVMROOT(tc, val, {
        res = MVM_repr_alloc_init(tc, type);
        MVM_repr_set_str(tc, res, val);
    return res;

It receives val, which is a string to box. Note that strings are garbage-collectable objects in MoarVM, and so may move. It then allocates a box of the specified type (for example, Perl 6’s Str), and puts the string inside of it. Since MVM_repr_alloc_init allocates an object, it may trigger garbage collection. And that in turn may move the object pointed to by val – meaning that the val pointer needs updating. The MVMROOT macro is used in order to add the memory address of val on the C stack to the set of roots that the GC considers and updates, thus ensuring that even if the allocation of the box triggers garbage collection, this code won’t end up with an old val pointer.

Coping with moving frames

Last time, I discussed how reference counting could be eliminated in favor of a “linear” call stack for frames that don’t escape (that is, become heap referenced), and promoting those that do escape to being garbage collected. As an intermediate step there, I’ve been working to make all frames GC-managed. This means that frames can move, and that they are part of the generational scheme. Therefore, every piece of code that both holds a reference to a frame and takes a code path that can allocate would need updating with MVMROOT. Further, all assignments of frames into other objects, and other objects into frames, would need write barriers (aside from the working area, which is handled specially).

In part, this just needs a lot of care. Going through the places frames show up, updating things as needed, and so forth. But even then, would that really be enough to be confident things weren’t broken? After all, my refactor was changing the rules for one of the most important data structures in the VM.

Of course, building NQP and Rakudo and passing the spectest suite is one good way to exercise MoarVM after the changes. Doing this showed up some issues, which I fixed. But even that doesn’t offer a huge amount of confidence. A simple script, or a short test, might trigger no garbage collections at all, or just the odd one. And the collections are highly likely to be triggered on the most common code paths in the VM.

GC torture testing

When faced with something scary, a surprisingly good approach is to tackle it by doing it really often. For example, are software releases scary? If yes, then do time-based releases every month, and with time they’ll become automatic and boring enough not to be scary. Is deploying changes to a production system scary? If yes, then adopt continuous delivery, deploying lots of times in a day and with easy rollback if things don’t go well.

Garbage collection is pretty scary. I mean, we take this world of objects the program has created, move them around, throw a bunch of them out, and then have the program continue running as if nothing happened. So…let’s try doing it really often!

This is exactly what GC torture testing involves.

--- a/src/gc/collect.h
+++ b/src/gc/collect.h
@@ -1,7 +1,7 @@
 /* How big is the nursery area? Note that since it's semi-space copying, we
  * actually have double this amount allocated. Also it is per thread. (In
  * the future, we'll make this adaptive rather than a constant.) */
-#define MVM_NURSERY_SIZE 4194304
+#define MVM_NURSERY_SIZE 13000

Rather than doing a collection every 4MB worth of allocations, let’s do one every 13KB worth of allocations! That’s around 320 times more often. Combined with a few debugging checks enabled, to catch references to objects that are out of date, bugs resulting from missing MVMROOTs and write barriers can be driven out of their hiding places into the light of day.

It’s a rather effective technique. It’s also a very time-consuming one. The NQP and Rakudo builds easily take an hour between them, and running spectest this way takes over 12 hours. It’s cheap compared to shipping a MoarVM with new and nasty bugs that waste a bunch of people’s time, of course!

It’s been a while since we did such a torture test. I’ve decided we should do them more often. It found issues. So far, from the spectest run torture test results, I’ve fixed 9 bugs (I didn’t go back and count those discovered while building NQP and Rakudo). What’s interesting is that of the 9, only 3 of them were clearly attributable to my refactors, one was potentially related to them, and 5 were bugs that must have been around a good while. One of the bugs that did relate to the frames changes caused deadlocks in multi-threaded code quite reliably under torture testing, but would have likely caused them very rarely under normal use (and so been extremely frustrating to reproduce and track down if it made it into the wild). 2 of the fixed non-frame bugs exclusively affected multi-threaded programs and would have doomed them. One was in the CUnion representation, and probably was the cause of some previously unresolved occasional failures of the NativeCall union tests.

What next?

By this point, I’m reasonably confident that regressions due to the first step of the frame changes have been shaken out. The GC torture testing has, however, shed light on some other issues that will want addressing in the near future.

I intend to put those aside for a little while, and complete the frame changes, introducing the linear stack. Compared with the first step, this feels like a lower risk change, in that mistakes should be a lot easier and cheaper to detect. I’d like to try and land this in the next week or so, in order that it can get some real-world testing before it makes it into a MoarVM and Rakudo release.

Once that’s out the way, I’ll be returning to other issues turned up in GC torture testing. I’d also like to look into a way to be able to run it automatically and regularly (once a week, perhaps). It’s a good bit too intensive to be able to farm it out to Travis. The sensible solution is probably to do it in the cloud, on some elastic compute thing where it just uses a machine once a week for a day or so. The silly but fun way is to build a Raspberry Pi cluster on my desk, and hack up something to distribute the tests across them. :-)

Strangely Consistent: Trinity

Published by Carl Mäsak

Lately, driven by a real itch I wanted to scratch, I finally wrote my first Perl 6 web app since the November wiki engine. (That was back in 2008. Very early days. I distinctly recall Rakudo didn't have a does-file-exist feature yet.)

Because I'm me, naturally the web app is a game. A board game. The details don't matter for this article — if you're curious, go check out the README. If you're not, I forgive you. It's just a game with a board and stones of various colors.

Here's the order in which I wrote the web app. The first thing I made work was a model, literally the game itself.

image of the model in the middle

This is the core, the heart of the application. A Games::Nex instance models the progress of a game, and the rules of Nex are encoded in this type's "algebra". Naturally, this was developed test-first, because it would be reckless not to. This is the essence of masakism: "testing gets you far" and "keep it small and simple". Eliminate all other factors, and the one which remains must be the model at the core.

The core domain is nice, but it doesn't have any face. By design, it's just mind stuff. I wanted it to be a web app, so the next reasonable step was to add a Bailador app that could show the board and allow moves to be tried on it.

image of the app+model in the middle

Of course, when you run it — it's a web app — the result turns up in your browser, as you're expect.

image of browser <--- app+model

And when you make a move by interacting with the web page, the browser goes "Hey, app! Yo! Do some stuff!"

image of browser ---> app+model

None of this is news to you, I'm sure. My point so far though is that we're now up to two environments. Two separate runloops. It has to be that way, because — in every single case except the crazy one where I am you — the server and the client will be two different computers. The GET and POST requests and their responses are simply client and server shouting to each other across the network.

Now my app had a face but no long-term memory. Every move was against an empty game board; you made it, and all traces of it disappeared into the great data limbo.

Time to add (you guessed it) a database backend:

image of browser <---> app+model <---> db

(Postgres is great, by the way. Highly recommended. Makes me feel happy just thinking about it. Mmm, Postgres.)

The database also sits in its own, separate runloop. Maybe — probably — on its own machine, too. So we have three environments.

It was at this point I felt that something was... well not wrong exactly, but odd...

<masak> I was struck by not just how wide apart the database world is from
        the server backend world, but also how wide apart the [browser]
        world is from the server backend world
<masak> you're writing three separate things, and making them interoperate
<masak> I almost feel like writing a blog post about that

Coke insightfully quipped:

<[Coke]> this reads like you discovered a 3-tier web app.

And yes, that is the name for it. Freakin' three-tier. Love those bloody tiers. Like a hamburger, but instead of meat, you got a tier, and not in bread as usual, but between two more tiers!

I have a subtle point I want to make with this post, and we're now getting to it. I'm well aware of how good it is to separate these three tiers. I teach a software architecture course several times yearly, and we've made a point to be clear about that: the model in the middle should be separated, nay, insulated, from both UI and DB. Why? Because UI layers come and go. First it's a desktop client, some years later it's a responsive Web 2.0 snazzle-pot of janky modernity, and surely in a few more decades we'll all be on VR and telepathy. Throughout all this, the model can stay the same, unfazed. Ditto the database layer; you go from RDBMS to NoSQL back to PffyeahSQL to an in-memory database to post-it-notes on the side of your screen. The model abides.

And yet... I want to make sure I'm building a single monolithic system, not three systems which just happen to talk to each other. That's my thesis today: even in a properly factored three-tier system, it's one system. If we don't reflect that in the architecture, we have problems.

If you have three tiers I feel bad for you son / I got 99 problems but message passing ain't one

So, just to be abundantly clear: the three tiers are good — besides being a near-necessity. The web is practically dripping with laud and praise of three-tier:

By segregating an application into tiers, developers acquire the option of modifying or adding a specific layer, instead of reworking the entire application. — Wikipedia

During an application's life cycle, the three-tier approach provides benefits such as reusability, flexibility, manageability, maintainability, and scalability. — Windows Dev Center

Three-tier architecture allows any one of the three tiers to be upgraded or replaced independently. — Technopedia

It's against this unisonal chorus of acclaim that I'm feeling that some kind of balancing force is missing and never talked about. Why am I writing three things again? I'm down in the database, I'm writing code in my model and my app, and I'm away writing front-end JavaScript, and they are basically unrelated. Or rather, they are only related because I tend to remember that they should be.

I think it's especially funny that I didn't really choose the three-tier model along the way. It was forced upon me: the browser is a separate thing from the server, and the database engine is a separate thing from the app. I didn't make it so! Because I didn't make that design decision, it's also not something to be proud of afterwards. "Ah, look how nice I made it." When everyone keeps repeating how good it is with a software architecture that's not a choice but a fact of the territory, it sounds a bit like this:

Potholes in the road allow us to slow down and drive more carefully, making our roads safer. — no-one, ever

Maybe we can close our eyes and imagine a parallel universe where some enlightened despot has given us the ultimate language Trinity, which encompasses all three tiers, and unites all concerns into one single conceptual runloop, one single syntax, and one single monolithic application. Sure, we still do DB and frontend stuff, but it's all the same code base, and incidentally it's so devoid of incidental complexity, that pausing to think about it will invariably make us slightly misty-eyed with gratitude.

Would that work? I honestly don't know. It might. We don't live in that universe, but I sure wouldn't mind visiting, just to see what it's like.

One very good argument that I will make at this point, before someone else makes it for me, is that the three tiers actually handle three very different concerns. Basically:

These differences go deep, to the point where each tier has a quite specialized/different view of the world. To wit:

But it doesn't end there. The three tiers also have very typical, well-defined relations to each other.

And I haven't even started talking about synchronous/asynchronous calls, timeouts, retries, optimistic UIs, user experience, contention, conflicts, events pushed from the server, message duplication, and error messages.

I guess my point is that however Trinity looks, it has a lot on its plate. I mean, it's wonderful that a Trinity system is one single application, and things that ought to be close to each other in code can be... but there's still a lot of concerns. Sometimes the abstraction leaks. You can practically see that server-code block glare in paranoid suspicion at that client-code block. And, while Trinity of course has something much more unified than SQL syntax, sometimes you'd feel a little bump in the floor when transitioning from model code to DB code.

But I could imagine liking Trinity a lot. Instead of writing one page Perl 6 and two pages JavaScript, I'd just implement the code once in Trinity. Instead of describing the model once for the server and once for the database, I'd just describe it once. Parallel universe, color me jealous.

But back to our world with three inevitable tiers. I've been thinking about what would make me feel better about the whole ui-app-db impedance mismatch. And, I haven't tried it out yet, but I've decided I want to borrow a leaf from 007 development. In 007, basically when we've found things that might go wrong during development, we've turned those things into tests. These are not testing the behavior of the system, but the source code itself. Call them consistency tests. We really like them; they make us feel ridiculous for making silly consistency mistakes, but also grateful that the test suite caught them before we pushed. (And if we accidentally push, then TravisCI notifies us that the branch is failing.)

What needs testing? As so often, the interfaces. In this case, the surface areas between frontend/app, and between app/database.

image of browser <-|-> app+model <-|-> db

Here's what I'm imagining. The frontend and app are only allowed to talk to each other in certain pre-approved ways. The tests make sure that each source file doesn't color outside of the line. (Need to make that code analysis robust enough.) Ditto with the app and database.

I figure I could use something pre-existing like Swagger for the frontend/app interaction. For the app/db interaction, actually using the database schema itself as the canonical source of truth seems like a decent idea.

Maybe it makes sense to have both static tests (which check the source code), and runtime tests (which mock the appropriate components and check that the results match in practice). Yes, that sounds about robust enough. (I'm a little unsure how to do this with the client code. Will I need to run it headless, using PhantomJS? Will that be enough? I might need to refactor just to expose the right kind of information for the tests.)

That would give me confidence that I'm not breaking anything when I'm refactoring my game. And it might make me feel a little bit better about the Trinity language being in a parallel universe and not in this one.

I might write a follow-up post after I've tried this out. Announce: Mac OS X Installer for Rakudo Star 2016.04

Published by Steve Mynott on 2016-04-27T14:13:45

A Mac OS X installer is now available. This installer has the “.dmg” file extension and is available from

Weekly changes in and around Perl 6: 2016.17 Making our Introductions

Published by liztormato on 2016-04-26T11:28:03

Ramiro Encinas has translated Naoum Hankache‘s excellent Introduction to Perl 6 to Español. And just before the closing of this Perl 6 Weekly, it turns out Itsuki Toyota has been working on a version in Japanese. This now means the introduction is available in 5 languages: English, French, German, Spanish and Japanese. I guess I really should get working on the Dutch version now :-).

Rakudo 2016.04 Releases

The past week saw the 2016.04 Rakudo compiler release, by Will Coleda yet again! Shortly after, Steve Mynott has released the 2016.04 Rakudo Star release.

Blog Posts

Ecosystem Additions

Winding Down

Again, not a lot of time to do the Perl 6 Weekly, this time mostly because of attending the Perl QA Hackathon in Rugby, England. Will try to catch up on things I missed in the past weeks in the next issue! See you then! Announce: Rakudo Star Release 2016.04

Published by Steve Mynott on 2016-04-25T23:36:56

On behalf of the Rakudo and Perl 6 development teams, I’m honored to announce the April 2016 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the April 2016 release is available from

This is the second post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

In the Perl 6 world, we make a distinction between the language (“Perl 6″) and specific implementations of the language such as “Rakudo Perl”. This Star release includes release 2016.04 of the Rakudo Perl 6 compiler, version 2016.04 of MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

Some of the new compiler features since the last Rakudo Star release include:

Notable changes in modules shipped with Rakudo Star:

There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

There is an online resource at that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at

See for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see, ask on the mailing list, or join us on IRC #perl6 on freenode.

6guts: Framing the problem

Published by jnthnwrthngtn on 2016-04-21T17:11:49

In this post I’ll be talking a lot about call frames, also known as invocation records. Just to be clear about what they are, consider a sub:

sub mean(@values) {
    @values.sum / @values

Whenever we call mean, we create a call frame. This holds the storage for the incoming @values parameter. It also holds some temporary storage we use in executing the sub, holding, for example, the sum method object we get back when looking up the method, and the result of calling @values.sum, which we then pass to infix:</>. Call frames also record outer and caller references (so we can resolve lexical and dynamic variables), the place to store the return value and go to on return, and other bits. It’s important to note that call frames are not 1:1 with subs/methods/blocks. Perhaps the best way to understand why is to consider a recursive sub:

sub fac($n) {
    $n <= 1
        ?? 1
        !! $n * fac($n - 1)

There’s one fac sub but we need a call frame for each invocation of (that is, call to) fac, since the $n parameter will vary in each call. (Threads are another example where you’re “in” a sub multiple times at the same time.)

All complex software systems evolve from simple systems. MoarVM is no exception. Back when MoarVM started out, I knew I wanted to have invocation be cheap, and call frames be fairly lightweight. I also didn’t want them to be GC-allocated. I figured that code sat in a loop, only using native types and only calling things involving native types, should not create garbage that needed collecting. All good goals.

Fast forward a few years, and where are we? Let’s start out with the easy one to assess: frames aren’t GC-allocated. So that’s good, right? Well, sure, in that I got the natives property that I was after. However, since things like closures and continuations exist, not to mention that you can get a first-class reference to a call frame and traverse the outer/caller chain, the lifetime of frames is interesting. They most certainly don’t always just go away at the point of return. Therefore, they need to have their memory managed in some way. I went with reference counts, figuring that since we’d only need to twiddle them fairly occasionally, it’d be fairly OK. Trouble is, thanks to MoarVM supporting concurrent threads of execution, those counts need to be incremented and decremented using atomic operations. Those are CPU native, but they’re still a bit costly (more so on some CPUs that others).

There’s another, more hidden, cost, however – one I didn’t really see coming. MoarVM has a generational garbage collector, as discussed in my previous post. But frames are not garbage collectable objects. They’re managed by reference counts. So what happens when a reference counted frame is referenced by a second generation object? Well, there’s no risk of the frames going away too early; the reference count won’t be decremented until the gen2 object itself is collected. The problem is about the objects the frame references. Frames, not being garbage collectable, don’t have write barriers applied on binds into them. This means that they can come at any time to point to nursery objects. We solved this by keeping all objects referencing frames in the inter-generational root set. This is perfectly safe. Unfortunately, it also greatly increases the cost of garbage collection for programs that build up large numbers of closures in memory and keep them around. Of course, since write barriers are cheap but not free, we get a performance win on all programs by not having to apply them to writes to working registers or lexical.

So, how about invocation cost? Is invocation cheap? Well, first of all lets turn off inlining:


And measure 10 million invocations passing/receiving one argument using Perl 5, NQP, and Rakudo. Perl 5 does them in 2.85s. NQP comes out a little ahead, at 2.45s. Rakudo strolls through them in an altogether too leisurely 6.14s. (Turn inlining back on, and Rakudo manages it in 3.39s.) So, if NQP is already ahead, is MoarVM really so bad? Well, it could certainly be better. On an idealized 3GHz GPU, each invocation is costing around 735 CPU cycles. That’s pricey. The other issue here is that just matching Perl 5 on invocation speed isn’t really enough, because tons of things that aren’t invocations in Perl 5 actually are in Perl 6 (like, every array and hash index). In a “Perl 6 is implemented in Perl 6” world, we need to squeeze a good bit more out of invocation performance.

And finally, what about size? An MVMFrame comes at a cost of 296 bytes. It points to a chunk of working space together with a lexical environment (both arrays). Every single closure we take also pays that fixed 296 byte cost (and, of course, the cost of the lexical environment storage, since that’s what we actually take closures for). Again, not staggeringly huge, but it adds up very quickly.

These are all areas that need improvement. In fact, they make up two of the entries in the performance section of theproposal for the grant I’m doing this work under. So, I decided it was time to start thinking about how I’ll address them.

Some measurements

I was curious how many frames end up referenced by garbage collectable objects against how many never end up in this situation. So, I quickly patched MoarVM to keep track of if a frame ever came to be referenced by a GC-able object:

diff --git a/src/core/frame.c b/src/core/frame.c
index ca1a4d2..f392aca 100644
--- a/src/core/frame.c
+++ b/src/core/frame.c
@@ -114,7 +114,10 @@ MVMFrame * MVM_frame_dec_ref(MVMThreadContext *tc, MVMFrame *frame) {
      * to zero, so we look for 1 here. */
     while (MVM_decr(&frame->ref_count) == 1) {
         MVMFrame *outer_to_decr = frame->outer;
+if (frame->refd_by_object)
+    tc->instance->refd_frames++;
+    tc->instance->non_refd_frames++;
         /* If there's a caller pointer, decrement that. */
         if (frame->caller)
             frame->caller = MVM_frame_dec_ref(tc, frame->caller);
diff --git a/src/core/instance.h b/src/core/instance.h
index b14f11d..4f61000 100644
--- a/src/core/instance.h
+++ b/src/core/instance.h
@@ -365,6 +365,9 @@ struct MVMInstance {

     /* Cached backend config hash. */
     MVMObject *cached_backend_config;
+MVMuint64 refd_frames;
+MVMuint64 non_refd_frames;

 /* Returns a true value if we have created user threads (and so are running adiff --git a/src/main.c b/src/main.c
index 5458912..1df4fe3 100644
--- a/src/main.c
+++ b/src/main.c
@@ -189,7 +189,9 @@ int main(int argc, char *argv[])

     if (dump) MVM_vm_dump_file(instance, input_file);
     else MVM_vm_run_file(instance, input_file);
+printf("Ref'd frames: %d\nNon-ref'd frames: %d\n",
+    instance->refd_frames,
+    instance->non_refd_frames);
     if (full_cleanup) {
         return EXIT_SUCCESS;

And measured a few things (the names from the latter ones are benchmark names from perl6-bench):

Measured                    Ref'd       Non-ref'd       % Ref'd
========                    =====       =========       =======
NQP startup                 0           5259            0.0%
NQP regex tests             28065       1682655         1.6%
Compile Perl 6 actions      115092      6100770         1.7%
Compile Perl 6 grammar      130716      5451120         2.3%
Compile CORE.setting        2065214     55771097        3.8%
Perl 6 startup              35          12822           0.3%
Compiling Test.pm6          39639       860474          4.4%
Compiling NativeCall.pm6    145426      1887682         7.2%
while_array_set             993701      6024920         14.1%
while_hash_set              1804        2024016         0.1%
for_assign                  1654        1020831         0.2%
for_concat_2                1743        2023589         0.1%
split_string_regex          8992750     19089026        32.0%
create_and_iterate_hash_kv  14990870    40027814        27.2%
parse_json                  10660068    42364909        20.1%
rc-forest-fire              3740096     16202368        18.8%
rc-mandelbrot               89989       5523439         1.6%
rc-man-or-boy-test          791961      7091381         10%

What can we infer from this? First of all, most NQP programs have at most just a few percent of their frames referenced by GC-able objects. With the Perl 6 benchmarks, it’s all over the map, with split_string_regex being the “worst” case. NQP’s optimizer is much better doing lexical to local lowering, and flattening away scopes that we don’t really need. In Rakudo, we’re pretty weak at that. Clearly, some more work on this area could benefit Rakudo (and yes, it’s also on the list of things to do under my grant).

Secondly, since – even in the worst cases – the majority of frames never get themselves tied up with any “interesting” situations that causes them to become GC-referenced, a strategy that handles them differently – and hopefully far more efficiently – would give us a win.

What GC-able things reference frames?

It was fairly easy to grep through the MoarVM source and make a list. I did so to help me think through the cases:

It’s also interesting to note that a frame only ever “escapes” such that it can be touched by another thread if it becomes referenced by a GC-able object.

What makes frames take up space?

Next, I decided to to through the MVMFrame data structure and see where the space is going, and what options might exist for saving that space. What follows is an analysis of all the fields in an MVMFrame.

/* The thread that is executing, or executed, this frame. */
MVMThreadContext *tc;

Interestingly, this one gets cleared after a certain point in the frame’s life, except if it’s captured in a continuation. Exception handling uses it to know if the frame is still on the call stack, which is interesting in various cases. GC marking uses it to know if it should mark ->work (see below).

Interestingly, nothing seems to care overly much at the moment that it points to a particular thread context; they all want it for a flag. So, it’s certainly a candidate for removal. It’s also interesting to note that in every case where a frame is not referenced by an object, it is alive solely by being in a thread’s “call stack” – that is, the call chain from following the ->caller pointer from the currently executing frame of a thread. So, the flag will only matter for frames that are GC-referenced.

/* The environment for this frame, which lives beyond its execution.
* Has space for, for instance, lexicals. */
MVMRegister *env;

Relevant for frames in whatever state.

/* The temporary work space for this frame. After a call is over, this
* can be freed up. Must be NULLed out when this happens. */
MVMRegister *work;

Relevant for frames that are still executing, or that are captured by a continuation. Cross-cuts whether they are GC-referenced.

/* The args buffer. Actually a pointer into an area inside of *work, to
* decrease number of allocations. */
MVMRegister *args;

Possibly could go away through a level of indirection, but it’s performance sensitive. Used together with…

/* Callsite that indicates how the current args buffer is being used, if
* it is. */
MVMCallsite *cur_args_callsite;

…this one.

/* The outer frame, thus forming the static chain. */
MVMFrame *outer;

Pretty much everything has an outer.

/* The caller frame, thus forming the dynamic chain. */
MVMFrame *caller;

Pretty much everything has a caller too.

/* The static frame information. Holds all we statically know about
* this kind of frame, including information needed to GC-trace it. */
MVMStaticFrame *static_info;

As you might guess, this is pretty important and useful. However, it’s also possible to obtain it – at the cost of a level of indirection – through the ->code_ref below. Would need to measure carefully, since it’d increase the cost of things like lexical lookups from outer frames (and, once we get better at optimizing, that will be “most of them”).

/* The code ref object for this frame. */
MVMObject *code_ref;

The particular closure we were invoked as. Not something we can obviously lose, and needed for the lifetime of the frame in general.

/* Parameters received by this frame. */
MVMArgProcContext params;

Argument processing context. Every frame uses it to process its arguments. It’s only useful while ->work is active, however, and so could be allocated as a part of that instead, which would reduce the cost of closures.

/* Reference count for the frame. */
AO_t ref_count;

Can go away provided we stop reference counting frames.

/* Is the frame referenced by a garbage-collectable object? */
MVMint32 refd_by_object;

Could also go away provided we stop reference counting frames and have some scheme for optimizing the common, non-referenced case.

/* Address of the next op to execute if we return to this frame. */
MVMuint8 *return_address;

/* The register we should store the return value in, if any. */
MVMRegister *return_value;

/* The type of return value that is expected. */
MVMReturnType return_type;

/* The 'entry label' is a sort of indirect return address
* for the JIT */
void * jit_entry_label;

These four are only used when the frame is currently on the call stack, or may be re-instated onto the call stack by a continuation being invoked. Could also live with ->work, thus making closures cheaper.

/* If we want to invoke a special handler upon a return to this
* frame, this function pointer is set. */
MVMSpecialReturn special_return;

/* If we want to invoke a special handler upon unwinding past a
* frame, this function pointer is set. */
MVMSpecialReturn special_unwind;

/* Data slot for the special return handler function. */
void *special_return_data;

/* Flag for if special_return_data need to be GC marked. */
MVMSpecialReturnDataMark mark_special_return_data;

Used relatively occasionally (and the more common uses are candidates for spesh, the dynamic optimizer, to optimize out anyway). A candidate for hanging off an “extra stuff” pointer in a frame. Also, only used when a frame is on the call stack, with the usual continuation caveat.

/* Linked list of any continuation tags we have. */
MVMContinuationTag *continuation_tags;

Used if this frame has been tagged as a possible continuation “base” frame. Only relevant if that actually happens (which is quite rare in the scheme of things), and can only happen when a frame is on the call stack. A candidate for similar treatment to the special return stuff.

/* Linked MVMContext object, so we can track the
* serialization context and such. */
/* note: used atomically */
MVMObject *context_object;

This is used when a context goes first-class. Thus, it implies the frame is referenced by at least one GC-able object (in fact, this points to said object). That’s fairly rare. It can happen independently of whether the frame is currently executing (so, unrelated to ->work lifetime).

/* Effective bytecode for the frame (either the original bytecode or a
* specialization of it). */
MVMuint8 *effective_bytecode;

/* Effective set of frame handlers (to go with the effective bytecode). */
MVMFrameHandler *effective_handlers;

/* Effective set of spesh slots, if any. */
MVMCollectable **effective_spesh_slots;

/* The spesh candidate information, if we're in one. */
MVMSpeshCandidate *spesh_cand;

These are all related to running optimized/specialized code. Only interesting for frames currently on the call stack or captured in a continuation (so, ->work lifetime once again).

/* Effective set of spesh logging slots, if any. */
MVMCollectable **spesh_log_slots;

/* If we're in a logging spesh run, the index to log at in this
* invocation. -1 if we're not in a logging spesh run, junk if no
* spesh_cand is set in this frame at all. */
MVMint8 spesh_log_idx;

/* On Stack Replacement iteration counter; incremented in loops, and will
* trigger if the limit is hit. */
MVMuint8 osr_counter;

These 3 play part a part in dynamic optimization too, though more in the stage where we’re gathering information. Again, they have ->work lifetime. The top may well go away in future optimizer changes, so not worth worrying over too much now.

/* GC run sequence number that we last saw this frame during. */
AO_t gc_seq_number;

This one is certainly a candidate for going away, post-refactoring. It serves as the equivalent of a “mark bit” when doing GC.

/* Address of the last op executed that threw an exception; used just
* for error reporting. */
MVMuint8 *throw_address;

May be something we can move inside of exception objects, and have them pay for it, not every frame. Worth looking in to.

/* Cache for dynlex lookup; if the name is non-null, the cache is valid
* and the register can be accessed directly to find the contextual. */
MVMString   *dynlex_cache_name;
MVMRegister *dynlex_cache_reg;
MVMuint16    dynlex_cache_type;

These also have ->work lifetime. Give a huge speed-up on dynlex access, so (aside from re-designing that) they can stay.

/* The allocated work/env sizes. */
MVMuint16 allocd_work;
MVMuint16 allocd_env;

These exist primarily because we allocate work and env using the fixed size allocator, and so we need the sizes to free the memory.

/* Flags that the caller chain should be kept in place after return or
* unwind; used to make sure we can get a backtrace after an exception. */
MVMuint8 keep_caller;

/* Flags that the frame has been captured in a continuation, and as
* such we should keep everything in place for multiple invocations. */
MVMuint8 in_continuation;

/* Assorted frame flags. */
MVMuint8 flags;

It appears the top two could be nicely folded into flags. Also, the flags may only be relevant for currently executing frames, or those captured in a continuation, so this lot is a candidate to move to something with ->work lifetime.


Here are some things that stand out to me, and that point the way to an alternate design.

  1. An MVMFrame presently carries a bunch of things in it that aren’t relevant unless the frame is either currently on a thread’s call stack or captured in a continuation.
  2. This is an orthogonal axis to whether the frame is referenced by something that is garbage-collectable.
  3. It’s further orthogonal to one of a number of relatively rare things that can happen and need storage in the frame.
  4. Frames that are never referenced by a garbage collectable object will only ever have a reference count of 1, because they will only be alive by virtue of being either the currently executing frame of a thread, or in its caller chain.
  5. Frames only become referenced by something garbage collectable in cases where we’d end up with some other garbage-collectable allocation anyway. For example, in the closure case, we allocate the code-ref that points to the referenced outer frame.
  6. Let’s assume we were to allocate all frames using the GC, and consider the analysis that would let us known when we are able to avoid those allocations. The analysis needed would be escape analysis.

A new approach: the big picture

Taking these into account, I arrived at a way forward that should, I hope, address most of the issues at hand.

Every thread will have a chunk of memory that we’ll refer to as its “call stack”. Every new frame created during normal program execution will be allocated by making space for it, including its ->work and ->env, on this stack. This will need:

Should this frame ever become referenced by a garbage collectable object, then we will GC-allocate a frame on the garbage-collected heap – as a totally normal garbage-collectable object. The frame state will be copied into this. The work space and environment will also be allocated from the fixed-size allocator, and the data migrated there.

Since this frame is now garbage-collectable, we have to check its ->caller to see if it’s on the thread-local stack, or already been promoted to the heap. If the former, we repeat the above process for it too. This is in order to uphold the key invariant in this design: the thread-local stack may point to things in the garbage-collectable heap, but never vice-versa.

This means the reference counting and its manipulation goes away entirely, and that frames that are heap-promoted become subject to the usual generational rules. Frames that would never be heap-referenced never end up on the heap, don’t add to GC pressure, and can be cleaned up immediately and cheaply.

There are some details to care about, of course. Since generational collection involves write barriers, then binds into frames on the garbage-collectable heap will also be subject to write barriers. Is that OK? There are two cases to consider.

  1. Binding of lexicals. Since most lexicals in Perl 6 point to a Scalar, Array, or Hash in my declarations, or point directly to a read-only object if parameters, this is relatively rare (of course, write barriers apply to the Scalar itself). In NQP, loads of lexicals are lowered to locals already, and we’ll do some more of that in Rakudo too, making it rarer still. Long story short, we can afford write barriers on lexical binds.
  2. Binding of stuff in ->work, which basically means every write into the register set of the interpreter. This, we cannot afford to barrier. However, there are only two cases where a frame is promoted to the heap and has ->work. One case is when it’s still executing, and so in the call chain of a thread. In this case, we can take care to always walk the objects in ->work by simply following the call chain . The second case is when a continuation is taken. But here, there are no binds to registers until the continuation is invoked again – at which point things are back in a thread’s call chain.

Refactoring towards it

The thing that makes this a somewhat scary piece of work is that, in making call frames potentially collectable objects, we break an assumption that has been there since week 1 of MoarVM’s development: that call frames never move. To maximize the chances of discovering problems with this refactor, I decided that step 1 would be to always allocate every single call frame on the heap. Only when that is working would I move on to optimizing away most of those heap allocations by adding the thread-local call stack.

MoarVM currently has 3 kinds of collectable:

So, I added a fourth: call frames. As a result, MVMFrame gains an MVMCollectable at the start of the data structure – which will be present whether it’s stack or heap allocated. This will start out zeroed when a frame is born on the call stack. This does two nice things: it gives us a way to know if a frame is GC-able or not, and also means the write barrier – without modification – will do the right thing on both stack and heap frames.

There were two more easy things to do. First was to add a function to allocate a heap frame. Second was to factor out frame destruction from reference decrement, since the latter was going away.

Beyond that, there was nothing for it besides diving in, breaking the world, and then trying to put it back together again. I got a good start towards it – but the conclusion of this first step will have to wait for next week’s installment! See you then.

6guts: Heap heap hooray!

Published by jnthnwrthngtn on 2016-04-15T11:55:19

Last week, I finally hunted down and fixed the EVAL memory leak, with the help of the heap snapshot analyzer I wrote about recently. I also hunted down a hang in parallel runs of the Perl 6 specification test suite that showed up recently – but only on Windows – and fixed that too.

Before we begin: a generational GC primer

A few posts ago I talked a bit about how MoarVM’s garbage collector works – and hand-waved on just about all of the interesting details. Some folks on IRC expressed a bit of curiosity about those details, so I figured I’d explain some of them as they come up in my performance/reliability work. In this post I’ll talk about generational collection, because it plays a part in the EVAL leak story.

Recall that the essence of a tracing garbage collector is that we start with a set of roots: global symbols, the current call frame that each thread is executing, thread-local symbols, and so forth. We “mark” each object we find in these roots as “alive”, and stick them onto a todo list. We work our way through this todo list, asking each object what it references and putting those onto the todo list also. With a little care to never revisit objects we already considered, we terminate having marked all reachable objects alive. The memory associated with unmarked objects can then be freed.

You might worry that if we have millions of objects in memory, this could be a rather time-consuming process to do again and again. You’d be right. Worse, memory access performance depends heavily on the CPU caches getting high hit rates. When we need to walk huge numbers of objects, we end up getting loads of CPU cache misses, and have to spend time fetching objects from main memory. (To give you an idea of the numbers: a level 1 cache hit means memory access in several CPU cycles, while having to go to main memory can easily cost a couple of hundred cycles or worse).

So, how might we do better? The key insight behind generational GC, often known as the generational hypothesis, is that in most programs objects are either very short-lived (surviving zero or one garbage collections) or long-lived (perhaps staying around for the lifetime of the entire program). Therefore, it’s reasonable to assume that once an object survives a couple of collections, it will stay around for a good few more.

Generational collection works by dividing the heap – that is, the space where objects are allocated – into generations (typically two or three). In MoarVM we have two generations, which we tend to refer to in the code as “nursery” and “gen2”. The nursery, as the name suggests, is where objects begin their life. We allocate them cheaply there using a “bump-the-pointer” scheme. The nursery in MoarVM is a fixed-size chunk of memory, and after a while we fill it up. This is what triggers garbage collection.

In a 2-generation collector, there are two types of collection, which we know as nursery collection and full collection. In a full collection, we do exactly what I described earlier: visit all objects, marking them alive, and freeing up those that aren’t. A nursery collection is similar, except as soon as we see an object is not in the nursery, we don’t put it on the todo list. Instead, we simply ignore it. This greatly cuts down on the number of objects we need to consider, making nursery collections hugely cheaper.

There are two things we must do to make this really work. The first is ensure that we only free memory associated with objects living in the nursery, not the old generation, since we didn’t do the analysis needed to free anything there. That’s fine; most objects “die young” anyway. The second is more subtle. There may be objects in the nursery that are only alive because something in the old generation references them. However, since we’re not considering any old generation objects, we won’t discover this liveness and so wrongly free things. This is resolved by maintaining a set of objects that are in the old generation but pointing to objects in the nursery. Whenever we assign a reference from one object to another, we check if this would establish an old generation to nursery reference, and stick the old generation object into the set, ensuring we will visit it and mark the nursery object. This check is known as a “write barrier”.

So, back to the EVAL story…

The EVAL leak

I reviewed, fixed up, and merged various patches from Timo to improve the heap snapshot data dumps by annotating them with more data. Then, I looked at a few paths to leaked objects (recall that I was using EVAL 'my class ABC { }' to demonstrate the leak). The paths looked something like this:

> path 38199
    --[ Thread Roots ]-->
Thread Roots
    --[ Lexotic cache entry ]-->
Lexotic (Object)
    --[ Unknown ]-->
BOOTStaticFrame (Object)
    --[ Unknown ]-->
BOOTCompUnit (Object)
    --[ Index 81 ]-->
BOOTCode (Object)
    --[ Unknown ]-->
BOOTStaticFrame (Object)
    --[ Unknown ]-->
ABC (STable)

This shows the objects along the path, but those “unknowns” were hiding what I really wanted to know. So, I did some further patches, and got out a rather more useful result:

> path 6466
    --[ Thread Roots ]-->
Thread Roots
    --[ Lexotic cache entry ]-->
Lexotic (Object)
    --[ Static Frame ]-->
BOOTStaticFrame (Object)
    --[ Compilation Unit ]-->
BOOTCompUnit (Object)
    --[ Code refs array entry ]-->
BOOTCode (Object)
    --[ Unknown ]-->
BOOTStaticFrame (Object)
    --[ Spesh guard match ]-->
ABC (STable)

So, here we see that it’s a type specializer guard that is keeping the object alive. “Wait…a what?!” MoarVM does a bunch of dynamic optimization, watching out for types that occur at runtime and generating specialized versions of the code by type. And, sometimes, we have an unfortunate situation where code is discovered “hot”, but the type it was invoked with is fairly transient. In this case, the specialization matching table will end up referring to that type, keeping it alive.

However, since for any given bit of code we only generate a handful of these specializations, eventually we’d saturate them and stop leaking memory. I looked at another path:

> path 10594
    --[ Thread Roots ]-->
Thread Roots
    --[ Lexotic cache entry ]-->
Lexotic (Object)
    --[ Static Frame ]-->
BOOTStaticFrame (Object)
    --[ Compilation Unit ]-->
BOOTCompUnit (Object)
    --[ Code refs array entry ]-->
BOOTCode (Object)
    --[ Unknown ]-->
BOOTStaticFrame (Object)
    --[ Spesh log slots ]-->
QAST::CompUnit (Object)
    --[ Unknown ]-->
SCRef (Object)
    --[ STable root set ]-->
ABC (STable)

This is a case where the optimizer is tracing what objects show up. In fact, most of the paths looked this way. However, that should saturate at some point, yet I know that it goes on leaking. Finally, I found another path to a leak:

> path 59877
    --[ Permanent Roots ]-->
Permanent roots
    --[ Boxed integer cache entry ]-->
Int (Object)
    --[ <SC> ]-->
SCRef (Object)
    --[ STable root set ]-->
ABC (STable)

However, 9 out of the 10 leaked objects were leaked because of the dynamic optimizer keeping things alive that it had seen while tracing. But that, while awkward, should eventually saturate – as should the integer cache issue. But the memory use grew forever, suggesting that things go on and on leaking. So, I tried a snapshot after disabling dynamic optimization. And:

> find stables type="ABC"
Object Id  Description
=========  ===========
368871     ABC

Just the one! 9 out of 10 objects were not on the heap. And yes, it was the integer box cache problem that kept the 1 alive:

> path 368871
    --[ Permanent Roots ]-->
Permanent roots
    --[ Boxed integer cache entry ]-->
Int (Object)
    --[ <SC> ]-->
SCRef (Object)
    --[ STable root set ]-->
ABC (STable)

So in theory, with dynamic optimization disabled, this suggested that we did not leak any more, and all the blame was on the optimizer. To check that out, I tried a long-running EVAL loop and…it still leaked heavily. My theory that dynamic optimization couldn’t account for all of the leaking, just the first bit of it, seemed to hold up.

To investigate it further, I did a loop of 100 EVALs, as opposed to the 10 I had used so far. This took a snapshot every GC run, plus one more that I forced at the end. So, how did the final snapshot look?

This file contains 9 heap snapshots. To select one to look
at, type something like `snapshot 1`.
Type `help` for available commands, or `exit` to exit.

> snapshot 8
Loading that snapshot. Carry on...
> summary
Wait a moment, while I finish loading the snapshot...

    Total heap size:              33,417,225 bytes

    Total objects:                369,541
    Total type objects:           1,960
    Total STables (type tables):  1,961
    Total frames:                 2,077
    Total references:             1,302,511

> find stables type="ABC"
Object Id  Description
=========  ===========
368872     ABC

Only the one. So, according to this, we’re not leaking. But that’s when the loop is over. What about a mid-loop GC? I switched to a snapshot in the middle, and:

> find stables type="ABC"
Object Id  Description
=========  ===========
130353     ABC
376552     ABC

> path 130353
    --[ Permanent Roots ]-->
Permanent roots
    --[ Boxed integer cache entry ]-->
Int (Object)
    --[ <SC> ]-->
SCRef (Object)
    --[ STable root set ]-->
ABC (STable)

> path 376552
    --[ Thread Roots ]-->
Thread Roots
    --[ Lexotic cache entry ]-->
Lexotic (Object)
    --[ Result ]-->
NQPMatch (Object)
    --[ Unknown ]-->
QAST::CompUnit (Object)
    --[ Unknown ]-->
SCRef (Object)
    --[ STable root set ]-->
ABC (STable)

OK, that’s reasonable too: it’s alive because it’s referred to by the compiler, which is run as part of EVALing the code. So what are we leaking? I tried this:

> top objects by size
Name                                   Total Bytes
=====================================  ===============
NQPArray                               7,114,800 bytes
BOOTStaticFrame                        4,806,668 bytes
BOOTInt                                4,642,720 bytes
VMString                               2,859,188 bytes
BOOTHash                               2,253,016 bytes
SCRef                                  1,891,768 bytes
NFAType                                1,886,272 bytes
BOOTCode                               1,448,208 bytes
BOOTNum                                832,096 bytes
Parameter                              783,360 bytes
BOOTStr                                567,936 bytes
BOOTCompUnit                           513,149 bytes
Perl6::Metamodel::ContainerDescriptor  341,496 bytes
QAST::Op                               266,400 bytes
Signature                              208,440 bytes

Then compared with an earlier snapshot

> snapshot 2
Loading that snapshot. Carry on...
> top objects by size
Name                                   Total Bytes
=====================================  ===============
NQPArray                               7,110,920 bytes
BOOTStaticFrame                        4,806,152 bytes
BOOTInt                                4,642,624 bytes
VMString                               2,858,472 bytes
BOOTHash                               2,241,320 bytes
SCRef                                  1,891,696 bytes
NFAType                                1,886,272 bytes
BOOTCode                               1,447,776 bytes
BOOTNum                                832,096 bytes
Parameter                              783,360 bytes
BOOTStr                                567,136 bytes
BOOTCompUnit                           513,149 bytes
Perl6::Metamodel::ContainerDescriptor  341,496 bytes
QAST::Op                               266,112 bytes
Signature                              208,296 bytes

Again, nothing very interesting to note. This highly suggested that either there was some missing information in the heap snapshot, or something else in the VM state – but that got cleaned up at exit – that was also getting leaked.

So I pondered a bit, and compared the GC marking code with the heap snapshot code. And…had an “aha!” moment. Remember I talked about the inter-generational root set that we keep thanks to generational collection? This was not being accounted for in heap snapshots. I fixed it, and the size of the resulting heap snapshot files was a dead giveaway that it made a huge difference:

04/06/2016  14:46       184,272,062 rak-heap-6
04/06/2016  15:21       262,846,653 rak-heap-7

And, back in the analyzer:

> snapshot 2
Loading that snapshot. Carry on...
> find stables type="ABC"
Wait a moment, while I finish loading the snapshot...

Object Id  Description
=========  ===========
21266      ABC
21312      ABC
22359      ABC
23317      ABC
24275      ABC
25233      ABC
26191      ABC
27149      ABC
28107      ABC
29065      ABC
30023      ABC
30981      ABC
361108     ABC
363007     ABC
364903     ABC

So, the objects were there, after all. I took a look at some of them:

> path 24275
    --[ Inter-generational Roots ]-->
Inter-generational Roots
    --[ Index 24269 ]-->
ABC (STable)

> path 30981
    --[ Inter-generational Roots ]-->
Inter-generational Roots
    --[ Index 30975 ]-->
ABC (STable)

> path 363007
    --[ Inter-generational Roots ]-->
Inter-generational Roots
    --[ Index 21244 ]-->
BOOTCompUnit (Object)
    --[ Serialization context dependency ]-->
SCRef (Object)
    --[ STable root set ]-->
ABC (STable)

This would explain a prolonged lifetime, but not an unending leak. I found myself missing a count command so I could easily see how things varied between the snapshots. I implemented it, then observed this:

> snapshot 2
Loading that snapshot. Carry on...
> count stables type="ABC"
> snapshot 5
Loading that snapshot. Carry on...
> count stables type="ABC"
> snapshot 7
Loading that snapshot. Carry on...
> count stables type="ABC"
> snapshot 8
Loading that snapshot. Carry on...
> count stables type="ABC"

The number of objects in the inter-generational set just kept on growing! So, either this workload was just not triggering a gen-2 collection, or there was a bug. How to find out? By doing a normal --profile on the same code, and looking at the output. The summary page stated:

The profiled code did 9 garbage collections. There were 0 full collections involving the entire heap.

OK, so we really never did a full collection. That explains this particular snapshot, but not the leak over time, which surely would end up triggering a full collection at some point. To test that theory, I tweaked a MoarVM header to make full collections happen more often, to see it helped:

diff --git a/src/gc/collect.h b/src/gc/collect.h
index b31a112..af6c456 100644
--- a/src/gc/collect.h
+++ b/src/gc/collect.h
@@ -6,7 +6,7 @@
 /* How many bytes should have been promoted into gen2 before we decide to
  * do a full GC run? The numbers below are used as a base amount plus an
  * extra amount per extra thread we have running. */
-#define MVM_GC_GEN2_THRESHOLD_BASE      (30 * 1024 * 1024)
+#define MVM_GC_GEN2_THRESHOLD_BASE      (1 * 1024 * 1024)
 #define MVM_GC_GEN2_THRESHOLD_THREAD    (2 * 1024 * 1024)

This change made it do a full collection for every 1MB promoted, not every 30MB. Hopefully that would be enough to trigger an some runs. And it did:

The profiled code did 9 garbage collections. There were 4 full collections involving the entire heap.

Much better. So, over in the heap profiler:

Considering the snapshot...looks reasonable!

This file contains 9 heap snapshots. To select one to look
at, type something like `snapshot 5`.
Type `help` for available commands, or `exit` to exit.

> snapshot 2
Loading that snapshot. Carry on...
> count stables type="ABC"
> snapshot 5
Loading that snapshot. Carry on...
> count stables type="ABC"
> snapshot 7
Loading that snapshot. Carry on...
> count stables type="ABC"
> snapshot 8
Loading that snapshot. Carry on...
> count stables type="ABC"

Well, ouch. That implies that the inter-generational root set keeps on growing and growing, for some reason. The profiler, which reports this number, agrees with this assessment (note the numbers of gen2 roots on the right):


So, how could this possibly be happening?

I like to rule out the really silly things first. Like, if the thing that cleans up the inter-generational roots list after a full collection is being called. How? With a printf, of course! :P

diff --git a/src/gc/roots.c b/src/gc/roots.c
index b771106..361b472 100644
--- a/src/gc/roots.c
+++ b/src/gc/roots.c
@@ -301,7 +301,7 @@ void MVM_gc_root_gen2_cleanup(MVMThreadContext *tc) {
     MVMuint32        num_roots    = tc->num_gen2roots;
     MVMuint32        i = 0;
     MVMuint32        cur_survivor;
+printf("cleaning up gen2 roots\n");
     /* Find the first collected object. */
     while (i < num_roots && gen2roots[i]->flags & MVM_CF_GEN2_LIVE)

And…no output. I couldn’t quite belive it. So, I went to the place that this function is called, and noticed there was some logging I could switch on that describes all the ins and outs of a organizing a GC run. That produced plenty of output, and showed it was indeed not reaching the place where it would call the gen2 roots cleanup either.

After some hunting, I discovered that an addition a while ago that tracked the amount of data promoted to the old generation, and used it to decide whether to do a full collection, had resulted in a nasty accident. It did the calculation to check if a full collection was needed in two different places, and the answer could change between them. This led to us not going through all the steps that a full collection would need.

I patched it and…the leak was gone. Even with dynamic optimization re-enabled and full collections back to being run after every promoted 30MB, the ever-increasing memory use of EVAL in a loop was no more. It climbed for a short while, then flattened out.

This fix will likely help numerous longer-running programs that have medium-lifetime objects. It also shaved around 45MB off CORE.setting compilation memory use. Unfortunately, it also caused a 5% slowdown in the Rakudo build-time, presumably because we were now actually doing all the work we should be on full collections!

A bit of tuning

With the leak out of the way – or at least, the big one – I wanted my 5% back. So I took a look at our logic for how we decide whether or not to do a full collection. The strategy so far had been to expect a fixed amount of memory to have been promoted to the old generation (with some additions per thread). However, this was lacking in a couple of ways.

For one, it only accounted for the direct size of objects, not any extra unmanaged memory they held. So, a compilation unit would not factor in the size of the compiled bytecode it held on to, and a dynamic array would not factor in the size of its storage. This was now an easy enough fix thanks to additions made while implementing heap snapshots.

For two, a fixed limit doesn’t behave too well with programs that really do build up a large heap over time. There, we can afford to promote a good bit more beforehand, and percentage wise it won’t make a lot of difference since they’re growing anyway. Building Rakudo’s CORE.setting is like this, as it builds up the program tree. So, I switched to a percentage-based scheme with a minimum threshold, which could afford to be a bit lower than the 30MB from before. These changes not only got Rakudo’s CORE.setting build time back down again (without having to give up much of the memory savings from before), but also had the EVAL loop example having a lower memory ceiling.

So, not only did I hunt down and fix the memory leak, I ended up tuning things to achieve a lower memory ceiling for applications whose memory footprint is fairly fixed over their life them, and less full GC runs for those growing over time.

Hunting down a parallel spectest hang

On Windows, a parallel spectest of high enough degree could cause hangs. I use TEST_JOBS=12, a number I picked after some measuring in the past. In the last couple of weeks, I started seeing hangs – and they went away if I cut down TEST_JOBS to just running 2 or 3 in parallel.

Eventually, helped by make localtest (where you can specify a file listing the tests to run), I managed to get it down to just 2 tests that, when run together, would reliably hang. It turned out one of them spawned another Rakudo process as part of the test, so there were 3 processes involved. Attaching the debugger to each in turn, I saw one was hung on getting a file lock, one was hung waiting for process termination (and was waiting on the process that was blocked on a file lock), and the other was blocked trying to write the STDOUT.

I don’t know for sure, but so far as I can tell, the way the test harness does parallel testing on Windows involves running batches of tests and reading a bit of TAP from them one at a time. The process blocked writing to STDOUT was also the one holding the file lock, but the test harness was, I presume, waiting to read output from the process blocked waiting for the process that was in turn waiting for the file lock. So, a nice circular wait involving 4 processes, one of them being the test harness! Typical darn Friday bug hunting… :-)

This also explained nicely why the issue didn’t crop up away from Windows: parallel spectest works differently (read: better) on other platforms. :-) While we will at some point switch to using a Perl 6 test harness for running the spectests that hopefully behaves consistently everywhre, I figured that Rakudo was probably doing something wrong with regard to the file lock.

File locks are used in only one place in Rakudo: managing pre-compilations. A little instrumentation of the lock/unlock code later, I saw a mis-match. A dump of the stack trace at each place we did a lock/unlock eventually led me to the problem, which I was able to fix. This bug likely didn’t just affect spectest on Windows; I suspect I could construct various hangs on other platforms too from it. So, a good fix to have in there.

As a side-note, the reason this bug was both likely to happen and hard to identify was because the lock and unlock were not placed on the same code-path. This is a good idea for any kind of locking. Locks are horrible to work with anyway; putting lock/unlock far apart in the code is just asking for problems (and this is far from the first problem I’ve hunted down in code with such a structure). So, I’ve asked nine++, as part of his great work to keep improving our installation and precomp handling, to look into addressing this, so we’re structurally in a better place to not have such bugs.

And, for all of you out there using Rakudo’s Lock class: first, don’t, and second, if you must use it, always prefer the $lock.protect({ ... }) form over manual lock/unlock method calls.

Two icky problems less…

So, not a bad last week’s Perl 6 work – though it took me all of this week to get around to writing it up. Maybe I’ll be a little faster writing up this week’s happenings. :-)

6guts: Small, but welcome, fixes

Published by jnthnwrthngtn on 2016-04-05T20:14:20

Last week wasn’t one of my most productive on Perl 6, thanks to a mix of Easter holiday, a little more work than expected on another project, and feeling a tad under the weather for a day or so. In the time I did find, though, I managed to pick off some worthwhile bits and pieces that needed doing.

Heap snapshot issues triage

It’s generally very clean and pleasant that our meta-objects are, so far as the VM is concerned, just objects. This means the answer to “can types be GC’d” is easy: sure they can, so long as nothing is using them any more. The downside is that the VM has very little insight into this huge pile of objects, which is why we have various “protocols” where the MOP can provide some hints. A lot of performance wins come from this. When I was working on heap snapshots, I found it would also be rather useful if there was a way to let the VM know about a “debug name” for a type. A name that isn’t used for any kind of resolution, but that can be included in things like heap snapshots. It’ll no doubt also be useful for debugging. So, I added an op for that, nqp::setdebugtypename, and used it in a branch of NQP, meaning I got more useful heap snapshots. This week, I also added this op to the JVM backend, meaning that I could merge the branch that uses it. This means that you can now do a normal build of Rakudo/NQP/Moar and get useful typenames in the MoarVM heap snapshots.

Last time, I mentioned that the heap snapshot viewer took an age to start up because we were super slow at reading files with really long lines. This week, I fixed this performance bug with two patches. Now, the heap snapshot analyzer picks apart a 25MB snapshot file into its various big pieces (just looking at how each line starts) and reaches its prompt in under a second on my box. That’s a rather large improvement from before, when it took well over a minute.

I also looked at a C-level profile of where MoarVM spends time doing the detailed parsing of a heap snapshot, which happens in the background while the user types their first command, and then blocks execution of that command if it’s not completed. The processing uses a few threads. The profile showed up a couple of areas in need of improvement: we have a lot of contention on the fixed size allocator, and the GC’s world-stopping logic seems to have room for improvement too. So, those will be on my todo list for the future.

Fixing a couple of crashes

SIGSEGV is one of the least inspiring ways to crash and burn, so I’m fairly keen to hunt down cases where that happens and fix them. This week I fixed two. The first was a bug in the UTF8 Clean 8-bit encoding, which turned out to be one of those “duh, what was I thinking” off-by-ones. The second was a little more fun to track down, but turned out to be memory corruption thanks to missing GC rooting.

And…that’s it!

But, this week I’m set to have a good bit more time, so hope to have some more interesting things to talk about in the next report. :-)

hoelzro: Binding to C++ With NativeCall

Published on 2016-03-29T03:03:21

A few months back, I was working on Xapian bindings for Perl 6. I got enough of the binding done for me to use it for what I wanted (using Xapian's stemmers and stoppers), but not enough for me to feel comfortable publishing it on However, what I am comfortable publishing is what I learned about binding a C++ library to Perl 6 using NativeCall!

Read More6guts: Happy heapster!

Published by jnthnwrthngtn on 2016-03-27T21:47:56

In last week’s report, I was working away at a heap snapshot mechanism for MoarVM. If you don’t know what that is, I suggest taking a look at last week’s post for an explanation, and then this one will probably make a whole lot more sense. :-)

This week, I did the rest of the work needed to get heap snapshots being dumped. I’m not going to talk any more about that, since I discussed the ideas behind it last time, and I encountered nothing unexpected along the way.

Instead, I want to discuss my initial work on building a tool to analyze the heap snapshots and get useful data out of them. The snapshots themeselves are mostly just megabytes and megabytes worth of integers separated by commas and semicolons. Big data it ain’t, but it’s big enough that processing it needs at least a bit of thought if we’re going to do it fast enough and without using huge amounts of memory.

I decided early on that I wanted to write this analysis tool in Perl 6. Overall, I’m working on performance, memory use, and reliability. The results from analyzing heap snapshots will help cut down on memory use directly, and identify leaks. Building the analyzer in Perl 6 means I’ll be able to identify and address the performance issues that I run into along the way, helping performance. Further, I could see various opportunities to do parallel processing, giving concurrency things a bit of a workout and maybe shaking some issues out there.

The design

I decided to build a command line application, which would execute queries on the heap snapshot and display the results. Thinking a bit about that, I realized it would break nicely into two pieces: a model that held the snapshot data and provided various methods to interrogate it, and a shell that would parse the queries and nicely display the results. All the stuff that needed performance engineering would lie within the model. In the shell, I didn’t have to worry about that.

I also realized that there was a nice perceived speed trick I could pull. While waiting for the user to type a query, there’s no reason not to get to work on parsing the snapshot and building up a data model of it. In the best case, we’re done by the time they’ve typed their query and hit enter. Even if not, we’ve shaved some time off their wait.

Looking at heap snapshots

A heap snapshot file is really a heap snapshot collection file, in that it can hold multiple snapshots. It starts with some common data that is shared among all the snapshots:

strings: ["Permanent Roots","VM Instance Roots", ...]
types: 18,19;21,22;25,26;27,28;30,31;33,33;30,34;...
static_frames: 10,11,1,12;63,64,13,65;10,67,1,65;...

The first, strings, is just a JSON array of all the strings we refer to anywhere else in the snapshot. The types data set can be seen as representing a table with two columns, both of them being indexes into the strings array. The first column is the name of the representation, and the second is type names (for example, a Perl 6 Rat would have the P6opaque representation and the Rat type). This is so we can understand what the objects in the snapshot are. The static_framesdata set does a similar thing for frames on the heap, so we can see the name, line number, and file of the sub, method, or block that was closed over.

These are followed by the snapshots, which look like this:

snapshot 0
collectables: 9,0,0,0,0,4;5,0,0,0,125030,165;...
references: 2,0,1;2,1,2;2,2,3;2,3,4;2,4,5;2,5,6;...

These describe a graph, rooted at the very first collectable. The collectables are the nodes, and the references the edges between them. The integers for a collectable describe what kind of node it is (object, frame, STable, etc.), its size, how many edges originate from it and where in the references table they are located (we take care to emit those in a contiguous way, meaning we can save on a from field in the references table). The references have a to field – which is actually the third integer – along with some data to describe the reference when we know something about it (for example, array indexes and attribute names in objects can be used to label the edges).

To give you an idea of the number of nodes and edges we might be considering, the snapshot I took to start understanding the EVAL memory leak has 501,684 nodes and 1,638,375 edges – and that is on a fairly well golfed down program. We can expect heap snapshots of real, interesting applications to be larger than this.

We could in theory represent every node and edge with an object. It’d “only” be 3 million objects and, if you strip away all the Perl 6 machinery we’re still pretty bad at optimizing, MoarVM itself can chug through making those allocations and sticking them into a pre-sized array in about 0.6s on my box. Unfortunately, though, the memory locality we’ll get when looking through those objects will be pretty awful. The data we want will be spread out over the heap, with lots of object headers we don’t care about polluting the CPU caches. Even with well-optimized object creation, we’d still have a bad design.

Native arrays to the rescue

Thankfully, Perl 6 has compact native arrays. These are perfect for storing large numbers of integers in a space-efficient way, and tightly packed together to be cache-friendly. So, I decided that my objects would instead be about the data sets that we might query: types, static frames, and snapshots. Then, the code would be mostly focused around ploughing through integer arrays.

The types data set makes a fairly easy example. It has two int arrays of indexes into the strings array, meaning the data set will be stored in two contiguous blobs of memory. Also note the careful use of binding in the BUILD submethod. Perl 6’s array assignment semantics are copying in nature, which is great for avoiding action at a distance (a good default) but not quite what we want when passing around large volumes of data. The type-name and repr-name method just resolve indexes to strings. More interesting are the all-with-type and all-with-repr methods, which simply resolve the search string – if it exists – to an index, and then go through the native int array of type name string indexes to find what we’re after.

Parallel, and background, processing

The BUILD submethod for the overall model starts out by picking the file apart, which is some simple line-based parsing. Given the heap snapshot format has very few lines (though they are very long), there’s nothing too performance sensitive in this code.

Then, at the end, it sets off 3 bits of parallel work to decode the JSON strings, and to parse the type and static frame tables. It’s worth noting that the Types dataset does also need to be constructed with the JSON strings; a simple await is used to declare that dependency. With regard to the snapshots, there’s no point processing all of them. Often, the user will only care about the final one, or a handful of them. At the same time, some background processing is still desriable. Thus, the model has arrays of unparsed snapshots and parsed snapshot promises. When the user indicates interest in a snapshot, we will start to prepare it. At the point it’s actually needed, we then await the Promise for that snapshot.

Over in the shell, we special-case having 1 snapshot in the collection by just selecting it right away, getting on with the processing of it immediately while the user types their query. We also take a little care to let the user know when they performed an operation, but we’re not done with parsing the snapshot yet.

Early results

The first things I implemented were getting a summary of the heap snapshot, along with queries to see the top frames and objects, both by space and count. Here’s how a session with it on a Rakudo heap snapshot starts out:

$ perl6-m -Ilib bin\moar-ha rak-heap
Considering the snapshot...looks reasonable!

This file contains 1 heap snapshot. I've selected it for you.
Type `help` for available commands, or `exit` to exit.

I ask for a summary:

> summary
Wait a moment, while I finish loading the snapshot...

    Total heap size:              28,727,776 bytes

    Total objects:                372,884
    Total type objects:           2,248
    Total STables (type tables):  2,249
    Total frames:                 2,523

And then can ask about the kinds of objects taking up the most space:

> top objects
Name                                   Total Bytes
=====================================  ===============
NQPArray                               7,184,816 bytes
BOOTInt                                4,639,680 bytes
BOOTStaticFrame                        4,195,168 bytes
VMString                               2,804,344 bytes
BOOTCode                               1,486,872 bytes
<anon>                                 1,331,744 bytes
Parameter                              847,552 bytes
BOOTNum                                827,904 bytes
BOOTStr                                546,368 bytes
Perl6::Metamodel::ContainerDescriptor  367,848 bytes
BOOTArray                              344,056 bytes
QAST::Op                               262,656 bytes
<anon>                                 255,032 bytes
Method                                 238,792 bytes
Signature                              228,096 bytes

And by number of objects:

> top objects by count
Name                                   Count
=====================================  =======
BOOTInt                                144,990
NQPArray                               74,270
VMString                               34,440
BOOTNum                                25,872
BOOTCode                               20,651
BOOTStr                                17,074
BOOTStaticFrame                        16,916
Parameter                              6,232
Perl6::Metamodel::ContainerDescriptor  5,109
BOOTHash                               4,775
Signature                              3,168
<anon>                                 2,848
QAST::Op                               2,736
BOOTIntArray                           1,725
Method                                 1,571

Figuring out where those huge numbers of boxed integers come from, along with the arrays we might guess they’re being stored in, will no doubt be the subject of a future week’s post here. In fact, I can already see there’s going to be a lot of really valuable data to mine.

But…that EVAL leak

This kind of analysis doesn’t help find leaks, however. That needs something else. I figured that if I did something like this:

C:\consulting\rakudo>perl6-m --profile=heap -e "for ^20 { EVAL 'my class ABC { }' }"
Recording heap snapshot
Recording completed
Writing heap snapshot to heap-snapshot-1458770262.66355

I could then search for the type tables for the class ABC (which we could expect to be GC’d). That would confirm that they are staying around. So, I implemented a find query, which gave me output like this:

> find stables type="ABC"
Object Id  Description
=========  ===========
42840      ABC
42845      ABC
45288      ABC
64994      ABC
71335      ABC
76824      ABC
78599      ABC
82535      ABC
86105      ABC
146765     ABC
404166     ABC

Those integers on the left are unique IDs for each object on the heap. So, given one of those, I then needed to calculate a path through the heap from the root to this object, to understand why it could not be collected and freed. This would be most useful if I could be told the shortest path. Thankfully, there exists a well-known algorithm that runs in O(Nodes + Edges) time that can annotate all graph nodes with data that lets you then determinte the shortest path to the root in O(maximum path length). (That gives an upper bound of O(Edges) in general, however unlikely that is in the real world. To see why, consider a heap snapshot that consists entirely of a linked list.)

The algorithm, if you didn’t guess yet, is the breadth-first search. The nice thing is that we only have to compute it once, and can then answer shortest path queries really quickly. Here’s the patch that added it. And here’s what it found:

> path 42840
    --[ Thread Roots ]-->
Thread Roots
    --[ Compiling serialization contexts ]-->
BOOTArray (Object)
    --[ Unknown ]-->
SCRef (Object)
    --[ Unknown ]-->
ABC (STable)

Bingo! We’re somehow, in an EVAL, leaving a reference to the serialization context in the “stuff we’re still compiling” array. (Serialization contexts being the things that hold objects we create at compile-time, but want to refer to at runtime, and so must serialize when pre-compiling.) This discovery quickly led to an NQP patch. Which…helped, but we still leak, for a different reason. The good news, however, is that the heap snapshot analyzer could very quickly show that the fix had been successful, and identify the next thing to investigate. You can only imagine how much more frustrating it would be to not have tooling that can do this! Sadly, I didn’t have time this week to dig into the next problem.

In one final bit of heap snapshot analyzer news, Timo forked it and added more features. I love it when I build something and others dig in to make it better. \o/

So, how was Perl 6’s performance?

I discovered 3 things that basically killed performance. The first was Str.Int being horrid slow, and since reading in a heap snapshot is basically all about convering strings to integers that was a real killer. For example:

$ timecmd perl6-m -e "my $a; for ^1000000 { $a = '100'.Int }"
command took 0:1:09.83 (69.83s total)

Awful! But, a 15-minute patch later, I had it down to:

$ timecmd perl6-m -e "my $a; for ^1000000 { $a = '100'.Int }"
command took 0:0:1.87 (1.87s total)

A factor of 37 faster. Not super-fast, so I’ll return to it at some point, but tolerable for now. I guess this’ll help out plenty of other folk’s scripts, so it was very nice to fix it up a bit.

The second discovery was that a thread blocked on synchronous I/O could end up blocking garbage collection from taking place by any thread, which completely thwarted my plans to do the snapshot parsing in the background. This wasn’t a design problem in MoarVM, thankfully. It’s long had a way for a thread to mark itself as blocked, so another can steal its GC work. I just had to call it on a couple of extra code paths.

There’s a third issue, which is that something in MoarVM’s line reading gets costly when you have really long (like, multi-megabyte) lines. Probably not a common thing to be doing, but still, I’ll make sure to fix it. Didn’t have time for it this week, though. Next time!

brrt to the future: FOSDEM and the future

Published by Bart Wiegmans on 2016-03-13T11:20:00

Hi all, I realise I haven't written one of these posts for a long time. Since November, in fact. So you could be forgiven for believing I had stopped working on the MoarVM JIT. Fortunately, that is not entirely true. I have, in fact, been very busy with a project that has nothing to do with perl6, namely SciGRID, for which I've developed GridKit. GridKit is a toolkit for extracting a power network model from OpenStreetMap, which happens to contain a large number of individual power lines and stations. Such a network can then be used to compute the flow of electric power throughout Europe, which can then be used to optimize the integration of renewable energy sources, among other things. So that was and is an exciting project and I have a lot of things to write on that yet. It is not, however, the topic of my post today.

Let's talk about the expression JIT, where it stands, how it got there, and where it needs to go. Last time I wrote, I had just finished in reducing the expression JIT surface area to the point where it could just compile correct code. And that helped in making a major change, which I called tile linearisation. Tile linearisation is just one example of a major idea that I missed last summer, so it may be worthwhile to expand a bit on it.

As I've epxlained at some length before, the expression JIT initially creates trees of low-level operations out of high-level operations, which are then matched (tiled) to machine-level operations. The low-level operations can each be expressed by a machine-level operation, but some machine-level instructions match multiple low-level operations. The efficient and optimal matching of low-level to machine-level operations is the tiling step of the compiler, and it is where most of my efforts have been.

Initially, I had 'tagged' these tiles to the tree that had been created, relying on tree traversal to get the tiles to emit assembly code. This turned out to be a poor idea because it introduces implicit order based on the tree traversal order. This is first of all finicky - it forces the order of numbering tiles to be the same in the register allocator and the tile selection algorithm and again for the code emitter. In practice that means that the last two of these were implemented in a single online step. But more importantly and more troubling, it makes it more complex to determine exactly the extent of live ranges and of basic blocks.

The notion of basic blocks is also one that I missed. Expression trees are typically compiled for single basic blocks at a time. The definition of a basic block is a sequence of instructions that is executed without interruption. This allows for some nice simplifications, because it means that a value placed in a register at one instruction will still be there in the next. (In contrast, if it were possible to 'jump in between' the instructions, this is not so easy to ensure). However, these basic blocks are defined at the level of MoarVM instructions. Like most high-level language interpreters, MoarVM instructions are polymorphic and can check and dispatch based on operands. In other words, a single MoarVM instruction can form multiple basic blocks. For correct register allocation, it is  vital that the register allocator knows about these basic blocks. But this is obscured, to say the  least, by the expression 'tree' structure, which really forms a Directed Acyclic Graph, owing to the use of values by multiple consumers.

The point of tile linearisation is provide an authoritative, explicit order for tiles - and the code sequences that they represent - so that they can be clearly and obviously placed in basic blocks. This then allows the register allocator to be extended to deal with cross-basic block compilation. (In the distant future, we might even implement some form of instruction scheduling). As a side effect, it also means that the register allocation step should be moved out of the code emitter. I've asked around and got some nice papers about that, and it seems like the implementation of one of these algorithms - I'm still biased towards linear scan - is within the range of reasonable, as soon as I have the details figured out. Part of the plan is to extract value descriptors from the tree (much like the tile state) and treat them as immutable, introducing copies as necessary (for instance for live range splitting). The current register allocator can survive as the register selector, because it has some interesting properties in that aspect.

Aside from that I've implemented a few other improvements, like:
From that last bit, I've learned that the way the JIT is currently dealing with annotations is subtly broken, because the following thing can and does happen:
I'm not yet sure how to deal with this. Jonathan implemented a fix last year that introduced a dynamic control label at the start of each basic block. Ultimately, that reinforces this 'stacking' behavior, although it already happened. Ideally, we would not need to store the current location for each basic block just for the few operations that need it. It might instead be possible to refer to the current region in some other way, which is what happens to some extent in exception handling already.

Anyway, that's all for today, and I hope next time I will be able to bring you some good news. See you!

Pawel bbkr Pabian: Running mixed Perl 5 and Perl 6 tests.

Published by Pawel bbkr Pabian on 2016-03-09T18:33:32

Those two tricks are especially useful when refactoring big codebase from Perl 5 to Perl 6. Such process may take weeks or even a months, and you will encounter two cases:

1. Some features are still in Perl 5, some are fully refactored to Perl 6. So you want to
run separate Perl 5 and Perl 6 test files on single prove command. Prove is not very smart. It does not peek into test files to use correct interpreter (Perl 5 is assumed) and it does not recognize ".t6" extension some people use. But there is a solution. First create your test files.


#!/usr/bin/env perl

use v5;
use Test::Simple 'tests' => 1;

ok 1, 'Hello Perl 5';


#!/usr/bin/env perl6

use v6;
use Test;

plan 1;

ok 1, 'Hello Perl 6';

Using shebang is crucial here because it will cause the system to use proper interpreter. Now make tests executable. And explicitly pass empty interpreter to prove, so it won't enforce anything.

$ prove -e ''
t/perl5.t .. ok
t/perl6.t .. ok
All tests successful.
Files=2, Tests=2,  1 wallclock secs ( 0.02 usr  0.00 sys +  0.19 cusr  0.03 csys =  0.24 CPU)
Result: PASS

2. Some feature components are entangled. For example you have communication protocol with client already refactored to Perl 6 but server still in Perl 5. To test them you need to interpolate Perl 6 test within Perl 5 test and combine test output. One of the Perl 5 core modules - Tap::Parser - has just what we need. First create your test files (we will use Perl 6 version from example above).


#!/usr/bin/env perl

use v5;
use Tap::Parser;
use Test::Simple 'tests' => 3;

ok 1, 'Hello Perl 5';

my $parser = TAP::Parser->new( { 'exec' => [ 'perl6', 't/perl6.t' ] } );

while ( my $result = $parser->next ) {
next unless $result->is_test;
ok $result->is_ok, $result->description;

ok 1, 'Bye Perl 5';

Tap parser allows to run any test code using any interpreter from your script and access those test results using nice, OO interface. Line "ok $result->is_ok" is what makes foreign test result your own.

$ perl t/interpolated.t
ok 1 - Hello Perl 5
ok 2 - - Hello Perl 6
ok 3 - Bye Perl 5

This is the very basic way to interpolate tests. As you may notice description output from Perl 6 is a bit messy, also comments, subtests, bailouts are not handled yet. However with excellent TAP::Parser documentation you should be able to implement more complex scenarios in no time.

Stay calm an keep testing!

hoelzro: Finding the most common n-grams in Russian using Perl 6 and HabraHabr

Published on 2016-03-05T07:10:41

I've been teaching myself Russian for some time; in fact, I would probably be a lot better at it if I spent time actually learning Russian instead of thinking of ways of hacking my language learning process…which is exactly what we'll be doing here.

Read Morehoelzro: The State of Multi-Line Input in Rakudo

Published on 2016-02-15T13:07:30

Last week, I created an experimental branch for multi-line input in the Rakudo REPL. I merged this branch on Friday, but I wanted to talk about where we stand, and where I see us going in the future.

Read Announce: Mac OS X Installer for release 2016.01

Published by Tobias Leich on 2016-02-12T12:27:09

Thanks to Steve Mynott a Mac OS X installer is now available.
This installer has the “.dmg” file extension and is available from Announce: Windows MSI Installers for release 2016.01

Published by Tobias Leich on 2016-02-04T21:48:54

The Windows MSI installers are now available, coming again in two versions. One installer targets x86 (32bit) platforms, and the other installer targets x86_64 (64bit) platforms (probably Windows 7 or better). Only the version for x86_64 comes with JIT enabled.
The two MSIs are available from

Death by Perl6: Perl6 Distribution thoughts and proposals (s22)

Published by Nick Logan on 2016-01-31T17:47:00

Currently Distribution1 is just a glorified key/value store. After years of digesting s222 I'm comfortable pushing for its adaption. A common complaint is that its over engineered or too academic. However I would guess many such complaints boil down to not putting in all the considerations of the original drafters. I say this because over the years I've gone from being confused and disliking it to promoting its implementation. Nearly every problem I've encountered or run into I end up finding a solution for in s22. Sometimes these solutions are vague, but their intentions can still be interpreted. So I will lay out my understanding of the Distribution aspect of s22.

The class for installing distributions using CompUnitRepo::Local::Installation. Basically provides the API to access the META6.json file representation of a distribution. It should at least contain the following methods:

method meta
my $meta = $dist.meta;
# Return a Hash with the representation of the meta-data,
# constructed the same way as in the META6.json specification.
# Please note that an actual META6.json file does not need to
# exist, just a representation in that format.
method content
my $content = $dist.content( <provides JSON::Fast lib/JSON/Fast.pm6> );
my $content = $dist.content( <resource images fido.png> );
# Return the octet-stream as specified by the given keys,
# navigating through the META6.json hash.

(note this is an interface, not a class as currently implemented)

role Distribution {
    meta {...}
    content(*@_ -> IO) {...}


method meta is simply for giving hash-like access to the META6. In most cases it would probably just be an alias to a json-from-file generated hash, but it doesn't have to be (nor does the META6 have to exist as a file). I don't think there is much confusion here.

method content is the interesting bit.

currently: Distribution assumes everything is a file. CU::R::I concats relative paths into strings so there is no way for anything other than a file to transfer data.

proposed: Distribution doesn't not care what representation the Distribution is as it just uses .content, so content may feed it data from a socket, a file, or even a hash


$dist.content(<provides Module::XXX lib/Module/XXX.pm6>)

(*note: key names found via method .meta)

This would return the raw data of whatever that set of keys point to (in this case lib/Module/XXX.pm6) so that CU::R::I can save that data anywhere it wants on the filesystem. So when CU::R::I gets the Distribution object the Distribution does not even have to exist yet; CU::R::I is just going to get the raw data and save it to (for instance) rakudo/install/perl6/site/XXX.pm6


"resource" : {
    "images" : [
        "fido.png", "zowie.png"
    "libraries" : {
        "inline_helper" : "build-time",

The current implementation of resource is rather lacking. Right now it just takes an array of paths: "resources" : ["libraries/libfoo", "xxx.img"]
The s22 design would allows for:

  1. each directory is its own hash key (so not "dir/dir2/xxx.txt" but rather "dir" : ["dir2" : "xxx.txt"])
  2. each file is not directly part of a string that contains its directory (no directory separators involved)
  3. Arbitrary data can be attached on leaf nodes; if a leaf node is a hash
    then its meant to be understood by a package manager and can be ignored by rakudo/compunit (as these might mean different things to specific package managers).

Let us look at the libraries example above; the arbitrary data here is build-time. This may tell a package manager something about libraries, so for this example we will say it tells the build phase we will generate a file called inline_helper that does not exist yet (so take this into account when creating a manifest/uninstall). It may also be that the package manager simply added it itself so that later it can look up that info [think dependency chain])

But its more useful than that. A package manager could then allow a command like <pkger> build-time . to run the build hook manually (similar to how npm scripts is).
Or allow explicitly requesting that $*VM.platform-library-name be applied (or explicitly supplying a type of "-or" => ["", "lib.dll"] to say one of these will exist). Remember, CU::R::I doesn't need to understand these outer hash nodes so any code meant to interpret these would be in whatever Distribution object (or package manager).

t/ hooks/ bin/:

resources/ does not belong in this group. Why? Because t/ hooks/ bin/ are special folders whereas resources can contain special files (anything with a hash leaf is considered a special file). I will say I can not explain why these special folders are not required entries in the META. I can only guess its because:

  1. These folders probably won't end up with any build time generated files meant to be installed (like resources)
  2. The files do not get tied to a specific name (like provides style Module::Foo => 'some/path')

So in the prototype distributions I code generally include a .files method to allow access to the files in these special folders that are not included in the META6. This is ok as none of these files have special properties (like resources) nor have to associate with a name (like provides).


Currently CompUnit::Repository::Installation has a signature3 of:

method install(Distribution $dist, %sources, %scripts?, %resources?, :$force)

We'll ignore the flaw of 2 optional positional hashes, but I do have to point out the pointlessness of passing in the 3 hashes at all. The $dist should already know all of this, and if not it should be able to generate it on the fly whenever CompUnit::Repository::Installation makes a request. So when its time to install the sources it would call something like

for $dist.meta<provides>.kv -> {
    $save-to.spurt: $dist.content(["provides", $_.key, $_.value])

instead of relying on a hash that gets passed in with the exact same data. In other words we want: method install(Distribution $dist, :$force)

I ask: is it safe to dismiss s22 Distribution for something else, when the something else is probably not considering many of the things that s22 does (but the proposer may not have thought of yet)? I'm willing to assuming quite a few hours were put into s22, and I'd hope anyone wanting to do something different puts in an equal amount of time to understand the problems its meant to solve before we say "lets just rewrite it because I understand this model I've built in my head over $some-fraction of the hours that were put into s22"

(following the programming story of programmer Foo insisting on removing a piece of code to programmer Bar, but programmer Bar denies this course of action because programmer Foo is unable to answer what its original purpose was. Foo wants to change code he does not understand into code he does understand, but Bar knows you should not change code you do not fully understand yet).

Again, I've been guilty of this as well. But I don't want to see the hack-y workarounds I've used for zef's4 Distribution objects over the last 2 years (see the bottom of this post) to somehow make it as the default implementation because they are easier to understand initially... they were developed as hacks until s22 was implemented after all.

A simpler interface could be used internally since the CompUnit::Repository already knows the actual location of the distribution it will load. It will also know what type of Distribution to use because the CompUnit installed it itself. This means the CompUnit can save it as any type of Distribution it wants; it may get Distribution::Tar but that does not mean CompUnit::Repository can't save it as a Distribution::Local (which means you no longer need Distribution::Tar to reinstall it when you upgrade rakudo). However a simpler interface being the default trades making it only a few lines easier to install a Distribution for taking away some of the flexibility s22 provides.

To wrap this up: I think we have been avoiding some/many of these things because at the time we did not understand why they were designed that way to begin with. I would ask that instead of requiring one to explain why we should use the s22 version of Distribution that anyone that would like to see otherwise explain what is actually wrong with it and what they think the original intention was. I do not think anyone should push for alternatives unless they can explain what they feel the original intention was meant to solve and whats wrong with the design decision. If these cannot be answered then it could be inferred that you may not have considered or be aware of certain problems that will be encountered. I know i've certainly been guilty of this in the past, but years of working on zef has only shown me the answers to most problems/features were in fact already solved in s22 in some way (and that I simply had not understood everything fully before).

Examples / Code

The first 2 examples each show multiple Distribution objects fulfilling the s22 interface (one using roles, one using classes):
Demo/prototype of Distribution::Local and Distribution::Tar

Demo/prototype of Distribution::Local and Distribution::GitHub (with a modified CompUnit::Repository::Installation to use .content and .meta methods)

A type of Distribution::Simple that can be created using the previously mentioned s22 implementation (but making this the actual interface leaves it too simple/limited to be the actual API; let the core have an API that allows slightly lower level access to the install process and leave the ::Simple style 'method per META6 root key' wrappers for distributions to implement themselves (as such implementations are simple enough to not require a core Distribution::Simple) or make a simple API available through a different CompUnit::Repository (CompUnit::Repository::Installation::Simple)
zef's Hack-y Distribution::Simple style implementation

Note: ioify is just a way to "absolutify" the relative paths contained in a META. In reality the data doesn't have to come from a file, but it may be used to absolutify a relative url or anything else that could transform a urn to an actual location.

  1. Distribution

  2. s22

  3. CompUnit::Repository::Installation.install

  4. zef

hoelzro: Anonymous State Variables And How They Work

Published on 2016-01-27T14:57:00

When debugging code, I will often add a counter variable to a loop so I can keep track of what's going on, or so that I can process a fraction of my data set while I'm iterating on a piece of code:

Read Morehoelzro: Getting End-of-Document POD and Declarative POD to Play Nice in Perl 6

Published on 2016-01-20T05:26:25

When I wrote more Perl 5, than I do today, I followed Damian Conway's advice about documentation and embraced the so-called end of document style:

Read Morebrrt to the future: Rumors of JITs' demise are greatly exaggerated.

Published by Bart Wiegmans on 2015-10-12T19:38:00

Earlier this week my attention was brought to an article claiming that the dusk was setting for JIT compilation. Naturally, I disagree. I usually try to steer clear of internet arguments, but this time I think I may have something to contribute. Nota bene, this is not a perl- or perl6 related argument, so if that is strictly your interest this is probably not an interesting post for you.

The main premise of the argument is that people are shifting away from JIT compilation because the technique has failed to live up to its promises. Those promises include, in various forms, high level languages running 'as fast as C', or having more optimization possibilities than ahead-of-time (AOT) compilers do. Now my perspective may be a bit unusual in that I don't actually expect momentous gains from JIT compilation per se. As I've described in the talk I gave at this years' YAPC::EU, by itself JIT compilation removes only the decoding and dispatch steps of interpretation, and - depending on the VM architecture - these may be a larger or smaller proportion of your running time. However, my thesis is that interpretation is not why high-level languages are slow, or rather, that interpretation is only one of the many sources of indirection that make high-level languages slow.

First of all, what of the evidence that JITs are actually in demise? The author provides three recent trends as evidence, none of which I hold to be decisive. First, both Windows 10 and the newest versions of Android translate .NET and Dalvik applications respectively to native code at installation time, which is properly considered ahead of time compilation. Second, high-performance javascript applications are currently often created using tools like emscripten, which compiles to asm.js, and this is in many ways more similar object code than it is to a high-level language, implying that the difficult bit of compilation is already behind us. (I agree mostly with that assesment, but not with its conclusion). Finally, on iOS devices JIT compilation is unsupported (except for the JIT compiler in the webkit browser engine), allegedly because it is insecure.

As to the first piece, the author suggest that the main reason is that JIT compilers being unpredictable in their output, at least relative to optimizing ahead-of-time compilers. I think that is nonsense; JIT compilation patterns tend to be quite reliably the same on different runs of the same program, a property I rely on heavily during e.g. debugging. The output code is also pretty much invariant, with an exception being the actual values of embedded pointers. So in my experience, what you see (as a developer) is also what you get (as a user), provided you're using the same VM. I humbly suggest that the author believes JITs to be unreliable because his work is being compiled by many different VMs using many different strategies. But I see that no differently than any other form of platform diversity. Maybe the author also refers to the fact that often optimization effectiveness and the resultant performance of JIT compiled applications is sensitive to minor and innocuous changes in the application source code. But this is true of any high-level language that relies primarily on optimizing compilers, for C as much as for python or javascript. The main difference between C and python is that any line of C implies far fewer levels of indirection and abstraction than a similar line of python.

I think I have a much simpler explanation as to why both Google and Microsoft decided to implement ahead-of-time compilation for their client platforms. The word 'client' is key here; because I think we're mostly talking about laptops, smartphones and tablets. As it turns out, hardware designers and consumers alike have decided to spend the last few years worth of chip manufacturing improvements on smaller, prettier form factors (and hopefully longer battery life) rather than computing power. Furthermore, what Qualcomm, Samsung etc. have given us, Photoshop has taken away. The result is that current generation portable devices are more portable and more powerful (and cheaper) than ever but are still memory-constrained.

JIT compilation inevitably comes with a significant memory cost from the compiled code itself (which is generally considerably larger than the interpreted code was), even when neglecting the memory usage of the compiler. Using various clever strategies one can improve on that a bit, and well-considered VM design is very important as always. But altogether it probably doesn't make a lot of sense to spend precious memory for JIT-compiled routines in a mobile setting. This is even more true when the JIT compiler in question, like Dalviks', isn't really very good and the AOT compiler has a good chance of matching its output.

Now to the case of asm.js. As I said, i agree mostly that a significant amount of work has already been done by an ahead-of-time compiler before the browser ever sees the code. It would be a mistake to think that therefore the role of the JIT (or rather the whole system) can be neglected. First of all, JIT-compiled code, even asm.js code, is greatly constrained in comparison to native code, which brings some obvious security benefits. Second of all, it is ultimately the JIT compiler that allows this code to run cross-platform at high performance. I think it is mistaken to suggest that this role is trivial, and so I see asm.js as a success of rather than evidence against JIT compilation as a technique.

Next, the iOS restriction on JIT compilation. I think the idea that this would be for security reasons is only plausible if you accept the idea that application security is significantly threatened by dynamic generation of machine code. While I'm sure that the presence of a JIT compiler makes static analysis very difficult - not to say impossible - I don't believe that this is the primary attack vector of our times. The assertion that memory must be both writable and executable for a JIT compiler to work is only superficially true, since there is no requirement that the memory must be both at the same time, and so this doesn't imply much of a threat (So called W^X memory is becoming a standard feature of operating systems). Vtable pointers stored in the heap, and return addresses on a downward-growing stack, now those are attack vectors of note.

But more importantly, that is not how mobile users are being attacked. It is much more interesting, not to mention significantly easier, for attackers to acquire whole contact books, private location information, credentials and private conversations via phishing and other techniques than it is to corrupt a JIT compiler and possibly, hopefully, and generally unreliably gain remote execution. Most of these attack vectors are wide-open indeed and should be prevented by actual security techniques like access control rather than by outlawing entire branches of computing technology. Indeed, an observer not sympathetic to Apple could probably relate this no-JIT compilation rule with the Californian company's general attitude to competing platforms, but I will not go further down that path here.

Finally, I think the claim that JIT compilation can't live up to its promise can readily be disproven by a simple google search. The reason is simple; the JIT compiler, which runs at runtime, has much more information at its disposal than even the best of ahead-of-time compilers. So-called profile-guided optimization help to offset the difference, but it is not a common technique, moreover that is still only a small subset of information available to a JIT compiler. The fact that many systems do not match this level of performance (and MoarVM's JIT compiler certainly doesn't) is of course relevant but not, in my opinion, decisive.

In conclusion, I would agree with the author that there are many cases in which JIT compilation is not suitable and in AOT compilation is. However, I think the much stronger claim that the dusk is setting on JIT compilation is unwarranted, and that JIT compilers will remain a very important component of computing systems.

brrt to the future: Most Significant Bits

Published by Bart Wiegmans on 2015-09-20T14:12:00

This week I think I fixed irregular behavior in the x64 instruction encoding register selction of DynASM. I think it'll be a fun story to share, so I thought it'd be time to blog.

The astonishingly irregular thing about x64 instruction encoding is that it is mostly very regular. Ignoring for the moment instruction prefixes and constants, an x86 instruction consists of two bytes, one for the instruction proper, and one for it's two operands. Take for example the addition of two registers:

| add | eax, ecx |
| 0x01 | 0xc8 |

We take the instruction byte as given. It is the second byte that concerns me because it determines which operands to use and how. Like a good CISC architecture, the x86 supports a number of addressing modes, meaning that a register can be used as a value but also as a (part of a) pointer. One of the reasons C does pointer arithmetic so freely is that this reflects the nature of the CPU's which where current when C was designed. The (relevant) x86 addressing modes are shown in the following table. (There are more, but you shouldn't use them):

Addressing ModeByte flagMeaning
Direct0xc0Both operands are used as direct values
Indirect0x00One of the operands is used as a memory reference, whether the source of destination operand depends on the instruction
Indirect with offset0x80 or 0x40One of the operands is used as a memory reference, offset by a constant which is encoded directly after the instruction.
Indexed0x04One operand consists of two registers base and index, which is multiplied by a scale, to provide a memory reference. Optionally also has an offset, in which case the code byte is 0x44 or 0x84, depending on whether the offset fits in a single byte or not.
Instruction-relative0x05One of the operands is a reference to the current location in the code offset by a constant (and the other refers to a register as usual).

Readers which are more careful than can be reasonably expected will have noticed that in the first three addressing modes, the lowest nibble is zero, whereas it is nonzero for the lower two addressing modes. This is in fact the source of irregularities in instruction encoding. To appreciate this it helps to unpack the operand byte in octal rather than hexadecimal. Octal is much closer to how x86 thinks about the world. As demonstrated in this table, the lowest two pairs of 3 bits each encode the actual registers that should be used.

| Byte | Mode | Op2 | Op1 | Meaning |
| 0xc0 | 3 | 0 | 0 | Direct |
| 0x80 | 2 | 0 | 0 | Indirect |
| 0x04 | 0 | 0 | 4 | Indexed |

The upshot of this is that in case the operand mode isn't direct, and the first operand register is either 4 or 5, the meaning of the operand byte is completely different. x86 suddenly expects another operand byte (a so-called SIB byte) to specify which register shall be base and which shall be index.

Normally this isn't much of a problem since the registers refered to by number 4 and 5 are rsp and rbp respectively; meaning the stack top and stack bottom registers. Fun fact: the x86 stack grows downward, so rbp > rsp in basically all cases. Also fun fact: because of this, writing from a rsp relative reference upwards can overwrite the return pointer held somewhere below rbp, which is the basis of most buffer overflow attacks. You thought NULL was a billion dollar mistake? Consider how the engineers that decided the stack should grow downward must feel.

Anyway, considering that rbp and rsp take up such a pivotal role in your program, it's actually unlikely you'll encode them by mistake. So as long as you don't do that, it's safe to ignore this complexity and just 'add in' the correct bits into the operand byte. Thus, for instance, to refer to the 7th and first register respectively in direct mode, we generate:

0300 + (07 << 3) + (01) == 0371 == 0xf9

However, in the land of x64, things are not so happy. I have blogged earlier about how the x64 architecture gives you 8 extra registers on top of the 8 legacy x86 registers, and how the difference between those registers lies only in that x64 specifies a prefix byte called REX. REX byte encoding is not so difficult, the trick is to find it reliably.  But because the lower three bits of the 4-bit register number are placed in the operand byte, register r12 and r13 look exactly like rsp and rbp to the CPU. Well, that's where the fun really starts, because it's all too easy to encode these 'accidentally'. They are after all perfectly regular registers.

For those not keeping score, we have two special cases to handle. First, whenever the first operand is either rsp or r12 and we're not using direct mode, an extra SIB byte needs to be encoded to specify that we are really talking about accessing rsp/r12 directly. This is done by encoding rsp as both the base and index, which the x86 understands because using rsp as an index is usually illegal. (The magic byte is thus 0044 or 0x24). Second, whenever the first operand is rbp or r13 and we're using indirect access without an offset, we need to encode indirect access with an offset instead, just with the offset at zero. This of course requires another byte. Somewhat byzantine, but manageable.

We are, unfortunately, not completely OK yet. It is my central hypothesis of this post that DynASM was not designed to handle register selection at runtime. Evidence for this hypothesis is that DynASM does 'weird' things like mix data and instructions and linking prior to encoding. Usually one encodes first and links afterwards, especially when during encoding you may need to make decisions that influence the final positions of certain segments. DynASM does it the other way around, which means that during linking we should be able to calculate exactly how much space we need for each instruction. Which is a pain, because DynASM mixes the data stream (which we need for inspection) with the instruction stream (which tells the runtime what to do with its input). It's possible to hack around this - basically by copying data into the instructions - but it's not elegant. Starting with this commit, I'm reasonably confident that this stuff works, a test case is provided here.

That almost concludes this weeks madness. The only thing left is to question the method. Why should x86 use this convoluted scheme? I could go on a detailed historical research, but I prefer to speculate it is caused by economy in memory. After all, in the regular case you need only 2 bytes, which is - conveniently - equal to 16 bit, the original register size of the 8086. And since that chip was designed in the 1970s, it makes sense instructions should be as space-efficient as possible. In contrast, ARM uses 32 bit instructions with 3 operands. So space economy seems a plausible cause to me.

See you next time!

brrt to the future: Wrapping Up

Published by Bart Wiegmans on 2015-09-14T08:31:00

Hi everybody, it's been a while since I've blogged, and I think you deserve an update. Last week, of course, was YAPC::EU, which was awesome. Granada is a very nice place, and the weather was excellent. Tapas lunch was very nice, as was the gala diner (also with tapas). There were many interesting people and presentations (many more than I could actually see). It was also very interesting to present (slides) about the JIT, which I think went well. One of the comments I heard was that it was a quite high-level talk, so if anyone should ask, I can and will give a talk describing the grueling details in the future. Oh, and I just found that Larry Wall's keynote has just been uploaded to youtube. Go ahead and watch, this page isn't going anywhere.

So what news from JIT compiler land? Well, since last I've blogged I had to design and implement the register allocator and tie everything together into the final running compiler. That has been achieved, but there are still many bugs and limitations that prevent it from actually being effective. I will talk about these at length. But first, let me explain the register allocator.

The basic problem of register allocation is that a compiler can assign more values than the CPU has registers. Thus, only some values can reside in registers and others must be stored to memory (spilled). Computing the best set of values to keep in registers is in general an impossible problem. To solve it I use the 'linear scan' allocation heuristic. This heuristic is as simple as can be - determine for each value allocated it's last use, and when it's time to spill a value, spill the one which will be live furthest in the future. These slides also do a good job of explaining it.

Aside from being simple to implement, it's also effective. Values that will expire soon are likely to be used soon as well, so their register will free up soon enough. On the other hand, values that are bound to live a long time will likely be spilled before they expire anyway, so it costs less to spill them now. Another benefit is that this can be evaluated online, i.e. as the JIT tree is being processed, so that it doesn't require a separate step. (One abstraction it did require, though, was order-numbering the tiles, which are currently attached to the JIT tree. This makes the tiles in essence a linear list, and it is my plan to convert them to a linear array in memory as well. That would reduce the number of tree traversals by one as well).

I will not maintain the illusion here that a register allocator is a trivial component, even with a conceptually simple algorithm as just outlined. One of the tricky bits is ensuring that values defined in a conditional branch do not accidentally 'escape' out of their branches, since they will be unavailable if the branch was not taken. In the same vein after a call all 'volatile' registers are invalidated, so that also requires some special treatment.

After the register allocator was finished, all that remained was ironing out the bugs. And they were (and are) many. The most annoying of these were the DynASM machine code generation bug. The DynASM runtime is a tight state machine that uses a mix of data and instructions to generate machine code. The first bug was relatively simple - a missing REX byte marking caused by the preprocessor looking at only one of the two operands. The second bug was positively evil. See, x86 uses so called ModR/M bytes to specify the register and memory access mode used for an instruction. It's not important you know what it stands for, but what is important is that each byte - 8 bits - is divided into a mode of 2 bits and 2 register numbers of 3 bits each. (There are 8 legacy registers in x86, so that fits). Except when the register number is 4 (rsp or the stack pointer register). Then the whole meaning of the byte changes to a SIB byte, which is quite different entirely - it refers to two registers at once. The upshot is that this SIB byte now must be appended to the ModR/M byte and filled with correct data; and that this SIB byte is then interpreted as if it were a ModR/M byte anyway.. I've patched DynASM to do this, but it is really quite sensitive and brittle and I quite expect to have to fix this in the future in another way.

That brings me to today. Unfortunately for my JIT aspirations, my educational obligations have caught up with me again. In other words: my study has started and leaves me with much less time to work on the JIT. So for clarity, I have completed in the last few months the following:
Timo Paulssen has developed a C-source to expression tree format preprocessor, which should in the best case help quickly convert most of the old JIT compiler segments to the new, when this is more stable.

Here is what I intended (and indeed promised) I would do, and haven't finished yet:
And there are things which I haven't outlined explicitly but which I still want to do:
And there are many more. But as I said, I cannot continue with them as I have in the past few months. I will continue to work to stabilise the JIT and I hope to succeed in that before Christmas, only at a slower pace than before. I realise it is somewhat disappointing not to be able to fully 'land' the new JIT yet, but I'm confident we'll get there in the end. See you next time!

brrt to the future: Moar JIT news

Published by Bart Wiegmans on 2015-11-29T15:22:00

Hello there, I thought it high time to write to you again and update you on the world of JITs. Since last I wrote, PyPy 4.0 was released. Also in python-land, Pyston 0.4 was released, and finally Guile 2.1.1 was released and Andy Wingo wrote a nice piece about that, as is his custom. I present these links not only to give these projects the attention they deserve, but also because I think they are relevant to our own project.

In chronological order, the release of PyPy 4.0 marked the first 'production' release of that projects' autovectorizer, which was developed over the course of this years Google Summer of Code. I'd like to take this opportunity to publicly congratulate the PyPy team on this achievement. So called 'vector' or SIMD operations perform a computation on multiple values in a single step and are an essential component of high-performance numerical computations. Autovectorizing refers to the compiler capability to automatically use such operations without explicit work by the programmer. This is not of great importance for the average web application, but it is very significant for scientific and deep learning applictions.

More recently, the Pyston project released version 0.4. Pyston is another attempt at an efficient implementation of Python, funded by Dropbox. Pyston is, or I should rather say, started out based on llvm. Most of my readers know of LLVM; for those who don't, it is a project which has somewhat revolutionised compiler development in the last few years. Its strengths are its high-quality cross-platform code generation with a permissive license. LLVM is also the basis for such languages as rust and julia. Notable weaknesses are size, speed, and complexity. To make a long story short, many people have high expectations of LLVM for code generation, and not without reason.

There are a few things that called my attention in the release post linked above. The first thing is that the Pyston project introduced a 'baseline' JIT compiler that skips the LLVM compilation step, so that JIT compiled code is available faster. They claim that this provides hardly a slowdown compared to the LLVM backend. The second thing is that they have stopped working on implementing LLVM-based optimisation. The third thing is that to support more esoteric python feature, Pyston now resorts to calling the Python C API directly, becoming sort of a hybrid interpreter. I would not be entirely surprised if the end point for Pyston would be life as a CPython extension module, although Conways law will probably prohibit that.

Pyston is not the first, nor the only current JIT implementation based on LLVM. It might be important to say here that there are many projects which do obtain awesome results from using LLVM; julia being a prime example. (Julia is also an excellent counterexample to the recent elitist celebration of self-declared victory by static typing enthusiasts assertion that 'static types have won', being very dynamic indeed). But Julia was designed to use the LLVM JIT, which probably means that tradeoffs have been made to assure performance; and it is also new, so it doesn't have to run as much weird legacy code; the team is probably highly competent. I don't know why some mileages vary so much (JavascriptCore also uses LLVM successfully, albeit as it's fourth and last tier). But it seems clear that far from being a golden gun, using LLVM for dynamic language implementations is a subtle and complex affair.

Anybody willing to try building an LLVM-backed JIT compiler for MoarVM, NQP, or perl6 in general, will of course receive my full (moral) support, for whatever that may be worth.

The posts by Andy Wingo, about the future of the Guile Scheme interpreter, are also well worth reading. The second post is especially relevant as it discusses the future of the guile interpreter and ways to construct a high-performance implementation of a dynamic language; it generalizes well to other dynamic languages. To summarise, there are roughly two ways of implementing a high-performance high-level programming language, dynamic or not, and the approach of tracing JIT compilers is the correct one, but incredibly complex and expensive and - up until now - mostly suitable for big corporations with megabucks to spend.
 Of course, we aim to challenge this; but for perl6 in the immediate future correctness far outranks performance in priority (as it should).

That all said, I also have some news on the front of the MoarVM JIT. I've recently fixed a very longstanding and complex bug that presented itself during the compilation of returns with named arguments by rakudo. This ultimately fell out as a missing inlined frame in the JIT compiler, which ultimately was caused by MoarVM trying to look for a variable using the JIT-compiler 'current location', while the actual frame was running under the interpreter,and  - this is the largest mystery - it was not deoptimized. I still do not know why that actually happened, but a very simple check fixed the bug.

I also achieved the goal of running the NQP and rakudo test suites under the new JIT compiler, albeit in a limited way; to achieve this I had to remove the templates of many complex operations, i.e. ops that call a C function or that have internal branches. The reason is that computing the flow of values beyond calls and branches is more complex, and trying to do it inline with a bunch of other things - as the new JIT has tried to so far - is prone to bugs. This is true especially during tree traversal, since it may not be obvious that computations relying on values may live in another context as the computations that generate these values.

In order to compile these more complex trees correctly, and understandably, I aim to disentangle the final  phases of compilation, that is, the stages of instruction selection, register allocation, and bytecode generation. Next to that I want to make the tiler internals and interface much simpler and user-friendly, and solve the 'implied costs problem'. The benefit of having the NQP test suite working means I can demonstrate the effects of changes much more directly, and more importantly, demonstrate whether individual changes work or not. I hope to report some progress on these issues soon, hopefully before christmas.

If you want to check out the progress of this work, checkout the even-moar-jit branch of MoarVM. I try, but not always successfully, to keep it up-to-date with the rapid pace of the MoarVM master branch. The new JIT only runs if you set the environment variable MVM_JIT_EXPR_ENABLE to a non-empty value. If you run into problems, please don't hesitate to report on github or on the #moarvm or #perl6 channels on freenode. See you next time!

Strangely Consistent: Macros: what the FAQ are they?

Published by Carl Mäsak

Thank you sergot++ for eliciting these answers out of me, and prompting me to publish them.

Q: Is it common to be totally befuddled by all these macro-related concepts?

Yes! In fact, that seems to be the general state of mind not just for people who hear about these things now and then, but also for those of us who have chosen to implement a macro system! 😝

Seriously though, these are not easy ideas. When you don't deal with them every day, they naturally slip away from your attention. The brain prefers it that way.

Q: I keep seeing all these terms: quasis, unquotes, macros... I think I know what some of them are, but I'm not sure. Could you explain?

Yes. Let's take them in order.

Q: OK. What the heck is a quasi?

Quasis, or quasiquotes, are a way to express a piece of code as objects. An average program doesn't usually "program itself", inserting bits and pieces of program code using other program code. But that's exactly what quasis are for.

Quasis are not strictly necessary. You could create all those objects by hand.

quasi { say "OH HAI" }        # easy way"say"),"OH HAI")
)));                          # hard way

It's easier to just write that quasi than to construct all those objects. Because when you want to structurally describe a bit of code, it turns out the easiest way to do that is usually to... write the code.

(By the way, take the Q::Block API with a huge grain of salt. It only exists for 007 so far, not for Perl 6. So the above is educated guesses.)

Q: Why would we want to create those Q objects? And in which situations?

Excellent question! Those Q objects are the "API for the structure of the language". So using it, we can query the program structure, change it during compile time, and even make our own Q objects and use them to extend the language for new things.

They are a mechanism for taking over the compiler's job when the language isn't flexible enough for you. Macros and slangs are a part of this.

Q: Do you have an example of this?

Imagine a format macro which takes a format string and some arguments:

say format "{}, {}!", "Hello", "World";

Since this is a macro, we can check things at compile time. For example, that we have the same number of directives as arguments after the format string:

say format "{}!", "Hello", "World";                        # compile-time error!

In the case of sufficient type information at compile time, we can even check that the types are right:

say format "{}, robot {:d}!", "Hello", "four-nineteen";    # compile-time error!

Q: Ooh, that's pretty cool!

I'm not hearing a question.

Q: Ooh, that's pretty... cool?

Yes! It is!

Q: Why is it called "quasiquote"? Why not "code quote" or something?

Historical reasons. In Lisp, the term is quasiquote. Perl 6's quasiquotes are not identical, but probably the nearest thing you'll get with Perl 6 still being Perl.

Traditionally, quasiquotes have unquotes in them, so let's talk about them.

Q: Right. What's an unquote?

In Java, you have to write something like "Hello, " + name + "!" when interpolating variables into a string. Java developers don't have it easy.

In Perl, you can do "Hello, $name!". This kind of thing is called "string interpolation".

Unquotes are like the $name interpolation, except that the string is a quasi instead, and $name is a Qtree that you want to insert somewhere into the quasi.

quasi {
    say "Hello," ~ {{{$name}}};

Just like the "Hello, $name" can be different every time (for example, if we loop over different $name from an array), unquotes make quasis potentially different every time, and therefore more flexible and more useful.

To tie it to a concrete example: every time we call the format macro at different points in the code, we can pass it different format strings and arguments. (Of course.) These could end up as unquotes in a quasi, and thus help to build different program fragments in the end.

In other words, a quasi is like a code template, and unquotes are like parametric holes in that template where you can pass in the code you want.

Q: Got it! So... macros?

Macros are very similar to subroutines. But where a sub call happens at run time, a macro call happens at compile time, when the parser sees it and knows what to send as arguments. At compile time, it's still early enough for us to be able to contribute/modify Q objects in the actual program.

So a macro in its simplest form is just a sub-like thing that says "here, insert this Qtree fragment that I just built".

Q: So quasis are used inside of a macro?

Yes. Well, they're no more tightly tied to each other than given and when are in Perl, but they're a good fit together. Since what you want to do in a macro is return Q objects representing some code, you'd naturally reach for a quasi to do that. (Or build the Q objects yourself. Or some combination of the two.)

Q: Nice! I get it!

Also not a question.

Q: I... get it?

Yeah! You do!

Q: Ok, final question: is there something that you've omitted from the above explanation that's important?

Oh gosh, yes. Unfortunately macros are still gnarly.

The most important bit that I didn't mention is hygiene. In the best case, this will just work out naturally, and Do What You Mean. But the deeper you go with macros, the more important it becomes to actually know what's going on.

Take the original quasiquote example from the top:

quasi { say "OH HAI" }

The identifier say refers to the usual say subroutine from the setting. Well, unless you were actually doing something like this:

macro moo() {
    sub say($arg) { callwith($ }

    return quasi { say "OH HAI" }

moo();    # 'oh hai' in lower-case

What we mean by hygiene is that say (or any identifier) always refers to the say in the environment where the quasi was written. Even when the code gets inserted somewhere else in the program through macro mechanisms.

And, conversely, if you did this:

macro moo() {
    return quasi { say "OH HAI" }

    sub say($arg) { callwith("ARGLEBARGLE FLOOT GROMP") }
    moo();    # 'OH HAI'

Then say would still refer to the setting's say.

Basically, hygiene is a way to provide the macro author with basic guarantees that wherever the macro code gets inserted, it will behave like it would in the environment of the macro.

The same is not true if we manually return Q objects from the macro:"say"),"OH HAI")

In this case, say will be a "detached" identifier, and the corresponding two examples above would output "OH HAI" with all-caps and "ARGLEBARGLE FLOOT GROMP".

The simple explanation to this is that code inside a quasi can have a surrounding environment (namely that which surrounds the quasi)... but a bunch of synthetically created Q objects can't.

We're planning to use this to our advantage, providing the safe/sane quasiquoting mechanisms for most things, and the synthetic Q object creation mechanism for when you want to mess around with unhygiene.

Q: Excellent! So when will all this land in Perl 6? I'm so eager to...

Ok, that was all the questions we had time for today! Thank you very much, and see you next time!

Steve Mynott: FOSDEM 2016 Perl Dev Room Lineup

Published by Steve on 2016-01-09T21:32:00

FOSDEM is a free two day conference in Brussels, Belgium on Jan 30th and 31st 2016.

The FOSDEM 2016 schedule for the Perl Dev Room on the second day (the Sunday)  has now been announced at

From a Perl 6 perspective it includes  Ovid's "Perl 6 for those who hate Perl", Daisuke Maki on "Crust --  Perl6 Port of Plack", Jeffrey Goff on Perl 6 Grammars, Bart Wiegmans talks about AMD64 assembly language programming and MoarVM, Stevan Little's "Perl is not dead,... it got better" and lastly Elizabeth Mattijsen finishes with "Perl 6 -- The end of the beginning".

Independent Computing: 2016 Predictions

Published by Michael on 2016-01-04T18:19:00

2016 Predictions

The new year brings new hopes and dreams for all. One interesting trend that has been going around to af few years now is the new year technology perdictions. This post will by my attempt, though late, to discuss what I think may transpire this year.

Already there have been some advances in this space. The new Raspberry Pi zero and C.H.I.P computers are fairly powerful for their small size. Below is a photo of my collection of micro PC's
2016 Predictions

Strangely Consistent: Strategic rebasing

Published by Carl Mäsak

Just a quick mention of a Git pattern I discovered recently, and then started using a whole lot:

  1. Realize that a commit somewhere in the commit history contained a mistake (call it commit 00fbad).

  2. Unless it's been fixed already, fix it immediately and push the fix.

  3. Then, git checkout -b test-against-mistake 00fbad, creating a branch rooted in the bad commit.

  4. Write a test against the bad thing. See it fail. Commit it.

  5. git rebase master.

  6. Re-run the test. Confirm that it now passes.

  7. Check out master, merge test-against-mistake, push, delete the branch.

There are several things I like about this pattern.

First, we're using the full power of Git's beautiful (distributed) graph theory model. Basically, we're running the branch in two different environments: one where the thing is broken and one where the thing is fixed. Git doesn't much care where the two base commits are; it just takes your work and reconstitutes it in the new place. Typically, rebasing is done to "catch up" with other people's recent work. Here, we're doing strategic rebasing, intentionally starting from an old state and then upgrading, just to confirm the difference.

Second, there's a more light-weight pattern that does this:

  1. Fix the problem.

  2. Stash the fix.

  3. Write the test. See it fail.

  4. git stash pop

  5. Confirm test now passes.

  6. Commit test and fix.

This is sometimes fully adequate and even simpler (no branches). But what I like about the full pattern is (a) it prioritizes the fix (which makes sense if I get interrupted in the middle of the job), and (b) it still works fine even if the problem was fixed long ago in Git history.

Git and TDD keep growing closer together in my development. This is yet another step along that path.

Independent Computing: Perl 6.c

Published by Michael on 2015-12-27T20:06:05

Perl 6.c

So the day has finally come. Perl 6 has been released this Christmas. It has been in development for the last 15 years. I have used Perl 6 for most of my software projects and it is nice to finally have a stable release.

I remember at the start of this journey for me, some of the code I had written was outdated when an fix to Perl 6 was made. This forced me to re-write what I had so it could compile again.

The announcement for the Christmas release can be read here

The really nice thing about Perl 6 is how easy it is to get the job done quickly.
Perl 6.c If you would like a quick introduction to Perl 6 take a look here the author has done good job of organizing the information.

Below is my latest work, a quick program I made update a web servers IP address if outdated:

use v6;  
use WebService::GoogleDyDNS;

multi sub MAIN( :$domain, :$login, :$password ) {

  my $updater = => $domain, login => $login , password => $password );
  if $updater.outdated { say $updater.updateIP(); } else { say "No change. No action taken."; }


Strangely Consistent: Double-oh seven

Published by Carl Mäsak

It's now one year since I started working on 007. Dear reader, in case I haven't managed to corner you and bore you with the specifics already, here's the elevator pitch:

007 is a deliberately small language that's fun and easy to develop where I/we can iterate quickly on all the silly mistakes needed to get good macros in Perl 6.

At this stage, I'd say we're way into "iterate quickly", and we're starting to see the "good macros" part emerge. (Unless I'm overly optimistic, and we just went from "easy" into a "silly mistakes" phase.)

Except for my incessant blabbering, we have been doing this work mostly under the radar. The overarching vision is still to give 007 the three types of macros I imagine we'll get for Perl 6. Although we have the first type (just as in Rakudo), we're not quite there yet with the other two types.

Instead of giving a dry overview of the internals of the 007 parser and runtime — maybe some other time — I thought I would share some things that I've discovered in the past year.

The AST is the program

The first week of developing 007, the language didn't even have a syntax or a parser. Consider the following simple 007 script:

say("Greetings, Mister Bond!");

In the initial tests where we just wanted to run such a program without parsing it, this program would be written as just its AST (conveniently expressed using Lisp-y syntax):

(compunit (block
  (stmtlist (stexpr (postfix:<()>
    (ident "say")
    (arglist (str "Greetings, Mister Bond!")))))))

(In the tests we helpfully auto-wrap a compunit and a block on the outermost level, since these are always the same. So in a test you can start writing at the stmtlist.)

But 007 isn't cons lists internally — that's just a convenient way to write the AST. The thing that gets built up is a Qtree, consisting of specific Q nodes for each type of program element. When I ask 007 to output what it built, it gives me this:

Q::CompUnit Q::Block {
    parameterlist: Q::ParameterList [],
    statementlist: Q::StatementList [Q::Statement::Expr Q::Postfix::Call {
        expr: Q::Identifier "say",
        argumentlist: Q::ArgumentList [Q::Literal::Str "Greetings, Mister Bond!"]

Yes, it's exactly the same structure, but with objects/arrays instead of lists. This is where 007 begins, and ends: the program is a bunch of objects, a hierarchy that you can access from the program itself.

I've learned a lot (and I'm still learning) in designing the Qtree API. The initial inspiration comes from IntelliJ's PSI, a similar hierarchy for describing Java programs. (And to do refactors and analysis, etc.)

The first contact people have with object-oriented design tends to be awkward and full of learning experiences. People inevitably design according to physical reality and what they see, which is usually a bad fit for the digital medium. Only by experience does one learn to play to the strengths of the object system, the data model, and the language. I find the same to be true of the Qtree design: initially I designed it according to what I could see in the program syntax. Only gradually have my eyes been opened to the fact that Qtrees are their own thing (and in 007, the primary thing!) and need to be designed differently from textual reality and what I can see.

Some examples:

Qtrees are the primary thing. This is the deep insight (and the reason for the section title). We usually think of the source text as the primary representation of the program, and I guess I've been mostly thinking about it that way too. But the source text is tied to the textual medium in subtle ways, and not a good primary format for your program. Much of what Qtrees do is separate out the essential parts of your program, and store them as objects.

(I used to be fascinated by, and follow at a distance, a project called Eidola, aiming to produce a largely representation-independent programming medium. The vision is daring and interesting, and maybe not fully realistic. Qtrees are the closest I feel I've gotten to the notion of becoming independent of the source code.)

Another thing I've learned:

Custom operators are harder to implement than macros

I thought putting macros in place wold be hard and terrible. But it was surprisingly easy.

Part of why, of course, is that 007 is perfectly poised to have macros. Everything is already an AST, and so macros are basically just some clever copying-and-pasting of AST fragments. Another reason is that it's a small system, and we control all of it and can make it do what we want.

But custom operators, oh my. They're not rocket science, but there's just so many moving parts. I'm pretty sure they're the most tested feature in the language. Test after test with is looser, is tighter, is equal. Ways to do it right, ways to do it wrong. Phew! I like the end result, but... there are a few months of drastically slowed 007 activity in the early parts of 2015, where I was basically waiting for tuits to finish up the custom-operators branch. (Those tuits finally arrived during YAPC::Europe, and since then development has picked up speed again.)

I'm just saying I was surprised by custom operators being hairier than macros! But I stand by my statement: they are. At least in 007.

(Oh, and by the way: you can combine the two features, and get custom operator macros! That was somewhere in the middle in terms of difficulty-to-implement.)


Toy language development is fun

Developing a small language just to explore certain aspects of language design space is unexpectedly addictive, and something I hope to do lots of in the next few years. I'm learning a lot. (If you've ever found yourself thinking that it's unfortunate that curly braces ({}) are used both for blocks and objects, just wait until you're implementing a parser that has to keep those two straight.)

There's lots of little tradeoffs a language designer makes all day. Even for a toy language that's never going to see any actual production use, taking those decisions seriously leads you to interesting new places. The smallest decision can have wide-ranging consequences.

If you're curious...'s what to look at next.

Looking forward to where we will take 007 in 2016. We're gearing up to an (internal) v1.0.0 release, and after that we have various language/macro ideas that we want to try out. We'll see how soon we manage to make 007 bootstrap its runtime and parser.

I want to take this opportunity to thank Filip Sergot and vendethiel for commits, pull requests, and discussions. Together we form both the developer team and user community of 007. 哈哈

Independent Computing: Advent Day 3

Published by Michael on 2015-12-03T02:39:19

Advent Day 3

Another day another post on the Perl6 Advent Calendar. Today we get a new Perl syntax-highlighter for the Atom text editor. While Atom already has Perl6 support, I find there are times where it becomes confused and things do not look as they should.

When I program on my Macbook Pro, I use Atom as my text editor. Atom has always provided great support for a large number of programming languages and file format types. So as per the per the post, I will install the new Perl6 syntax-highlighter

First go to the Preferences in the Atom menu:
Advent Day 3

Next click on the Install icon and search for Perl6:
Advent Day 3 Once you have done that locate the language-perl6fe package and click on the install icon.

After the package is installed, you can make use of the new syntax-highlighter by first clicking on the language selection (bottom right) then type Perl6,(top) then select the language-perl6fe:
Advent Day 3

It is nice to see Perl6 gaining more popularity, this new syntax-highlighter is only evidence of this. I will definitely be using this new syntax-highligther when coding on the Macbook Pro.

Independent Computing: Perl6 Advent Calendar 2015

Published by Michael on 2015-12-02T04:57:46

Perl6 Advent Calendar 2015

This is the Christmas I have been waiting for for 15 years! Ever seance I got word that there was to be a major redesign of the Perl programming language, I have been interested in what Perl would become.

For me Perl was a powerful enough programming language for the type of problems I was trying to solve. I have worked with others, but they were ether too complicated or really too simple. Perl was right there in the sweet spot.

I started using Perl6 around 2007, and have seen it progress immensely, becoming more stable, quicker, and useful, year after year. The running joke with Perl6 was that it was going to be released on Christmas. But for years that Christmas never came.

However according to Larry Wall himself in a presentation he gave back in October, Perl6 will be released THIS Christmas! Like I said before, I have been using the pre-release versions of Perl6 seance 2007 and have watched it grow more useful. I would say the last 3 years Perl6 has mostly been ready for everyday use.

Here is the announcement Larry Wall made regarding Perl6: (kinda long)

Every year in December will do a sorts of Advent Calendar related to the Perl6 programming language. I have learned much from the examples posted there and I'm really grateful for their efforts. I'm sure they have helped many people in learning this new Perl6 thing. This year I expect the Perl6 Advent Calendar will be outstanding! And from what I have read so far I'm really excited.

The remainder of this post I'll talk breifly about what was discussed or what I found interesting on the 2015 Advent Calendar. There is already 2 entries in this Perl6 Advent season. The first post mostly goes over some housekeeping and relates the recent events in the world of Perl6.

The second post reviles something new, Proxies. (new at least for me.) The theory behind Perl6 Proxies is to have a variable, and have a bit of code executed every time that variable is retrieved or stored. As this was a new concept to me I had to look it up. So to the Perl6 documentation on Proxies I went.

So according to the spec:
A Proxy is an object that allows you to execute whenever a value is retrieved from a container (FETCH) or when it is set (STORE).

Now some code to show a container:

sub double() is rw {  
    my $storage = 0;
        FETCH => method ()     { $storage },
        STORE => method ($new) { $storage = 2 * $new }

And some code to make use of the above container:

 my $doubled := double();
 $doubled = 4;
 say $doubled;       # 8

So first we create a new container variable called $doubled with the line my $doubled := double();

Next we set a value with the line: $doubled = 4; When we do this, the code in the STORE block of the double() container is run. In this case it takes the value of 4 and multiplies it by 2, then saves it in the variable $storage

Finally we retrieve the current value of the container with the line: say $doubled; This will run the code in the FETCH block, in this case we just return the current value stored in $storage

I see this as a really useful way to ensure some process is completed every time a proxied container is ether set or retrieved. It has me thinking how I could make use of this nice little gem in future projects....

Death by Perl6: Yet Another Perl6 HTTP Client

Published by Nick Logan on 2015-11-08T23:08:12

I've had a few bits of Perl6 code in production for some basic tasks for awhile, but with rakudo growing increasingly stable I decided to give it a little more responsibility. Its next task? Basic web scraping.

Our data source is open to scraping, but they expect certain considersations to be made. Additionally there are 1.2 million pages that need to be scraped weekly within a certain window. This ruled out the (then) current list of pure Perl6 HTTP clients, including the one I had built into Zef 1, due to keep-alive. Using curl or perl5 would certainly be the superior choice here, but this is as much an exercise as much as it is a task to complete. Thus Net::HTTP 2 was started. This blog post will focus on 2 of the more interesting aspects of Net::HTTP, the connection caching responsible for keep-alive, and the socket wrapper used to parse responses.

In Perl6 IO::Handle 3 provides the .get and .lines methods, but these rely on decoding the data first. Instead we will just split the Positional Blob on a specific ord sequence:

method get(Bool :$bin where True, Bool :$chomp = True) {
    my @sep      = $CRLF.contents;
    my $sep-size = +@sep;
    my $buf =;
    loop {
        $buf ~= $.recv(1, :bin);
        last if $buf.tail($sep-size) ~~ @sep;
    $ = ?$chomp ?? $buf.subbuf(0, $buf.elems - $sep-size) !! $buf;

The code is fairly simple: $CRLF contains the string "\r\n", and .contents extracts the ords that will match it. Then we iterate 1 byte and check the end of the buffer to see if it matches our end-of-line ords. Reading 1 byte at a time may not be the most efficient method, but we only use .get(:bin) for accessing headers so the performance hit is insignificant. The .lines(:bin) method is implemented similar to how it already is in IO::Handle:

method lines(Bool :$bin where True) {
    gather while (my $data = $.get(:bin)).DEFINITE {
        take $data;

Of course these are meant to be used in a very specific order... call .get(:bin) once to get your status line followed by .lines(:bin).map({$_ or last }) to get your headers. What is that map bit for you ask? It keeps .lines(:bin) from iterating into the message body itself. We need to read the message body with a different function, one that can understand content length and chunked encoding:

method supply(:$buffer = Inf, Bool :$chunked = False) {
    my $bytes-read = 0;
    my @sep        = $CRLF.contents;
    my $sep-size   = @sep.elems;
    my $want-size  = ($chunked ?? :16(self.get(:bin).unpack('A*')) !! $buffer) || 0;
    $ = Supply.on-demand(-> $supply {
        loop {
            my $buffered-size = 0;
            if $want-size {
                loop {
                    my $bytes-needed = ($want-size - $buffered-size) || last;
                    if (my $data = $.recv($bytes-needed, :bin)).defined {
                        last unless ?$data;
                        $bytes-read    += $data.bytes;
                        $buffered-size += $data.bytes;
                    last if $buffered-size == $bytes-needed | 0;

            if ?$chunked {
                my @validate = $.recv($sep-size, :bin).contents;
                die "Chunked encoding error: expected separator ords '{@sep.perl}' not found (got: {@validate.perl})" unless @validate ~~ @sep;
                $bytes-read += $sep-size;
                $want-size = :16(self.get(:bin).unpack('A*'));
            last if $want-size == 0 || $bytes-read >= $buffer || $buffered-size == 0;


This code may appear more intimidating than it actually is, but it essentially just double buffers the data (the inner loop almost never needs to iterate a second time). It knows when to stop reading based on the content length sent via header, decoding the size line of a chunked section of message body, or reads everything until the connection is closed. We emit our data out via a supply (for threading reasons outside the scope of this post), so we can even close the connection mid-body read if needed. Then it all gets stuffed into a IO::Socket::INET or IO::Socket::SSL object via the role IO::Socket::HTTP 4 that contains these helpers. Here is what your basic http client might look like using these components:

# Personally I like how clean this looks, but I'm a big fan of higher order functions
my $socket        =<>, :port(80)) but IO::Socket::HTTP;
my $status-line   = $socket.get(:bin).unpack('A*');
my @header-lines  = $socket.lines(:bin).map({"$_" or last})>>.unpack('A*');
my $body          = $;

With the ability to write clean HTTP interfaces let us now look at connection caching and our keep-alive goal. We know that we can't just send a HTTP request for one host to any old socket that is open, so a simple solution is to just use a hash and index it based on host and scheme: my %connections{$host}{$scheme}. If a socket exists and is not being used, then try to reuse it. Otherwise create the socket and save it to the hash (but only if Connection: keep-alive)

method get-socket(Request $req) {
        my $connection;

        # Section 1
        my $scheme    = $req.url.scheme;
        my $host      = $req.header<Host>;
        my $usable   := %!connections{$*THREAD}{$host}{$scheme};

        # Section 2
        if $usable -> $conns {
            for $conns.grep(*.closing.not) -> $sock {
                # don't wait too long for a new socket before moving on
                next unless await Promise.anyof( $sock.promise, start { $ =; False });
                next if $sock.promise.status ~~ Broken;
                last if $connection = $sock.init;

        # Section 3
        if $connection.not {
            $connection = $.dial($req) but IO::Socket::HTTP;

            $usable.append($connection) unless $req.header<Connection>.any ~~ /[:i close]/;

        # Section 4
        $connection.closing = True if $req.header<Connection>.any ~~ /[:i close]/;


First we lock this block of code up because our Net::HTTP::Transport 5 needs to be thread safe and we don't want a race condition when retrieving or setting a socket into the cache (Section 1). $usable gets bound to %connections just because its shorter to write later on. There is also the additional index on $*THREAD; This too is beyond the scope of this blog post but just understand it needs to be there if you want to launch these in start blocks.

In Section 2 we iterate over our cache looking at the .closing attribute (an attribute in IO::Socket::HTTP that we set if a socket in the queue knows it will close the connection, aka be the last request sent on that socket). Because we don't want to wait for a long request to finish we also implement a timeout of 3 seconds before it tries the next socket. Next we check if a promise used in IO::Socket::HTTP is broken, which would mean the socket was closed, and move on if it is. Finally we call $connection = $sock.init, where our .init method from IO::Socket::HTTP resets the previously mentioned promise and essentially claims the socket for its own use.

We enter Section 3 if there are no reusable connections (either the first connection for a specific host created, or none allow keep-alive). .dial($req) simply returns a IO::Socket::INET or IO::Socket::SSL, and we apply our IO::Socket::HTTP to this connection. Finally we add the connection to our cache for possible reuse unless the server has told us it will close the connection.

Section 4 speaks for itself I hope :)

With the novel parts out of the way I still need to implement cookies, multipart form posting, and some other simple basics. But now we have a strong base for building customized client foundations similar to golang. No it doesn't follow the keep-alive rules sent via the header, but these are trivial tasks.

I'll leave one last code snippet from IO::Socket::HTTP some may find useful:

method closed {
    try {
        # if the socket is closed it will give a different error for read(0)
        CATCH { when /'Out of range'/ { return False } }

This will let you call $socket.closed to find out if a socket is open or not... no need to access the $!PIO. You may wonder why we wouldn't use this .closed method in our get-socket method above. The answer is it is used, but its abstracted behind the call to .init.

  1. Zef

  2. Net::HTTP

  3. IO::Handle

  4. IO::Socket::HTTP

  5. Net::HTTP::Transport

Strangely Consistent: Macros: Time travel is hard

Published by Carl Mäsak

Let's say you're a plucky young time traveler from the 80s and you go back to the 50s because Libyan terrorists are chasing you and then you accidentally mess things up with the past and your time machine DeLorean is broken and you need to fix that too...

Dealing with the different phases involved in macros is a bit like dealing with the two time streams: the new present, which is very early compared to what we're used to, and the future, which we would like to get back to intact. (The slow path, wasn't much of an option for Marty, and it isn't for us either as macro authors.)

Let's try a new way of thinking of all this, instead of compile time and runtime. Those terms quickly get muddled with macros, since a macro runs (so it's kind of runtime), but it has BEGIN-time semantics (so it's definitely compile-time).

Let's call the two phases early and late. Instead. What the compiler does happens early. What the normal program does happens late. M'kay?

BEGIN blocks? Early. constant declarations? Early. CHECK blocks? Also early, though kind of at the 11th hour of early. Role bodies get executed early when they're getting concretized into a class.

Most everything else? Late. That is, in the 80s where Marty belongs. And where most people run most of their code.

Macro bodies run early too. Except for the (often-occurring) quasi block at the end, which runs late. This is how you should probably think about macros: they have a "preparation area" where you get to do some initialization early, and then they have a "code area" whose code runs, inline where you called the macro, late.

Got it? Like these terms? Eh, I guess they work. I don't expect we'll keep them around, but let's explore them.

So much for setup. I had a thought today, which led me down some wrong paths. I'd like to try and relate it. Partly because it's "fun" to analyze one's own failure modes. Partly because, and believe me on this, macros are gnarly. Far be it from me to discourage people from using macros — the past few months of thinking and investigations have convinced me that they're useful, that they have a clear place in Perl 6, and that we can make them well-designed. But wow are they weird sometimes.

Macros are so strange that once you wrap your head around closures, and accept that they're actually quite a natural consequence of first-class functions and lexical lookup, macros come along and make your brain go "wait, what?".

Part of that, I'm sure, is the weird (but consistent) scoping semantics you get from macros. But part of it is also the early/late distinction.

Ok, so back to what happened. I was thinking about implementing a 007 runtime in 007, and in particular how the runtime would invoke functions, especially built-in functions. I realized I would probably need a huge table of built-in function mappings at some point in the runtime. In Perl 6, it'd look something like this:

my %builtins =
    say => &say,
    abs => &abs,
    min => &min,
    max => &max,
    # ...and so on, for far too many lines...

As you see, it's all mappings from strings to the runtime's own built-ins. Yep, welcome to the world of metacircular runtimes.

So I wondered, "hm, shouldn't we be able to write a macro that takes a string value (that we get from a Q::Identifier) and dynamically looks it up in the current environement?". And indeed, I quickly came up with this:

macro lookup(name) {
    return Q::Identifier( melt(name) );

So easy! This example presupposes melt, which is a kind of evaluation built-in for Qtrees. We need to use melt because the name that we get in is a Qtree representing a name, not a string. But melt gives us a string.

Oh, and it works and everything!

lookup("say")("OH HAI");    # OH HAI

Unfortunately, it's also completely useless. Specifically, lookup doesn't fulfill its original purpose, which was to allow dynamic lookup of names in the scope.

Why? Because we're running the macro early, and so variables like ident_name cannot be melted because they don't have a meaningful value yet — they will, but not until late — only constants and literals like "say" have meaningful values. But in all such cases, we could just replace lookup("say") with... say. D'oh!

Ok, so that didn't work. My next bright idea was to make use of the fact that quasi blocks run late. So we can get our dynamism from there:

macro lookup_take_II(name_expr) {
    quasi {
        my name = {{{name_expr}}};
        # great, now I just need to... ummm....

There's no correct way to conclude that thought.

Why? Because you're in a quasi, which means you're back in the 80s, and you want to do a 50s thing, but you can't because the time machine is gone. In other words, it's late, and suddenly you wish it was early again. Otherwise how are you going to build that dynamic identifier? If only there were a lookup macro or something! 😀

This has been the "masak stumbles around in the dark" show. Hope you enjoyed. I also hope this shows that it's easy to go down the wrong path with macros, before you've had something like four years of practice with them. Ahem.

Anyway, these ponderings eventually led to the melt builtin — coming soon to a 007 near you! — which, happily will solve the problem of dynamic lookup (as detailed in the issue description. So all's well that ends well.

Macros are not a handgun that can accidentally shoot you in the foot if you use them wrong. They're more like an powered-off laser cannon which, if you look at it the wrong way, will shoot you in both feet, and your eyes and then your best friend's feet. That kind of power sure can come in handy sometimes! I'm convinced we as a community will learn not just to harness and contain that power, but also how to explain to ourselves how to best use the laser cannon so that it does not accidentally fire too early, or too late. Gotta stay in phase.

Perl 6 Maven: Number guessing game in Perl 6

Published by szabgab

Perl 6 Maven: push vs. append on arrays in Perl 6

Published by szabgab

Perl 6 Maven: Installing Rakudo Perl 6

Published by szabgab

Perl 6 Maven: Find Perl 6 modules without Travis CI

Published by szabgab

jdv: Perl6 and CPAN: MetaCPAN Status as of 2015-10-09

Published by jdv on 2015-10-09T14:04:00

MetaCPAN, like the rest of "CPAN", was built assuming the sole context of Perl5. Which is cool until we want to use it for Perl6 and avoid the troubles associated with different namespaces, dist mgmt, etc... To largely avoid and more easily handle these issues for MetaCPAN it's been suggested that we have separate instances. The existing Perl5 instance only needs to be changed to ignore Perl6 distributions. There has already been some breakage because it didn't ignore a Perl6 dist of mine which exists in the Perl5 world:( And the new Perl6 instance will do just the opposite and only look at Perl6 distributions.

In contrast, and relatedly, on CPAN we've designated a special spot for Perl6 distributions in order to keep them separate from the Perl5 dists. This reserved place is a Perl6 subdir in an author's dir (/author/id/*/*/*/Perl6/). Any dists in or under that spot on the fs will be considered a Perl6 dist; valid or invalid. So this is where the Perl6 MetaCPAN will look and the Perl5 instance will not.

Current development is being done on these temporary branches:

And the main dev instance is running on The web end is at and the api is at

So far the idea has been to iterate on the aforementioned branches and instance until we have something that works sufficiently well. At that point we'll tidy up the branches and submit them for merging. Shortly after that time the hope is that we'll be able to stand up the official Perl6 instance.

The list of requirements for being adequately cooked is:

  1. track Perl6 CPAN dists and ignore Perl5 dists
  2. import a Perl6 distribution
  3. index a Perl6 distribution for search
  4. render pod6 documentation
  5. do Perl6 syntax highlighting

All of these have been hacked in and are at various degrees of completeness. Next up is testing and fixing bugs until nothing major is left. To that end I've recently loaded up the dev instance with all the distributions from The dist files were generated, very hackily, with I also just loaded them all under one user, mine, for simplicity. That load looks like it has problems of its own as well as revealing a bunch of issues. So in the coming days I hope to get that all sorted out.

jdv: Perl6 and CPAN

Published by jdv on 2015-10-08T20:31:00

In the Perl5 world, just in case anyone is unaware, CPAN is a major factor. Its basically the hub of the Perl5 world.

What I am referring to here as CPAN is not just the mirrored collection of 32K+ distributions. Its the ecosystem that's built up around that collection. This ecosystem has many parts, some more important than others depending on who you talk to, but the most important parts to me are:

These are the 5 aspects of "CPAN" that I'd like to see happen for Perl6. One way to get that would be to write the whole thing from scratch in Perl6. While it may sound cool in some sort of dogfoody and/or bootstrappy kind of way to some, it sounds like a lot of work to me and we're a bit strapped for developer resources. Another way would be to add support for Perl6 to the existing CPAN bits. The hope there being, primarily, that it'd be a lot less work. The latter approach is what I've been working on lately. And if we want to refactor ourselves off the Perl5 bits in the future we can take our time doing it; later.

At this time we have:

So we can publish Perl6 distributions to CPAN and search that collection. Well, sort of on that last bit. The metacpan prototype instance is not currently tracking CPAN. Its actually been loaded up with Perl6 distributions from the Perl6 module ecosystem ( for testing. But hopefully soon we'll have an official Perl6 metacpan instance, separate from the Perl5 instance, that will track CPAN's Perl6 content as it should.

What we need next is:

If anyone is interested in working on any of this stuff please stop by #perl6 on freenode. If nobody else is able to help you I'll (jdv79) do my best.

Death by Perl6: Parallel Testing and an Iconoclastic Pilgrimage

Published by Tony O'Dell on 2015-09-15T20:42:37

This is an article about Green

Green is a module I wrote with the intention of replacing the well loved prove command in favor of something that can parallel test all files and even perform parallel testing of functions inside of a test file.

Wait, what?

Don't worry, the parallel testing of functions doesn't mean all your code is going to run at once. It means that you can put multiple test groups into one file and have those groups all tested in parallel while the methods in those groups retain their order and are executed in series.

This is confusing.

Let's take a look at an example, then we'll talk about how the module was put together. Afterwards, more in depth examples.

use Green :harness;

set("Group 1", sub {  
  test("Group 1, Test 1", sub { ok 1==1; });
  test("Group 1, Test 2", sub { 
    sleep 2;
    ok 1==1; 

set("Group 2", sub {  
  test("Group 2, Test 1", sub { 
    sleep 1;
    ok 1==1; sleep 1; 
  test("Group 2, Test 2", sub { 
    sleep 1; 
    ok 1==1;


   [P] Group 1
      [P] Group 1, Test 1
      [P] Group 1, Test 2

   [P] Group 2
      [P] Group 2, Test 1
      [P] Group 2, Test 2

   [P] 4 of 4 passing (2511.628ms)

Notice the runtime of 2.5 seconds. Normally if you combined all four of those test groups into a series, you'd have ended up waiting around four seconds (Group 1 has a sleep 2 and Group 2 has sleep 1 twice).

*Please note that the startup time for my perl6 is anywhere between 500ms and 1s.

Pitfalls & Considerations

Why is it such a big deal to have a test give its result as soon as its available?

This causes issues when tests fail. When a test fails and you need to print the stack trace for the test, you can't guarantee that another test being executed in parallel isn't busy writing to the output. This can cause jumbled or confusing stuff on the command line. Imagine that two tests complete at roughly the same time, the first fails and needs to print the failure and then the stack trace. The second completes with success and brags about its accomplishment to stdout between the time that the first test showed error and generated the stack trace information to be printed. You'd end up with some output that looks roughly like

[F] #1 Failed test 1
[P] Passed test 2

#1 - not ok
  in block  at <path>.pm6:103
  in block  at <path>.pm6:94

Yes, that's manageable for two tests. It's not when you have a test file with several tens or hundreds of tests in it. It's certainly not manageable when you have several test files running at once, all trying to write at once.

There are other ways to deal with it, I think this solution is the cleanest and that's why Green is built in this way.

Other problems with parallelization and testing

There exists another big issue; what about when you want to do some asynchronous IO and don't want to have to handle setting up your own await and returning mechanisms to get the result? Green handles this too. If your test returns a Promise then Green will automatically await the promise and use its result as the output for the test result. This is useful when you need to do things like non-blocking database calls or non-blocking file reads, reading data from channels, etc. This also prevents Green from executing the next consecutive test until the Promise is Kept|Broken, this keeps tests executing in serial despite containing potentially a large amount of non-blocking code.

Why not let the user handle that? The user can handle that, they just shouldn't return a Promise from the test()'s method.


The main purpose of a testing suite is to make testing easy, having a stack trace not be a two step process, make the testing output simple to read, and to keep things simple.

Making test easy

To run Green against the current directory, you have to simply type green on the command line. Done.

Green automatically -Iblib -Ilib and looks for the following directories in order t/, test/, tests/, stops on the first match and continues to test every file in that directory that ends with .t

Doesn't get much easier than that.

Another feature of Green is the quick testing shorthands.

use Green :harness;

>> { <Callable 1>; };
>> { <Callable 2>; };

This will execute both Callables in series.

Too much crap? Try again.

use Green :harness;

ok 1==1;  
ok 2==1;  

This is also acceptable shorthand.

Have some tests that require some parallel processing and don't want to deal with writing your own promise/result handlers? Check this out

use Green :harness;

set('Set 1', sub {  
  test('Test 1', -> $done {
    start {
      sleep 20; #let them wait
      $done();  #'Test 2' is now executed
  test('Test 2', {
    sleep 1;
    ok 1==1;

When there is an extra parameter expected on the Callable that test is passed, Green handles tossing in a sub that the test can execute when it's done processing. Easy doggone peasy.

Visibility on stack traces and easy to read output

My only really major complaint with prove is that it's a pain to get a stack trace. Sure it gives me the first line, that's great. Rarely is my issue on that first line of a stack trace. Green took cues from other testing suites from other languages that are much easier to read. Sample failure output from Green looks like the following:

tonyo@imac ~/projects/perl6-green/examples$ perl6 failure.pl6  
   [F] Prefixed Tests
      [F] #1 - Prefixed 1
      [F] #2 - Prefixed 2

      #1 - not ok
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:106
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:97
      #2 - not ok
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:106
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:97

   [F] 0 of 2 passing (501.065ms)

Again, easy.

The output of this is easy to read. sets are output all at once along with the failures. Nothing is jumbled, it's just clean.

Some more examples

Concise series
#!/usr/bin/env perl6

use lib '../lib';  
use Green :harness;

#all of these tests should complete in 2 seconds
>> {
  sleep 2;
  ok 1 == 1;

>> {
  sleep 2;
  ok 1 == 1;
>> {
  sleep 2;
  ok 1 == 1;

>> { 1 == 1; }
   [P] Prefixed Tests
      [P] Prefixed 1
      [P] Prefixed 2
      [P] Prefixed 3
      [P] Prefixed 4

   [P] 4 of 4 passing (6528.732ms)
#!/usr/bin/env perl6

use lib '../lib';  
use Green :harness;

ok 1 == 0, 'test';

>> sub {
  ok False;
}, 'not ok';
   [F] Prefixed Tests
      [F] #1 - Prefixed 1
      [F] #2 - Prefixed 2

      #1 - not ok
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:106
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:97
      #2 - not ok
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:106
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:97

   [F] 0 of 2 passing (535.971ms)
More Concise
#!/usr/bin/env perl6

use lib '../lib';  
use Green :harness;

ok 1 == 1;

ok 0 == 1;

>> {
  ok 0 == 1;
   [F] Prefixed Tests
      [P] Prefixed 1
      [F] #1 - Prefixed 2
      [F] #2 - Prefixed 3

      #1 - not ok
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:106
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:97
      #2 - not ok
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:106
         in block  at /Users/tonyo/projects/perl6-green/examples/../lib/Green.pm6:97

   [F] 1 of 3 passing (602.168ms)
Parallel Tests
#!/usr/bin/env perl6

use lib '../lib';  
use Green :harness;

#all of these tests should complete in 2 seconds
set("time me 1", sub {  
  test("delay 2", sub {
    sleep 2;
    ok 1==1;
set("time me 2", sub {  
  test("delay 2", sub {
    sleep 2;
    ok 1==1;
set("time me 3", sub {  
  test("delay 2", sub {
    sleep 2;
    ok 1==1;
set("time me 4", sub {  
  test("delay 2", sub {
    sleep 2;
    ok 1==1;
   [P] time me 1
      [P] delay 2

   [P] time me 2
      [P] delay 2

   [P] time me 3
      [P] delay 2

   [P] time me 4
      [P] delay 2

   [P] 4 of 4 passing (2629.375ms)
#!/usr/bin/env perl6

use lib '../lib';  
use Green :harness;

set('Async tests in series', sub {  
  test('Sleep 1', -> $done {
    start { 
      sleep 1;
      ok 1==1;

  test('Sleep 2', -> $done {
    start {
      sleep 2;
      ok 2 == 2;

set('This happens async with the first set', sub {  
  test('Sleep 1', -> $done {
    start {
      sleep 1;
      ok 1==1;
   [P] This happens async with the first set
      [P] Sleep 1

   [P] Async tests in series
      [P] Sleep 1
      [P] Sleep 2

   [P] 3 of 3 passing (3572.252ms)

Final thoughts

Comments or PRs are welcome. You can find the repository for Green here. You can leave your rude dude peanut gallery in #perl6 if you're so inclined.

Steve Mynott:

Published by Steve on 2015-09-11T16:43:00

A Little GLRer (revision 1)

The GLR (Great List Refactor) radically changed the way lists worked in Rakudo (an implementation of Perl).

This blog post is a list of some one-liners to show differences between the old (pre-glr) rakudo and the new (glr) rakudo intended to aid understanding and porting of modules.

Note this was done for self-education and may contain errors or things which may change. 

Thanks to those on Freenode IRC/perl6 for help.

Further corrections and expansions welcome either on iRC via pull request to

 pre  GLR
> say (1,2,3).WHAT
> say (1,2,3).WHAT
> my @array = 1,(2,3),4
1 2 3 4
> @array.elems
my @array = 1,(2,3),4
[1 (2 3) 4]
> @array.elems

to flatten

> my @list := 1, [2, 3], 4
(1 [2 3] 4)
> dd @list.flat.list
(1, 2, 3, 4)


> my @array = (1,(2,3),4).flat
[1 2 3 4]

or more complex structures (jnthn++)

say gather [[[[["a", "b"], "c"], "a"], "d"], "e"].deepmap(*.take)
> dd (1,2,3).lol
(1; 2; 3)
> dd (1,)
> dd [1,]
$ = [1]
> dd [[1,]]
$ = [[1]]
> dd (1,)
> dd [1,]
> dd [[1],]
> my @array = 1,2,3
1 2 3
> @array.shift
> dd @array
@array = [2, 3]<>
> my @list := 1,2,3
(1 2 3)
> @list.shift
Method 'shift' not found for invocant of class 'List'
> @list[0]
> dd @list
(1, 2, 3)
> my @array = 1,2,3
[1 2 3]
> @array[0]=0
> dd @array
@array = [0, 2, 3]
>say (Array).^mro
((Array) (List) (Cool) (Any) (Mu))
> my @a = 1, (2, 3).Slip, 4
[1 2 3 4]
> my $slip = slip(2,3)
(2 3)
> dd $slip
Slip $slip = $(2, 3)
> my @array = 1,$slip,4
[1 2 3 4]
> (1,$(2,3),4)
(1 (2 3) 4)
> (1,|(2,3),4)
(1 2 3 4)
> my $grep = (1..4).grep(*>2); dd $grep>>.Int;
(3, 4)
> dd $grep>>.Int;
This Seq has already been iterated, and its values consumed
in block  at :1

prevent consumption

> my $grep = (1..4).grep(*>2); my $cache=$grep.cache
(3 4)
> say $cache>>.Int
(3 4)
> say $cache>>.Int
(3 4)
> my @array = 1,(2,3),4
[1 (2 3) 4]
> dd @array.flat
(1, $(2, 3), 4).Seq
> dd @array.flat.list
(1, $(2, 3), 4)

Pawel bbkr Pabian: Asynchronous, parallel and... dead. My Perl 6 daily bread.

Published by Pawel bbkr Pabian on 2015-09-06T14:00:56

I love Perl 6 asynchronous features. They are so easy to use and can give instant boost by changing few lines of code that I got addicted to them. I became asynchronous junkie. And finally overdosed. Here is my story...

I was processing a document that was divided into chapters, sub-chapters, sub-sub-chapters and so on. Parsed to data structure it looked like this:

    my %document = (
        '1' => {
            '1.1' => 'Lorem ipsum',
            '1.2' => {
                '1.2.1' => 'Lorem ipsum',
                '1.2.2' => 'Lorem ipsum'
        '2' => {
            '2.1' => {
                '2.1.1' => 'Lorem ipsum'

Every chapter required processing of its children before it could be processed. Also processing of each chapter was quite time consuming - no matter which level it was and how many children did it have. So I started by writing recursive function to do it:

    sub process (%chapters) {
        for %chapters.kv -> $number, $content {
            note "Chapter $number started";
            &?ROUTINE.($content) if $content ~~ Hash;
            sleep 1; # here the chapter itself is processed
            note "Chapter $number finished";

So nothing fancy here. Maybe except current &?ROUTINE variable which makes recursive code less error prone - there is no need to repeat subroutine name explicitly. After running it I got expected DFS (Depth First Search) flow:

    $ time perl6
    Chapter 1 started
    Chapter 1.1 started
    Chapter 1.1 finished
    Chapter 1.2 started
    Chapter 1.2.1 started
    Chapter 1.2.1 finished
    Chapter 1.2.2 started
    Chapter 1.2.2 finished
    Chapter 1.2 finished
    Chapter 1 finished
    Chapter 2 started
    Chapter 2.1 started
    Chapter 2.1.1 started
    Chapter 2.1.1 finished
    Chapter 2.1 finished
    Chapter 2 finished
    real    0m8.184s

It worked perfectly, but that was too slow. Because 1 second was required to process each chapter in serial manner it ran for 8 seconds total. So without hesitation I reached for Perl 6 asynchronous goodies to process chapters in parallel.

    sub process (%chapters) {
        await do for %chapters.kv -> $number, $content {
            start {
                note "Chapter $number started";
                &?ROUTINE.outer.($content) if $content ~~ Hash;
                sleep 1; # here the chapter itself is processed
                note "Chapter $number finished";

Now every chapter is processed asynchronously in parallel and first waits for its children to be also processed asynchronously in parallel. Note that after wrapping processing in await/start construct &?ROUTINE must now point to outer scope.

    $ time perl6
    Chapter 1 started
    Chapter 2 started
    Chapter 1.1 started
    Chapter 1.2 started
    Chapter 2.1 started
    Chapter 1.2.1 started
    Chapter 2.1.1 started
    Chapter 1.2.2 started
    Chapter 1.1 finished
    Chapter 1.2.1 finished
    Chapter 1.2.2 finished
    Chapter 2.1.1 finished
    Chapter 2.1 finished
    Chapter 1.2 finished
    Chapter 1 finished
    Chapter 2 finished
    real    0m3.171s

Perfect. Time dropped to expected 3 seconds - it was not possible to go any faster because document had 3 nesting levels and each required 1 second to process. Still smiling I threw bigger document at my beautiful script - 10 chapters, each with 10 sub-chapters, each with 10 sub-sub-chapters. It started processing, run for a while... and DEADLOCKED.

Friedrich Nietzsche said that "when you gaze long into an abyss the abyss also gazes into you". Same rule applies to code. After few minutes me and my code were staring at each other. And I couldn't find why it worked perfectly for small documents but was deadlocking in random moments for big ones. Half an hour later I blinked and got defeated by my own code in staring contest. So it was time for debugging.

I noticed that when it was deadlocking there was always constant amount of 16 chapters that were still in progress. And that number looked familiar to me - thread pool!

    $ perl6 -e 'say start { }'
        scheduler =>
            initial_threads => 0,
            max_threads => 16,
            uncaught_handler => Callable
        status => PromiseStatus::Kept

Every asynchronous task that is planned needs free thread so it can be executed. And on my system only 16 concurrent threads are allowed as shown above. To analyze what happened let's use document from first example but also assume thread pool is limited to 4:

    $ perl6          # 4 threads available by default
    Chapter 1 started       # 3 threads available
    Chapter 1.1 started     # 2 threads available
    Chapter 2 started       # 1 thread available
    Chapter 1.1 finished    # 2 threads available again
    Chapter 1.2 started     # 1 thread available
    Chapter 1.2.1 started   # 0 threads available
                            # deadlock!

At this moment chapter 1 subtree holds three threads and waits for one more for chapter 1.2.2 to complete everything and start ascending from recursion. And subtree of chapter 2 holds one thread and waits for one more for chapter 2.1 to descend into recursion. In result processing gets to a point where at least one more thread is required to proceed but all threads are taken and none can be returned to thread pool. Script deadlocks and stops here forever.

How to solve this problem and maintain parallel processing? There are many ways to do it :)
The key to the solution is to process asynchronously only those chapters that do not have unprocessed chapters on lower level.

Luckily Perl 6 offers perfect tool - promise junctions. It is possible to create a promise that waits for other promises to be kept and until it happens it is not sent to thread pool for execution. Following code illustrates that:

    my $p = Promise.allof(, );
    sleep 1;
    say "Promise after 1 second: " ~ $p.perl;
    sleep 3;
    say "Promise after 4 seconds: " ~ $p.perl;


    Promise after 1 second:
        ..., status => PromiseStatus::Planned
    Promise after 4 seconds:
        ..., status => PromiseStatus::Kept

Let's rewrite processing using this cool property:

    sub process (%chapters) {
        return Promise.allof(
            do for %chapters.kv -> $number, $content {
                my $current = {
                    note "Chapter $number started";
                    sleep 1; # here the chapter itself is processed
                    note "Chapter $number finished";
                if $content ~~ Hash {
                    Promise.allof( &?ROUTINE.($content) )
                        .then( $current );
                else {
                    Promise.start( $current );
    await process(%document);

It solves the problem when chapter was competing with its sub-chapters for free threads but at the same time it needed those sub-chapters before it can process itself. Now awaiting for sub-chapters to complete does not require free thread. Let's run it:

    $ perl6
    Chapter 1.1 started
    Chapter 1.2.1 started
    Chapter 1.2.2 started
    Chapter 2.1.1 started
    Chapter 1.1 finished
    Chapter 1.2.1 finished
    Chapter 1.2.2 finished
    Chapter 1.2 started
    Chapter 2.1.1 finished
    Chapter 2.1 started
    Chapter 1.2 finished
    Chapter 1 started
    Chapter 2.1 finished
    Chapter 2 started
    Chapter 1 finished
    Chapter 2 finished
    real    0m3.454s

I've added separator for each second passed so it is easier to understand. When script starts chapters 1.1, 1.2.1, 1.2.2 and 2.1.1 do not have sub-chapters at all. So they can take threads from thread pool immediately. When they are completed after one second then Promises that were awaiting for all of them are kept and chapters 1.2 and 2.1 can be processed safely on thread pool. It keeps going until getting out of recursion.

After trying big document again it was processed flawlessly in 72 seconds instead of linear 1000.

I'm high on asynchronous processing again!

You can download script here and try different data sizes and algorithms for yourself (params are taken from command line).

Steve Mynott: YAPC::EU 2015

Published by Steve on 2015-09-05T15:42:00

We came down to Granada on Tuesday night and (after missing the pre-conference meeting with its free pizza) made our way to the Amsterdam Bar with its massive selection of bottled import beers and rather bizarre nut and soft sweets tapas.

Wednesday morning we made our way to the venue.  The conference topic was Art and Engineering and the venue a particularly arty looking university science building with a large Foucault pendulum inside and "Bombes de Vapor" (steam engines and the like) outside.  The Arabic art influenced T shirts were the most stylish since the Pisa ones and the seats in the main hall were the most comfortable YAPC seats ever.

I first saw Leon Timmermans gave some good advice about how to contribute to Perl 5 core even if you didn't know the odd C89 plus macros language in which it was written.  It was followed by Bart (brrt) Wiegmans speaking about the Just In Time (JIT) compiler for MoarVM -- perl6's main VM -- in a quite high level talk so we were spared the scary details (which I later noticed included s-expressions).  Kang-min (gugod) Liu spoke about Booking's search engine which he couldn't show us and how he indexed his email (which he could).

The main conference dinner of tapas was that evening around the pool of a four star hotel with constant glass refills.  Thankfully noone fell in.  More sadly we learnt Jeff Goff had been injured earlier and was in hospital.

Next day started with Sawyer X's State of the [Art] Velociraptor which was upbeat and positive stressing the benefits of community.  Upasana spoke about Moose meta objects and Leonerd bravely fought AV issues to speak about how perl5 sort of resembled scheme a little bit.

At the end of day Xavier Noria, currently a ruby programmer, spoke about how much he missed Perl since many things (like docs) were better.

Next day I got up at silly o'clock to hear Art School dropout Stevan Little compare his former subject with programming with some interesting details about painting techniques.  Kerstin Puschke talked about RabbitMQ including some live code examples using Perl 5.

Domm told us about his image uploading Perl 6 script

which uploaded pics to twitter including one of his audience.

Gabor talked us through a minimal port of Dancer to Perl 6 called "Bailador" (which is part of Rakudo Star). actually uses perl6 in production!

Herbert Breunung spoke about Functional Perl 6 using a particularly garish slide deck.  John Lightsey did a line by line audit of an old version of Module::Signature to point out some security issues.  Liz did Jonathan's Parallelism, Concurrency, and Asynchrony in Perl 6 since the original author sadly couldn't make it.  At least one thing had changed in the week since I last heard the talk!

Finally a long haired Larry compared Perl 5 and 6 with Tolkien's Hobbit and Lord of the Rings respectively  and sang a bit. Two out of the three big ticket items for Perl 6.0 were done and things were looking good for a Long Expected Christmas Party.  This was a truly great keynote and went down a storm.

Some of the final lightening talks were particularly good with one even given in rapid Japanese.  To finish off Friday night Sue organised a "Sherry Tasting" visit to a local tapas restaurant which also included much consumption of the local beer Alhambra 1925.  A large number of mongers turned up to effectively take over the whole place.  Some also stayed up all night playing card games

Death by Perl6: 5 tips for writing better Perl6 modules

Published by Nick Logan on 2015-09-01T23:15:34

1. Provide a provides section

Without giving a provides section in your META file, a package manager will just have to recursively grep your directories and hope that anything it happens to find is actually part of the package. Yuck. Instead we should be explicit about what our package provides:

# META6.json
"provides" : { 
    "Module"            => "lib/Module.pm6", 
    "Module::Submodule" => "lib/Module/Submodule.pm6"

Make sure you use the correct extension (.pm or .pm6), and use relative paths.

2. Don't miss those test and build dependencies.

Be sure to thoroughly test your module, not just in the same dev enviroment you always use. Look at any build or test files and make sure any external dependencies are listed under build-depends and test-depends in your META file. Uninstall/delete your installed modules and try to build/test/install froms scratch if you must. Does it still work? Good. Now test it with --ll-exception. Did it suddenly fail? If so, thats because without --ll-exception missing dependencies in a test still exit 0, resulting in tests passing.

# META6.json
"build-depends" : [ "LibraryMake" ]

Using a If you are actually using functionality from Panda::Builder then you probably need to add panda to your depends. If you are not using any of the functionality from it you can drop the dependency by using this template:

class Build {
    method build($where) { ... }

    # work around needless dependency
    method isa($what) {
        return True if $what.^name eq 'Panda::Builder';
3. Write tests.

Please! Why would you submit a module to the ecosystem without any tests? Go open your README, copy the example or synopsis, and make it a test already!

4. Resist the "boilerplate-only module ecosystem submission" urge

It can be frustrating for newcomers to have to wade through empty shells of boilerplate. The intentions behind such submissions are undoubtedly good, but consider waiting until your module does something (even if its fail its own tests). The namespace isn't going anywhere... if someone else "takes" it that doesn't mean you can't still use it yourself. use MyModule:auth<github:ugexe>.

5. Take note of errors found by Perl6 smoke testers

Maybe you don't have a Windows or OSX machine to test your module on, but intend for it to work on multiple platforms. The Perl6 smoke reports, found at, can be helpful for finding errors run by other people on different configurations.