pl6anet

Perl 6 RSS Feeds

Steve Mynott (Freenode: stmuk) steve.mynott (at)gmail.com / 2017-11-18T13:11:13


Weekly changes in and around Perl 6: 2017.46 Spesh Explained

Published by liztormato on 2017-11-13T22:10:59

Jonathan Worthington completed his four part blog about the MoarVM Specializer Improvements he did in the past 3 months, supported by a grant from the Perl Foundation. The four parts are:

Required reading for anybody interested in the current and future inner workings of the Moar Virtual Machine!

Rakudo Star 2017.10 Released

Steve Mynott did all the hard work again and released Rakudo Star 2017.10, now available for download. Precompiled binaries for MacOS and Windows (64 bit) are also available.

Perl 6 Advent Calendar

Zoffix Znet reminds us that it is still possible to claim a slot in the Perl 6 Advent Calendar for this year. We’d like to hear from you! Your Advent Calendar Needs You! Just let us know!

No More Grants This Year

The Perl Foundation has run out of allocable funds for grants for this year. Your donation or a donation by your employer will allow for more grant work to be supported next year. So please give kindly, especially if you need to finish off budgets before the end of the year!

Core developments

Other Blog Posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Winding Down

It was a bit of a quiet week: nothing on perl6-users or on perlmonks. Which is also nice every now and then, as it makes the work of yours truly for the Perl 6 Weekly easier. The coming weekend will see the Rakudo 2017.11 compiler release with already more than 300 commits under its belt. The coming weekend will also see yours truly giving a one-hour “Introduction to Rakudo Perl 6” presentation at T-Dose (in Dutch), as well as a Perl stand with a lot of Rakudo Perl 6 books and goodies. Whether or not you will be able to attend, it seems wise to check in again next week for more Perl 6 related news 🙂


6guts: MoarVM Specializer Improvements Part 4: Argument Guards

Published by jnthnwrthngtn on 2017-11-09T21:27:31

So far in this series, I have discussed how the MoarVM dynamic optimizer gathers statistics, uses them to plan what to optimize, and then produces specialized versions of hot parts of a program to speed up execution. In this final part, I’ll look at how we switch from unoptimized code into optimized code, which centers around argument guards.

But wait, what about code-gen?

Ah, yes, I knew somebody would ask that. At the end of part 3, we had a data structure representing optimized code, perhaps for a routine, method, or block. While going from bytecode to a CFG in SSA form was a fairly involved process, going back to bytecode is far simpler: we iterate the basic blocks in order, iterate each of the instructions within a basic block, and write out the bytecode for each of instructions. Done!

There are, of course, a few complications to take care of. When we have a forward branch, we don’t yet know the offset within the bytecode of the destination, so a table is needed to fix those up later. Furthermore, a new table of exception handler offsets will be needed, since the locations of the covered instructions and handlers will have moved. Beyond those bits of bookkeeping, however, there’s really not much more to it than a loop spitting out bytecode from instruction nodes.

Unlike bytecode that is fed into the VM from the outside, we don’t spend time doing validation of the specialized bytecode, since we can trust that it is valid – we’re generating it in-process! Additionally, the specialized bytecode may make use of “spesh ops” – a set of extra opcodes that exist purely for spesh to generate. Some of them are non-logging forms of ops that would normally log statistics (no point logging after we’ve done the optimizations), but most are for doing operations that – without the proofs and guards done by spesh – would be at risk of violating memory safety. For example, there’s an op that simply takes an object offset and reads a pointer or integer from a certain number of bytes into it, which spesh can prove is safe to do, but in general would not be.

What I’ve described so far is the portable behavior that we can do on any platform. So it doesn’t matter whether you’re running MoarVM on x86, x64, ARM, or something else, you can take advantage of all the optimizations that spesh can do. On x64, however, we can go a step further, and compile the spesh graph not back into specialized MoarVM bytecode, but instead into machine code. This eliminates the interpreter overhead. In MoarVM, we tend to refer to this stage as “the JIT compiler”, because most people understand JIT compilation as resulting in machine code. In reality, what most other VMs call their JIT compiler spans the same space that both spesh and the MoarVM JIT cover between them. MoarVM’s design means that we can deliver performance wins on all platforms we can run on, and then an extra win on x64. For more on the machine code generation process, I can recommend watching this talk by brrt, who leads work on it.

Argument guards

By this point, we have some optimized code. It was generated for either a particular callsite (a certain specialization) or a combination of callsite and incoming argument types (an observed type specialization). Next, we need a mechanism that will, upon a call, look at the available specializations and see if any of them match up with the incoming arguments. Provided one is found that matches, we can then call it.

My original approach to this was to simply have a list of specializations, each tagged with a callsite and, for each object argument index, an expected type, whether we wanted a type object or a concrete object, and – for container types like Scalar – what type we expected to find on the inside of the container. This was simple to implement, but rather inefficient. Even if all of the type specializations were for the same callsite, it would be compared for each of them. Alternatively, if there were 4 specializations and 3 were on the same callsite, and one was on a second callsite, we’d have to do 3 failed comparisons on it to reach the final one that we were hunting.

That might not sound overly bad, because comparing callsites is just comparing pointers, and so somewhat cheap (although it’s branching, and branches aren’t so friendly for CPUs). Where it gets worse is that parameter type checks worked the same way. Therefore, if there were 4 specializations of the same callsite, all of them against a Scalar argument with 4 different types of value inside of it, then the Scalar would have to be dereferenced up to 4 times. This isn’t ideal.

My work during the summer saw the introduction of a new, tree-structured, approach. Each node in the tree represents either an operation (load an argument to test, read a value from a Scalar container) with a single child node, or a test with two child nodes representing “yes” and “no”. The leaves of the tree either indicate which specialization to use, or “no result”.

The tree structure allows for loads, tests, and dereferences to be lifted out. Therefore, each argument needs to be loaded once, checked against a particular type once, and dereferenced once if it’s a container. So, if there were to be specializations for (Scalar:D of Int:D, Str:D) and (Scalar:D of Int:D, Num:D), then the first argument would be loaded one time and tested to see if it is a Scalar. If it is, then it will be dereferenced once, and the resulting value tested to see if it’s an Int. Both alternatives for the second argument are placed in the tree underneath this chain of tests, meaning that they do not need to be repeated.

Arg guard trees are dumped in the specializer log for debugging purposes. Here is how the output looks for the situation described above:

0: CALLSITE 0x7f5aa3f1acc0 | Y: 1, N: 0
1: LOAD ARG 0 | Y: 2
2: STABLE CONC Scalar | Y: 3, N: 0
3: DEREF_RW 0 | Y: 4, N: 0
4: DEREF_VALUE 0 | Y: 5, N: 0
5: STABLE CONC Int | Y: 6, N: 0
6: LOAD ARG 1 | Y: 7
7: STABLE CONC Int | Y: 8, N: 9
8: RESULT 0
9: STABLE CONC Str | Y: 10, N: 0
10: RESULT 1

As the output suggests, the argument guard tree is laid out in a single block of memory – an array of nodes. This gives good cache locality on the lookups, and – since argument guard trees are pretty small – means we can use a small integer type for the child node indices rather than requiring a pointer worth of space.

Immutability wins performance

Additional specializations are generated over time, but the argument guard tree is immutable. When a new specialization is generated, the existing argument guard tree is cloned, and the clone is modified to add the new result. That new tree is then installed in place of the previous one, and the previous one can be freed at the next safe point.

Why do this? Because it means that no locks need to be acquired to use a guard tree. In fact, since spesh runs on a single thread of its own, no locks are needed to update the guard trees either, since the single specializer thread means those updates are naturally serialized.

Calls between specialized code

In the last part of the series, I mentioned that part of specializing a call is to see if we can map it directly to a specialization. This avoids having to evaluate the argument guard tree of the target of the call, which is a decidedly nice saving. As a result, most uses of the argument guard are on the boundary between unspecialized and specialized code.

But how does the optimizer see if there’s a specialization of the target code that matches the argument types being passed? It does it by evaluating the argument guard tree – but on facts, not real values.

On Stack Replacement

Switching into specialized code at the point of a call handles many cases, but misses an important one: that where the hot code is entered once, then sits in a loop for a long time. This does happen in various real world programs, but it’s especially common in benchmarks. It’s highly desirable to specialize the hot loop’s code, if possible inlining things into the loop body and compiling the result into machine code.

I discussed detection of hot loops in an earlier part of this series. This time around, let’s take a look at the code for the osrpoint op:

OP(osrpoint):
    if (MVM_spesh_log_is_logging(tc))
        MVM_spesh_log_osr(tc);
    MVM_spesh_osr_poll_for_result(tc);
    goto NEXT;

The first part is about writing a log entry each time around the loop, which is what bumps the loop up in the statistics and causes a specialization to be generated. The call to MVM_spesh_osr_poll_for_result is the part that checks if there is a specialization ready, and jumps into it if so.

One way we could do this is to evaluate the argument guard in every call to MVM_spesh_osr_poll_for_result to see if there’s an appropriate optimization. That would get very pricey, however. We’d like the interpreter to make decent progress through the work until the optimized version of the code is ready. So what to do?

Every frame gets an ID on entry. By tracking this together with the number of specializations available last time we checked, we can quickly short-circuit running the argument guard when we know it will give the very same result as the last time we evaluated it, because nothing changed.

MVMStaticFrameSpesh *spesh = tc->cur_frame->static_info->body.spesh;
MVMint32 num_cands = spesh->body.num_spesh_candidates;
MVMint32 seq_nr = tc->cur_frame->sequence_nr;
if (seq_nr != tc->osr_hunt_frame_nr || num_cands != tc->osr_hunt_num_spesh_candidates) {
    /* Check if there's a candidate available and install it if so. */
    ...

    /* Update state for avoiding checks in the common case. */
    tc->osr_hunt_frame_nr = seq_nr;
    tc->osr_hunt_num_spesh_candidates = num_cands;
}

If there is a candidate that matches, then we jump into it. But how? The specializer makes a table mapping the locations of osrpoint instructions in the unspecialized code to locations in the specialized code. If we produce machine code, a label is also generated to allow entry into the code at that point. So, mostly all OSR does is jump execution into the specialized code. Sometimes, things are approximately as easy as they sound.

There’s a bonus feature hidden in all of this. Remember deoptimization, where we fall back to the interpreter to handle rarely occurring cases? This means we’ll encounter the osrpoint instructions in the unoptimized code again, and so – once the interpreter has done with the unusual case – we can enter back into the specialized, and possibly JIT-compiled code again. Effectively, spesh factors your slow paths out for you. And if you’re writing a module, it can do it differently based on different application’s use cases of the module.

Future idea: argument guard compilation to machine code

At the moment, the argument guard tree is walked by a little interpreter. However, on platforms where we have the capability, we could compile it into machine code. This would perhaps allow branch predictors to do a bit of a better job, as well as eliminate the overhead the interpreter brings (which, given the ops are very cheap, is much more significant here than in the main MoarVM interpreter).

That’s all, folks

I hope this series has been interesting, and provided some insight into how the MoarVM specializer works. My primary reason for writing it was to put my recent work on the specializer, funded by The Perl Foundation, into context, and I hope this has been a good bit more interesting than just describing the changes in isolation.

Of course, there’s no shortage of further opportunities for optimization work, and I will be reporting on more of that here in the future. I continue to be looking for funding to help make that happen, beyond what I can do in the time I have aside from my consulting work.


rakudo.org: Announce: Rakudo Star Release 2017.10

Published by Steve Mynott on 2017-11-09T15:53:10

A useful and usable production distribution of Perl 6

On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the October 2017 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for this release is available from https://rakudo.perl6.org/downloads/star/.

Binaries for macOS and Windows (64 bit) are also available at the same location.

This is a post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms). Currently, Star is on a quarterly release cycle.

IMPORTANT: “panda” has been removed from this release since it is deprecated. Please use “zef” instead.

Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

In the Perl 6 world, we make a distinction between the language (“Perl 6”) and specific implementations of the language such as “Rakudo Perl”.

This Star release includes release 2017.10 of the Rakudo Perl 6 compiler, version 2017.10 MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

The Rakudo compiler changes since the last Rakudo Star release of 2017.07 are now listed in “2017.08.md”, “2017.09.md”, and “2017.10.md” under the “rakudo/docs/announce” directory of the source distribution.

Notable changes in modules shipped with Rakudo Star:

+ DBIish: Newer version (doesn't work with 2017.01 anymore)
+ Test-META: New. also dependencies (JSON-Class, JSON-Marshal, JSON-Name, JSON-Unmarshal and META6)
+ doc: Too many to list. p6doc-index merged into p6doc and index built on first run.
+ p6-Template-Mustache: Fixed on Windows.
+ panda: Removed. Stub added to warn about this.
+ perl6-datetime-format: New (heavily used in ecosystem)
+ perl6-file-which: Fixed tests on Windows
+ tap-harness6: Many fixes.
+ zef: New version with many fixes

There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

There is an online resource at http://perl6.org/compilers/features that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at rakudobug@perl.org.

See https://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org mailing list, or join us on IRC #perl6 on freenode.

Weekly changes in and around Perl 6: 2017.45 Uplink Established

Published by liztormato on 2017-11-06T22:38:58

Elizabeth Mattijsen got carried away trying to devise an API for getrusage. The result is a new set of modules available with use Telemetry. And a helper module for easy activation: snapper. The simplest use case: perl6 -Msnapper yourscript.pl6. This will start up a separate thread that will take snapshots every 0.1 seconds. Once it is done, it will display a report much like this on STDERR:

Telemetry Report of Process #72908 (2017-11-06T21:26:26Z)
Number of Snapshots: 7
Supervisor thread ran the whole time
Initial Size:        67676 Kbytes
Total Time:           0.58 seconds
Total CPU Usage:      0.59 seconds

wallclock  util%  max-rss
   105476  13.28    17680
   105220  13.27       12
   105178  12.73     1024
   105149  12.85      152
   105141  12.42        4
    49525  12.98       12
--------- ------ --------
   575689  12.92    18884

Legend:
wallclock  Number of microseconds elapsed
    util%  Percentage of CPU utilization (0..100%)
  max-rss  Maximum resident set size (in Kbytes)

Even this hot off the press, Wenzel P. P. Peppmeyer took this new development to actually create a distribution that runs an httpd daemon in your program, so that you can look at usage data of your (long running program) in your browser!

But that’s only the tip of the iceberg: there are also a number of ways to get at system information ad-hoc, such as:

$ perl6 -MTelemetry -e 'say T<wallclock cpu max-rss>'
(140878 264269 84756)

Or plan your snapshots and reporting more precisely:

while $working {
    snap;
    LAST snap;
    # do a lot of work
}
say report(:csv)  # output in CSV format

Other features of Telemetry include a Telemetry::Instrument role for creating your own “instruments” for plugging into this framework (with two Instruments already supplied: Telemetry::Instrument::Usage (basically giving all the data of getrusage + wallclock in microseconds) and Telemetry::Instrument::ThreadPool (giving all the data pertaining to actions of the $*SCHEDULER, such as being able to see when new worker threads are started). Exciting times!

LPW Rakudo Perl 6 Presentations

Well, a program as such hasn’t been decided yet, but it looks like the following Rakudo Perl 6 related presentations will be given at the London Perl Workshop on 25 November 2017 at the University of Westminster:

Hope to see you there!

Optimizing Code

Jonathan Worthington published another part of his MoarVM Specializer Improvements series: Optimizing Code. A long read, but if you fancy understanding MoarVM capabilities better, it is highly recommended reading indeed!

Building a Single Page Application with Cro

This seem to have slipped through the cracks the past weeks. Jonathan Worthington describes how to set up a simple Single Page Application using Cro as the backend, and webpack, ES6, React and Redux as the frontend (no prior knowledge needed). It shows a SPA for a (food/beer) festival where people could leave their tips about what’s hot and what’s not, and being able to see them in real time. It supports:

A must read if you are looking into building fully interactive web-applications, specifically using all of the asynchronous functionality that Rakudo Perl 6 offers today!

Other Core Developments

Other Blog Posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on perl6-users

Winding down

Not a small Perl 6 Weekly again this week. It’s almost become a dayjob! But a fun one! So keep all the new stuff coming so yours truly can report them again in the next issue of the Perl 6 Weekly. See you then!


6guts: MoarVM Specializer Improvements Part 3: Optimizing Code

Published by jnthnwrthngtn on 2017-11-05T22:51:20

Finally, after considering gathering data and planning what to optimize, it’s time to look at how MoarVM’s dynamic optimizer, “spesh”, transforms code into something that should run faster.

Specialization

The name “spesh” is short for “specializer”, and this nicely captures the overall approach taken. The input bytecode the optimizer considers is full of genericity, and an interpreter of that bytecode needs to handle all of the cases. For example, a method call depends on the type of the invocant, and in the general case that may vary from call to call. Worse, while most of the time the method resolution might just involve looking in a cache, in other cases we might have to call into find_method to do a more dynamic dispatch. That in turn means method lookup is potentially an invocation point, which makes it harder to reason about the state of things after the method lookup.

Of course, in the common case, it’s a boring method lookup that will be found in the cache, and in many cases there’s only one invocant type that shows up at a given callsite at runtime. Therefore, we can specialize the code for this case. This means we can cache the result of the method lookup, and since that is now a constant, we may be able to optimize the call itself using that knowledge.

This specialization of method resolution is just one of many kinds of specialization that we can perform. I’ll go through a bunch more of them in this post – but before that, let’s take a look at how the code to optimize is represented, so that these optimizations can be performed conveniently.

Instruction Nodes, Basic Blocks, and CFGs

The input to the optimization process is bytecode, which is just a bunch of bytes with instruction codes, register indexes, and so forth. It goes without saying that doing optimizations by manipulating that directly would be pretty horrible. So, the first thing we do is make a pass through the bytecode and turn every instruction in the bytecode into an instruction node: a C structure that points to information about the instruction, to the previous and next instructions, and contains information about the operands.

This is an improvement from the point of view of being able to transform the code, but it doesn’t help the optimizer to “understand” the code. The step that transforms the bytecode instructions into instruction nodes also performs a second task: it marks the start and end of each basic block. A basic block is a sequence of instructions that will run one after the other, without any flow control. Actually, this definition is something we’ve gradually tightened up over time, and was particularly refined during my work on spesh over the summer. By now, anything that might invoke terminates a basic block. This means that things like findmethod (which may sometimes invoke find_method on the meta-object) and decont (which normally takes a value out of a Scalar container, but may have to invoke a FETCH routine on a Proxy) terminate basic blocks too. In short, we end up with a lot of basic blocks, which makes life a bit harder for those of us who need to read the spesh debug output, but experience tells that the more honest we are about just how many things might cause an invocation, the more useful it is in terms of reasoning about the program and optimizing it.

The basic blocks themselves are only so useful. The next step is to arrange them into a Control Flow Graph (CFG). Each basic block is represented by a C structure, and has a list of its successors and predecessors. The successors of a basic block are those basic blocks that we might end up in after this one. So, if it’s a condition that may be taken or not, then there will be two successors. If it’s a jump table, there may be a lot more than two.

Once we have a control flow graph, it’s easy to pick out basic blocks that are not reachable from the root of the graph. These could never be reached in any way, and are therefore dead code.

One thing that we need to take special care of is exception handlers. Since they work using dynamic scope, then they may be reached only through a callee, but we are only assembling the CFG for a single routine or block. We need to make sure they are considered reachable, so they aren’t deleted as dead code. One safe and conservative way to do this is to add a dummy entry node and link them all from it. This is what spesh did for all exception handlers, until this summer. Now, it only does this for catch handlers. Control exception handlers are instead made reachable by linking them into the graph as a successor to all of the basic blocks that are in the region covered by the handler. Since these are used for redonext, and last handling in loops, it means we get much more precise flow control information for code with loops in. It also means that control exception handlers can now be eliminated as dead code too, if all the code they cover goes away. Making this work out meant doing some work to remove assumptions in various areas of spesh that exception handlers were never eliminated as dead code.

Dominance and SSA form

The control flow graph is a step forward, but it still falls short of letting us do the kind of analysis we’d really like. We’d also like to be able to propagate type information through the graph. So, if we know a register holds an Int, and then it is assigned into a second register, we’d like to mark that register as also containing an Int. The challenge here is that register allocation during code generation wants to minimize the number of registers that are needed – a good thing for memory use – and so a given register will have many different – and unrelated – usages over time.

To resolve this, the instructions are transformed into Static Single Assignment form. Each time a register is assigned to in the bytecode (this is where the “static” comes from), it is given a new version number. Thus, the first assignment to r2 would be named r2(1), the second r2(2), and so forth. This helps enormously, because now information can be stored about these renamed variables, and it’s easy to access the correct information upon each usage. Register re-use is something we no longer have to worry about in this model.

So far so straightforward, but of course there’s a catch: what happens when we have conditionals and loops?

  if_i r1(1) goto LABEL1
  set r2(1), r3(1)
  goto LABEL2
LABEL1:
  set r2(2), r4(1)
LABEL2:      
  # Which version of r2 reaches here???

This problem is resolved by the use of phi nodes, which “merge” the various versions at “join points”:

  if_i r1(1) goto LABEL1
  set r2(1), r3(1)
  goto LABEL2
LABEL1:
  set r2(2), r4(1)
LABEL2:      
  PHI r2(3), r2(1), r2(2)

The placement of these phi nodes is an interesting problem, and that’s where dominance comes in. Basic block B1 dominates basic block B2 if every possible path to B2 in the flow control graph must pass through B1. There’s also a notion of immediate dominance, which you can think of as the “closest” dominator of a node. The immediate dominators form a tree, not a graph, and this tree turns out to be a pretty good order to explore the basic blocks in during analysis and optimization. Last but not least, there’s also the concept of dominance frontiers – that is, where a node’s dominance ends – and this in turn tells us where the phi nodes should be placed.

Thankfully, there’s a good amount of literature on how to implement all of this efficiently and, somewhat interestingly, the dominance algorithm MoarVM picked is a great example of how a cubic algorithm with a really low constant factor can beat a n*log(n) algorithm on all relevant problem sizes (20,000 basic blocks or so – which is a lot).

Facts

Every SSA register has a set of facts associated with it. These include known type, known value, known to be concrete, known to be a type object, and so forth. These provide information that is used to know what optimizations we can do. For example, knowing the type of something can allow an istype operation to be turned into a constant. This means that the facts of the result of that istype now include a known value, and that might mean a conditional can be eliminated entirely.

Facts are, in fact, something of a relative concept in spesh. In the previous parts of this series, I discussed how statistics about the types that show up in the program are collected and aggregated. Of course, these are just statistics, not proofs. So can these statistics be used to introduce facts? It turns out they can, provided we are willing to add a guard. A guard is an instruction that cheaply checks that reality really does match up with expectations. If it does not, then deoptimization is triggered, meaning that we fall back to the slow-path code that can handle all of the cases. Beyond the guard, the facts can be assumed to be true. Guards are another area that I simplified and made more efficient during my work in the summer.

Phi nodes and facts have an interesting relationship. At a phi node, we need to merge the facts from all of the joined values. We can only merge those things that all of the joined values agree on. For example, if we have a phi with two inputs, and they both have a fact indicating a known type Int, then the result of the phi can have that fact set on it. By contrast, if one input thinks it’s an Int and the other a Rat, then we don’t know what it is.

During my work over the summer, I also introduced the notion of a “dead writer”. This is a fact that is set when the writer of a SSA register is eliminated as dead code. This means that the merge can completely ignore what that input thinks, as it will never be run. A practical example where this shows up is with optional parameters:

sub process($data, :$normalize-first) {
    if $normalize-first {
        normalize($data);
    }
    do-stuff-with($data);
}

Specializations are produced for a particular callsite, meaning that we know if the normalize-first argument is passed or not. Inside of the generated code, there is a branch that sticks an Any into $normalize-first if no argument is received. Before my change, the phi node would not have any facts set on it, as the two sides disagreed. Now, it can see that one side of the branch is gone, and the merge can keep whichever facts survive. In the case that type named argument was not passed, then the if can also be eliminated entirely, as a type object is falsey.

Specialization by arguments

So, time to look at some of the specializations! The first transformations that are applied relate to the instructions that bind the incoming arguments into the parameter variables that are to receive them. Specializations are keyed on callsite, and a callsite object indicates the number of arguments, whether any of them are natively typed, and – for named arguments – what names the arguments have. The argument processing code can be specialized for the callsite, by:

Attribute access specialization

Attribute access instructions pass the name of the attribute together with the type the attribute was declared in. This is used to resolve the location of the attribute in the object. There is a static optimization that adds a hint, which has to be verified to be in range at runtime and that does not apply when multiple inheritance is used.

When the spesh facts contain the type of the object that the attribute is being looked up in, then – after some sanity checks – this can be turned into something much cheaper: a read of the memory location at an offset from the pointer to the object. Both reads and writes of object attributes can receive this treatment.

Decontainerization specialization

Many values in Perl 6 live inside of a Scalar container. The decont instruction is used to take a value out of a container (which may be a Scalar or a Proxy). Often, decont is emitted when there actually is no container (because it’s either impossible or too expensive to reason about that at compile time). When we have the facts to show it’s not being used on a container type, the decont can be rewritten into a set instruction. And when the facts show that it’s a Scalar, then it can optimized into a cheap memory offset read.

Method resolution specialization

When both the name of a method and the type it’s being resolved on are known from the facts, then spesh looks into the method cache, which is a hash. If it finds the method, it turns the method lookup into a constant, setting the known value fact along the way. This, in the common case, saves a hash lookup per method call, which is already something, but knowing the exact method that will be run opens the door to some very powerful optimizations on the invocation of the resolved method.

Invocation specialization

Specialization of invocations – that is, calling of subs, methods, blocks, and so forth – is one of the most valuable things spesh does. There are quite a few combinations of things that can happen here, and my work over the summer made this area of spesh a good deal more powerful (and, as a result, took some time off various benchmarks).

The most important thing for spesh to know is what, exactly, is the target of the invocation. In the best case, it will be a known value already. This is the case with subs resolved from the setting or from UNIT, as well as in the case that a findmethod was optimized into a constant. These used to be the only cases, until I also added recording statistics on the static frame that was the target of an invocation. These are analyzed and, if pretty stable, then a guard can be inserted. This greatly increases the range of invocations that can be optimized. For example, a library that can be configured with a callback may, if the program uses a consistent callback, and the callback is small, have the callback’s code inlined into the place that calls it in the library code.

The next step involves seeing if the target of the invocation is a multi sub or method. If so, then the argument types are used to do a lookup into the multi dispatch cache. This means that spesh can potentially resolve a multi-dispatch once at optimization time, and just save the result.

Argument types are worth dwelling on a little longer. Up until the summer, the facts were consulted to see if the types were known. In the case there was no known type from the facts, then the invocation would not be optimized. Now, using the far richer set of statistics, when a fact is missing but the statistics suggest there is a very consistent argument type, then a guard can be inserted. This again increases the range of invocations that can be optimized.

So, by this point a multi dispatch may have been resolved, and we have something to invoke. The argument types are now used to see if there is a specialization of the target of the invocation for those arguments. Typically, if it’s on a hot path, there will be. In that case, then the invocation can be rewritten to use an instruction that pre-selects the specialization. This saves a lot of time re-checking argument types, and is a powerful optimization that can reduce the cost of parameter type checks to zero.

Last, but very much not least, it may turn out that the target specialization being invoked is very small. In this case, a few more checks are performed to see if the call really does need its own invocation record (for example, it may do so if it has lexicals that will be closed over, or if it might be the root of a continuation, or it has an exit handler). Provided the conditions are met, the target may be inlined – that is, have its code copied – into the caller. This saves the cost of creating and tearing down an invocation record (also known as call frame), and also allows some further optimizations that consider the caller and callee together.

Prior to my work in the summer, it wasn’t possible to inline anything that looked up lexical variables in an outer scope (that is to say, anything that was a closure). Now that restriction has been lifted, and closures can be inlined too.

Finally, it’s worth noting that MoarVM also supports inlining specializations that themselves contain code that was inlined. This isn’t too bad, except that in order to support deoptimization it has to also be able to do multiple levels of uninlining too. That was a slight headache to implement.

Conditional specialization

This was mentioned in passing earlier, but worth calling out again: when the value that is being evaluated in a conditional jump is known, then the branch can be eliminated. This in turn makes dead code of one side of the branch or the other. This had been in place for a while, but in my work during the summer I made it handle some further cases, and also made it more immediately remove the dead code path. That led to many SSA registers being marked with the dead writer fact, and thus led to more optimization possibilities in code that followed the branch.

Control exception specialization

Control exceptions are used for things like next and last. In the general case, their target may be some stack frames away, and so they are handled by the exception system. This walks down the stack frames to find a handler. In some cases, however – and perhaps thanks to an inlining – the handler and the throw may be in the same frame. In this case, the throw can be replaced with a simple – and far cheaper – goto instruction. We had this optimization for a while, but it was only during the summer that I made it work when the throwing part was in code that had been inlined, making it far more useful for Perl 6 code.

In summary

Here we’ve seen how spesh takes pieces of a program, which may be full of generalities, and specializes them for the actual situations that are encountered when the program is run. Since it operates at runtime, it is able to do this across compilation units, meaning that a module’s code may be optimized in quite different ways according to the way a particular consumer uses it. We’ve also discussed the importance of guards and deoptimization in making good use of the gathered statistics.

You’ll have noticed I mentioned many changes that took place “in the summer”, and this work was done under a grant from The Perl Foundation. So, thanks go to them for that! Also, I’m still looking for sponsors.

In the next, and final, part of this series, we’ll take a look at argument guards, which are used to efficiently pick which specialization to use when crossing from unspecialized to specialized code.


gfldex: Racing Rakudo

Published by gfldex on 2017-11-05T17:39:33

In many racing sports telemetry plays a big role in getting faster.  Thanks to a torrent of commits by lizmat you can use telemetry now too!

perl6 -e 'use Telemetry; snapper(½); my @a = (‚aaaa‘..‚zzzz‘).pick(1000); say @a.sort.[*-1 / 2];'
zyzl
Telemetry Report of Process #30304 (2017-11-05T17:24:38Z)
No supervisor thread has been running
Number of Snapshots: 31
Initial Size:        93684 Kbytes
Total Time:          14.74 seconds
Total CPU Usage:     15.08 seconds

wallclock  util%  max-rss  gw      gtc  tw      ttc  aw      atc
   500951  53.81     8424
   500557  51.92     9240
   548677  52.15    12376
   506068  52.51      196
   500380  51.94     8976
   506552  51.74     9240
   500517  52.45     9240
   500482  52.33     9504
   506813  51.67     6864
   502634  51.63
   500520  51.78     6072
   500539  52.13     7128
   503437  52.29     7920
   500419  52.45     8976
   500544  51.89     8712
   500550  49.92     6864
   602948  49.71     8712
   500548  50.33
   500545  49.92      320
   500518  49.92
   500530  49.92
   500529  49.91
   500507  49.92
   506886  50.07
   500510  49.93     1848
   500488  49.93
   500511  49.93
   508389  49.94
   508510  51.27      264
    27636  58.33
--------- ------ -------- --- -------- --- -------- --- --------
 14738710  51.16   130876

Legend:
wallclock  Number of microseconds elapsed
    util%  Percentage of CPU utilization (0..100%)
  max-rss  Maximum resident set size (in Kbytes)
       gw  The number of general worker threads
      gtc  The number of tasks completed in general worker threads
       tw  The number of timer threads
      ttc  The number of tasks completed in timer threads
       aw  The number of affinity threads
      atc  The number of tasks completed in affinity threads

The snapper function takes an interval at which data is collected. On termination of the program the table above is shown.

The module comes with plenty of subs to collect the same data at hand and file your own report. What may be sensible in long running processes. Or you call the reporter sub by hand every now and then.

use Telemetry;

react {
    snapper;
    whenever Supply.interval(60) {
        say report;
    }
}

If the terminal wont cut it you can use http to fetch telemetry data.

Documentation isn’t finished nor is the module. So stay tuning for more data.


Weekly changes in and around Perl 6: 2017.44 Nom Mastered

Published by liztormato on 2017-10-30T21:46:43

It has been a long time coming, but finally the default branch of Rakudo Perl 6 is no longer called nom (for New Object Model), but called master (which is what most git tools take as the default default). More than 6 years after it became the default branch. For a lot of people involved in the development of Rakudo Perl 6 it almost feels like a rite of passage. Zoffix Znet keeps us up to date on these developments.

Advent Calendar 2017

It’s almost that time of the year again, with just over a month to go: the Rakudo Perl 6 Advent Calendar. Zoffix Znet issued the Call for Authors. Have an idea for a blog post? Don’t wait too long with it: only 15 slots left at the moment of this writing!

Rakudo 2017.10 Released

After a long period of labor, compiler release Rakudo 2017.10 has seen the light after the last showstoppery issues were resolved. The third release that Aleks-Daniel Jakimenko-Aleksejev has done in a row! Not too long after that Claudio Ramirez made sure the world knows about the Linux packages he has created from that compiler release.

New Rakudo Star in the works

Steve Mynott tells us that there is now also an RC0 (Release Candidate #0) release of Rakudo Star 2017.10 (draft announcement). An important change with previous versions of Rakudo Star is that panda is no longer shipped because it has been deprecated in favour of zef.

Binary RC0 releases for MacOS (DMG) and 64-bit Windows (MSI) and a new experimental linux cross distro 64 bit build (using AppImage) are also available. The AppImage was built using scripts by Samantha McVey and should work on any recent (3 year old or newer) mainstream Linux distro by setting as executable and running (no installation needed).

Please download and test! Please give any feedback on the #perl6 IRC channel, or create an Issue. It’s hoped the final Rakudo Star 2017.10 release will take place around November 8th.

Squashaton Again!

This week sees the first Saturday of the month, so that means Squashaton time! This time the focus will be on the documentation again, aka the doc repository.

Books, Books, Books!

Andrew Shitov announced a new book “Using Perl 6” for immediate availability. On FaceBook he described it like this:

About a year ago, I decided to write a book about using Perl 6. Later, the plans changed and I published “Perl 6 at a Glance”, after which I wanted to write “Migrating to Perl 6” but instead wrote “Perl 6 Deep Dive” for Packt Publishing. Here and there, I was giving trainings on Perl 5, Python, and JavaScript, and was always suffering from finding a good list of easy tasks that a newcomer can use to train their skills in the language. Finally, I made up the list and solved it in Perl 6. This is the content of “Using Perl 6” — a book with solutions to 100 programming challenges, from simple to intermediate, with explanations of the Perl 6 constructions used. (…) The PDF version can be bought and downloaded already today.

Meanwhile, it became clear that Moritz Lenz has another book planned for late December: Parsing with Perl 6 Regexes and Grammars. I wonder how long it will take before this overview becomes too crowded! Who would have thought that a year ago?

Near Future Of Programming Languages

Stephen Diehl provided an overview last January. Guess which language is missing? Fortunately, some recent comments on Hacker News (this and that) picked up on that.

my %h is Set = ...;

Elizabeth Mattijsen made my %h is Set = <a b c> finally work (of course, also for the other QuantHash types: SetHash, Bag, BagHash, Mix and MixHash). This can be especially useful for the mutable QuantHash types, as you can now:

my %h is SetHash = (integers from some source);
%h .= grep: *.is-prime; # only keep the prime numbers in %h

And any other action that you would like to do “in place”.

Other Core Developments

Most of the tuits this week were spent on fixing the showstoppery issues that were blocking the release. Aleks-Daniel Jakimenko-Aleksejev provided an overview of ticket activity of the past two weeks that more or less reflects that. Other notable developments were:

Blog Posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on perl6-users

Meanwhile on PerlMonks

Winding Down

As one of the distractions this week, I was looking at Code Golf, which has a lot of entries for Rakudo Perl 6. A good sight to see! Now, if I could only figure out how one can see the actual code samples. Perhaps next week. Join us then for another Perl 6 Weekly!


rakudo.org: Main Development Branch Renamed from “nom” to “master”

Published by Zoffix Znet on 2017-10-27T00:35:03

If you track latest rakudo changes by means of a local checkout of the development repository, please take notice that we renamed our unusually-named main development branch from nom to the more traditional name master

This branch will now track all the latest changes. Attempting to build HEAD of the nom branch will display a message at some point during the build stage, informing that the branch was renamed. The build will then wait for user to press ENTER if they want to continue.

It’s expected this change will cause some fallout and this page will be updated with any useful instructions, as needed. For more information join our #perl6-dev IRC chat channel on irc.freenode.net


UPDATE 1: the blocking message has been disabled for now, to avoid too much impact on any of the bleeding edge users’ automations.


Rakudobrew

Rakudobrew has been updated to default to master branch now. Note that you need to upgrade rakudobrew to receive that change; run rakudobrew self-upgrade

Until updated, rakudobrew‘s standard instructions will build the no-longer actively updated nom branch.

To build latest and greatest rakudo use rakudobrew build moar master instead of rakudobrew build nom. In general, we advise against using rakudobrew if the goal is simply to track development version of the compiler. Instead, we recommend to build directly from repository. Regular end-users should continue using pre-built Rakudo Star distributions.

Zoffix Znet: Rakudo Perl 6 Advent Calendar 2017 Call for Authors

Published on 2017-10-24T00:00:00

Write a blog post about Rakudo Perl 6

Weekly changes in and around Perl 6: 2017.43 Hyper lands, racing…

Published by liztormato on 2017-10-23T21:37:23

The past week saw reliable support for .hyper and .race arrive, as described in Jonathan Worthington‘s blog post earlier this year. Now some operations can be made about 3x faster with just a few keystrokes. To give an example, these are some ways of obtaining the prime numbers below 100000:

$ time perl6 -e 'my @primes = ^100000 .grep: *.is-prime'
real 0m26.428s
user 0m26.440s

Where real is the wallclock time that passed, and user is the amount of CPU-time that was used. Now, by adding a simple .hyper:

$ time perl6 -e 'my @primes = ^100000 .hyper.grep: *.is-prime'
real 0m9.288s
user 0m31.354s

We go from 26 to 9 seconds wallclock time, which is almost 3x as fast. And if you’re not interested in the order of the final result, you can use .race:

$ time perl6 -e 'my @primes = ^100000 .race.grep: *.is-prime'
real 0m8.335s
user 0m32.106s

Which is well over 3x as fast.

A more real-world example is the test-t.pl test of Text::CSV. The --hyper version typically runs for 1.14 seconds, whereas the serial (normal) version runs for 2.57 seconds (timings all made on the quadcore machine that yours truly is using, so YMMV).

To execute code that only needs to be run once in a batch, you can use a once block:

$ perl6 -e '^100 .hyper.map: { once { say "started with $_" } }'
started with 0
started with 64

Unfortunately, it is not yet possible to use phasers inside hypered/raced blocks, mainly because we still need to figure out when e.g. a LAST phaser should fire: at the end of a batch? Or after the final call of the final batch?

Casual Contributors Welcome!

Ann Barcomb describes a number of useful strategies for managing what she calls “episodic contributors” in her blog post: “How to manage casual contributors to open source projects. I think we can all learn from this, so recommended reading!

More Perl 6 Videos

The uploads page of TPCiA videos shows some more Rakudo Perl 6 related videos this week:

Also, Rakudo Perl 6 – A Primer by Simon Proctor (from the last London Perl Mongers Technical Meeting). Enjoy!

2017.10 Release Delayed

Some showstoppery issues with Rakudo are delaying the 2017.10 release. As this release will most likely be used for the next Rakudo Star release, and thus will be used longer and more extensively, it was considered better to wait until these issues are dealt with, at least temporarily. So stay tuned!

Other Core Developments

As promised last week, a roundup of other core developments of the past 2 weeks:

Blog Posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Meanwhile on perl6-users

Meanwhile on PerlMonks

Winding Down

Alas, no recent module information on CPAN just yet. We’ve been all too busy fixing bugs before the 2017.10 release. But you should check out the new MetaCPAN-like search functionality of modules.perl6.org! And while you all do that, I’ll be preparing for next week’s Perl 6 Weekly. See you then!


gfldex: There Is More Than One Way At The Same Time

Published by gfldex on 2017-10-22T21:21:16

The Perl 6 Rosattacode section for parallel calculations is terribly outdated and missing all the goodies that where added or fixed in the last few weeks. With this post I would like to propose an updated version for Rosettacode. If you believe that I missed something feel free to comment below. Please keep in mind that Rosettacode is for showing off, not for being comprehensive.

use v6.d.PREVIEW;

 

Perl 6 provides parallel execution of code via threads. There are low level custructs that start a thread or safely pause execution.

my $t1 = Thread.start({ say [+] 1..10_000_000 });
my $t2 = Thread.start({ say [*] 1..10_000 });
$t1.finish;
$t2.finish;

my $l = Lock.new;
$l.lock;
$t1 = Thread.start: { $l.lock; say 'got the lock'; $l.unlock };
sleep 2; $l.unlock;

$t1.finish;

When processing lists, one can use a highlevel Iterator created by the methods hyper and race. The latter may return values out of order. Those Iterators will distribute the elements of the list to worker threads that are in turn assigned to OS level threads by Rakudos ThreadPoolScheduler. The whole construct will block until the last element is processed.

my @values = 1..100;

sub postfix:<!> (Int $n) { [*] 1..$n }

say [+] @values.hyper.map( -> $i { print '.' if $i %% 100; $i!.chars });

For for-lovers there are the race for and hyper for keyword for distributing work over threads in the same way as their respective methods forms.

race for 1..100 {
    say .Str; # There be out of order dragons!
}

my @a = do hyper for 1..100 {
   .Int! # Here be thread dragons!
}

say [+] @a;

Perl 6 sports constructs that follow the reactive programming model. One can spin up many worker threads and use threadsafe Channels or Supplys to move values from one thread to another. A react-block can combine those streams of values, process them and react to conditions like cleaning up after a worker thread is done producing values or dealing with errors. The latter is done by bottling up Exception-objects into Failure-objects that keep track of where the error first occured and where it was used instead of a proper value.

my \pipe = Supplier::Preserving.new;

start {
    for $*HOME {
        pipe.emit: .IO if .f & .ends-with('.txt');

        say „Looking in ⟨{.Str}⟩ for files that end in ".txt"“ if .IO.d;
        .IO.dir()».&?BLOCK when .IO.d;

        CATCH {
            default {
                note .^name, ': ', .Str;
                pipe.emit: Failure.new(.item);
            }
        }
    }
    pipe.done;
}

react {
    whenever pipe.Supply {
        say „Checking ⟨{.Str}⟩ for "Rosetta".“;
        say „I found Rosetta in ⟨{.Str}⟩“ if try .open.slurp.contains('Rosetta');
        LAST {
            say ‚Done looking for files.‘;
            done;
        }
        CATCH {
            default {
                note .^name, ': ', .Str;
            }
        }
    }
    whenever Promise.in(60*10) {
        say „I gave up to find Rosetta after 10 minutes.“;
        pipe.done;
        done;
    }
}

Many build-in objects will return a Supply or a Promise. The latter will return a single value or just convey an event like a timeout. In the example above we used a Promise in that fashion. Below we shell out to find and process its output line by line. This could be used in a react block if there are many different types of events to process. Here we just tap into a stream of values and process them one by one. Since we don’t got a react block to provide a blocking event loop, we wait for find to finish with await and process it’s exitcode. Anything inside the block given to .tap will run in its own thread.

my $find = Proc::Async.new('find', $*HOME, '-iname', '*.txt');
$find.stdout.lines.tap: {
    say „Looking for "Rosetta" in ⟨$_⟩“;
    say „Found "Rosetta" in ⟨$_⟩“ if try .open.slurp.contains('Rosetta');
};

await $find.start.then: {
    say „find finished with exitcode: “, .result.exitcode;
};

Having operators process values in parallel via threads or vector units is yet to be done. Both hyper operators and Junctions are candidates for autothreading. If you use them today please keep in mind side effects may provide foot guns in the future.


Weekly changes in and around Perl 6: 2017.42 Taking Ticketing Seriously

Published by liztormato on 2017-10-16T22:02:12

Aleks-Daniel Jakimenko-Aleksejev has been ticketing away! The result can be seen in an overview of status changes in RT tickets of the past week or so. He explains it like this:

The high number of [REGRESSION] tickets is caused by the concentrated effort to detect any unintentional changes in Rakudo Perl 6. I extracted all messages to the evalbot (e.g. “m:” on the #perl6 channel) from the last few years, and passed these code snippets to different versions of Rakudo Perl 6. These snippets were filtered by comparing the output and exit code, which resulted in a list of thousands of samples that produce different results on different versions. Most of these differences are due to bug fixes or other harmless reasons (e.g. current time being shown in the output), but there were some notable regressions. All regressions noticed during the manual inspection are now filed, and some of them are already resolved (with tests, of course). This is the second time we see dedicated effort to find otherwise hard-to-notice regressions, first time being in December 2016.

So it should be easier to keep the sanity of Rakudo Perl 6 in check!

One more TPCiA Rakudo Perl 6 Video

Samantha McVey‘s High End Unicode in Rakudo Perl 6 has arrived! Of course, you can still keep checking for more uploads in the coming weeks.

Blog Posts

Meanwhile on Twitter

  • Please welcome, CPAN6 by builtinperl.
  • Rakudo Perl 6 BuildPack by brian d foy.
  • 20% shaved off by Zoffix Znet.
  • Debugging an Amazon issue by Moritz Lenz.
  • Whole day Rakudo Perl 6 Tutorial at TPCiG by Jeff Goff.
  • !perl6 at DuckDuckGo by Zoffix Znet.
  • int32.new shouldn’t give you an Int by Zoffix Znet.
  • Meanwhile on StackOverflow

    Meanwhile on perl6-users

    Meanwhile on PerlMonks

    Lately, there have been some Rakudo Perl 6 related Meditations on PerlMonks. Yours truly feels they should be mentioned in the Perl 6 Weekly as well, although some troll hugging may be needed:

    Ecosystem Additions

    This week will probably be the last time you will see the ecosystem additions mentioned in the Perl 6 Weekly. Thanks to Zoffix Znet and Aleks-Daniel Jakimenko-Aleksejev we will soon have a page on that will show the recent uploads to CPAN (the new preferred way putting your module out into the wild). So here goes for the last time:

    Winding Down

    Being too tired to investigate the latest Core Developments of the past week, I will keep them from you until next week. Which will also coincide with the 2017.10 release of Rakudo Perl 6! So please check in again next week!


    gfldex: It’s Classes All The Way Down

    Published by gfldex on 2017-10-08T17:23:05

    While building a cache for a web api that spits out JSON I found myself walking over the same data twice to fix a lack of proper typing. The JSON knows only about strings even though most of the fields are integers and timestamps. I’m fixing the types after parsing the JSON with JSON::Fast by coercively .map-ing .

    @stations.=hyper.map: { # Here be heisendragons!
        .<lastchangetime> = .<lastchangetime>
            ?? DateTime.new(.<lastchangetime>.subst(' ', 'T') ~ 'Z', :formatter(&ISO8601))
            !! DateTime;
        .<clickcount> = .<clickcount>.Int;
        .<lastcheckok> = .<lastcheckok>.Int.Bool;
    
        (note "$_/$stations-count processed" if $_ %% 1000) with $++;
    
        .Hash
    };
    

    The hyper helps a lot to speed things up but will put a lot of stress on the CPU cache. There must be a better way to do that.

    Then lizmat showed where Rakudo shows its guts.

    m: grammar A { token a { }; rule a { } }
    OUTPUT: «5===SORRY!5=== Error while compiling <tmp>␤Package 'A' already has a regex 'a' 
    (did you mean to declare a multi-method?)␤
    

    Tokens are regex or maybe methods. But if tokens are methods then grammars must be classes. And that allows us to subclass a grammar.

    grammar WWW::Radiobrowser::JSON is JSON {
        token TOP {\s* <top-array> \s* }
        rule top-array      { '[' ~ ']' <station-list> }
        rule station-list   { <station> * % ',' }
        rule station        { '{' ~ '}' <attribute-list> }
        rule attribute-list { <attribute> * % ',' }
    
        token year { \d+ } token month { \d ** 2 } token day { \d ** 2 } token hour { \d ** 2 } token minute { \d ** 2 } token second { \d ** 2}
        token date { <year> '-' <month> '-' <day> ' ' <hour> ':' <minute> ':' <second> }
    
        token bool { <value:true> || <value:false> }
    
        token empty-string { '""' }
    
        token number { <value:number> }
    
        proto token attribute { * }
        token attribute:sym<clickcount> { '"clickcount"' \s* ':' \s* '"' <number> '"' }
        token attribute:sym<lastchangetime> { '"lastchangetime"' \s* ':' \s* '"' <date> '"' }
        token attribute:sym<lastcheckok> { '"lastcheckok"' \s* ':' \s* '"' <bool> '"' }
    }
    

    Here we overload some tokens and forward calls to tokens that got a different name in the parent grammar. The action class follows suit.

    class WWW::Radiobrowser::JSON::Actions is JSON::Actions {
        method TOP($/) {
            make $<top-array>.made;
        }
        method top-array($/) {
            make $<station-list>.made.item;
        }
        method station-list($/) {
            make $<station>.hyper.map(*.made).flat; # Here be heisendragons!
        }
        method station($/) {
            make $<attribute-list>.made.hash.item;
        }
        method attribute-list($/) {
            make $<attribute>».made.flat;
        }
        method date($_) { .make: DateTime.new(.<year>.Int, .<month>.Int, .<day>.Int, .<hour>.Int, .<minute>.Int, .<second>.Num) }
        method bool($_) { .make: .<value>.made ?? Bool::True !! Bool::False }
        method empty-string($_) { .make: Str }
    
        method attribute:sym<clickcount>($/) { make 'clickcount' => $/<number>.Int; }
        method attribute:sym<lastchangetime>($/) { make 'lastchangetime' => $/<date>.made; }
        method attribute:sym<lastcheckok>($/) { make 'lastcheckok' => $/<bool>.made; }
    }
    

    In case you wonder how to call a method with such a funky name, use the quoting version of postfix:<.>.

    class C { method m:sym<s>{} }
    C.new.'m:sym<s>'()
    

    I truncated the examples above. The full source can be found here. The .hyper-Version is still quite a bit faster but also heisenbuggy. In fact .hyper may not work at all when executed to fast after a program starts or when used in a recursive Routine. This is mostly due to the grammer being one of the oldest parts of Rakudo with the least amount of work to make it fast. That is a solvable problem. I’m looking forward to Grammar All The Things.

    If you got grammars please don’t hide them. Somebody might need them to be classy.

     


    Zoffix Znet: CPAN6 Is Here

    Published on 2017-10-06T00:00:00

    CPAN support for Rakudo Perl 6 dists

    Zoffix Znet: 6lang: The Naming Discussion Update

    Published on 2017-09-28T00:00:00

    A rose by any other name 2.0...

    samcv: Grant Final Report

    Published on 2017-09-25T07:00:00

    This contains the work since the last report as well as my final report.

    Table of Contents

    Work Since the Last Report

    Merged in Unicode Collation Algorithm

    I merged the Unicode Collation Algorithm branch into MoarVM. Now that the sort is stable, the coll, unicmp and .collate operators in Rakudo are no longer experimental so use experimental :collation no longer is needed to use them.

    The $*COLLATION dynamic variable is still hidden under experimental, since it is possible design changes could be made to them.

    Prepend

    In some of my other posts I talked about the difficulties of getting Prepend codepoints working properly. To do this I had changed how we store synthetics so as not to assume that the first codepoint of a synthetic is always the base character. This month I merged in change in synthetic representation and implemented the features which were now possible with the new representation.

    The feature was to detect which character is the base character and store its index in the synthetic. Most combiners, such as diacritics come after the base character and are Extend codepoints: a + ◌́. Prepend has the reverse functionality and comes before: ؀◌ + 9 (Arabic number sign + number).

    This required many assumptions our code rightfully made before Unicode 9.0 to be abandoned. When a synthetic is created, we now check to see if the first codepoint is a Prepend codepoint. If so, we keep checking until we find a codepoint that is not a Prepend. In most cases, the base character is the codepoint following the last Prepend mark.

    In degenerate cases there is no base character, which could be a grapheme composed of all Prepend’s or only Prepend’s and Extend’s. In these degenerate cases we set the first codepoint as the base character.

    Once I had that done, I was able to fix some of our ops which did not work correctly if there were Prepend characters. This included fixing our case changing op so it would now work on graphemes with Prepend marks. Since the case change ops apply the case change to the base codepoint, it is necessary for us to have the correct base character. Similarly, ordbaseat which gets the base codepoint also needed to be fixed. This allowed ignoremark to now work for graphemes with Prepend marks.

    Documentation

    I wrote documentation on our Unicode Collation Algorithm, which explains to the reader why the UCA solves, with examples of different single to multiple or multiple to single mappings of codepoints. It goes in a fair amount of detail on how it was implemented.

    UTF8-C8

    Bugs with Encoding into UTF8-C8

    Since MoarVM normalizes all input text, our way of dealing with not normalizing, is important to people who want their strings to be unchanged unchanged. Previously there was a bug where if something was a valid UTF8 storable value, such as a Surrogate or a value higher than 0x10FFFF, it would create a Str type with that value, even though it was not valid Unicode. It would then throw when this value was attempted to be used (since the Str type shouldn’t hold values higher than 0x10FFFF or Surrogates). As this is the only way we have of dealing with text unaltered, this seemed like a serious issue that would prevent UTF8-C8 from being usable in a production environment attempting to encode arbitrary data into UTF8-C8. [f112fbcf]

    Bugs While Working with UTF8-C8 Strings

    Another issue I fixed was that under concatenation, text replacement or renormalization, the UTF8-C8 codepoints would be "flattened". They would lose their special properties and instead start acting like any other set of Unicode codepoints (although unusual since it contains a private use character and a hex code of the stored value). I changed our codepoint iterator so that optionally you can choose to pass back UTF8-C8 Synthetics unchanged. We use Synthetics to store both UTF8-C8 values as well as storing graphemes which contain multiple codepoints. When iterating by codepoint on an already existing arbitrary string, we want to retain the UTF8-C8 codepoints and make sure they are not changed during the renormalization process. This has been fixed, and UTF8-C8 strings are now drastically more reliable, and hopefully, much more production ready. [2f71945d]

    Grapheme Caching and Move To

    The function which moves a grapheme iterator forward a specified number of graphemes now works even if we aren’t starting from the very start of the string. In this function we have a first loop which locates the correct strand, and had a second loop which would find the correct grapheme inside that strand. I refactored the grapheme locating code and was able to remove the loop.

    In the grapheme caching implementation we can save a lot of work by not creating a new iterator for every grapheme access. Not only that, I also sped it the move_to function about 30%. While the cached iterator reduces access to this function for the functions I added it to, there are still many which may seek for each grapheme requested, this will speed that up.

    Other MoarVM Changes

    I setup automated Appveyor builds for MoarVM so we get automated builds on Windows (Travis CI only builds MacOS and Linux builds).

    I fixed a segfault that occurred when compiling nqp on Alpine Linux which uses musl as its libc. I ended up reducing the depth of recursion in the optimize_bb() function when compiling nqp from 99 to 29 (3x reduction). In the Rakudo build, we have a 5x reduction in the depth of recursion.

    Final Report

    As I’ve already said many things in my previous grant reports (1, 2, 3, 4) I will iterate on some of the big things I did do, which is not exhaustive. For full details of all the changes please see my other grant reports as well as a partial list of bonus changes I made in MoarVM during the grant at the bottom of the page.

    The only thing not completed was implementing a whole new UCD backend. While I planned on doing this, I ended up choosing not to do so. I realized that it would not have been the best use of my time on the grant, as there were many more user noticeable changes I could do. Despite this, I did achieve the goals that the rewrite was intended to solve; namely making property values distinct for each property and making the UCD database generation more reproducible. While it is not exactly the same on different runs, the only thing that changes is the internal property codes which does not affect anything adversely. It is fully functional every time instead of database regeneration breaking our tests most of the time. Once the database was stable, I was then able to update our database to Unicode 10. Without my improvements regarding reproducibility and property values becoming distinct for each property, updating to Unicode 10 would have not been possible. In addition all Hangul (Korean) characters now have names in the Unicode database.

    A big thing I wanted to implement was the Unicode Collation Algorithm, which ended up being a total success. I was able to still retain the ability to choose which collation levels the user wanted to sort with as well as reverse the sort order of individual collation levels.

    Yet I did not only implement one algorithm, I also implemented the Knuth-Morris-Prat string search algorithm which can take advantage of repeated characters in the needle (can be multiple times faster if you have sections repeating). The Knuth-Morris-Pratt algorithm was adjusted to use either the new cached grapheme iterator or the simple lookup depending on if it was a flat or strand haystack as well. Indexing a strand based string with a one grapheme long needle was sped up by 9x by making a special case for this.

    Practically all string ops were sped up, often by multiple times due to getting MVM_string_get_grapheme_at_nocheck inlined. In addition to this, I changed the way we access strings in many of our most used string ops, intelligently using either grapheme iterators, cached grapheme iterators or direct access depending on circumstances. With the MVM_string_get_grapheme_at_nocheck inlined, the time to accessing graphemes with this function was sped up between 1.5x for strands and up to 2x for flat strings. Ops we use a lot, like the op that backs eq and nqp::eqat was given special casing for Strand ←→ Strand, Flat ←→ Flat and Strand ←→ Flat (which also covers Flat ←→ Strand as well). This special casing spec up eq by 1.7x when one is a strand and one is flat, and 2.5x when both strings are flat. Applying similar optimizations to index made a 2x speedup when haystack is flat and 1.5x speedup when haystack is a strand (on top of the previous improvements due to the Knuth-Morris-Pratt algorithm)

    I fixed a longstanding bug in NQP which caused 'ignoremark+ignorecase' operations to be totally broken. I fixed this by adding more MoarVM ops and refactoring the code to have many less branches. In MoarVM we now use a centralized function to do each variation of with/without ignorecase and ignore mark which is also fully compatible with foldcase operations as well as igoremark.

    Doing the ignoremark/ignorecase indexing work sped them up by multiple times, but then in addition to that, it became 2x faster when the haystack was made up of 10 strands by implementing a cached grapheme iterator.

    I implemented full Unicode 9.0 support not just in our grapheme segmentation, but also in our other ops, refactoring how we store synthetic codepoints to allow us to have the 'base' codepoint be a codepoint other than the 1st in the synthetic to support Prepend codepoints.

    Our concatenation was improved so as to make full renormalization of both input strings no longer needed in almost all cases. The x repeat operator was fixed so it always creates normalized strings. Previously it could create unnormalized strings instead, causing issues when it was used.

    I believe I have more than accomplished what I set out to do in this grant. I made tons of user facing changes: to speed, Unicode normalization support, full Unicode 9.0 support. I added awesome collation features and fixed all the major issues with decoding and working with UTF8-C8 representations. I have listed an incomplete list of bonus deliverables below the deliverables which were part of this project.

    Deliverables

    • I documented MoarVM’s string representation, with lots of good information for future developers as well as interested users.

    • Hangul syllables now have Unicode names in our database, with a test added in roast.

    • I implemented the Unicode Collation Algorithm [866623d9]

      • Tests have been added in roast for the UCA and the unicmp op

      • I wrote documentation on our Unicode Collation Algorithm implementation

      • Regarding language specific sort. This would involve us using data from the Unicode CLDR. Once we have this data available from MoarVM, it simply requires a conditional to override DUCET and check a different set of data before checking the DUCET data. This information is in our documentation for collation.

    • Text normalization

      • Speed of normalization was improved

      • Full Unicode 9.0 support for text segmentation and normalization was added

    • While I did not fully rewrite the database, I did solve the needed issues:

      • Property values are now unique for each property

      • Running the generation script creates a functional database every time it is run, rather than only some of the time.

      • I added Unicode Collation data to our database, generated from a Perl 6 script, which happened to be the only property required to complete my deliverables

    Bonus Deliverables

    Here is a somewhat complete list of bonus deliverables:

    • Updated our database to Unicode 10. This was only possible once I had fixed the problems with the database generation, and made property values unque.

    • Implemented Knuth-Morris-Pratt string search

    • Set up Appveyor builds. Appveyor builds and tests MoarVM on Windows, similar to Travis CI.

    • Fixed ignoremark+ignorecase regex when used together as well as huge speed increases.

    UTF8-C8/UTF-8

    • Fix UTF8-C8 encoding so it can encode values > 0x10FFFF as well as surrogates

    • Fix UTF8-C8 strings so they do not get corrupted and flattened when string operations are performed on them.

    • MVM_string_utf8_decodestream: free the buffer on malformed UTF-8 [a22f98db]

    String Ops

    • Have MVM_string_codes iterate the string with codepoint iterator [ed84a632]

    • Make eqat 1.7x-2.5x faster [3e823659]

    • Speed up index 9x when Haystack is strand, needle is 1 long [0b364cb8]

    • Implement the Knuth-Morris-Pratt string search algorithm [6915d80e]

    • Add indexim_s op and improve/fix bugs in eqatim_s [127fa2ce]

    • Fix a bug in index/eqat(im) and in ord_getbasechar causing us to not decompose the base character when the grapheme was a synthetic [712cff33]

    • Fix MVM_string_compare to support deterministic comparing of synthetics [abc38137]

    • Added a codes op which gets the number of codepoints in a string rather than the number of graphemes. Rakudo is now multipe times faster doing the .codes op now. Before it would request an array of all the codepoints and then get number of elements, which was much slower.

    Fix string ops with Prepend characters

    • Rework MVMNFGSynthetic to not store base separately [3bd371f1]

    • Fix case change when base cp isn’t the first cp in synthetic [49b90b99]

    • For degenerate Synth’s with Prepend and Extend set base cp to 1st cp [db3102c4]

    • Fix ignoremark with Prepend characters and ordbaseat op [f8a639e2]

    Memory/GC/Build Fixes

    • Fix segfault when compiling nqp with musl as libc [5528083d]

    • Avoid recursion in optimize_bb() when only 1 child node [6d844e00]

    • Fix various memory access/garbage collection issues in some string ops that were showing up when running in Valgrind or using Address Sanitizer

    Grapheme Iteration

    • Ensure we can move forward in a grapheme iterator even if we aren’t starting at the very beginning of a string.

    • Use grapheme iterator cached for ignorecase/ignoremark index ops [b2770e27]

    • Optimize MVM_string_gi_move_to. Optimize the loop which finds the correct location within a strand so that it isn’t a loop and is just conditionals.[c2fc14dd]

    • Use MVMGraphemeIter_cached for strands in KMP index [ce76c994]

    • Allow MVM_string_get_grapheme_at_nocheck to be inlined

    • Refactor code into iterate_gi_into_string() to reduce code duplication [1e92fc96]

    Tests Added to Roast

    • Add tests for testing collation. Tests for the unicmp operator [5568a0d63]

    • Test that Hangul syllables return the correct Unicode name [6a4bc5450]

    • Add tests for case changes when we have Prepend codepoints [00554ccbd]

    • Add tests for x operator to ensure normalization retained [1e4fd2148] [59909ca9a]

    • Add a large number of string comparison tests [51c4ac08b]

      • Add tests to make sure synthetics compare properly with cmp [649d9dc50]

    • Improve ignoremark tests to cover many different cases [810e218c8]

      • Add ignoremark tests to cover JSON::Tiny regression + other issue [c185acc57]

    • Add generated tests (from UCD data) and manually created ones to ensure strings concatenation is stable, when the concatenated string would change the normalization. [2e1f7d92a][9f304b070] [64e385faf][0976d07b9][59909ca9a][2e1f7d92a] [88635564e] [a6bbc73cf] [a363e3ff1]

    • Add test for MoarVM Issue #566 .uniprop overflow [9227bc3d8]

    • Add tests to cover RT #128875 [0a80d0d2e]

    • Make new version of GraphemeBreakTest.t to better test grapheme segmentation [54c96043c]

    NQP Work

    Below is a listing of some of the commits I made to NQP. This included adding the ops I created over the course of the grant: eqatim, indexim, indexicim, eqaticim, and codes (gets the number of codepoints in a string rather than graphemes).

    The testing in NQP was inadequate for our string ops, so I added hundreds of tests for practically all of the string ops, so we could properly test the different variations of index* and eqat*.

    NQP Documentation

    • Add docs for a few variations of index/eqat [589a3dd5c] Bring the unicmp_s op docs up to date [091625799]

    • Document hasuniprop moar op [650840d74]

    NQP Tests

    • Add more index* tests to test empty string paths [8742805cb]

    • run indexicim through all of the indexic tests [26adef6be]

    • Add tests for RT #128875, ignorecase+ignoremark false positives [b96a0afe7]

    • Add tests for nqp::eqatim op on MoarVM [6fac0036e]

    Other Work

    • Added script to find undocumented NQP ops [0ead16794]

    • Added nqp::codes to QASTOperationsMAST.nqp [59421ffe1]

    • Update QASTRegexCompilerMAST to use new indexicim and eqaticim ops [18e40936a]

    • Added eqatim and indexim ops. Fix a bug when using ignoremark [9b94aae27]

    • Added nqp::hasuniprop op to QASTOperationsMAST.nqp [d03c4023a]

    6guts: Rakudo gets a new thread pool

    Published by jnthnwrthngtn on 2017-09-23T14:00:12

    Vienna.pm have funded me to work 50 hours on Perl 6. After some discussion, we decided I would first work on improving the thread pool scheduler, and then move on to continuing the work around non-blocking await, a feature of the upcoming Perl 6.d. In this post I’ll discuss the work on the thread pool scheduler, which was merged shortly after the latest Rakudo release to maximize testing time, and thus will appear in the next release (2017.10).

    What was wrong with the ThreadPoolScheduler before?

    My (by now slightly hazy) memory is that I wrote the initial Perl 6 thread pool implementation in the space of an hour or two on a train, probably heading to some Perl event or other. It was one of those “simplest thing that could possibly work” implementations that turned out to work well enough that it survived with only isolated tweaks and fixes until January this year. At that point, I added the initial bits of support for non-blocking await. Even that change was entirely additive, leaving the existing scheduling mechanism completely intact.

    When I first implemented the thread pool, there were approximately no people writing concurrent Perl 6 programs. Happily, that’s changed, but with it came a few bug reports that could be traced back to the thread pool. Along with those, there were things that, while not being outright bugs, were far from ideal.

    Before I dug into a re-design, I first summarized all of the problems I was aware of with the thread pool, so as I considered new designs I could “crash test” them against the problems that afflicted the previous design. Here’s my list of problems with the previous design.

    1. It was wasteful, spawning far too many threads in quite a lot of cases. A single use of Proc::Async would cause some threads to be created. A second use after that would add some more. A third use would add more. Even if these uses were not concurrent, threads would still be added, up until the maximum pool size. It was a very conservative way to make sure a thread would be available for each piece of work, provided the pool didn’t reach exhaustion point. But really, the pool doesn’t need more than a thread or two for many programs. RT #131915 was a ticket resulting from this behavior: each thread added to the memory consumption of the program, making memory use a good bit higher than it should have been for programs that really didn’t need more than a thread or two.
    2. Very active async I/O could swamp the thread pool with events. Since there was a single queue, then timer events – which might be being used to kill off a process after a timeout – may not have fired in a very timely manner at all, since the timer event would be stuck behind all of the I/O events. RT #130370
    3. Sometimes, despite the conservative mechanism described in #1, not enough threads got started and this led to occasional deadlocks. RT #122709
    4. It didn’t try to scale the number of threads to match the available CPU cores in any way. Just because there is a lot of work in the queue does not mean that we need more threads; if we are making progress and the CPU load is pretty high then adding more threads is unlikely to lead to more progress.
    5. In high-core-count systems, the default limit of 16 was too low. 32-core CPUs at tolerable prices very much exist by now!
    6. For programs that manage to really block a lot of threads with non-CPU bound work, we could deadlock. Most of that will go away with non-blocking await, but there will still be ways to write code that really blocks a bunch of OS threads and can’t make progress without yet more threads being made available to run something. At some point, people just have to write their programs differently, but the default limit of 16 threads was not very generous.
    7. Despite wishing to raise the maximum default pool size a good bit, we couldn’t do it because issues #1 and #4 meant we’d likely end up hitting the maximum in most programs, and memory use would become totally unreasonable.
    8. We suffered from poor thread affinity for events that would inevitably queue up due to serial supplies. For example, if two packets arrived from a socket then they might be picked up by different threads in the pool. If the first packet was still being processed, then the second would contend for the lock used to enforce serial processing of messages by supplies, and block until it was available.

    3 kinds of queue

    Problems 2 (active I/O swamping the queue and delaying timers) and 8 (poor thread affinity) are resolved in the new thread pool by recognizing that not all work given to the pool is equal. Timers really want to be dealt with in a timely manner, so in programs that have timers it makes sense to have a queue, and one or more worker threads, exclusively for time-based events. Work items that will need processing sequentially anyway should be assigned to a single thread, implying that each such “affinity worker” gets a queue of its own. Finally, there is a general queue for everything else, and one or more threads eat from it.

    Adding a supervisor

    The other problems were addressed by the addition of a Sufficiently Smart Supervisor. The supervisor is a thread created by the thread pool upon its first use, and living from then until the end of the program. It spends most of its time sleeping, waking up around 100 times a second to check how things are going. It is the supervisor that makes most of the decisions about how many threads to add to the pool. It factors in:

    Having the pool soft-limited to the number of threads, and reluctantly able to go beyond that, means that we can raise the maximum default pool size quite a lot; it now stands at 64 threads. In reality, many programs don’t even reach the CPU core count, as there’s never enough work to trigger the addition of more threads.

    Affinity workers

    Affinity worker threads aren’t added by the supervisor. Instead, when an affinity queue is requested, any existing affinity worker threads will have their queue lengths inspected. The one with the shortest queue will be picked, provided that the queue length is below a threshold. If it’s over the threshold (which increases per affinity worker thread added), then another affinity worker thread will be spawned and the work allocated to it.

    Outcomes

    For all the cases I’ve tried so far, the new scheduler seems to do either at least as well or better than the one it replaced – and sometimes much better. The deadlock due to insufficient threads bug is gone, thanks to the supervisor. Programs that do the occasional bit of work needing pool threads end up with just a thread or two, greatly reducing their memory consumption. A Cro web application that is processing just a handful of requests a second will now spawn just a few threads (and so use much less memory and get better locality), while one faced with some hundreds of requests per second will spawn more. And a program doing a lot of CPU-bound work now spawns as many threads as there are cores, which gives a small speedup compared to oversubscribing the CPU under the previous scheduler. Finally, timer events are delivered and handled in a timely way, even when there is a lot of output from a process.

    And next?

    Well, most immediately I should write some extra regression tests, so I can get the RT tickets mentioned earlier in the article closed up. Feel free to beat me to doing that, though! :-)

    That aside, I’d like to mention that while the new scheduler is a decided improvement, it’s also improvable. The heuristics it uses to decide when to add further threads can surely be tuned some more. The code doing that decision making is written in Perl 6, which I hope makes it accessible to those who would like to experiment and tweak.

    Once again, thanks to Vienna.pm for making this work possible. And, finally, a mention that I’m still looking for funding to help me keep on doing Perl 6 things for a sizable chunk of my work time; a handful of companies each sponsoring 10 hours a month would soon add up!


    p6steve: Clone Wars

    Published by p6steve on 2017-09-20T18:52:04

    Apologies to those that have OO steeped in their blood. I am a wary traveller in OO space,  maybe I am an technician, not an architect at heart. So for me, no sweeping frameworks unless and until they are needed. And, frankly, one can go a long way on procedural code with subroutines to gather repetitive code sequences.

    (And don’t get me started on functional programming…)

    Some time ago, I was tasked to write a LIMS in old perl. Physical ‘aliquots’ with injections of various liquids would be combined and recombined by bio-machines. This led to a dawning realization that these aliquot objects could be modelled in an object style with parent / child relationships. After a few weeks, I proudly delivered my lowly attempt at ‘Mu’ for this (and only this) problem. Kudos to the P6 team – after a couple of weeks in here, it’s just sensational the level of OO power that the real Mu delivers:

    Screenshot 2017-09-20 19.32.21

    Now, hard at work, on the perl6 version of Physics::Unit, I am wondering how to put the OO theory into productive practice. One of my aims was to find a medium sized (not core tools) problem that (i) I know something about and (ii) would be a good-sized problem to wrangle.

    So I am enjoying the chance to design some classes and some interfaces that will make this all hang together. But – as an explorer, it has become clear that I only have three options. The problem runs like this:

    Initially I had some success with object types ::T – but these only let you read the type and  duplicate if needed for a new left hand side container. Then I tried the built in (shallow) clone method. But…

    Ultimately, thanks to rosettacode.org, I worked out that $x.perl.EVAL with some ~~ s/// substitions on the way would do the trick!

    Phew. Please comment below if you have a better way to share – or would like to point out the risks of this technique…


    6guts: MoarVM Specializer Improvements Part 2: Planning

    Published by jnthnwrthngtn on 2017-09-17T22:39:51

    It’s been a good while since I posted part 1 of this little series. In the meantime, I paid a visit to the Swiss Perl Workshop, situated in a lovely hotel in the beautiful village of Villars. Of course, I was working on my slides until the last minute – but at least I could do it with this great view!

    DSC08286-small

    I gave two talks at the workshop, one of them about the MoarVM specializer (slidesvideo). Spoiler alert: the talk covers much of what I will write about in this blog series. :-) Now that I’ve recovered from the talk writing and the travel, it’s time to get back to writing here. Before I dig in to it, I’d like to say thanks to The Perl Foundation and their sponsors, who have made this work possible.

    Where we left off

    In the last part I discussed how we collected statistics about running code, in order to understand what to optimize and how to optimize it. The running example, which I’ll continue with, was this

    sub shorten($text, $limit) is export {
        $text.chars > $limit
            ?? $text.substr(0, $limit) ~ '...'
            !! $text
    }
    

    And an example of the kind of information collected is:

    Latest statistics for 'infix:<~>' (cuid: 4245, file: SETTING::src/core/Str.pm:2797)
    
    Total hits: 367
    
    Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
    Positional flags: obj, obj
    
        Callsite hits: 367
    
        Maximum stack depth: 32
    
        Type tuple 0
            Type 0: Str (Conc)
            Type 1: Str (Conc)
            Hits: 367
            Maximum stack depth: 32
    

    Which, to recap, tells us that the ~ infix operator was called 367 times so far, and was always passed two Str instances, neither of which was held in a Scalar container.

    The job of the planner

    The planner is the component that takes this statistical data and decides what code to optimize and what cases to optimize it for. It turns out to be, at least at the time of writing, one of the smaller parts of the whole specialization subsystem; subtract memory management code and debug bits and it’s not much over 100 lines.

    Its inputs are a list of static frames whose statistics were updated by the latest data. There’s no point considering anything whose statistics did not change, since there will be nothing new since the last time the planner saw them and so the conclusion would be the same. In Perl 6 terms, a static frame may represent a routine (SubMethod), Block, or Code thunk.

    Initial filtering

    The first thing the planner looks at is whether the static frame got enough hits to be worth even considering investing time on. So if it sees something like this:

    Latest statistics for 'command_eval' (cuid: 8, file: src/Perl6/Compiler.nqp:31)
    
    Total hits: 1
    
    No interned callsite
        Callsite hits: 1
    
        Maximum stack depth: 5
    

    It can simply ignore it based on the total hits alone. Something that gets a single hit isn’t worth spending optimization effort on.

    There are cases where the total hits are very low – perhaps just a single hit – but optimization is still interesting, however. Here is the loop that I used to make shorten hot so we could look at its statistics and plan:

    for 'x' x 50 xx 100000 {
        shorten($_, 42)
    }
    

    The statistics for the mainline of the program, containing the loop, start out like this:

    Latest statistics for '<unit>' (cuid: 4, file: xxx.p6:1)
    
        Total hits: 1
        OSR hits: 273
    
        Callsite 0x7f1cd6241020 (0 args, 0 pos)
    
            Callsite hits: 1
    
            OSR hits: 273
    
            Maximum stack depth: 10
    

    The total hits may well be 1, but obviously the loop inside of this code is hot and would be worth optimizing. MoarVM knows how to perform On Stack Replacement, where code running in a frame already on the callstack can be replaced with an optimized version. To figure out whether this is worth doing, it records the hits at “OSR points” – that is, each time we cross the location in the loop where we might replace the code with an optimized version. Here, the loop has racked up 273 OSR hits, which – despite only one hit in terms of entering the code at the top – makes it an interesting candidate for optimization.

    Hot callsites

    In MoarVM, a callsite (of type MVMCallsite) represents the “shape” of a set of arguments being passed. This includes:

    When bytecode is produced, callsites are interned – that is to say, if we have two calls that are both passing two positional object arguments, then they will share the callsite data. This makes them unique within a compilation unit. When bytecode is loaded, then any callsites without flattening are further interned by the VM, meaning they are globally shared across all VM instances. This means that they can then be used as a kind of “key” for things like multi-dispatch caching. The specializer’s statistics are also aggregated by callsite.

    The planner, therefore, takes a look at what callsites showed up. Often, code will be called with a consistent callsite, but things with optional parameters may not be. For example, if we were to run:

    sub foo(:$bar) { }
    for ^100000 { rand < 0.9 ?? foo(bar => 42) !! foo() }
    

    The the statistics of foo might end up looking like this:

    Latest statistics for 'foo' (cuid: 1, file: -e:1)
    
    Total hits: 367
    
    Callsite 0x506d920 (2 args, 0 pos)
      - bar
    
        Callsite hits: 331
    
        Maximum stack depth: 13
    
        Type tuple 0
            Type 0: Int (Conc)
            Hits: 331
            Maximum stack depth: 13
    
    Callsite 0x7f3fd7461620 (0 args, 0 pos)
    
        Callsite hits: 36
    
        Maximum stack depth: 13
    
        Type tuple 0
            Hits: 36
            Maximum stack depth: 13
    

    Unsurprisingly, the case where we pass bar is significantly more common than the case where we don’t, so the planner would in this case favor spending time making a specialization for the case where bar was passed.

    As an aside, specialization by optional argument tends to be a good investment. A common pattern is:

    sub foo($bar, Bool :$some-option) {
        if $some-option {
            extra-stuff();
        }
    }
    

    In the case that $some-option is not passed, the specializer:

    That stripping out of code might in turn bring the optimized version below the inlining limit, whereas before it might have been too big, which could bring about an additional win.

    Hot type tuples

    Once one or more hot callsites have been identified, the types that were passed will be considered. In the best case, that turns out to be very boring. For example, here:

    Latest statistics for 'chars' (cuid: 4219, file: SETTING::src/core/Str.pm:2728)
    
    Total hits: 273
    
    Callsite 0x7f1cd6231ae0 (1 args, 1 pos)
    Positional flags: obj
    
        Callsite hits: 273
    
        Maximum stack depth: 14
    
        Type tuple 0
            Type 0: Scalar (Conc) of Str (Conc)
            Hits: 273
            Maximum stack depth: 14
    

    The code is consistently called with a Scalar container holding a Str instance. Therefore, we can see this code is monomorphic in usage. It is quite clearly going to be worth spending time producing a specialization for this case.

    In other cases, the code might be somewhat polymorphic:

    Latest statistics for 'sink' (cuid: 4795, file: SETTING::src/core/List.pm:694)
    
    Total hits: 856
    
    Callsite 0x7f1cd6231ae0 (1 args, 1 pos)
    Positional flags: obj
    
        Callsite hits: 856
    
        Maximum stack depth: 31
    
        Type tuple 0
            Type 0: Slip (Conc)
            Hits: 848
            Maximum stack depth: 31
    
        Type tuple 1
            Type 0: List (Conc)
            Hits: 7
            Maximum stack depth: 26
    

    Here we can see the sink method is being called on both Slip and List. But it turns out that the Slip case is far more common. Thus, the planner would conclude that the Slip case is worth a specialization, and the List case is – at least for now – not worth the bother.

    Sometimes, a situation like this comes up where the type distribution is more balanced. Any situation where a given type tuple accounts for 25% or more of the hits is considered to be polymorphic. In this case, we produce the various specializations that are needed.

    Yet another situation is where there are loads of different type tuples, and none of them are particularly common. One place this shows up is in multiple dispatch candidate sorting, which is seeing all kinds of types. This case is megamorphic. It’s not worth investing the time on specializing the code for any particular type, because the types are all over the place. It is still worth specialization by callsite shape, however, and there may be other useful optimizations that can be performed. It can also be compiled into machine code to get rid of the interpreter overhead.

    Sorting

    By this point, the planner will have got a list of specializations to produce. Specializations come in two forms: observed type specializations, which are based on matching a type tuple, and certain specializations, which are keyed only on an interned callsite (or the absence of one). Certain specializations are produced for megamorphic code. The list of specializations are then sorted. But why?

    One of the most valuable optimizations the specializer performs is inlining. When one specialized static frame calls another, and the callee is small, then it can often be incorporated into the caller directly. This avoids the need to create and tear down an invocation record. Being able to do this relies on a matching specialization being available – that is, given a set of types that will be used in the call, there should be a specialization for those types.

    The goal of the sorting is to try and ensure we produce specializations of callees ahead of specializations of their callers. You might have wondered why every type tuple and callsite was marked up with the maximum call depth. Its role is for use in sorting: specializing things with the deepest maximum call stack depth first means we will typically end up producing specialized code for potential inlinees ahead of the potential inliners. (It’s not perfect, but it’s cheap to compute and works out pretty well. Perhaps more precise would be to form a graph and do a topological sort.)

    Dumping

    The plan is then dumped into the specialization log, if that is enabled. Here are some examples from our shorten program:

    ==========
    
    Observed type specialization of 'infix:<~>' (cuid: 4245, file: SETTING::src/core/Str.pm:2797)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
    Positional flags: obj, obj
    
    It was planned for the type tuple:
        Type 0: Str (Conc)
        Type 1: Str (Conc)
    Which received 367 hits (100% of the 367 callsite hits).
    
    The maximum stack depth is 32.
    
    ==========
    
    Observed type specialization of 'substr' (cuid: 4316, file: SETTING::src/core/Str.pm:3061)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231a60 (3 args, 3 pos)
    Positional flags: obj, obj, obj
    
    It was planned for the type tuple:
        Type 0: Str (Conc)
        Type 1: Scalar (Conc) of Int (Conc)
        Type 2: Scalar (Conc) of Int (Conc)
    Which received 274 hits (100% of the 274 callsite hits).
    
    The maximum stack depth is 15.
    
    ==========
    
    Observed type specialization of 'chars' (cuid: 4219, file: SETTING::src/core/Str.pm:2728)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231ae0 (1 args, 1 pos)
    Positional flags: obj
    
    It was planned for the type tuple:
        Type 0: Scalar (Conc) of Str (Conc)
    Which received 273 hits (100% of the 273 callsite hits).
    
    The maximum stack depth is 14.
    
    ==========
    
    Observed type specialization of 'substr' (cuid: 2562, file: SETTING::src/core/Cool.pm:74)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231a60 (3 args, 3 pos)
    Positional flags: obj, obj, obj
    
    It was planned for the type tuple:
        Type 0: Scalar (Conc) of Str (Conc)
        Type 1: Int (Conc)
        Type 2: Scalar (Conc) of Int (Conc)
    Which received 273 hits (100% of the 273 callsite hits).
    
    The maximum stack depth is 14.
    
    ==========
    
    Observed type specialization of 'infix:«>»' (cuid: 3165, file: SETTING::src/core/Int.pm:365)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
    Positional flags: obj, obj
    
    It was planned for the type tuple:
        Type 0: Int (Conc)
        Type 1: Scalar (Conc) of Int (Conc)
    Which received 273 hits (100% of the 273 callsite hits).
    
    The maximum stack depth is 14.
    
    ==========
    
    Observed type specialization of 'shorten' (cuid: 1, file: xxx.p6:1)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
    Positional flags: obj, obj
    
    It was planned for the type tuple:
        Type 0: Str (Conc)
        Type 1: Int (Conc)
    Which received 273 hits (100% of the 273 callsite hits).
    
    The maximum stack depth is 13.
    
    ==========
    

    Compared to before

    So how does this compare to the situation before my recent work in improving specialization? Previously, there was no planner at all, nor the concept of certain specializations. Once a particular static frame hit a (bytecode size based) threshold, the type tuple of the current call was taken and a specialization produced for it. There was no attempt to discern monomorphic, polymorphic, and megamorphic code. The first four distinct type tuples that were seen got specializations. The rest got nothing: no optimization, and no compilation into machine code.

    The sorting issue didn’t come up, however, because specializations were always produced at the point of a return from a frame. This meant that we naturally would produce them in deepest-first order. Or at least, that’s the theory. In practice, if we have a callee in a conditional that’s true 90% of the time, before there was a 10% chance we’d miss the opportunity. That’s a lot less likely to happen with the new design.

    Another problem the central planning avoids is a bunch of threads set off executing the same code all trying to produce and install specializations. This was handled by letting threads race to install a specialization. However, it could lead to a lot of throwaway work. Now the planner can produce them once; when it analyzes the logs of other threads, it can see that the specialization was already produced, and not plan anything at all.

    Future plans

    One new opportunity that I would like to exploit in the future is for derived type specializations. Consider assigning into an array or hash. Of course, the types of the assigned values will, in a large application, be hugely variable. Today we’d consider ASSIGN-POS megamorphic and so not try and do any of the type-based optimizations. But if we looked more closely, we’d likely see the type tuples are things like (Array, Int, Str)(Array, Int, Piranha)(Array, Int, Catfish), and so forth. Effectively, it’s monomorphic in its first two arguments, and megamorphic in its third argument. Therefore, a specialization assuming the first two arguments are of type Array and Int, but without specialization of the third argument, would be a win.

    Next time…

    We now have a specialization plan. Let’s go forth and optimize stuff!


    Zoffix Znet: The Rakudo Book Project

    Published on 2017-09-17T00:00:00

    A plan to write some Rakudo books

    gfldex: The Siege Can Continue

    Published by gfldex on 2017-09-16T20:14:31

    A wise internet spaceship pirate once wrote: “Whining gets you stuff. That’s why humans are at the top of the food chain.”. My Whining got me a fix that put an end to segfaults on long running scripts that react to http requests.

    So I continued to add missing bits to my golfed httpd and found another good use for term:<>.

    constant term:<HTTP-HEADER-404> = "HTTP/1.1 404 Not Found", "Content-Type: text/plain; charset=UTF-8", "Content-Encoding: UTF-8", "";

    Without term:<> the compiler would think I wanted to substract 400 from a http header.

    If you got some time to spare please write a whiny blog post about a crash that is bugging you – works like a charm.


    gfldex: Goto the Last Fifo

    Published by gfldex on 2017-09-13T08:45:21

    I always admired the elegance of the sysfs. You take a text and write it to a file to talk to a function running in kernel space. As soon as you know how to work with files, you can change the systems behaviour and reuse access control mechanism without any special tools. It’s very easy to script and dump (parts) of the systems state.

    Linux and friends come with fifos that serve the same purpose. Create a fifo, set access rights and start reading from that pseudo-file. Very easy to do in Perl 5.

    my $fifo; open($fifo, "+);
    while (<$fifo>) {
        do-things-with $_;
    } 

    Rakudo doesn’t really know about fifos yet, as a result it doesn’t block on a read of a fifo that don’t got data anymore. After a bit of fiddeling I found a way around that problem.

          1 use v6.c;
          2
          3 # `mkfifo radio-fifo-in`
          4 # `echo "foo" > radio-fifo-in`
          5 # `echo "foo^D" > radio-fifo-in`
          6
          7 my $fifo-in = open(„radio-fifo-in“, :r);
          8
          9 LABEL: loop {
         10     react {
         11         whenever supply { .emit for $fifo-in.lines } {
         12             say .Str;
         13             last LABEL if /‚^D‘/;
         14         }
         15     }
         16 }
    

    I learned that whenever reacts to last and will teach the docs about it later today. Luckily Perl 6 got labels so we can tell last where to goto.

    UPDATE: scovit found a short expression that gets very close to the behaviour of Perl 5.

    my $fifo = open("radio-fifo-in", :r);
    while defined $_ = $fifo.get { .say }

    p6steve: perl6 Atomic Fission

    Published by p6steve on 2017-09-03T18:46:48

    I have been listening to the reaction on the web to the incorporation of an emoji as a unicode symbol in perl6 rakudo. Here’s a flavour…

    (https://p6weekly.wordpress.com/2017/08/21/2017-34-going-atomic/ )

    The rationale for the use of unicode symbols is as follows:

    BTW- ASCII versions are known as Texas versions since they are always bigger

    Certainly this has caused some consternation – ranging from how can I type ⚛️ on my keyboard (hit CTRL-CMD-SPACE if you are on macOS ) to this will never be accepted for the coding standards of my company.

    On reflection, while it is understandable that programmers have a well established comfort zone of ASCII text and using English for keywords, I think that perl6 is leading the way on an irresistible path. Of the 6.5bn people on the planet, only a small fraction prefer to work in English – or even in Latin alphabets. Now, the pioneering work to embed unicode in a programming language will open the doors to all kinds of invention. What about:

    And this, in combination with perl6 Grammars, opens some interesting conceptual doors.

    ~p6steve


    samcv: Grant Status Update 4

    Published by Samantha McVey on 2017-08-31T07:00:00

    This is my fourth grant update. A productive and noteworthy month. I gave two presentations at YAPC-EU in the Netherlands earlier this month, on High End Unicode and MoarVM Internals (links to my slides). It was my first Perl conference and I had a great time meeting people in the Perl 6 and Perl community!

    Despite the conference, I made some big changes this month: big speedups for indexing operations, as well as an update to the Unicode 10 database. Before I could regenerate the database I had to fix ucd2c.pl, the script that generates the database, to be more deterministic and have unique property values per property (without this regenerating it with the Unicode 10 data would result in partially broken Unicode properties). I also implemented the Knuth-Morris-Pratt string search algorithm for our index function.

    I have added documentation to MoarVM which is an overview of how our strings are represented as well as details about normalization, synthetics and other topics. On Hacker News someone noted that this was not documented anywhere, so I made sure to add documentation for this. If you are interested in some of the internals of MoarVM, I’m hoping the document should be pretty accessible even if you are not a MoarVM developer.

    Table of Contents

    Index

    Knuth-Morris-Pratt String Search Algorithm

    Previously we did not have any natively implemented efficient string search algorithms, we only had memmem which was optimized, but would only work if both strings were flat and both the same bitwidth per grapheme.

    Now all index operations with a 4096 or under length needle are optimized and no longer use brute force methods for searching (this does not include case insensitive or ignoremark indexing operations).

    When my KMP implementation in MVM_string_index is used:
    • Strings with non-matching types

      • That have more than one codepoint in the needle

      • That don’t have a needle more than 4096 graphemes long

    • Speedup can be small, or large depending on the pattern of the needle

      • Repeating letters can cause multiple times speed increase depending on the haystack

    We still use memmem when both strings are both flat strings and both the same data type (which use Knuth-Morris-Pratt or Booyer-Moore depending on platform). Most of the strings we will work with — especially once the strings have been modified — are strands.

    Grapheme Caching

    Since the Knuth-Morris-Pratt algorithm often will request the same grapheme again, but will never request an earlier point in the haystack, I was able to optimize the KMP string search function I added to cache the graphemes so we can use a grapheme iterator instead of using MVM_string_get_grapheme_at_nocheck which for strands, will have to find its place in the haystack from the beginning each time. What this grapheme caching does, is it caches the last returned grapheme, so if the same grapheme is requested again it continues to return that without requesting a new grapheme from the grapheme iterator. If it requests the next grapheme, or some number of graphemes after the current position, it will either grab the next grapheme or move the grapheme iterator forward (skip a certain number of graphemes) and then get a grapheme from the grapheme iterator.

    Here are some timings I got before and after the grapheme iterator for an English language book text file, searching for English words misspelled by one letter (so it would search the entire file but be representative of searching for actual words).

    Description

    caching

    master

    index with needle (strand haystack)

    0.832730

    2.67385181

    index with word needle (flat haystack)

    0.8357464

    1.01405131

    As you can see from the table, we actually even got savings when the haystack was flat. This surprised me, since getting a grapheme with a flat haystack points to a function with one switch and then returning the integer of the blob array at the specified position. I am guessing this is likely caused by the caching function generating more efficient machine code, since the savings can’t be only explained by the savings from caching the grapheme — the speed up was seen even when I manipulated the needle so that there were no cache “hits” and it always requested different graphemes.

    ℹ️
    The grapheme caching has not yet been merged, but is ready to go after fixing some merge conflicts

    Inlining MVM_string_get_grapheme_at_nocheck

    After I finished writing the previous section, I was able to discover the reason for the speed difference with flat haystacks. By getting MVM_string_get_grapheme_at_nocheck to inline I was able to speed up index operations for a flat haystack by 2x. This is on top of the speedups of about we got from the KMP algorithm! This should affect any code which uses this function, making the function 2x as fast for flat strings, and likely a slight speedup for strands as well. This has huge implications as this function is used extensively throughout our string ops. This may change what I do with the grapheme caching code. It is likely I will change it so that it uses the grapheme caching for strands, and uses MVM_string_get_grapheme_at_nocheck for flat haystacks.

    Single Grapheme Needle Index

    I sped up string index by 9x when the haystack is a strand and the needle is 1 grapheme long. For this we use a grapheme iterator when looking for a single grapheme inside of a strand, and use a simpler faster loop since we are only looking for a single grapheme.

    Unicode

    Property Values

    Property values are now unique for each property in the Unicode Database. Since property values are non-unique, we must store them uniquely. Previously this would cause only the property whose value was last in the C data array to work. Now that property values are unique for each property code, that should no longer cause breakage.

    Unicode Database Updated to Unicode 10

    The Unicode database has been updated for Unicode 10. Now that the previous breaking point — the database not always generating properly each time — was solved, I was able to make changes to the Unicode database, including updating it to version 10. The main significant changes in this release were the addition of new scripts, symbols and Emoji. You can see the full list of changes here.

    Unicode Names

    Hangul (Korean) characters now have names in our name database. These are generated algorithmically on database creation by decomposing the characters and concatening the 2 or 3 resulting codepoint’s Jamo names. This is needed since the Unicode data file does not include the name for these codepoints and leaves it to the implementer to create them. A roast test has been added to check for support of these Korean characters.

    Fixed a bug in ignoremark/ordbaseat

    Ignoremark relys on the MoarVM ordbaseat op under the hood. For graphemes made up of a single character, we decompose the character and get the resulting base charactewhich it relays on assumed once we had gotten a synthetic’s base character that we already had the final base character. For non-synthetics we would decompose This wasn’t true for example with: "\c[LATIN SMALL LETTER J WITH CARON, COMBINING DOT BELOW]" The "j with caron" would not be decomposed because it was the base character of a synthetic.

    This also fixes a bug in indexim and eqatim ops which caused it to fail when encountering a synthetic.

    We now have ord_getbasechar handle synthetic and non-synthetic’s and have MVM_string_ord_basechar_at just handle the needed string based checks, then grabbing the grapheme from the string and passing its value on to ord_getbasechar.

    Synthetic Grapheme Rework

    I reworked MVMNFGSynthetic, the C struct which represents our synthetic graphemes. This is done to (eventually) get Prepend support working for things like ignoremark regex and the ordbaseat op which gets the base codepoint of a grapheme.

    Unlike all other marks which come after a base character, Prepend characters come before the base character. All of our current code assumed the first codepoint of the synthetic is the base character.

    For now, we also assume the first codepoint of the synthetic is the base character, but we now have a base_index which will be able to hold the index of the base character in the codepoint array.

    While this doesn’t add Prepend support to everything, this is one step toward getting that working, and decoupling base codepoints from being the first codepoint of a synthetic.

    ℹ️
    This is not yet merged, but ready to go

    Collation

    There’s not as much to say about this, since it is almost ready. As I said last month it is fully functional, and I have since done some work cleaning it up. Left to be done is integrating the data generation into the other Unicode database generation script. Although the file and code it uses to generate is different, ideally we will have only one script to run to generate and update all of our files on new Unicode releases.

    Previously I created a script which downloads all of the Unicode files needed in the generation, so an update should hopefully only require a run of the download script to fetch the UCD, the UCA (Unicode Collation Algorithm), and Emoji data.

    One thing I had not talked about previously was some of the ways I have sped up the UCA implementation to be quite fast and efficient, only having to evaluate the collation arrays efficiently even for very large strings. If you are interested, the details are in my slides.

    NQP Documentation

    In addition to the MoarVM string documentation I mentioned in the introduction, I also ensured all of the ops I have added over my project are documented in NQP.

    Ensured all the NQP index, eqat variations (indexic, indexim, indexicim, eqatic, eqaticim) are documented, and added the couple that had not yet been added to the ops list.

    Added a Perl 6 program which gets a list of ops which are not mentioned in the oplist which will hopefully be useful to other developers.

    The Goal is in Sight

    The grant is finally winding down. I have all significant things implemented although not everything has been merged yet. I also have implemented additional things that were not part of the grant (Knuth-Morris-Pratt string search algorithm).

    Left to be Done

    To finally complete my work and fullfill my objectives I will make any additional documentation or tests that need to be made. If other Perl 6 devs or users of the community want to make any requests for Unicode or string related documentation, you can send me an email or send me a message on freenode IRC (nick samcv).

    Other than this, I only need to clean up and merge the collation arrays branch, merge the synthetic grapheme reword, and update the grapheme caching for KMP branch to use caching for strand haystacks and use the now inlined MVM_string_get_grapheme_at_nocheck for flat haystacks.

    Zoffix Znet: On Troll Hugging, Hole Digging, and Improving Open Source Communities

    Published on 2017-08-31T00:00:00

    How to be better

    stmuk: Swiss Perl Workshop 2017

    Published by stmuk on 2017-08-30T17:48:17

    cropped-lake_castle1.jpeg

    After a perilous drive up a steep, narrow, winding road from Lake Geneva we arrived at an attractive Alpine village (Villars-sur-Ollon) to meet with fellow Perl Mongers in a small restaurant.  There followed much talk and a little clandestine drinking of exotic spirits including Swiss whisky. The following morning walking to the conference venue there was an amazing view of mountain ranges. On arrival I failed to operate the Nespresso machine which I later found was due to it simply being off.  Clearly software engineers should never try to use hardware. At least after an evening of drinking.

    Wendy’s stall was piled high with swag including new Bailador (Perl 6 dancer like framework) stickers, a Shadowcat booklet about Perl 6 and the new O’Reilly “Thinking in Perl 6″. Unfortunately she had sold out of Moritz’s book “Perl 6 Fundamentals” (although there was a sample display copy present). Thankfully later that morning I discovered I had a £3 credit on Google Play Books so I bought the ebook on my phone.

    The conference started early with Damian Conway’s Three Little Words.  These were “has”, “class” and “method” from Perl 6 which he liked so much that he had added them to Perl 5 with his “Dios” – “Declarative Inside-Out Syntax” module.  PPI wasn’t fast enough so he had to replace it with a 50,000 character regex PPR. Practical everyday modules mentioned included Regexp::Optimizer and Test::Expr. If the video  doesn’t appear shortly on youtube a version of his talk dating from a few weeks earlier is available at https://www.youtube.com/watch?v=ob6YHpcXmTg

    Jonathan Worthington returned with his Perl 6 talk on “How does deoptimization help us go faster?” giving us insight into why Perl 6 was slow at the Virtual Machine level (specifically MoarVM). Even apparently simple and fast operations like indexing an array were slow due to powerful abstractions, late binding and many levels of Multiple Dispatch. In short the flexibility and power of such an extensible language also led to slowness due to the complexity of code paths. The AST optimizer helped with this at compile time but itself took time and it could be better to do this at a later compile time (like Just In Time).  Even with a simple program reading lines from a file it was very hard to determine statically what types were used (even with type annotations) and whether it was worth optimizing (since the file could be very short).

    The solution to these dynamic problems was also dynamic but to see what was happening needed cheap logging of execution which was passed to another thread.  This logging is made visible by setting the environment variable MVM_SPESH_LOG to a filename. Better tooling for this log would be a good project for someone.

    For execution planning we look for hot (frequently called) code, long blocks of bytecode (slow to run) and consider how many types are used (avoiding “megamorphic” cases with many types which needs many versions of code).  Also analysis of the code flow between different code blocks and SSA.  Mixins made the optimization particularly problematic.

    MoarVM’s Spesh did statistical analysis of the code in order to rewrite it in faster, simpler ways. Guards (cheap check for things like types) were placed to catch cases where it got it wrong and if these were triggered (infrequently) it would deoptimize as well, hence the counterintuitive title since “Deoptimization enables speculation” The slides are at http://jnthn.net/papers/2017-spw-deopt.pdf with the video at https://www.youtube.com/watch?v=3umNn1KnlCY The older and more dull witted of us (including myself) might find the latter part of the video more comprehensible at 0.75 Youtube speed.

    After a superb multi-course lunch (the food was probably the best I’d had at any Perl event) we returned promptly to hear Damian talk of “Everyday Perl 6”. He pointed out that it wasn’t necessary to code golf obfuscated extremes of Perl 6 and that the average Perl 5 programmer would see many things simpler in Perl 6.  Also a rewrite from 5 to 6 might see something like 25% fewer lines of code since 6 was more expressive in syntax (as well as more consistent) although performance problems remained (and solutions in progress as the previous talk had reminded us).

    Next Liz talked of a “gross” (in the numerical sense of 12 x 12 rather than the American teen sense) of Perl 6 Weeklies as she took us down memory lane to 2014 (just about when MoarVM was launched and when unicode support was poor!)  with some selected highlights and memories of Perl 6 developers of the past (and hopefully future again!). Her talk was recorded at https://www.youtube.com/watch?v=418QCTXmvDU

    newton

    Cal then spoke of Perl 6 maths which he thought was good with its Rats and FatRats but not quite good enough and his ideas of fixing it.  On the following day he showed us he had started some TDD work on TrimRats. He also told us that Newton’s Method wasn’t very good but generated a pretty fractal. See https://www.youtube.com/watch?v=3na_Cx-anvw

    Lee spoke about how to detect Perl 5 memory leaks with various CPAN modules and his examples are at https://github.com/leejo/Perl_memory_talk

    The day finished with Lightning Talks and a barbecue at givengain — a main sponsor.

    On the second day I noticed the robotic St Bernards dog in a tourist shop window had come to life.

    dog1

    Damian kicked off the talks with my favourite of his talks,  “Standing on the Shoulders of Giants”, starting with the Countess of Lovelace and her Bernoulli number program. This generated a strange sequence with many zeros. The Perl 6 version since it used rational numbers not floating point got the zeros right whereas the Perl 5 version initially suffered from floating point rounding errors (which are fixable).

    Among other things he showed us how to define a new infix operator in Perl 6. He also showed us a Perl 6 sort program that looked exactly like LISP even down to the Lots of Irritating Superfluous Parentheses. I think this was quicksort (he certainly showed us a picture of Sir Tony Hoare at some point). Also a very functional (Haskell-like) equivalent  with heavy use of P6 Multiple Dispatch.  Also included was demonstration of P6 “before” as a sort of typeless/multi-type comparison infix. Damian then returned to his old favourite of Quantum Computing.

    My mind and notes got a bit jumbled at this point but I particularly liked the slide that explained how factorisation could work by observing the product of possible inputs since this led to a collapse that revealed the factors.  To do this on RSA etc., of course, needs real hardware support which probably only the NSA and friends have (?). Damian’s code examples are at http://www.bit.do/Perl6SOG with  an earlier version of his talk at https://www.youtube.com/watch?v=Nq2HkAYbG5o Around this point there was a road race of classic cars going on outside up the main road into the village and there were car noises in the background that strangely were more relaxing than annoying.

    File_000

    After Quantum Chaos Paul Johnson brought us all back down to ground with an excellent practical talk on modernising legacy Perl 5 applications based on his war stories. Hell, of course, is “Other People’s Code”, often dating from Perl’s early days and lacking documentation and sound engineering.

    Often the original developers had long since departed or, in the worse cases, were still there.  Adding tests and logging (with stack traces) were particularly useful. As was moving to git (although its steep learning curve meant mentoring was needed) and handling CPAN module versioning with pinto.  Many talks had spoken of the Perl 6 future whereas this spoke of the Perl 5 past and present and the work many of us suffer to pay the bills. It’s at https://www.youtube.com/watch?v=4G5EaUNOhR0

    File_000 (1)

    Jonathan then spoke of reactive distributed software.  A distributed system is an async one where “Is it working?” means “some of it is working but we don’t know which bits”.  Good OO design is “tell don’t ask” — you tell remote service to do something for you and not parse the response and do it yourself thus breaking encapsulation.  This is particularly important in building well designed distributed systems since otherwise the systems are less responsive and reliable.  Reactive (async) works better for distributed software than interactive (blocking or sync).

    We saw a table that used a Perl 6 promise for one value and a supply for many values for reactive (async) code and the equivalent (one value) and a Perl 6 Seq for interactive code. A Supply could be used for pub/sub and the Observer Pattern. A Supply could either be live (like broadcast TV) or, for most Perl 6 supplies, on-demand (like Netflix). Then samples of networking (socket) based code were discussed including a web client, web server and SSH::LibSSH (async client bindings often very useful in practical applications like port forwarding)

    https://github.com/jnthn/p6-ssh-libssh

    Much of the socket code had a pattern of “react { whenever {” blocks with “whenever” as a sort of async loop.He then moved on from sockets to services (using a Supply pipeline) and amazed us by announcing the release of “cro”, a microservices library that even supports HTTP/2 and Websockets, at http://mi.cro.services/.  This is installable using Perl 6 by “zef install –/test cro”.

    Slides at http://jnthn.net/papers/2017-spw-sockets-services.pdf and video at https://www.youtube.com/watch?v=6CsBDnTUJ3A

    Next Lee showed Burp Scanner which is payware but probably the best web vulnerabilities scanner. I wondered if anyone had dare run it on ACT or the hotel’s captive portal.

    Wendy did some cheerleading in her “Changing Image of Perl”.  An earlier version is at https://www.youtube.com/watch?v=Jl6iJIH7HdA

    Sue’s talk was “Spiders, Gophers, Butterflies” although the latter were mostly noticeably absent. She promises me that a successor version of the talk will use them more extensively. Certainly any Perl 6 web spidering code is likely to fit better on one slide than the Go equivalent.

    During the lightning talks Timo showed us a very pretty Perl 6 program using his SDL2::Raw to draw an animated square spiral with hypnotic colour cycling type patterns. Also there was a talk by the author about https://bifax.org/bif/— a distributed bug tracking system (which worked offline like git).

    Later in the final evening many of us ate and chatted in another restaurant where we witnessed a dog fight being narrowly averted and learnt that Wendy didn’t like Perl 5’s bless for both technical and philosophical reasons.


    p6steve: perl6 Module How To

    Published by p6steve on 2017-08-18T16:29:28

    Some investigation has discovered great resources on how to write and then list a perl6 module….


    p6steve: Physics::Unit in perl6

    Published by p6steve on 2017-08-18T16:23:07

    First and foremost, homage to the original authors of Physics::Unit and related perl5 CPAN modules. I would be honoured to hear from you and to collaborate in any way.

    What’s the big picture? TOP down, I have in mind:

    So, that said, given my poor state of knowledge of most of these things, my thinking is to start build BOTTOM up and see the shape of things that emerges, while learning some perl6 on the way.

    So, first I am going to need some MEASUREMENTS which are VALUES expressed in UNITS with associated ERROR.

    I took a gander at this CPAN module Physics::Udunits2 which is a perl5 interface to udunits2 and felt that the richness of it’s units and adherence to NIST guidelines were not of sufficient benefit to overcome my sense of incoherence.

    So, to cut a long story short, decided to take inspiration from Physics::Unit .

    Next, I needed some guidance on How to Build a perl6 module…


    p6steve: perl6, really?

    Published by p6steve on 2017-08-18T15:19:54

    I have been waiting for perl6 for over 15 years since it was first conceived. Recently, I have had an urge to get back to hands’ on coding and, having seen the latest Rakudo* release of perl6 felt that it is now sufficiently mature for my nefarious purposes.

    No doubt I am not the only person to have been frustrated by the slow progress of perl6, and certainly many have dropped by the wayside. Perhaps going to the siren call of Python(2 or 3), Ruby, Swift or Go. And now it is finally here, the community is obviously worried that no one will adopt the fruits of their work.

    Here ‘zoffix’ makes a desperate plea to change the name from ‘perl6’ to ‘rakudo’ to reboot the brand…. https://www.reddit.com/r/perl6/comments/6lstq3/the_hot_new_language_named_rakudo/

    My rebuttal to this concept is reproduced here.

    I’m a slow thinking guy who has had two careers:- perl5 dev and marketing manager. I have been wrestling with Zoffix’ proposed change and it is certainly a well construed argument made with feeling. Here’s my line:

    So, I detect some natural frustration within and without the community. Keep the faith. We have a new audience, they value truth and beauty, and a story of battling the odds. They need this TECHNOLOGY. It is [perl6 | rakudo] – who cares. It may have to start in academia, it may have to start in the p5 stalwarts. It will ignite. Finish the journey. Do not deny your heritage.

    So, yes, I think that perl6 is awesome (whatever it’s called). And I believe that it will be an interesting personal journey to come to grips with the ultimate programming power tool and to deliver something interesting on the way. As a user and sometime low level contributor perhaps.

    ~p6steve


    Perl 6 Maven: Printing to Standard Error in Perl 6

    Published by szabgab

    Perlgeek.de: My Ten Years of Perl 6

    Published by Moritz Lenz on 2017-08-08T22:00:01

    Time for some old man's reminiscence. Or so it feels when I realize that I've spent more than 10 years involved with the Perl 6 community.

    How I Joined the Perl 6 Community

    It was February 2007.

    I was bored. I had lots of free time (crazy to imagine that now...), and I spent some of that answering (Perl 5) questions on perlmonks. There was a category of questions where I routinely had no good answers, and those were related to threads. So I decided to play with threads, and got frustrated pretty quickly.

    And then I remember that a friend in school had told me (about four years earlier) that there was this Perl 6 project that wanted to do concurrency really well, and even automatically parallelize some stuff. And this was some time ago, maybe they had gotten anywhere?

    So I searched the Internet, and found out about Pugs, a Perl 6 compiler written in Haskell. And I wanted to learn more, but some of the links to the presentations were dead. I joined the #perl6 IRC channel to report the broken link.

    And within three minutes I got a "thank you" for the report, the broken links were gone, and I had an invitation for a commit bit to the underlying SVN repo.

    I stayed.

    The Early Days

    Those were they wild young days of Perl 6 and Pugs. Audrey Tang was pushing Pugs (and Haskell) very hard, and often implemented a feature within 20 minutes after somebody mentioned it. Things were unstable, broken often, and usually fixed quickly. No idea was too crazy to be considered or even implemented.

    We had bots that evaluated Perl 6 and Haskell code, and gave the result directly on IRC. There were lots of cool (and sometimes somewhat frightening) automations, for example for inviting others to the SVN repo, to the shared hosting system (called feather), for searching SVN logs and so on. Since git was still an obscure and very unusable, people tried to use SVK, an attempt to implement a decentralized version control system on top of of the SVN protocol.

    Despite some half-hearted attempts, I didn't really make inroads into compiler developments. Having worked with neither Haskell nor compilers before proved to be a pretty steep step. Instead I focused on some early modules, documentation, tests, and asking and answering questions. When the IRC logger went offline for a while, I wrote my own, which is still in use today.

    I felt at home in that IRC channel and the community. When the community asked for mentors for the Google Summer of Code project, I stepped up. The project was a revamp of the Perl 6 test suite, and to prepare for mentoring task, I decided to dive deeper. That made me the maintainer of the test suite.

    Pet Projects

    I can't recount a full history of Perl 6 projects during that time range, but I want to reflect on some projects that I considered my pet projects, at least for some time.

    It is not quite clear from this (very selected) timeline, but my Perl 6 related activity dropped around 2009 or 2010. This is when I started to work full time, moved in with my girlfriend (now wife), and started to plan a family.

    Relationships

    The technologies and ideas in Perl 6 are fascinating, but that's not what kept me. I came for the technology, but stayed for the community.

    There were and are many great people in the Perl 6 community, some of whom I am happy to call my friends. Whenever I get the chance to attend a Perl conference, workshop or hackathon, I find a group of Perl 6 hackers to hang out and discuss with, and generally have a good time.

    Four events stand out in my memory. In 2010 I was invited to the Open Source Days in Copenhagen. I missed most of the conference, but spent a day or two with (if memory serve right) Carl Mäsak, Patrick Michaud, Jonathan Worthington and Arne Skjærholt. We spent some fun time trying to wrap our minds around macros, the intricacies of human and computer language, and Japanese food. (Ok, the last one was easy). Later the same year, I attended my first YAPC::EU in Pisa, and met most of the same crowd again -- this time joined by Larry Wall, and over three or four days. I still fondly remember the Perl 6 hallway track from that conference. And 2012 I flew to Oslo for a Perl 6 hackathon, with a close-knit, fabulous group of Perl 6 hackers. Finally, the Perl Reunification Summit in the beautiful town of Perl in Germany, which brought together Perl 5 and Perl 6 hackers in a very relaxed atmosphere.

    For three of these four events, different private sponsors from the Perl and Perl 6 community covered travel and/or hotel costs, with their only motivation being meeting folks they liked, and seeing the community and technology flourish.

    The Now

    The Perl 6 community has evolved a lot over the last ten years, but it is still a very friendly and welcoming place. There are lots of "new" folks (where "new" is everybody who joined after me, of course :D), and a surprising number of the old guard still hang around, some more involved, some less, all of them still very friendly and supportive

    The Future

    I anticipate that my family and other projects will continue to occupy much of my time, and it is unlikely that I'll be writing another Perl 6 book (after the one about regexes) any time soon. But the Perl 6 community has become a second home for me, and I don't want to miss it.

    In the future, I see myself supporting the Perl 6 community through infrastructure (community servers, IRC logs, running IRC bots etc.), answering questions, writing a blog article here and there, but mostly empowering the "new" guard to do whatever they deem best.

    samcv: Grant Status Update 3

    Published on 2017-08-06T07:00:00

    This month I accomplished a lot. We now support full Unicode 9.0/Emoji 4.0 text segmentation (putting codepoints into graphemes), some really awesome concatenation improvements (these are both in master). Also a fully functional Unicode Collation Algorithm (this not yet merged into master)

    Table of Contents

    Unicode Collation Algorithm

    In my collation-arrays branch I now have implemented a fully working Unicode Collation Algorithm. Only 82/190,377 of the Unicode collation tests are failing (99.956% passing).

    What I did:

    • Collation keys that need to be generated on the fly are working

    • Characters that need to be decomposed for collation (mainly Hangul characters) are working

    • The script that generates the linked list (used for codepoints with more than one collation array in them or for sequences of more than one codepoint with collation keys) was rewritten and I have not discovered any issues with the data

    I was also able to properly keep the ability to adjust each of the collation levels (reversing or disabling primary, secondary or tertiary levels).

    Since primary > secondary > tertiary for all collation elements we are able to have a default array:

    {-1, 0, 1} /* aka Less, Same, More */

    This is the default array that is used in case we are comparing between different levels (primary vs secondary for instance).

    If we are comparing between the same level, we use a 3x3 array which holds the return values based on whether the collation values are either Less, More or Same. This is the array that is customized to allow you to reverse or disable different collation levels.

     {-1, 0, 1}, {-1, 0, 1}, {-1, 0, 1}
    /*  primary,  secondary,   tertiary */

    Below, pos_a and pos_b track moving between keys on the collation stack while level_a and level_b track comparing the levels between each other. For more information about how this works, see my [previous grant status update](/perl6/grant-status-update-2).

    if (level_a == level_b)
        effective_level_eval = level_eval_settings.a[level_a];
    else
        effective_level_eval = level_eval_default
    
    rtrn =
    stack_a.keys[pos_a].a[level_a] < stack_b.keys[pos_b].a[level_b] ?  effective_level_eval.s2.Less :
    stack_a.keys[pos_a].a[level_a] > stack_b.keys[pos_b].a[level_b] ?  effective_level_eval.s2.More :
                                                                       effective_level_eval.s2.Same ;

    Need to do:

    • Either use MVM_COLLATION_QC property to determine whether to look up a character in the linked list or instead store the index of the value of the main node in the Unicode database instead

      • Currently it checks all of the main nodes before then reverting to the values stored in the Unicode Database or based on generated values

    • Investigate country/language selective sorting

    Other than needing to improve the speed of finding the main nodes, the implementation is feature complete.

    The currently failing tests mostly relate to codepoints whose collation values are different in NFD form compared to NFC form. This happens if there are combining accent/extending characters after it, which would be reordered in NFD form. As the number of tests failing is quite low and there is only a very slight variation in sorting order because of this, I find it acceptable in its current form. I may find a work around to fix it, but since 99.956% of the tests pass and the incorrectness is only a slight variation of sorting.

    Text Segmentation/Normalization

    Emoji Fixes

    Improvements to segmentation of Emoji w/ GCB=Other

    Not all Emoji Modifiers have Grapheme_Cluster_Break = E_Base or E_Base_GAZ. In these cases we need to check the Emoji_Modifier_Base property. This fixed 25 more emoji than before.

    Commit: 4ff2f1f9

    Don’t break after ZWJ for Emoji=True + GCB=Other

    With this change we are now counting 100% of the Emoji v4.0 emoji as a single grapheme. Since ASCII numbers and the pound symbol are also counted as Emoji, we disregard any codepoints in the ASCII range. I updated the MoarVM Readme to show we have full Unicode 9.0/Emoji 4.0 text segmentation support!

    Other Improvements

    Avoid property lookups for Hangul and just use the Grapheme_Cluster_Break property.

    Lookup Canonical_Combining_Class as the raw integer property value instead of having to get the string and then converting to an integer. Even though the numbers we get back are not identical to the actual CCC, when doing the Unicode algorithms we only care about relative CCC. This results in lower CPU usage during normalization.

    Fix cmp/leg to support deterministic comparing of synthetics

    Fixes the cmp/leg operator to have deterministic results when the string contains a synthetic. Previously it compared based on the value the synthetic was stored as, which is based on which synthetic was allocated first. This makes it so cmp/leg always have the same result for the same string, no matter the order the synthetics are allocated. It also makes it so the codepoints the synthetic is made up of are compared instead of the value of the synthetic (whose value has no relation to the codepoints in the synthetic).

    Previously we compared naïvely by grapheme, and ended up comparing synthetic codepoints with non-synthetics. This would cause synthetics to be sorted incorrectly, in addition to it making comparing things non-deterministic; if the synthetics were added in a different order, you would get a different result with MVM_string_compare.

    Now, we compare down the string the same way we did before, but if the first non-matching grapheme is a synthetic, we iterate over the codepoints in the synthetic, so it returns the result based on codepoint. If the two non-matching graphemes contain the same initial codepoints but different in length, the grapheme with less codepoints in it is determined as less than the other.

    Concatenation Improvements

    When we concatenate strings normally, we create a new string which contains references to the string or strands of string a and the string or strands of string b.

    Previously if there were any of the following conditions for the last grapheme of string a (last_a) or the first grapheme of string b (first_b), we would renormalize string a and string b in their entirety:

    • last_a or first_b were synthetics (contained multiple codepoints in one grapheme) other than the CRLF synthetic

    • Didn’t pass NFG Quickcheck

    • Had a non-zero Canonical_Combining_Class

    The first improvement I made was to use same the code that we use during normalizations (should_break) to find out if two adjacent codepoints should be broken up or if they are part of the same grapheme (user visible character unit). This first improvement had a very big effect since previously we were renormalizing even if renormalization would not have caused there to be any change in the final output. This change reduced the amount we had to do full renormalization by about 80% in my tests of various different languages' scripts.

    The second change I made was more technically challenging. While the first change reduces how often we would do full renormalization, the second change’s goal was to only renormalize a small section of string a and string b instead of string a and b in their entirety.

    With the new code, we either copy the strands or set string a as the first strand of the new string (as we did previously). Then the should_break function we use for checking if we break between two codepoints or not is used. If we break between last_a and first_b we continue as I explained at the very start of this section. If should_break returns that we need to combine the two codepoints, then we create a renormalized_section strand for those two graphemes. Since last_a is now inside the renormalized_section, we adjust the reference to string a (similar to how .substr creates a substring) or if string a was made up of multiple strands, we adjust the last strand of string a.

    We then insert the renormalized section. String b or its strands are then set after the renormalized section. The same way we adjusted a, we adjust b or its first strand to be a substring.

    Since it’s possible for the adjusted strings or strands to be fully consumed (i.e. the substring is 0 in length), we "drop" the copied strand or string (for a or b) if the adjusted substring has no more graphemes in it. This currently can only happen if string a or the last strand of string a is a single grapheme or the first strand of string b or string b is a single grapheme.

    Other minor fixes

    Fix overflow in uniprop lookups

    Makes excessively large uniprop queries not overflow into negative numbers, which would cause it to throw. Closes MoarVM issue #566.

    Commit: a3e98692.

    Made string_index 16% faster for incompatible string types

    For incompatible string types use string_equal_at_ignore_case_INTERNAL_loop which results in a 16% speed boost. Only a minor change to the INTERNAL_loop function and it works for non-ignorecase/ignoremark as well as with ignorecase/ignoremark functionality.

    Commit: 161ec639.

    Add nqp::codes op to MoarVM for 3.5x faster doing .codes for short strings

    Added a nqp::codes which gets the number of codepoints in the string. We had nqp::chars to get the number of graphemes already, but for getting the number of codepoints in the string we called .NFC.codes which is 3.5x slower for short strings and 2x slower for longer strings.

    Tests

    Added in tests to make sure that concat is stable when concatenating things which change under normalization. This was done to make sure there were no issues with the concatenation improvement.

    Next Month

    In the next month I need to get the Collation stuff merged and work on fixing some very longstanding bugs in our Unicode Database implementation. I may not be fully changing out our Unicode Database implementation as I had planned, but I will still at least fix the majority of the issues.

    This includes ucd2c.pl not being deterministic and changing every time it is generated. I plan to make it generate the same code on every run. It was originally written to assumes properties and property values are unique. I need to make it so that when the script is generated, all of the properties and values are functional. Currently, when the script is regenerated, it is random which property values are functional and which are not; depending on which ends up last in the hash initializer array (since the property values where occur last overwrite the identical names which appear first).

    6guts: MoarVM Specializer Improvements Part 1: Gathering Data

    Published by jnthnwrthngtn on 2017-08-06T00:34:01

    Over the last weeks I’ve had the chance to work full time on Perl 6, and have dedicated this time to improving the MoarVM specializer. Since “specializer” is such a lot to type, but shortening it to “spec” would result in it being confused with specification (or perhaps bacon), we refer to it as “spesh”. But what is it?

    Specialization

    Specialization is the process of taking code with lots of late-binding in it, and producing one or more versions of the code with as much of that stripped out as we can. We take late binding in a very broad sense, really using it to mean any place where we defer a decision until the code is executed (that is, runtime) because we don’t know the context it will be executed in. So, if we write a module with this in it:

    sub shorten($text, $limit) is export {
        $text.chars > $limit
            ?? $text.substr(0, $limit) ~ '...'
            !! $text
    }
    

    Then this code isn’t committed to the types of $text and $limit. Thus, we must:

    Even if we put Str and Int type constraints on the sub, that still isn’t really enough: these could be subclassed (for example, by mixing in to them) and thus override the methods that we call. (Granted in this particular case that’d be a tad odd, but in the general case, it’s not.)

    Even with nice things like method caches and multi-dispatch caches, which do provide a huge speedup, we still must do the cache lookups, call the methods, and so forth. The job of the specializer is to notice what types we actually call shorten with, and the types methods like chars return, and thus strip out the need to even do cache lookups. It can then go further, for example inlining the body of the small chars method (and maybe substr too) into the caller so that there will be no invocation overhead for them at all!

    Before we go any further, a quick thank you

    This work is funded by my current TPF grant, which was approved some weeks back (so I’ve already been working away at it), though only recently publicly announced due to an internal TPF procedure that had to be completed first. So, thanks go to TPF for administering the funding, and to those who have donated to TPF and the Perl 6 Core Development Fund for providing the funding.

    A little series

    I could just write up a list of the improvements I’ve done, but I figure that many readers here don’t know a great deal about spesh. So instead, I will write a series of posts working through the way spesh works today, after my changes, together with some notes on how it used to work and why that changed. Hopefully that will be a more interesting and useful read.

    What to optimize?

    When I’ve talked people through how spesh works in the past, I’ve tended to start by discussing how the bytecode is turned into a control flow graph in single static assignment form, and then we go on and do the various bits of analysis and transformation. We’ll get to that, but in fact there has always been a step before it.

    Constructing a CFG and putting it in SSA form takes time; transforming it by doing a load of optimizations takes more. How much depends on the size and structure of the code. So we need to decide what code to spend time on. This was done in two ways.

    The first was just incrementing a counter every time the sub, block, method, or regex was called. If the count passed a threshold (which was chosen based upon the size of the bytecode) then an optimized version of it would be produced. Easy enough.

    Trouble is that this doesn’t catch an important case: that where we have a program that spends most of its lifetime in a loop. In that case, we only enter the block holding the loop once, which is never going to trigger optimization. Thus we also count the number of iterations of loops. If that counter hits a high enough value, then we produce the optimized code and replace the running code with it, potentially moving from the interpreter into machine code. This replacement of code with an optimized version is known as “on stack replacement”, because it’s replacing code already running on the call stack).

    Gathering data

    Just throwing bytecode representing a highly dynamic program into an optimizer won’t achieve much, however. The cost is in the late binding, and to remove that we need an idea of what kinds of things are showing up when the code is run. This data came from two sources.

    The first source was the incoming parameter types. When code was determined as hot, then the shape of the callsite (how many arguments, which named arguments) and the types of the arguments were taken and turned into a “key” for the specialization. These could then be assumed inside of the optimizer.

    This was not enough on its own to produce good optimizations, however, because a lot of data comes from attribute accesses, lexical accesses, and the return values of things the code calls. Therefore, before doing any significant work, spesh would go through the code and insert logging instructions. These would record the values that were observed. After 8 runs, the specialization process would continue, using this data.

    Bad decisions and missing information

    This worked relatively well, but had notable shortcomings. The main one concerns code that is highly polymorphic in nature – that is, it is called with many different types. In this case, whichever type it was called with on, say, its 100th call, would have a specialization produced for it. If the next call used a different type, another specialization would be produced for that. And so forth, up until we had four specializations, which was the number picked as the limit.

    This meant that things like defined or sink, which are called very often but on a lot of different types, would often go unoptimized and un-JITted. Worse, the way inlining works is to inline any existing specialization. Inlining all the defined and sink calls would be ideal. But in reality, for any non-trivial program, that would not happen in most cases. Clearly, if only the specializer were able to take a step back, it could see this situation and do something smarter. But the data just wasn’t there to do it.

    The 8 samples taken when logging could also get lucky or unlucky about what values they saw. If a value returned from a method is a certain type over 99% of the time, and less than 1% of the time it is something else, then it makes sense to stick a guard in, optimize for the 99% case, and let the guard fail and trigger deoptimization for the less than 1% of the cases. (A deoptimization is like the opposite of OSR: we swap the optimized version of the code for the slower one that can handle all of the cases, and let it take care of the rare cases).

    With only 8 values, however, 1 of them being different suggests we might have to deoptimize over 10% of the time. Deoptimization takes a bit of time to do, but leaves us interpreting the code with a bunch of late binding. The stakes are too high if all the data can tell us is that there’s a 1 in 8 chance.

    Another problem in the data was a lack of knowing about the stability of the types on the callee side of a callsite. We relied on being able to infer them based on things we did know, or had inserted checks to guard against. However, what if we were just one more guard clause away from being able to inline something? Previously we could never determine that, so had to miss out on the opportunity. With better data, it would be possible to do better.

    Interruptions and stampedes

    Spesh needed to interrupt the execution of the program twice for each frame it wished to optimize: once to parse the bytecode into the SSA CFG, insert the logging and produce a logged version of the code, and again after the logging to produce an optimized version of the code. This introduced pauses into the execution. So, every time spesh makes a program run faster, it has to do so sufficiently to overcome the time it stole to do its optimization work. And if it speeds something up that doesn’t get run much in the future, it can make the program slower overall.

    This is pretty poor use of the parallel hardware available in pretty much all computers nowadays. Better would be to spend time doing the optimization work on a thread separate from the execution of the user’s program. This not only gets rid of the pauses; it also means that if the work is completed by the interpreter before the optimized version is ready then we didn’t slow the program down by stopping it to do optimization work that was never going to pay off anyway.

    Having a thread taking care of specialization work can also resolve another problem. Imagine that a bunch of threads are set off executing the same code on a range of different data items (so, data parallelism). In this case, more than one thread could see that the code is hot, notice there’s not yet any specialization, and get to work on producing it. There was detection of any duplicate ones at installation time, but that still meant there was wasted work producing specializations. If 3 different threads wasted their time this way, then the amount of time before the specializer had paid for itself was also increased.

    A new approach to data collection

    Clearly, there was plenty of room for improvement. It was fairly clear that having a better data model representing the execution of the program, which the specialier would use to make smarter decisions, was a key part of this. At the same time, it was important that threads executing user code didn’t spend much of their time building this. Ideally, they would just throw data into a sequential buffer, and toss it over to another thread – a specializer thread – when it was full.

    So, that’s the direction I headed it. Each thread would log interesting events into a thread local buffer. The append-only and thread local (for writes) nature of it should make it fairly cache friendly. It was desirable that the entries were fairly small; the main place this had an impact was parameter type logging, where we both wish to log the container type and the type held inside of the container. This was taken care of by just writing two entries, rather than all the rest being padded out (fixed size entries weren’t all that important, I guess, but it did make things easier).

    I converged on this set of events in the log. It’s 24 bytes per entry, which isn’t too bad. One early mistake I made was logging invokes. I was keen that we start being able to inline calls to closures in the future, when it’s always the same code we call but with a different environment. So I logged the invoked code object, figuring the specializer thread could then pick the interesting parts out. This turned out to quite notably extend the lifetime of the closed over data, however. As the things we cared about were what code was invoked and whether the calling frame was also the outer frame, it was quite cheap to just extract those at the point of writing the log anyway.

    The end result only logs references to static frames, types objects, and code objects where we’re told the result of the lookup can be cached because it will always be the same. It never logs values. This fixes an issue in the previous logging mechanism: during its 8 runs it would keep values alive for longer, and – much worse – if the 8 runs were never completed, it could keep them alive indefinitely. It was a bounded leak, but certainly one I’m glad to no longer have.

    A hugely important part of the log is the correlation ID. This is a per-thread incrementing counter. It is bumped each time a frame is entered with logging turned on. It is then included in all events occurring within the execution of that frame. This allows the specializer thread to work out what events go with what code.

    The spesh worker thread

    The spesh worker thread sits in a loop, reading a log from a blocking queue (meaning it waits in an efficient way, and – once the program has been fully specialized – just never gets woken up again). Once a thread running code fills up a log buffer, it sticks it into the blocking queue, the spesh worker receives it, and it gets to work. I’ll go through the various steps it takes later in this series; for today I’ll focus only on the first one (updating the statistics model, below) and what it does before going to wait for another data buffer.

    To prevent runaway memory use if the threads running code are producing logs way faster than the specializer can make use of them, threads are given a quota of log buffers they are allowed to send. On sending, they decrement the quota. If it’s still greater than zero, then they allocate another log buffer and continue logging. Otherwise, logging is disabled for that thread. Once the specialization worker thread is done processing a buffer and acting upon its content, it increments the quotas. If it incremented it from zero, then it also installs a spesh log buffer for the thread, so it will continue logging.

    Nailing the main loop

    This scheme almost worked out fine, but there was an awkward problem. If we weren’t logging at the point that the outermost frame of the program started running (perhaps because we were specializing bits of the compiler still), then the outermost body of the program wouldn’t get a correlation ID, and so wouldn’t log any events. This would be especially unfortunate, since the first thing people measuring performance tend to do is write a hot loop in the mainline of a program! (More usefully, any perl6 -ne ... invocation has the same kind of program structure.)

    Thankfully, there was an easy way to deal with this: when we enter a new compilation unit for the first time, if there is no spesh log, then grant a temporary quota boost. Alternatively, if the log is almost full, then we send it off and then make a new one (with a quota boost if needed). This boost is temporary.

    Every heuristic written with good intentions can go rogue, and this one is no exception: imagine a program doing a load of EVALs. So, there is a fixed limit on how many times this quota boosting can happen, to ensure it handles the cases it is aimed at but doesn’t do harm.

    Building a model

    Over on the specialization thread, the events are fed into a simulation that recreates something much like the call stack of the logged program, so it can understand the relationships between callers and callees. This produces a set of statistics, hung off each invoked piece of code (know as a “static frame” in MoarVM); doing it this way has the advantage that the statistics will be garbage collected should the code also be garbage collected (this can matter in EVAL-heavy programs).

    One of the things I like most about this new approach is that, with the MVM_SPESH_LOG environment variable set to the name of a file to log to, the assembled statistical data will be dumped. This makes it possible to see the data being used to make decisions. Enough with me describing stuff, though: let’s see the stats! Here’s an example program:

    sub shorten($text, $limit) is export {
        $text.chars > $limit
            ?? $text.substr(0, $limit) ~ '...'
            !! $text
    }
    for ^10000 {
        shorten 'foo' x 100, (20..500).pick;
    }
    

    Here’s the statistics output after a while for the shorten sub:

    Latest statistics for 'shorten' (cuid: 1, file: xxx.p6:1)
    
    Total hits: 156
    
    Callsite 0x7fdb8a1249e0 (2 args, 2 pos)
    Positional flags: obj, obj
    
        Callsite hits: 156
    
        Maximum stack depth: 13
    
        Type tuple 0
            Type 0: Str (Conc)
            Type 1: Int (Conc)
            Hits: 156
            Maximum stack depth: 13
            Logged at offset:
                226:
                    156 x type Int (Conc)
                    156 x static frame 'chars' (4197) (caller is outer: 0)
                    156 x type tuple:
                        Type 0: Scalar (Conc) of Str (Conc)
                260:
                    156 x type Bool (Conc)
                    1 x static frame 'infix:«>»' (2924) (caller is outer: 0)
                    155 x static frame 'infix:«>»' (3143) (caller is outer: 0)
                    156 x type tuple:
                        Type 0: Int (Conc)
                        Type 1: Scalar (Conc) of Int (Conc)
                340:
                    91 x type Str (Conc)
                    91 x static frame 'substr' (2541) (caller is outer: 0)
                    91 x type tuple:
                        Type 0: Scalar (Conc) of Str (Conc)
                        Type 1: Int (Conc)
                        Type 2: Scalar (Conc) of Int (Conc)
                382:
                    91 x type Str (Conc)
                            91 x static frame 'infix:<~>' (4223) (caller is outer: 0)
                            91 x type tuple:
                        Type 0: Str (Conc)
                        Type 1: Str (Conc)
    
    Static values:
        - Sub+{<anon|77645888>} (0x2a70d48) @ 194
        - Sub+{<anon|77645888>}+{Precedence} (0x23a90c0) @ 288
    

    From this we can see:

    At compile time it was also worked out that some lexical lookups will always resolve to the very same object; the results of those have been logged also (the static values section), so the optimization process can elide the lookups and do further optimizations by knowing exactly what will be called. (The lookups in question are for the > and ~ operations, which are lexical, but in this case not overridden in a nested scope and so will come from the CORE setting.)

    Here’s another example, this time for chars:

    Latest statistics for 'chars' (cuid: 4197, file: SETTING::src/core/Str.pm:2728)
    
    Total hits: 157
    
    Callsite 0x7fdb8a124a00 (1 args, 1 pos)
    Positional flags: obj
    
        Callsite hits: 157
    
        Maximum stack depth: 33
    
        Type tuple 0
            Type 0: Str (Conc)
            Hits: 1
            Maximum stack depth: 33
    
        Type tuple 1
            Type 0: Scalar (Conc) of Str (Conc)
            Hits: 156
            Maximum stack depth: 14
    

    Here we can see that it was called once with a Str value, but a bunch of times with a Scalar holding a Str. Therefore, it’s worth producing optimized code for the latter case, but not worth bothering about the former.

    Clearing up

    But what about all of the statistics we record on one-off cold paths, such as at startup? We’ll never do optimizations based on them, and so they’d just sit around using memory. Or what about after we’ve optimized a frame in all the needed ways, and it’s no longer logging new data? We don’t really need the statistics any more.

    To alleviate this, each time a log buffer is received a statistics version number is incremented. The statistics are marked with the current version when updated. Any statistics that are not updated in a while will be presumed to be no longer interesting, and so thrown out.

    Debuggability

    When I first mentioned moving specialization work off to its own thread on #moarvm, brrt (who leads development of the JIT) was immediately concerned about the unpredictability this would introduce. Threads are scheduled a bit differently between runs, which would make it a nightmare to reproduce and hunt down specializer and JIT compiler bugs, including using the bisection approach to work out exactly which specialization made things go wrong. It was an excellent point.

    Therefore, I introduced the MVM_SPESH_BLOCKING environment variable. When set, a thread executing code will send off its log, and then block until the spesh worker thread has finished processing it and installing specializations. This means that, for a single threaded program, the behavior will again be fully deterministic.

    You won’t believe what happens next!

    Alas, that’s all I’m covering this time. Next time, I’ll talk about how the statistics are used.


    Perl 6 Maven: Continuous Integration for Perl 6 modules

    Published by szabgab

    rakudo.org: Announce: Rakudo Star Release 2017.07

    Published by Steve Mynott on 2017-07-24T18:36:58

    A useful and usable production distribution of Perl 6

    On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the July 2017 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the July 2017 release is available from https://rakudo.perl6.org/downloads/star/.

    Binaries for macOS and Windows (64 bit) are also available.

    This is the eighth post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

    IMPORTANT: “panda” is to be removed very shortly since it is deprecated. Please use “zef” instead.

    Currently, Star is on a quarterly release cycle and 2017.10 (October) will follow later this year.

    Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

    In the Perl 6 world, we make a distinction between the language (“Perl 6”) and specific implementations of the language such as “Rakudo Perl”.

    This Star release includes release 2017.07 of the Rakudo Perl 6 compiler, version 2017.07 MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

    Note this Star release contains NQP version 2017.07-9-gc0abee7 rather than the release NQP 2017.07 in order to fix the –ll-exception command line flag.

    The Rakudo compiler changes since the last Rakudo Star release of 2017.01 are now listed in “2017.05.md”, “2017.06.md” and “2017.07.md” under the “rakudo/docs/announce” directory of the source distribution.

    Notable changes in modules shipped with Rakudo Star:

    + DBIish: Doc and CI updates
    + doc: Too many to list. p6doc fixed.
    + grammar-debugger: Works again now.
    + p6-io-string: New dep for doc.
    + p6-native-resources: Removed since deprecated and not used by linenoise.
    + panda: Officially deprecate panda in favour of zef.
    + perl6-Test-When: New dep for perl6-pod-to-bigpage.
    + perl6-lwp-simple: Fix breakage due to rakudo encoding refactor.
    + tap-harness6: Replaces deprecated tap-harness6-prove6.
    + zef: Too many to list.

    There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

    + advanced macros
    + non-blocking I/O (in progress)
    + some bits of Synopsis 9 and 11

    There is an online resource at http://perl6.org/compilers/features that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

    In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at rakudobug@perl.org.

    See https://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

    The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org mailing list, or join us on IRC #perl6 on freenode.

    Perlgeek.de: Perl 6 Fundamentals Now Available for Purchase

    Published by Moritz Lenz on 2017-07-21T22:00:01

    After about nine months of work, my book Perl 6 Fundamentals is now available for purchase on apress.com and springer.com.

    The ebook can be purchased right now, and comes in the epub and PDF formats (with watermarks, but DRM free). The print form can be pre-ordered from Amazon, and will become ready for shipping in about a week or two.

    I will make a copy of the ebook available for free for everybody who purchased an earlier version, "Perl 6 by Example", from LeanPub.

    The book is aimed at people familiar with the basics of programming; prior Perl 5 or Perl 6 knowledge is not required. It features a practical example in most chapters (no mammal hierarchies or class Rectangle inheriting from class Shape), ranging from simple input/output and text formatting to plotting with python's matplotlib libraries. Other examples include date and time conversion, a Unicode search tool and a directory size visualization.

    I use these examples to explain subset of Perl 6, with many pointers to more documentation where relevant. Perl 6 topics include the basic lexicographic structure, testing, input and output, multi dispatch, object orientation, regexes and grammars, usage of modules, functional programming and interaction with python libraries through Inline::Python.

    Let me finish with Larry Wall's description of this book, quoted from his foreword:

    It's not just a reference, since you can always find such materials online. Nor is it just a cookbook. I like to think of it as an extended invitation, from a well-liked and well-informed member of our circle, to people like you who might want to join in on the fun. Because joy is what's fundamental to Perl. The essence of Perl is an invitation to love, and to be loved by, the Perl community. It's an invitation to be a participant of the gift economy, on both the receiving and the giving end.

    Perl 6 Maven: LWP::Simple - a simple web client in Perl 6

    Published by szabgab

    Perlgeek.de: The Loss of Name and Orientation

    Published by Moritz Lenz on 2017-07-10T22:00:01

    The Perl 6 naming debate has started again. And I guess with good reason. Teaching people that Perl 6 is a Perl, but not the Perl requires too much effort. Two years ago, I didn't believe. Now you're reading a tired man's words.

    I'm glad that this time, we're not discussing giving up the "Perl" brand, which still has very positive connotations in my mind, and in many other minds as well.

    And yet, I can't bring myself to like "Rakudo Perl 6" as a name. There are two vary shallow reasons for that: Going from two syllables, "Perl six", to five of them, seems a step in the wrong direction. And two, I remember the days when the name was pretty young, and people would misspell it all the time. That seems to have abated, though I don't know why.

    But there's also a deeper reason, probably sentimental old man's reason. I remember the days when Pugs was actively developed, and formed the center of a vibrant community. When kp6 and SMOP and all those weird projects were around. And then, just when it looked like there was only a single compiler was around, Stefan O'Rear conjured up niecza, almost single-handedly, and out of thin air. Within months, it was a viable Perl 6 compiler, that people on #perl6 readily recommended.

    All of this was born out of the vision that Perl 6 was a language with no single, preferred compiler. Changing the language name to include the compiler name means abandoning this vision. How can we claim to welcome alternative implementations when the commitment to one compiler is right in the language name?

    However I can't weigh this loss of vision against a potential gain in popularity. I can't decide if it's my long-term commitment to the name "Perl 6" that makes me resent the new name, or valid objections. The lack of vision mirrors my own state of mind pretty well.

    I don't know where this leaves us. I guess I must apologize for wasting your time by publishing this incoherent mess.

    Perl 6 Maven: MongoDB with Perl 6 on Linux

    Published by szabgab

    Perl 6 Maven: Parsing command line arguments in Perl 6 - ARGS - ARGV - MAIN

    Published by szabgab

    samcv: Grant Status Update 2

    Published on 2017-07-04T07:00:00

    This is my second grant progress report for my Perl Foundation grant entitled "Improving the Robustness of Unicode Support in Rakudo on MoarVM".

    I got to working on collation this month. I'm going to explain a bit of how the Unicode Collation Algorithm works.

    Unicode Collation Algorithm

    The collation data for UCA is made up of arrays like so:

    [primary.secondary.tertiary]

    Each one is an integer. Primary is different for different letters, 'a' vs 'z'. secondary is differences such as diacritics, 'a' vs 'á'. And tertiary is case, 'a' vs 'A'. While it works different for non-Latin characters, that is the gist of what they represent. In most cases you have one codepoint mapped to one collation array. Though in many cases, this is not true.

    Single codepoints can map to multiple collation array elements. Sequences of codepoints can also map to one or more than one collation array elements.

    Some sequences also can exist inside others.

    So the string xyz may have one set of collation elements but xy has another, where x y and z are codepoints in a three codepoint sequence with its own set of multiple collation keys.

    So, how do these collation elements translate into sorting the codepoints?

    [.0706.0020.0002], [.06D9.0020.0002]

    You take the two primary values, then append a 0 as a seperator. Then push the secondary, append another 0 as a separator and then push on the tertiary:

    0707, 06D9, 0, 020, 020, 0, 02, 02

    Now this would pose a problem since we would need to traverse the entire string before making any decisions. Instead what I have decided to do is to use the arrays with [primary.secondary.tertiary] and push them onto a stack instead of changing them into a linear progression, and iterate through the primary's, and then grab more collation elements as they are needed to resolve ties.

    Also when collation data for the next codepoint is added to the stack, if it is a starter is a sequence we will also pull the next codepoint going through a linked list stored in C arrays as needed. If the next codepoint ends up not being a part of a sequence we just push the codepoint we just "peeked" at onto the stack as well, so don't have to go back over codepoints.

    Now this improved Unicode Collation Algorithm is not complete, but I am continuing to work on integrating the new C data structure I've created into MoarVM, and it currently works partially, but not as well as the current implementation.

    Improvements to the Current UCA Implementation

    In the meantime I have made improvements to the current implementation of the Unicode Collation Algorithm. Previously it was possible to enable or disable the primary, secondary or tertiary levels. This allowed you to do things such as ignore diacritics when sorting or ignore casing. What you are now able to do is to reverse the sorting of different levels. This allows you to for example sort uppercase letters before lowercase (default UCA sorts lowercase before uppercase, since lowercase < uppercase). It can also let you put diacritic mark containing characters before the ordinary letters. Any of the three levels can be either enabled, disabled, or reversed. For anybody already using it, supplying True or False to set $*COLLATION still works the same as before, but you are now able to supply 1, 0, or -1 to enable, disable or reverse the collation for specific levels.

    Fixes

    Grapheme Cluster Break

    As I said last week I made improvements to the script that tests our breakup of graphemes. Now we have full support for the Prepend property that was added in Unicode 9.0, as well as passing all the tests for regional indicators. The only tests we now don't pass in GraphemeClusterBreakTest.t are a few emoji tests, and I believe we only fail 3 or so of these! The Prepend mark fixes needed us to save more state across parsing the code, as Prepend is different from all other Unicode grapheme break logic in that it comes before not after a base character.

    Igorecase+Ignoremark Regex

    The longstanding bug I mentioned in my previous status report has now been fixed. The bug was in regex when both ignorecase and ignoremark adverbs were used.

    say "All hell is breaking loose" ~~ m:i:m/"All is fine, I am sure of it"/
    # OUTPUT«「All hell is breaking loose」␤» Output before the fix. This should not have matched.
    

    This bug occurred when the entire length of the haystack was searched and all of the graphemes matched the needle.

    If the needle exceeded the length of the haystack past that point, it would erroneously think there was a match there, as it only checked that it matched the whole length of the haystack.

    Would cause 'fgh' to be found in: 'abcdefg'. This only occurred at the very end of the haystack.

    The internal string_equal_at_ignore_case_INTERNAL_loop now returns -1 if there was no match and 0 or more if there was a match at that index.

    This return value provides new information which is 0 if there was a match and some positive integer when the haystack was expanded when casefolding it.

    As explained by my previous post, information about when characters expand when foldcased must be retained.

    This information had been planned to be exposed in some way at a future date, as if we are searching for 'st' inside a string 'stabc', nqp::indexic (index ignorecase) will indicate that it is located at index 0, and in Perl 6 Rakudo it will return 'sta' when it should instead have returned 'st'.

    For now this additional information is only internal and the return values of the nqp::indexic_s and nqp::equatic_s ops have not changed.

    NQP Codepath Problems…

    Previously there were way too many different codepaths handling different variations of no regex adverbs, ignorecase, ignoremark, ignorecase+ignoremark. Problematically each combination had their own codepath. To really solve this bug and improve the code quality I decided to clean it up and correct this.

    In my past work I had already added a nqp::indexic op, so it was time to add another! I added a nqp::indexicim op and a nqp::eqaticim op and was able to reuse most of the code and not increase our code burden much on the MoarVM side, and greatly reduce the possibility for bugs to get in on varying combinations of ignorecase/ignoremark ops.

    This is was a very longstanding Unicode bug (I don't think both adverbs together ever worked) so it's great that it is now fixed :).

    Coming Up

    I will be continuing to fix out the issues in the new Unicode Collation Algorithm implementation as I described earlier in this post. I also plan on taking stock of all of the current Grapheme Cluster Break issues, which only exist now for certain Emoji (though the vast majority of Emoji work properly).

    I will also be preparing my talks for the Amsterdam Perl conference as well!

    Sidenote

    I released a new module, Font::QueryInfo which allows you to query font information using FreeType. It can even return the codepoints a font supports as a list of ranges!

    Perlgeek.de: Living on the (b)leading edge

    Published by Moritz Lenz on 2017-06-24T22:00:01

    Perl 6 is innovative in many ways, and sometimes we don't fully appreciate all the implications, for good or for bad.

    There's one I stumbled upon recently: The use of fancy Unicode symbols for built-in stuff. In this case: the `.gist` output of Match objects. For example

    my token word { \w+ }
    say 'abc=def' ~~ /<word> '=' <word>/;
    
    produces this output:
    「abc=def」
     word => 「abc」
     word => 「def」
    

    And that's where the problems start. In my current quest to write a book on Perl 6 regexes, I noticed that the PDF that LeanPub generates from my Markdown sources don't correctly display those pesky 「」 characters, which are

    $ uni -c 「」
    「 - U+0FF62 - HALFWIDTH LEFT CORNER BRACKET
    」 - U+0FF63 - HALFWIDTH RIGHT CORNER BRACKET
    

    When I copied the text from the PDF and pasted into my editor, they showed up correctly, which indicates that the characters are likely missing from the monospace font.

    The toolchain allows control over the font used for displaying code, so I tried all the monospace fonts that were available. I tried them in alphabetical order. Among the earlier fonts I tried was Deja Vu Sans Mono, which I use in my terminal, and which hasn't let me down yet. No dice. I arrived at Noto, a font designed to cover all Unicode codepoints. And it didn't work either. So it turns out these two characters are part of some Noto Sans variants, but not of the monospace font.

    My terminal, and even some font viewers, use some kind of fallback where they use glyphs from other fonts to render missing characters. The book generation toolchain does not.

    The Google Group for Leanpub was somewhat helpful: if I could recommend an Open Source mono space font that fit my needs, they'd likely include it in their toolchain.

    So I searched and searched, learning more about fonts than I wanted to know. My circle of geek friends came up with several suggestions, one of them being Iosevka, which actually contains those characters. So now I wait for others to step up, either for LeanPub to include that font, or for the Noto maintainers to create a monospace variant of those characters (and then LeanPub updating their version of the font).

    And all of that because Perl 6 was being innovative, and used two otherwise little-used characters as delimiters, in an attempt to avoid collisions between delimiters and content.

    (In the mean time I've replaced the two offending characters with ones that look similar. It means the example output is technically incorrect, but at least it's readable).

    Perlgeek.de: Perl 6 Books Landscape in June 2017

    Published by Moritz Lenz on 2017-06-07T22:00:01

    There are lots of news around Perl 6 books to share these days. If you follow the community very closely, you might be aware of most of it. If not, read on :-).

    Think Perl 6 is now available for purchase, and also for download as a free ebook. Heck, it's even Open Source, with the LaTeX sources on GitHub!

    Perl 6 at a Glance, previously only available in print form, is now available as an ebook. Save paper and shipping costs!

    My own book, Perl 6 Fundamentals, is now in the "production" phase: copyediting, indexing, layout. And just before the manuscript submission deadline, Larry Wall has contributed a foreword. How awesome is that?

    I've revamped perl6book.com to provide a short overview of the current and future Perl 6 books. As a small gimmick, it contains a flow chart explaining which book to chose. And I even got input from two other Perl 6 book authors (Laurent Rosenfeld of "Think Perl 6", Andrew Shitov of "Perl 6 at a Glance", "Migrating to Perl 6".

    From a pull request to perl6book.com, it looks like Andrew Shitov is working on two more Perl 6 books. Keep 'em coming!

    Last but not least, Gabor Szabo has started a crowd funding campaign for a Perl 6 book on web app development. There are still a few day left, so you can help it succeed!

    And as always, if you want to keep informed about Perl 6 books, you can sign up at perl6book.com for my Perl 6 books mailing list (low volume, typically less than one email per month).

    samcv: Grant Status Update 1

    Published on 2017-06-02T07:00:00

    This is my first grant progress report for my Perl Foundation grant entitled "Improving the Robustness of Unicode Support in Rakudo on MoarVM".

    I was not able to work quite as many hours as I would have liked this month, but I still made quite a lot of progress.

    Improvement for Tests

    Merged In

    In Roast there is a new version of GraphemeBreakTest.t.

    The script tests the contents of each grapheme individually from the GraphemeClusterBreak.txt file from the Unicode 9.0 test suite.

    Previously we only checked the total number of ‘.chars’ each for the string as a whole. Obviously we want something more precise than that, since the test specifies the location of each of the breaks between codepoints. The new code checks that codepoints are put in the correct graphemes in the proper order. In addition we also check the string length as well.

    This new test uses a grammar to parse the file and generally is much more robust than the previous script.

    Running the parse class generates an array of arrays. The index of the outer array indicates the grapheme, while the inner arrays indicate which codepoints should be in that grapheme.

    [[10084, 776], [9757]]

    The array above would indicate that the 1st grapheme is made up of codepoint's 10084 and 776 while the 2nd grapheme is made up codepoint 9757. This allows us to easily test the contents of each grapheme.

    The array shown above corresponds to the following line from the Unicode data file:

    ÷ 2764 × 0308 ÷ 261D ÷ where × means break and ÷ means no-break

    Work in Progress

    I have some currently unmerged tests which need to wait to be merged, although sections of it are complete and are being incorporated into the larger Unicode Database Retrofit, reusing this code.

    I have written grammars and modules to process and provide data on the PropertyValueAliases and PropertyAliases. They will be used for testing that all of the canonical property names and all the property values themselves properly resolve to separate property codes, as well as that they are usable in regex.

    Work on the Unicode Database Retrofit

    As part of my grant work I am working on making Unicode property values distinct per property, and also on allowing all canonical Unicode property values to work. For a background on this see my previous post about Unicode Property Names. The WIP generated code can be seen in this gist here and was generated from UCD-gen.p6. The code resolves property name and property value command line arguments and matches them with property codes and property value codes. It is also case insensitive and ignores underscores as Unicode spec says is permissible. In addition it is also deduplicated, meaning we only store one hash per unique set of property values.

    For example: Script and Script_Extensions both have the same values, so we don't store these more than once; likewise for the Boolean property values. The C program resolves the property string to a unique property code, and from there is able to look up the property value code. Note: aside from the property values which specify the lack of a property, these codes are internal and have no relation to the Unicode spec, for example Grapheme_Cluster_Break=Other is designated as property value 0.

    Docs

    I've also started adding some documentation to my Unicode-Grant wiki with information about what is enclosed in each Unicode data file; there are a few other pages as well. This wiki is planned to be expanded to have many more sections than it does currently. https://github.com/samcv/Unicode-Grant/wiki/All-Unicode-Files

    Future Work

    Next I must integrate the property name/value alias resolving code with UCD-gen.p6. UCD-gen.p6 already has a mostly functional Unicode database with a fair number of properties. When these two are integrated, the next step will be to start integrating it with the MoarVM codebase, making any changes to MoarVM or the database retrofit codebase as needed along the way.

    I will also be exploring ways of compressing the mapping of codepoints to unique combinations of Unicode property data in the bitfield. Due to the vast number of codepoints within Unicode, currently the mapping of codepoints to rows in the bitfield takes up many times more space than the actual property value data itself.

    For compressing the Unicode names, it is planned to use base 40 encoding with some additional tricks to save additional space for repeated words. I plan on making a blog post where I go into the details of the compression scheme.

    I am considering also rolling in the ignorecase/ignoremark bug into my grant. Even though it was not originally planned to be part of the Grant, I think it is important enough to warrant inclusion. Currently, using regex using both ignorecase and ignoremark together is completely broken.

    Note

    The work described above has been commited to the two repositories as listed below (in addition to the test work described which was merged into Roast).

    https://github.com/samcv/UCD

    https://github.com/samcv/Unicode-Grant

    brrt to the future: Function call return values

    Published by Bart Wiegmans on 2017-05-25T19:02:00

    Hi there, it's been about a month since last I wrote on the progress of the even-moar-jit branch, so it is probably time for another update.

    Already two months ago I wrote about adding support for function calls in the expression JIT compiler. This was a major milestone as calling C functions is essential for almost everything that is not pure numerical computing. Now we can also use the return values of function calls (picture that!) The main issue with this was something I've come to call the 'garbage restore' problem, by which I mean that the register allocator would attempt to 'restore' an earlier, possibly undefined, version of a value over a value that would result from a function call.

    This has everything to do with the spill strategy used by the compiler. When a value has to be stored to memory (spilled) in order to avoid being overwritten and lost, there are a number of things that can be done. The default, safest strategy is to store a value to memory after every instruction that computes it and to load it from memory before every instruction that uses it. I'll call this a full spill. It is safe because it effectively makes the memory location the only 'true' storage location, with the register being merely temporary caches. It can also be somewhat inefficient, especially if the code path that forces the spill is conditional and rarely taken. In MoarVM, this happens (for instance) around memory barriers, which are only necessary when creating cross-generation object references.

    That's why around function calls the JIT uses another strategy, which I will call a point spill. What I mean by that is that the (live) values which could be overwritten by the function call are spilled to memory just before the function call, and loaded back into their original registers directly after. This is mostly safe, since under normal control flow, the code beyond the function call point will be able to continue as if nothing had changed. (A variant which is not at all safe is to store the values to memory at the point, and load them from memory in all subsequent code, because it isn't guaranteed that the original spill-point-code is reached in practice, meaning that you overwrite a good value with garbage. The original register allocator for the new JIT suffered from this problem).

    It is only safe, though, if the value that is to be spilled-and-restored is both valid (defined in a code path that always precedes the spill) and required (the value is actually used in code paths that follow the restore). This is not the case, for instance, when a value is the result of a conditional function call, as in the following piece of code:

    1:  my $x = $y + $z;
    2:  if ($y < 0) {
    3:      $x = compute-it($x, $y, $z);
    4:  }
    5:  say "\$x = $x";

    In this code, the value in $x is defined first by the addition operation and then, optionally, by the function call to compute-it. The last use of $x is in the string interpolation on line 5. Thus, according to the compiler, $x holds a 'live' value at the site of the function call on line 3, and so to avoid it from being overwritten, it must be spilled to memory and restored. But in fact, loading $x from memory after compute-it would directly overwrite the new value with the old one.

    The problem here appears to be that when the JIT decides to 'save' the value of $x around the function call, it does not take into account that - in this code path - the last use of the old value of $x is in fact when it is placed on the parameter list to the compute-it call. From the perspective of the conditional branch, it is only the new value of $x which is used on line 5. Between the use on the parameter list and the assignment from the return value, the value of $x is not 'live' at all. This is called a 'live range hole'. It is then the goal to find these holes and to make sure a value is not treated as live when it is in fact not.

    I used an algorithm from a paper by Wimmer and Franz (2010) to find the holes. However, this algorithm relies on having the control flow structure of the program available, which usually requires a separate analysis step. In my case that was fortunately not necessary since this control flow structure is in fact generated by an earlier step in the JIT compilation process, and all that was necessary is to record it. The algorithm itself is really simple and relies on the following ideas:

    I think it goes beyond the scope of this blog post to explain how it works in full, but it is really not very complicated and works very well. At any rate, it was sufficient to prevent the JIT from overwriting good values with bad ones, and allowed me to finally enable functions that return values, which is otherwise really simple.

    When that was done, I obviously tried to use it and immediately ran into some bugs. To fix that, I've improved the jit-bisect.pl script, which wasn't very robust before. The jit-bisect.pl script uses two environment variables, MVM_JIT_EXPR_LAST_FRAME and MVM_JIT_EXPR_LAST_BB, to automatically find the code sequence where the expression compiler fails and compiles wrong code. (These variables tell the JIT compiler to stop running the expression compiler after a certain number of frames and basic blocks. If we know that the program fails with N blocks compiled, we can use binary search between 0 and N to find out which frame is broken). The jit-dump.pl script then provides disassembled bytecode dumps that can be compared and with that, it is usually relatively easy to find out where the JIT compiler bug is.

    With that in hand I've spent my time mostly fixing existing bugs in the JIT compiler. I am now at a stage in which I feel like most of the core functionality is in place, and what is left is about creating extension points and fixing bugs. More on that, however, in my next post. See you then!

    Death by Perl6: Perl Toolchain Summit 2017 - CPAN and Perl6

    Published by Nick Logan on 2017-05-25T05:41:53

    At the 2017 Perl Toolchain Summit (PTS) a lot of stuff got done. This is a brief demonstration style summary of the resulting CPAN-related feature enhancements to zef.

    First I should mention that now Perl6 distributions can be uploaded to CPAN (without needing to add a special Perl6/ folder), and will have their source-url automatically set or replaced with the appropriate CPAN url. Additionally App::Mi6 now has mi6 dist and mi6 upload to make the process even simpler.

    Now lets get started by making sure we are using a version with features developed at PTS:

    $ zef install "zef:ver(v0.1.15+)"
    All candidates are currently installed  
    No reason to proceed. Use --force to continue anyway  
    

    Perl6 distributions uploaded to CPAN are now indexed. Currently the index is generated by https://github.com/ugexe/Perl6-App--ecogen and stored at https://github.com/ugexe/Perl6-ecosystems alongside a mirror of the existing p6c ecosystem. It is also enabled by default now:

    $ zef list --max=10
    ===> Found via Zef::Repository::Ecosystems<cpan>
    Inline:ver('1.2.1')  
    Inline:ver('1.2')  
    Inline:ver('1')  
    IO::Glob:ver('0.1'):auth('github:zostay')  
    Text::CSV:ver('0.007'):auth('github:Tux')  
    Text::CSV:ver('0.008'):auth('github:Tux')  
    Data::Selector:ver('1.01')  
    Data::Selector:ver('1.02')  
    NativeCall:ver('1')  
    CompUnit::Repository::Mask:ver('0.0.1')  
    Inline::Perl5:ver('0.26'):auth('github:niner')
    
    $ zef info CompUnit::Repository::Mask
    - Info for: CompUnit::Repository::Mask
    - Identity: CompUnit::Repository::Mask:ver('0.0.1')
    - Recommended By: Zef::Repository::Ecosystems<cpan>
    Description:     hide installed modules for testing.  
    License:     Artistic-2.0  
    Source-url:     http://www.cpan.org/authors/id/N/NI/NINE/Perl6/CompUnit-Repository-Mask-0.0.1.tar.gz  
    Provides: 1 modules  
    Depends: 0 items
    

    A distribution can exist in multiple "ecosystems":

    $ zef search Inline::Perl5
    ===> Found 3 results
    -----------------------------------------------------------------------------------------------------------------------
    ID|From                             |Package                                       |Description  
    -----------------------------------------------------------------------------------------------------------------------
    1 |Zef::Repository::LocalCache      |Inline::Perl5:ver('0.26'):auth('github:niner')|Use Perl 5 code in a Perl 6 program  
    2 |Zef::Repository::Ecosystems<cpan>|Inline::Perl5:ver('0.26'):auth('github:niner')|Use Perl 5 code in a Perl 6 program  
    3 |Zef::Repository::Ecosystems<p6c> |Inline::Perl5:ver('0.26'):auth('github:niner')|Use Perl 5 code in a Perl 6 program  
    -----------------------------------------------------------------------------------------------------------------------
    

    Dependencies can be resolved by any and all ecosystems available, so distributions can be put on cpan and still have their dependencies that aren't get resolved:

    $ zef -v install Inline::Perl5
    ===> Searching for: Inline::Perl5
    ===> Found: Inline::Perl5:ver('0.26'):auth('github:niner') [via Zef::Repository::Ecosystems<cpan>]
    ===> Searching for missing dependencies: LibraryMake, File::Temp
    ===> Found dependencies: File::Temp [via Zef::Repository::Ecosystems<p6c>]
    ===> Found dependencies: LibraryMake:ver('1.0.0'):auth('github:retupmoca') [via Zef::Repository::LocalCache]
    ===> Searching for missing dependencies: Shell::Command, File::Directory::Tree
    ===> Found dependencies: Shell::Command, File::Directory::Tree:auth('labster') [via Zef::Repository::Ecosystems<p6c>]
    ===> Searching for missing dependencies: File::Which, File::Find
    ===> Found dependencies: File::Find:ver('0.1'), File::Which [via Zef::Repository::Ecosystems<p6c>]
    
    ...<more output>...
    

    In addition to CPAN we have access to CPAN testers. Garu worked with me to create a perl6 cpan testers report module: Zef::CPANReporter

    $ zef install Zef::CPANReporter
    ===> Searching for: Zef::CPANReporter
    ===> Searching for missing dependencies: Net::HTTP
    ===> Testing: Net::HTTP:ver('0.0.1'):auth('github:ugexe')
    ===> Testing [OK] for Net::HTTP:ver('0.0.1'):auth('github:ugexe')
    ===> Testing: Zef::CPANReporter:ver('0.0.1'):auth('github:garu')
    ===> Testing [OK] for Zef::CPANReporter:ver('0.0.1'):auth('github:garu')
    ===> Installing: Net::HTTP:ver('0.0.1'):auth('github:ugexe')
    ===> Installing: Zef::CPANReporter:ver('0.0.1'):auth('github:garu')
    
    # ...and in use:
    
    $ zef -v install Grammar::Debugger
    ===> Searching for: Grammar::Debugger
    ===> Found: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn') [via Zef::Repository::Ecosystems<p6c>]
    ===> Searching for missing dependencies: Terminal::ANSIColor
    ===> Found dependencies: Terminal::ANSIColor:ver('0.3') [via Zef::Repository::Ecosystems<p6c>]
    ===> Fetching [OK]: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn') to /Users/ugexe/.zef/tmp/grammar-debugger.git
    ===> Fetching [OK]: Terminal::ANSIColor:ver('0.3') to /Users/ugexe/.zef/tmp/Terminal-ANSIColor.git
    ===> Testing: Terminal::ANSIColor:ver('0.3')
    t/00-load.t .. ok  
    All tests successful.  
    Files=1, Tests=1,  0 wallclock secs  
    Result: PASS  
    ===> Testing [OK] for Terminal::ANSIColor:ver('0.3')
    Report for Terminal::ANSIColor:ver('0.3') will be available at http://www.cpantesters.org/cpan/report/a9ed11ac-4108-11e7-9b92-c8514edb94d5  
    ===> Testing: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
    t/debugger.t .. ok  
    t/ltm.t ....... ok  
    t/tracer.t .... ok  
    All tests successful.  
    Files=3, Tests=3,  1 wallclock secs  
    Result: PASS  
    ===> Testing [OK] for Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
    Report for Grammar::Debugger:ver('1.0.1'):auth('github:jnthn') will be available at http://www.cpantesters.org/cpan/report/ab9c911c-4108-11e7-a777-ca182bdd3934  
    ===> Installing: Terminal::ANSIColor:ver('0.3')
    ===> Install [OK] for Terminal::ANSIColor:ver('0.3')
    ===> Installing: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
    ===> Install [OK] for Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
    

    This work (and my attendance) was made possible by a bunch of great perl companies and people:

    Booking.com, ActiveState, cPanel, FastMail, MaxMind, Perl Careers, MongoDB, SureVoIP, Campus Explorer, Bytemark, CAPSiDE, Charlie Gonzalez, Elastic, OpusVL, Perl Services, Procura, XS4ALL, Oetiker+Partner.

    rakudo.org: Announce: Rakudo Star Release 2017.04

    Published by Steve Mynott on 2017-05-01T15:35:01

    A useful and usable production distribution of Perl 6

    On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the April 2017 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the April 2017 release is available from https://rakudo.perl6.org/downloads/star/.

    Binaries for macOS and Windows (64 bit) are also available.

    This is the seventh post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

    This release includes “zef” as module installer. “panda” is to be shortly replaced by “zef” and will be removed in the near future.

    It’s hoped to produce quarterly Rakudo Star releases during 2017 with 2017.07 (July) and 2017.10 (October) to follow.

    Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

    In the Perl 6 world, we make a distinction between the language (“Perl 6”) and specific implementations of the language such as “Rakudo Perl”.

    This Star release includes [release 2017.04.3] of the Rakudo Perl 6 compiler, version 2017.04-53-g66c6dda of MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

    The Rakudo compiler changes since the last Rakudo Star release of 2017.01 are now listed in “2017.02.md” and “2017.04.md” under the “rakudo/docs/announce” directory of the source distribution.

    In particular this release featured many important improvements to the IO subsystem thanks to Zoffix and the support of the Perl Foundation.

    Please see
    Part 1: http://rakudo.org/2017/04/02/upgrade
    Part 2: http://rakudo.org/2017/04/03/part-2
    Part 3: http://rakudo.org/2017/04/17/final-notes

    Note there were point releases of 2017.04 so also see “2017.04.1.md”, “2017.04.2.md” and “2017.04.3.md”.

    Notable changes in modules shipped with Rakudo Star:

    + DBIish: New version with pg-consume-input
    + doc: Too many to list. Large number of “IO Grant” doc changes.
    + json\_fast: Too many to list. Big performance improvements.
    + perl6-lwp-simple: Fix for lexical require and incorrect regex for absolute URL matcher
    + test-mock: Enable concurrent use of mock objects
    + uri: Encoding fixes
    + zef: Too many to list. IO fixage.

    There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

    + advanced macros
    + non-blocking I/O (in progress)
    + some bits of Synopsis 9 and 11
    + There is an online resource at http://perl6.org/compilers/features that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

    In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at rakudobug@perl.org.

    See https://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

    The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org mailing list, or join us on IRC #perl6 on freenode.

    brrt to the future: Letting templates do what you mean

    Published by Bart Wiegmans on 2017-04-30T22:12:00

    Hi everybody, today I'd like to promote a minor, but important improvement in the 'expression template compiler' for the new JIT backend. This is a tool designed to make it easy to develop expression templates, which are themselves a way to make it easy to generate the 'expression tree' intermediate representation used by the new JIT backend. This is important because MoarVM instructions operate on a perl-like level of abstraction - single instructions can perform operations such as 'convert object to string', 'find first matching character in string' or 'access the last element of an array'. Such operations require rather more instructions to represent as machine code.

    This level of abstraction is rather convenient for the rakudo compiler, which doesn't have to consider low-level details when it processes your perl6 code. But it is not very convenient for the JIT compiler which does. The 'expression' intermediate representation is designed to be much closer to what hardware can support directly. Basic operations include loading from and storing to memory, memory address computation, integer arithmetic, (conditional) branching, and function calls. At some point in the future, floating point operations will also be added. But because of this difference in abstraction level, a single MoarVM instruction will often map to many expression tree nodes. So what is needed is an efficient way to convert between the two representations, and that is what expression templates are supposed to do.

    Expression templates are very much like the expression tree structure itself, in that both are represented as arrays of integers. Some of the elements represent instructions, some are constants, and some are references (indexes into the same array), forming a directed acyclic graph (not a tree). The only difference is that the template is associated with a set of instructions that indicate how it should be linked into the tree. (Instruction operands, i.e. the data that each instruction operates on, are prepared and linked by the template application process as well).

    Surprisingly, arrays of integers aren't a very user-friendly way to write instruction templates, and so the template compiler was born. It takes as input a text file with expression templates defined as symbolic expressions, best known from the LISP world, and outputs a header file that contains the templates, ready for use by the JIT compiler. Note that the word 'template' has become a bit overloaded, referring to the textual input of the template compiler as well as to the binary input to the JIT compiler. That's okay, I guess, since they're really two representations of the same thing. The following table shows how template text, binary, and expression tree relate to each other:

    Text 'Binary'Tree

    (template: unless_i
    (when
    (zr $0)
    (branch (label $1))
    ))

    template: {
    MVM_JIT_ZR,
    0,
    MVM_JIT_LABEL,
    1,
    MVM_JIT_BRANCH,
    2,
      MVM_JIT_WHEN,
      0,
      4,
    },
    info: ".f.f.l.ll",
    len: 9,
    root: 6

    I hope it isn't too hard to see how one maps to the other. The unless_i instruction executes a branch if its integer argument is zero, specified by a constant as its second argument. All symbols (like when, label and zr) have been replaced by uppercase prefixed constants (MVM_JIT_WHEN), and all nesting has been replaced by references (indexes) into the template array. The 'info' string specifies how the template is to be linked into the tree. Instruction operands are indicated by an 'f', and internal links by an 'l'. In the tree representation the operands have been linked into the tree by the JIT; they form the LOAD and CONST nodes and everything below them.

    Anyway, my improvement concerns a more complex form of template, such as the following example, an instruction to load an object value from the instance field of an object:

    (template: sp_p6oget_o
    (let: (($val (load (add (^p6obody $1) $2) ptr_sz)))
    (if (nz $val) $val (^vmnull))))

    This template contains a let: expression, which declares the $val variable. This value can be used in the subsequent expression by its name. Without such declarations the result of a computation could only have one reference, its immediate syntactic parent. (Or in other words, without let:, every template can only construct a tree). That is very inconvenient in case a result should be checked for null-ness, as in this case. (vmnull is a macro for the global 'null object' in MoarVM. The null object represents NULL wherever an object is needed, but isn't actually NULL, as that would mean it couldn't be dereferenced; it saves the interpreter from checking if a pointer to an object is NULL everywhere it is accessed).

    The let: construct has another purpose: it ensures the ordering of operations. Although most operations can be ordered in whatever way suits the compiler, some do not, most notably function calls. (Function calls may have numerous unpredictable side effects, after all). All statements declared in the 'let declaration body' are compiled to run before any statements in the 'expression body'. This enables the programmer to ensure that a value is not needlessly computed twice, and more importantly, it ensures that a value that is used in multiple branches of a conditional statement is defined in both of them. For instance:


    (let (($foo (...)))
    (if (...)
    (load $foo)
    $foo))

    This pseudo-snippet of template code would dereference $foo if some condition is met (e.g. $foo is not NULL) and returns $foo directly otherwise. Without let to order the computation of $foo prior to the blocks of if, the first (conditional) child of if would be the first reference to $foo. That would mean that the code to compute $foo is only compiled in the first conditional block, which would not be executed whenever the if condition was not true, meaning that $foo would be undefined in the alternative conditional block. This would mean chaos. So in fact let does order expressions. All is good.

    Except... I haven't told you how this ordering works, which is where my change comes in. Prior to commit 7fb1b10 the let expression would insert a hint to the JIT compiler to add the declared expressions as tree roots. The 'tree roots' are where the compiler starts converting the expression tree (graph) to a linear sequence of byte code. Hence the declaring expressions are compiled prior to the dependent expressions. But this has, of course, one big disadvantage, which is that the set of roots is global for the tree. Every declaration, no matter how deep into the tree, was to be compiled prior to the head of the tree. As a result, the following template code would not at all do what you want:


    (let ($foo (...))
    (if (nz $foo)
    (let (($bar (load $foo))) # dereference $foo !
    (... $bar))
    ...)


    The declaration of $bar would cause $foo to be dereferenced prior to checking whether it is non-null, causing a runtime failure. Chaos is back. Well, that's what I've changed. Fortunately, we have another ordering mechanism at our disposal, namely DO lists. These are nodes with a variable number of children that are also promised to be compiled in order. After the patch linked above, the compiler now transforms let expressions into the equivalent DO expressions. Because DO expressions can be nested safely, $bar is not computed prior to the null-check of $foo, as the programmer intended. I had originally intended to implement analysis to automatically order the expressions with regard to the conditionals, but I found that this was more complicated to implement and more surprising to the programmer. I think that in this case, relying on the programmer is the right thing.

    One thing that I found interesting is that this reduces the number of mechanisms in the compiler. The 'root-hint' was no longer useful, and subsequently removed. At the same time, all but the last child of a DO list must be void expressions, i.e. yield no value, because DO can only return the value of its last child. Since all expressions in a let declaration must yield some value - otherwise they would be useless - they required a new operation type: discard. Thus with a new node type (extension of data range) we can remove a class of behavior.

    After I had implemented this, I've started working on adding basic block analysis. That is a subject for a later post, though. Until next time!

    Strangely Consistent: The root of all eval

    Published by Carl Mäsak

    Ah, the eval function. Loved, hated. Mostly the latter.

    $ perl -E'my $program = q[say "OH HAI"]; eval $program'
    OH HAI
    

    I was a bit stunned when the eval function was renamed to EVAL in Perl 6 (back in 2013, after spec discussion here). I've never felt really comfortable with the rationale for doing so. I seem to be more or less alone in this opinion, though, which is fine.

    The rationale was "the function does something really weird, so we should flag it with upper case". Like we do with BEGIN and the other phasers, for example. With BEGIN and others, the upper-casing is motivated, I agree. A phaser takes you "outside of the normal control flow". The eval function doesn't.

    Other things that we upper-case are things like .WHAT, which look like attributes but are really specially code-generated at compile-time into something completely different. So even there the upper-casing is motivated because something outside of the normal is happening.

    eval in the end is just another function. Yes, it's a function with potentially quite wide-ranging side effects, that's true. But a lot of fairly standard functions have wide-ranging side effects. (To name a few: shell, die, exit.) You don't see anyone clamoring to upper-case those.

    I guess it could be argued that eval is very special because it hooks into the compiler and runtime in ways that normal functions don't, and maybe can't. (This is also how TimToady explained it in the commit message of the renaming commit.) But that's an argument from implementation details, which doesn't feel satisfactory. It applies with equal force to the lower-cased functions just mentioned.

    To add insult to injury, the renamed EVAL is also made deliberately harder to use:

    $ perl6 -e'my $program = q[say "OH HAI"]; EVAL $program'
    ===SORRY!=== Error while compiling -e
    EVAL is a very dangerous function!!! (use the MONKEY-SEE-NO-EVAL pragma to override this error,
    but only if you're VERY sure your data contains no injection attacks)
    at -e:1
    ------> program = q[say "OH HAI"]; EVAL $program⏏<EOL>
    
    $ perl6 -e'use MONKEY-SEE-NO-EVAL; my $program = q[say "OH HAI"]; EVAL $program'
    OH HAI
    

    Firstly, injection attacks are a real issue, and no laughing matter. We should educate each other and newcomers about them.

    Secondly, that error message ("EVAL is a very dangerous function!!!") is completely over-the-top in a way that damages rather than helps. I believe when we explain the dangers of code injection to people, we need to do it calmly and matter-of-factly. Not with three exclamation marks. The error message makes sense to someone who already knows about injection attacks; it provides no hints or clues for people who are unaware of the risks.

    (The Perl 6 community is not unique in eval-hysteria. Yesterday I stumbled across a StackOverflow thread about how to turn a string with a type name into the corresponding constructor in JavaScript. Some unlucky soul suggested eval, and everybody else immediately piled on to point out how irresponsible that was. Solely as a knee-jerk reaction "because eval is bad".)

    Thirdly, MONKEY-SEE-NO-EVAL. Please, can we just... not. 😓 Random reference to monkies and the weird attempt at levity while switching on a nuclear-chainsaw function aside, I find it odd that a function that enables EVAL is called something with NO-EVAL. That's not Least Surprise.

    Anyway, the other day I realized how I can get around both the problem of the all-caps name and the problem of the necessary pragma:

    $ perl6 -e'my &eval = &EVAL; my $program = q[say "OH HAI"]; eval $program'
    OH HAI
    

    I was so happy to realize this that I thought I'd blog about it. Apparently the very dangerous function (!!!) is fine again if we just give it back its old name. 😜

    rakudo.org: PART 3: Information on Changes Due to IO Grant Work

    Published by Zoffix Znet on 2017-04-17T20:22:46

    The IO grant work is at its wrap up. This note lists some of the last-minute changes to the plans delineated in earlier communications ([1], [2], [3]). Most of the listed items do not require any changes to users’ code.

    Help and More Info

    If you need help or more information, please join our IRC channel and ask there. You can also contact the person performing this work via Twitter @zoffix or by talking to user Zoffix in our dev IRC channel