pl6anet

Perl 6 RSS Feeds

Steve Mynott (Freenode: stmuk) steve.mynott (at)gmail.com / 2017-03-21T18:19:07


Weekly changes in and around Perl 6: 2017.12 How To Race A Hyper

Published by liztormato on 2017-03-20T21:54:17

Jonathan Worthington has focused his attention on race and hyper, two ways of making your code run in parallel more or less automatically. In his most recent blog post he describes the history behind hyper and race, and how he is (re-) considering the exact semantics for the coming 6.d version of Perl 6. Please check out this spreadsheet and let him know what you think about it.

And Yet Another Compiler Release

It was that time of the month again. Zoffix Znet and his trusty bots released Rakudo Perl 6 compiler, Release #109 (aka 2017.03). It’s getting almost as normal as landing a first stage rocket in your back garden. Anyways, this release has fixed the lexical module loading bug, which changes the way symbols exposed by a module are visible in your code. This was announced last December. Should you not have changed your code yet to take this bug fix into account, you can take a look at the Lexical require upgrade info, posted by Zoffix Znet.

Core Developments

Blog Posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Ecosystem Additions

An interesting addition for packaging up your applications, an easier way to do an HTTP client, and an interface to ftplib!

Winding Down

Still very busy. Even though it apparently is spring now, it feels more like autumn. Let the sunshine in! See you next week, for more Perl 6 news


Perlgeek.de: Perl 6 By Example: Plotting using Matplotlib and Inline::Python

Published by Moritz Lenz on 2017-03-18T23:00:01

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, either in this book project or any other Perl 6 book news, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


Occasionally I come across git repositories, and want to know how active they are, and who the main developers are.

Let's develop a script that plots the commit history, and explore how to use Python modules in Perl 6.

Extracting the Stats

We want to plot the number of commits by author and date. Git makes it easy for us to get to this information by giving some options to git log:

my $proc = run :out, <git log --date=short --pretty=format:%ad!%an>;
my (%total, %by-author, %dates);
for $proc.out.lines -> $line {
    my ( $date, $author ) = $line.split: '!', 2;
    %total{$author}++;
    %by-author{$author}{$date}++;
    %dates{$date}++;
}

run executes an external command, and :out tells it to capture the command's output, and makes it available as $proc.out. The command is a list, with the first element being the actual executable, and the rest of the elements are command line arguments to this executable.

Here git log gets the options --date short --pretty=format:%ad!%an, which instructs it to print produce lines like 2017-03-01!John Doe. This line can be parsed with a simple call to $line.split: '!', 2, which splits on the !, and limits the result to two elements. Assigning it to a two-element list ( $date, $author ) unpacks it. We then use hashes to count commits by author (in %total), by author and date (%by-author) and finally by date. In the second case, %by-author{$author} isn't even a hash yet, and we can still hash-index it. This is due to a feature called autovivification, which automatically creates ("vivifies") objects where we need them. The use of ++ creates integers, {...} indexing creates hashes, [...] indexing and .push creates arrays, and so on.

To get from these hashes to the top contributors by commit count, we can sort %total by value. Since this sorts in ascending order, sorting by the negative value gives the list in descending order. The list contains Pair objects, and we only want the first five of these, and only their keys:

my @top-authors = %total.sort(-*.value).head(5).map(*.key);

For each author, we can extract the dates of their activity and their commit counts like this:

my @dates  = %by-author{$author}.keys.sort;
my @counts = %by-author{$author}{@dates};

The last line uses slicing, that is, indexing an array with list to return a list elements.

Plotting with Python

Matplotlib is a very versatile library for all sorts of plotting and visualization. It's written in Python and for Python programs, but that won't stop us from using it in a Perl 6 program.

But first, let's take a look at a basic plotting example that uses dates on the x axis:

import datetime
import matplotlib.pyplot as plt

fig, subplots = plt.subplots()
subplots.plot(
    [datetime.date(2017, 1, 5), datetime.date(2017, 3, 5), datetime.date(2017, 5, 5)],
    [ 42, 23, 42 ],
    label='An example',
)
subplots.legend(loc='upper center', shadow=True)
fig.autofmt_xdate()
plt.show()

To make this run, you have to install python 2.7 and matplotlib. You can do this on Debian-based Linux systems with apt-get install -y python-matplotlib. The package name is the same on RPM-based distributions such as CentOS or SUSE Linux. MacOS users are advised to install a python 2.7 through homebrew and macports, and then use pip2 install matplotlib or pip2.7 install matplotlib to get the library. Windows installation is probably easiest through the conda package manager, which offers pre-built binaries of both python and matplotlib.

When you run this scripts with python2.7 dates.py, it opens a GUI window, showing the plot and some controls, which allow you to zoom, scroll, and write the plot graphic to a file:

Basic matplotlib plotting window

Bridging the Gap

The Rakudo Perl 6 compiler comes with a handy library for calling foreign functions, which allows you to call functions written in C, or anything with a compatible binary interface.

The Inline::Python library uses the native call functionality to talk to python's C API, and offers interoperability between Perl 6 and Python code. At the time of writing, this interoperability is still fragile in places, but can be worth using for some of the great libraries that Python has to offer.

To install Inline::Python, you must have a C compiler available, and then run

$ zef install Inline::Python

(or the same with panda instead of zef, if that's your module installer).

Now you can start to run Python 2 code in your Perl 6 programs:

use Inline::Python;

my $py = Inline::Python.new;
$py.run: 'print("Hello, Pyerl 6")';

Besides the run method, which takes a string of Python code and execute it, you can also use call to call Python routines by specifying the namespace, the routine to call, and a list of arguments:

use Inline::Python;

my $py = Inline::Python.new;
$py.run('import datetime');
my $date = $py.call('datetime', 'date', 2017, 1, 31);
$py.call('__builtin__', 'print', $date);    # 2017-01-31

The arguments that you pass to call are Perl 6 objects, like three Int objects in this example. Inline::Python automatically translates them to the corresponding Python built-in data structure. It translate numbers, strings, arrays and hashes. Return values are also translated in opposite direction, though since Python 2 does not distinguish properly between byte and Unicode strings, Python strings end up as buffers in Perl 6.

Object that Inline::Python cannot translate are handled as opaque objects on the Perl 6 side. You can pass them back into python routines (as shown with the print call above), or you can also call methods on them:

say $date.isoformat().decode;               # 2017-01-31

Perl 6 exposes attributes through methods, so Perl 6 has no syntax for accessing attributes from foreign objects directly. If you try to access for example the year attribute of datetime.date through the normal method call syntax, you get an error.

say $date.year;

Dies with

'int' object is not callable

Instead, you have to use the getattr builtin:

say $py.call('__builtin__', 'getattr', $date, 'year');

Using the Bridge to Plot

We need access to two namespaces in python, datetime and matplotlib.pyplot, so let's start by importing them, and write some short helpers:

my $py = Inline::Python.new;
$py.run('import datetime');
$py.run('import matplotlib.pyplot');
sub plot(Str $name, |c) {
    $py.call('matplotlib.pyplot', $name, |c);
}

sub pydate(Str $d) {
    $py.call('datetime', 'date', $d.split('-').map(*.Int));
}

We can now call pydate('2017-03-01') to create a python datetime.date object from an ISO-formatted string, and call the plot function to access functionality from matplotlib:

my ($figure, $subplots) = plot('subplots');
$figure.autofmt_xdate();

my @dates = %dates.keys.sort;
$subplots.plot:
    $[@dates.map(&pydate)],
    $[ %dates{@dates} ],
    label     => 'Total',
    marker    => '.',
    linestyle => '';

The Perl 6 call plot('subplots') corresponds to the python code fig, subplots = plt.subplots(). Passing arrays to python function needs a bit extra work, because Inline::Python flattens arrays. Using an extra $ sigil in front of an array puts it into an extra scalar, and thus prevents the flattening.

Now we can actually plot the number of commits by author, add a legend, and plot the result:

for @top-authors -> $author {
    my @dates = %by-author{$author}.keys.sort;
    my @counts = %by-author{$author}{@dates};
    $subplots.plot:
        $[ @dates.map(&pydate) ],
        $@counts,
        label     => $author,
        marker    =>'.',
        linestyle => '';
}


$subplots.legend(loc=>'upper center', shadow=>True);

plot('title', 'Contributions per day');
plot('show');

When run in the zef git repository, it produces this plot:

Contributions to zef, a Perl 6 module installer

Summary

We've explored how to use the python library matplotlib to generate a plot from git contribution statistics. Inline::Python provides convenient functionality for accessing python libraries from Perl 6 code.

In the next installment, we'll explore ways to improve both the graphics and the glue code between Python and Perl 6.

Subscribe to the Perl 6 book mailing list

* indicates required

rakudo.org: Upgrade Information for Lexical require

Published by Zoffix Znet on 2017-03-18T01:29:32

Upgrade Information for Lexical require

What’s Happening?

Rakudo Compiler release 2017.03 includes the final piece of lexical module loading work: lexical require. This work was first announced in December, in http://rakudo.org/2016/12/17/lexical-module-loading/

There are two changes that may impact your code:

Upgrade Information

Lexical Symbols

WRONG:

# WRONG:
try { require Foo; 1 } and ::('Foo').new;

The require above is inside a block and so its symbols won’t be available
outside of it and the look up will fail.

CHANGE TO:

(try require Foo) !=== Nil and ::('Foo').new;

Now the require installs the symbols into scope that’s lexically accessible
to the ::('Foo') look up.

Optional Loading

WRONG:

# WRONG:
try require Foo;
if ::('Foo') ~~ Failure {
    say "Failed to load Foo!";
}

This construct installs a package named Foo, which would be replaced by the
loaded Foo if it were found, but if it weren’t, the package will remain a
package, not a Failure, and so the above ~~ test will always be False.

CHANGE TO:

# Use return value to test whether loading succeeded:
(try require Foo) === Nil and say "Failed to load Foo!";

# Or use a run-time symbol lookup with require, to avoid compile-time
# package installation:
try require ::('Foo');
if ::('Foo') ~~ Failure {
    say "Failed to load Foo!";
}

In the first example above, we test the return value of try isn’t Nil, since
on successful loading it will be a Foo module, class, or package.

The second example uses a run-time symbol lookup in require and so it never needs
to install the package placeholder during the compile time. Therefore, the
::('Foo') ~~ test does work as intended.

Help and More Info

If you require help or more information, please join our chat channel
#perl6 on irc.freenode.net

6guts: Considering hyper/race semantics

Published by jnthnwrthngtn on 2017-03-16T16:42:05

We got a lot of nice stuff into Perl 6.c, the version of the language released on Christmas of 2015. Since then, a lot of effort has gone on polishing the things we already had in place, and also on optimization. By this point, we’re starting to think about Perl 6.d, the next language release. Perl 6 is defined by its test suite. Even before considering additional features, the 6.d test suite will tie down a whole bunch of things that we didn’t have covered in the 6.c one. In that sense, we’ve already got a lot done towards it.

In this post, I want to talk about one of the things I’d really like to get nailed done as part of 6.d, and that is the semantics of hyper and race. Along with that I will, of course, be focusing on getting the implementation in much better shape. These two methods enable parallel processing of list operations. hyper means we can perform operations in parallel, but we must retain and respect ordering of results. For example:

say (1, 9, 6).hyper.map(* + 5); # (6 14 11)

Should always give the same results as if the hyper was not there, even it a thread computing 6 + 5 gave its result before that computing 1 + 5. (Obviously, this is not a particularly good real-world example, since the overhead of setting up parallel execution would dwarf doing 3 integer operations!) Note, however, that the order of side-effects is not guaranteed, so:

(1..1000).hyper.map(&say);

Could output the numbers in any order. By contrast, race is so keen to give you results that it doesn’t even try to retain the order of results:

say (1, 9, 6).race.map(* + 5); # (14 6 11) or (6 11 14) or ...

Back in 2015, when I was working on the various list handling changes we did in the run up to the Christmas release, my prototyping work included an initial implementation of the map operation in hyper and race mode, done primarily to figure out the API. This then escaped into Rakudo, and even ended up with a handful of tests written for it. In hindsight, that code perhaps should have been pulled out again, but it lives on in Rakudo today. Occasionally somebody shows a working example on IRC using the eval bot – usually followed by somebody just as swiftly showing a busted one!

At long last, getting these fixed up and implemented more fully has made it to the top of my todo list. Before digging into the implementation side of things, I wanted to take a step back and work out the semantics of all the various operations that might be part of or terminate a hyper or race pipeline. So, today I made a list of those operations, and then went through every single one of them and proposed the basic semantics.

The results of that effort are in this spreadsheet. Along with describing the semantics, I’ve used a color code to indicate where the result leaves you in the hyper or race paradigm afterwards (that is, a chained operation will also be performed in parallel).

I’m sure some of these will warrant further discussion and tweaks, so feel free to drop me feedback, either on the #perl6-dev IRC channel or in the comments here.


Weekly changes in and around Perl 6: 2017.11 Tidy Da Tags

Published by liztormato on 2017-03-13T21:58:53

Jeffrey Goff made the first version of Perl6::Tidy available for general usage. As he announced in his tweet, only very few lines were needed to perform this feat (so far). But that’s because all of the hard work he already put into Perl6::Parser! In any case, something that people coming from Perl 5 have frequently asked for in Perl 6 is (a step closer to) reality now.

Tag Your Distributions

Zoffix Znet started a project to make it easier to find Perl 6 distributions in the ecosystem. By adding tags to your distribution, these tags can be used as targets for search queries. He explains it all in a blog post.

Perl Developers Survey 2017

The people at Built In Perl have announced their yearly Perl Developers Survey, and they would like you to participate in it. It will only take a few minutes!

So You’re Experienced in Python

The nice folks of the Perl 6 documentation project have created a page to help programmers coming from Python. It discusses the equivalent syntax in Perl 6 for a number of Python constructs and idioms. There have been similar pages for Perl 5, Ruby and Haskell, in case you were wondering why Python was being advantaged here 🙂 .

VIP ¿Que?

There’s something hovering in the air!

Core Developments

Other Blog Posts

Meanwhile on Twitter

More than 250 followers now! 🙂

Meanwhile on StackOverflow

Ecosystem Additions

A modest catch this week:

Winding Down

It has been a busy week for yours truly. And the following weeks will also be very busy. But that’s all irrelevant for you readers 🙂 . So please check again next week for more Perl 6 news!


Perlgeek.de: What's a Variable, Exactly?

Published by Moritz Lenz on 2017-03-11T23:00:01

When you learn programming, you typically first learn about basic expressions, like 2 * 21, and then the next topic is control structures or variables. (If you start with functional programming, maybe it takes you a bit longer to get to variables).

So, every programmer knows what a variable is, right?

Turns out, it might not be that easy.

Some people like to say that in ruby, everything is an object. Well, a variable isn't really an object. The same holds true for other languages.

But let's start from the bottom up. In a low-level programming language like C, a local variable is a name that the compiler knows, with a type attached. When the compiler generates code for the function that the variable is in, the name resolves to an address on the stack (unless the compiler optimizes the variable away entirely, or manages it through a CPU register).

So in C, the variable only exists as such while the compiler is running. When the compiler is finished, and the resulting executable runs, there might be some stack offset or memory location that corresponds to our understanding of the variable. (And there might be debugging symbols that allows some mapping back to the variable name, but that's really a special case).

In case of recursion, a local variable can exist once for each time the function is called.

Closures

In programming languages with closures, local variables can be referenced from inner functions. They can't generally live on the stack, because the reference keeps them alive. Consider this piece of Perl 6 code (though we could write the same in Javascript, Ruby, Perl 5, Python or most other dynamic languages):

sub outer() {
    my $x = 42;
    return sub inner() {
        say $x;
    }
}

my &callback = outer();
callback();

The outer function has a local (lexical) variable $x, and the inner function uses it. So once outer has finished running, there's still an indirect reference to the value stored in this variable.

They say you can solve any problem in computer science through another layer of indirection, and that's true for the implementation of closures. The &callback variable, which points to a closure, actually stores two pointers under the hood. One goes to the static byte code representation of the code, and the second goes to a run-time data structure called a lexical pad, or short lexpad. Each time you invoke the outer function, a new instance of the lexpad is created, and the closure points to the new instance, and the always the same static code.

But even in dynamic languages with closures, variables themselves don't need to be objects. If a language forbids the creation of variables at run time, the compiler knows what variables exist in each scope, and can for example map each of them to an array index, so the lexpad becomes a compact array, and an access to a variable becomes an indexing operation into that array. Lexpads generally live on the heap, and are garbage collected (or reference counted) just like other objects.

Lexpads are mostly performance optimizations. You could have separate runtime representations of each variable, but then you'd have to have an allocation for each variable in each function call you perform, whereas which are generally much slower than a single allocation of the lexpad.

The Plot Thickens

To summarize, a variable has a name, a scope, and in languages that support it, a type. Those are properties known to the compiler, but not necessarily present at run time. At run time, a variable typically resolves to a stack offset in low-level languages, or to an index into a lexpad in dynamic languages.

Even in languages that boldly claim that "everything is an object", a variable often isn't. The value inside a variable may be, but the variable itself typically not.

Perl 6 Intricacies

The things I've written above generalize pretty neatly to many programming languages. I am a Perl 6 developer, so I have some insight into how Perl 6 implements variables. If you don't resist, I'll share it with you :-).

Variables in Perl 6 typically come with one more level of indirection, we which call a container. This allows two types of write operations: assignment stores a value inside a container (which again might be referenced to by a variable), and binding, which places either a value or a container directly into variable.

Here's an example of assignment and binding in action:

my $x;
my $y;
# assignment:
$x = 42;
$y = 'a string';

say $x;     # => 42
say $y;     # => a string

# binding:
$x := $y;

# now $x and $y point to the same container, so that assigning to one
# changes the other:
$y = 21;
say $x;     # => 21

Why, I hear you cry?

There are three major reasons.

The first is that makes assignment something that's not special. For example in python, if you assign to anything other than a plain variable, the compiler translates it to some special method call (obj.attr = x to setattr(obj, 'attr', x), obj[idx] = x to a __setitem__ call etc.). In Perl 6, if you want to implement something you can assign to, you simply return a container from that expression, and then assignment works naturally.

For example an array is basically just a list in which the elements are containers. This makes @array[$index] = $value work without any special cases, and allows you to assign to the return value of methods, functions, or anything else you can think of, as long as the expression returns a container.

The second reason for having both binding and assignment is that it makes it pretty easy to make things read-only. If you bind a non-container into a variable, you can't assign to it anymore:

my $a := 42;
$a = "hordor";  # => Cannot assign to an immutable value

Perl 6 uses this mechanism to make function parameters read-only by default.

Likewise, returning from a function or method by default strips the container, which avoids accidental action-at-a-distance (though an is rw annotation can prevent that, if you really want it).

This automatic stripping of containers also makes expressions like $a + 2 work, independently of whether $a holds an integer directly, or a container that holds an integer. (In the implementation of Perl 6's core types, sometimes this has to be done manually. If you ever wondered what nqp::decont does in Rakudo's source code, that's what).

The third reason relates to types.

Perl 6 supports gradual typing, which means you can optionally annotate your variables (and other things) with types, and Perl 6 enforces them for you. It detects type errors at compile time where possible, and falls back to run-time checking types.

The type of a variable only applies to binding, but it inherits this type to its default container. And the container type is enforced at run time. You can observe this difference by binding a container with a different constraint:

my Any $x;
my Int $i;
$x := $i;
$x = "foo";     # => Type check failed in assignment to $i; expected Int but got Str ("foo")

Int is a subtype of Any, which is why the binding of $i to $x succeeds. Now $x and $i share a container that is type-constrained to Int, so assigning a string to it fails.

Did you notice how the error message mentions $i as the variable name, even though we've tried to assign to $x? The variable name in the error message is really a heuristic, which works often enough, but sometimes fails. The container that's shared between $x and $i has no idea which variable you used to access it, it just knows the name of the variable that created it, here $i.

Binding checks the variable type, not the container type, so this code doesn't complain:

my Any $x;
my Int $i;
$x := $i;
$x := "a string";

This distinction between variable type and container type might seem weird for scalar variables, but it really starts to make sense for arrays, hashes and other compound data structures that might want to enforce a type constraint on its elements:

sub f($x) {
    $x[0] = 7;
}
my Str @s;
f(@s);

This code declares an array whose element all must be of type Str (or subtypes thereof). When you pass it to a function, that function has no compile-time knowledge of the type. But since $x[0] returns a container with type constraint Str, assigning an integer to it can produce the error you expect from it.

Summary

Variables typically only exists as objects at compile time. At run time, they are just some memory location, either on the stack or in a lexical pad.

Perl 6 makes the understanding of the exact nature of variables a bit more involved by introducing a layer of containers between variables and values. This offers great flexibility when writing libraries that behaves like built-in classes, but comes with the burden of additional complexity.

Zoffix Znet: Tag Your Dists

Published on 2017-03-10T00:00:00

Tags support in Perl 6 modules ecosystem

Weekly changes in and around Perl 6: 2017.10 ≤ It’s all relative ≥

Published by liztormato on 2017-03-06T22:08:49

In the past week, we saw the landing of several Unicode operators in Rakudo Perl 6: (aka <=), (aka >=), (aka !=) and finally ⁇ ‼ (aka ?? !!), courtesy of Fernando Correa de Oliveira and Aleks-Daniel Jakimenko-Aleksejev. This was tweeted by Zoffix and had quite a few (weird) responses. Well, weird for those of us not versed in Twitter Etiquette: Priorities, Relevant, APL, Mad (Irish) and Duct Tape. That was another 5 minutes of fame in our Twitterverse. Now at 220+ followers, a 10% increase in 2 weeks!

On The Book Front

The cover contest for Moritz Lenz‘s Perl 6 By Example book has a winner. Congratulations! My personal favourite, with the origami / paper ripping effect, so yours truly is happy as well.

Meanwhile, Andrew Shitov has officially announced his new Perl 6 book: Migrating To Perl 6. Which incidentally also has an origami inspired cover. Expected in paperback in May 2017.

There also was a thread on Reddit about Perl books, “incited” by the coming “barrage” of Perl 6 books. 🙂

Core Developments

Blog Posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Ecosystem Additions

Quite a nice catch this week!

Winding Down

Spring is in the air! And so is rain. With some sunshine in between. And that about sums it up for the past week. See you next week for more Perl 6 news.


Perlgeek.de: Perl 6 By Example: A Unicode Search Tool

Published by Moritz Lenz on 2017-03-04T23:00:01

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, either in this book project or any other Perl 6 book news, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


Every so often I have to identify or research some Unicode characters. There's a tool called uni in the Perl 5 distribution App::Uni.

Let's reimplement its basic functionality in a few lines of Perl 6 code and use that as an occasion to talk about Unicode support in Perl 6.

If you give it one character on the command line, it prints out a description of the character:

$ uni 🕐
🕐 - U+1f550 - CLOCK FACE ONE OCLOCK

If you give it a longer string instead, it searches in the list of Unicode character names and prints out the same information for each character whose description matches the search string:

$ uni third|head -n5
⅓ - U+02153 - VULGAR FRACTION ONE THIRD
⅔ - U+02154 - VULGAR FRACTION TWO THIRDS
↉ - U+02189 - VULGAR FRACTION ZERO THIRDS
㆛ - U+0319b - IDEOGRAPHIC ANNOTATION THIRD MARK
𐄺 - U+1013a - AEGEAN WEIGHT THIRD SUBUNIT

Each line corresponds to what Unicode calls a "code point", which is usually a character on its own, but occasionally also something like a U+00300 - COMBINING GRAVE ACCENT, which, combined with a a - U+00061 - LATIN SMALL LETTER A makes the character à.

Perl 6 offers a method uniname in both the classes Str and Int that produces the Unicode code point name for a given character, either in its direct character form, or in the form the code point number. With that, the first part of uni's desired functionality:

#!/usr/bin/env perl6

use v6;

sub format-codepoint(Int $codepoint) {
    sprintf "%s - U+%05x - %s\n",
        $codepoint.chr,
        $codepoint,
        $codepoint.uniname;
}

multi sub MAIN(Str $x where .chars == 1) {
    print format-codepoint($x.ord);
}

Let's look at it in action:

$ uni ø
ø - U+000f8 - LATIN SMALL LETTER O WITH STROKE

The chr method turns a code point number into the character and ord is the reverse, in other words: from character to code point number.

The second part, searching in all Unicode character names, works by brute-force enumerating all possible characters and searching through their uniname:

multi sub MAIN($search is copy) {
    $search.=uc;
    for 1..0x10FFFF -> $codepoint {
        if $codepoint.uniname.contains($search) {
            print format-codepoint($codepoint);
        }
    }
}

Since all character names are in upper case, the search term is first converted to upper case with $search.=uc, which is short for $search = $search.uc. By default, parameters are read only, which is why its declaration here uses is copy to prevent that.

Instead of this rather imperative style, we can also formulate it in a more functional style. We could think of it as a list of all characters, which we whittle down to those characters that interest us, to finally format them the way we want:

multi sub MAIN($search is copy) {
    $search.=uc;
    print (1..0x10FFFF).grep(*.uniname.contains($search))
                       .map(&format-codepoint)
                       .join;
}

To make it easier to identify (rather than search for) a string of more than one character, an explicit option can help disambiguate:

multi sub MAIN($x, Bool :$identify!) {
    print $x.ords.map(&format-codepoint).join;
}

Str.ords returns the list of code points that make up the string. With this multi candidate of sub MAIN in place, we can do something like

$ uni --identify øre
ø - U+000f8 - LATIN SMALL LETTER O WITH STROKE
r - U+00072 - LATIN SMALL LETTER R
e - U+00065 - LATIN SMALL LETTER E

Code Points, Grapheme Clusters and Bytes

As alluded to above, not all code points are fully-fledged characters on their own. Or put another way, some things that we visually identify as a single character are actually made up of several code points. Unicode calls these sequences of one base character and potentially several combining characters as a grapheme cluster.

Strings in Perl 6 are based on these grapheme clusters. If you get a list of characters in string with $str.comb, or extract a substring with $str.substr(0, 4), match a regex against a string, determine the length, or do any other operation on a string, the unit is always the grapheme cluster. This best fits our intuitive understanding of what a character is and avoids accidentally tearing apart a logical character through a substr, comb or similar operation:

my $s = "ø\c[COMBINING TILDE]";
say $s;         # ø̃
say $s.chars;   # 1

The Uni type is akin to a string and represents a sequence of codepoints. It is useful in edge cases, but doesn't support the same wealth of operations as Str. The typical way to go from Str to a Uni value is to use one of the NFC, NFD, NFKC, or NFKD methods, which yield a Uni value in the normalization form of the same name.

Below the Uni level you can also represent strings as bytes by choosing an encoding. If you want to get from string to the byte level, call the encode method:

my $bytes = 'Perl 6'.encode('UTF-8');

UTF-8 is the default encoding and also the one Perl 6 assumes when reading source files. The result is something that does the Blob role; you can access individual bytes with positional indexing, such as $bytes[0]. The decode method helps you to convert a Blob to a Str.

Numbers

Number literals in Perl 6 aren't limited to the Arabic digits we are so used to in the English speaking part of the world. All Unicode code points that have the Decimal_Number (short Nd) property are allowed, so you can for example use Bengali digits:

say ৪২;             # 42

The same holds true for string to number conversions:

say "৪২".Int;       # 42

For other numeric code points you can use the unival method to obtain its numeric value:

say "\c[TIBETAN DIGIT HALF ZERO]".unival;

which produces the output -0.5 and also illustrates how to use a codepoint by name inside a string literal.

Other Unicode Properties

The uniprop method in type Str returns the general category by default:

say "ø".uniprop;                            # Ll
say "\c[TIBETAN DIGIT HALF ZERO]".uniprop;  # No

The return value needs some Unicode knowledge in order to make sense of it, or one could read Unicode's Technical Report 44 for the gory details. Ll stands for Letter_Lowercase, No is Other_Number. This is what Unicode calls the General Category, but you can ask the uniprop (or uniprop-bool method if you're only interested in a boolean result) for other properties as well:

say "a".uniprop-bool('ASCII_Hex_Digit');    # True
say "ü".uniprop-bool('Numeric_Type');       # False
say ".".uniprop("Word_Break");              # MidNumLet

Collation

Sorting strings starts to become complicated when you're not limited to ASCII characters. Perl 6's sort method uses the cmp infix operator, which does a pretty standard lexicographic comparison based on the codepoint number.

If you need to use a more sophisticated collation algorithm, Rakudo 2017.02 and newer offer the Unicode Collation Algorithm as an experimental feature:

my @list = <a ö ä Ä o ø>;
say @list.sort;                     # (a o Ä ä ö ø)

use experimental :collation;
say @list.collate;                  # (a ä Ä o ö ø)
$*COLLATION.set(:tertiary(False));
say @list.collate;                  # (a Ä ä o ö ø)

The default sort considers any character with diacritics to be larger than ASCII characters, because that's how they appear in the code point list. On the other hand, collate knows that characters with diacritics belong directly after their base character, which is not perfect in every language, but internally a good compromise.

For Latin-based scripts, the primary sorting criteria is alphabetic, the secondary diacritics, and the third is case. $*COLLATION.set(:tertiary(False)) thus makes .collate ignore case, so it doesn't force lower case characters to come before upper case characters anymore.

At the time of writing, language specification of collation is not yet implemented.

Summary

Perl 6 takes languages other than English very seriously, and goes to great lengths to facilitate working with them and the characters they use.

This includes basing strings on grapheme clusters rather than code points, support for non-Arabic digits in numbers, and access to large parts of Unicode database through built-in methods.

Subscribe to the Perl 6 book mailing list

* indicates required

Weekly changes in and around Perl 6: 2017.09 The IOs Are A-Changin’

Published by liztormato on 2017-02-28T00:50:02

Zoffix Znet has started working on the IO Standardization Grant for real. And his first grant report brings us many goodies apart from what he’s been up to so far. Worth a read! He also put up a more visible message to warn us of the coming changes.

Help Moritz

Moritz Lenz is looking for feedback with regards to the cover of his Perl 6 By Example book: tell him which ones you like from this list!

Other Core developments

Blog Posts

Meanwhile on Twitter

Meanwhile on StackOverflow

Ecosystem Additions

Winding Down

A little bit later than usual, due to a visit to the AmsterdamX.pm meeting. Please check again next week for more news about Perl 6!


rakudo.org: Advance Notice of Significant Changes

Published by Zoffix Znet on 2017-02-26T15:59:30

Advance Notice of Significant Changes

As part of the IO grant run by The Perl Foundation, we’re improving our IO-related methods and subroutines. We’ve identified several of the changes that will have moderate
impact on the users and may require you to update your code.

The exact changes to be made will be known by March 18th, 2017 and the
implementation will be part of the 2017.04 Rakudo compiler release on April,
15th, which will be followed by the 2017.04 Rakudo Star Perl 6 release.

Details

Why Are We Changing Stuff?

Our contract with the users is we don’t change anything that’s covered
by the Perl 6.c language version tests. This means most of the language
remains reliably stable, but IO features got short-changed on the love. The
tests for them are sparse—a big reason why the IO grant is running in the
first place—which gives core developers a lot of freedom to change them
and to improve them.

Despite that freedom, we realize broken code isn’t a nice thing, and will
attempt to reduce the impact of the changes, by providing backwards compatible
interface to support old API, where feasible. Where not, we will provide
information of the upcoming changes; this notice is a part of that effort.

What’s Changing?

The grant covers all of IO routines and methods (excluding sockets). All of the
final changes are yet to be deliberated and ratified and we’ll share the
details once they’re known.

Currently, it is speculated that link() routine will change the order
of arguments (no backwards compatible support will be provided) and seek()
routine will take seek reference as a named argument instead of an enum value
(backwards compatible support will be provided).

It’s very likely many more changes will be made. We’ll be using the code of
all the modules in the ecosystem to judge the potential impact of the change
and evaluate each change on a case-by-case basis.

Timeline

Help and More Info

If you need help or more information, please join our IRC channel and ask there. You can also contact the person performing this work via Twitter @zoffix (IOninja on IRC)

Pawel bbkr Pabian: Your own template engine in 4 flavors. With Benchmarks!

Published by Pawel bbkr Pabian on 2017-02-25T23:43:36

This time on blog I'll show you how to write your own template engine - with syntax and behavior tailored for your needs. And we'll do it in four different ways to analyze pros and cons of each approach as well as code speed and complexity. Our sample task for today is to compose password reminder text for user, which can then be sent by email.

use v6;

my $template = q{
Hi [VARIABLE person]!

You can change your password by visiting [VARIABLE link] .

Best regards.
};

my %fields = (
'person' => 'John',
'link' => 'http://example.com'
);

So we decided how our template syntax should look like and for starter we'll do trivial variables (although that's not very precise name because variables in templates are almost always immutable).
We also have data to populate template fields. Let's get started!

1. Substitutions

sub substitutions ( $template is copy, %fields ) {
    for %fields.kv -> $key, $value {
        $template ~~ s:g/'[VARIABLE ' $key ']'/$value/;
    }
    return $template;
}

say substitutions($template, %fields);

Yay, works:

    Hi John!

You can change your password by visiting http://example.com .

Best regards.

Now it is time to benchmark it to get some baseline for different approaches:

use Bench;

my $template_short = $template;
my %fields_short = %fields;

my $template_long = join(
' lorem ipsum ', map( { '[VARIABLE ' ~ $_ ~ ']' }, 'a' .. 'z')
) x 100;
my %fields_long = ( 'a' .. 'z' ) Z=> ( 'lorem ipsum' xx * );

my $b = Bench.new;
$b.timethese(
1000,
{
'substitutions_short' => sub {
substitutions( $template_short, %fields_short )
},
'substitutions_long' => sub {
substitutions( $template_long, %fields_long )
},
}
);

Benchmark in this post will tests two cases for each approach. Our template from example is "short" case. And there is "long" case with 62KB template containing 2599 text fragments and 2600 variables filled by 26 fields. So here are the results:

Timing 1000 iterations of substitutions_long, substitutions_short...
substitutions_long: 221.1147 wallclock secs @ 4.5225/s (n=1000)
substitutions_short: 0.1962 wallclock secs @ 5097.3042/s (n=1000)

Whoa! That is a serious penalty for long templates. And the reason for that is because this code has three serious flaws - original template is destroyed during variables evaluation and therefore it must be copied each time we want to reuse it, template text is parsed multiple times and output is rewritten every time after populating each variable. But we can do better...

2. Substitution

sub substitution ( $template is copy, %fields ) {
    $template ~~ s:g/'[VARIABLE ' (\w+) ']'/{ %fields{$0} }/;
    return $template;
}

This time we have single substitution. Variable name is captured and we can use it to get field value on the fly. Benchmarks:

Timing 1000 iterations of substitution_long, substitution_short...
substitution_long: 71.6882 wallclock secs @ 13.9493/s (n=1000)
substitution_short: 0.1359 wallclock secs @ 7356.3411/s (n=1000)

Mediocre boost. We have less penalty on long templates because text is not parsed multiple times. However remaining flaws from previous approach still apply and regexp engine still must do plenty of memory reallocations for each piece of template text replaced.

Also it won't allow our template engine to gain new features - like conditions or loops - in the future because it is very hard to parse nested tags in single regexp. Time for completely different approach...

3. Grammars and direct Actions

If you are not familiar with Perl 6 grammars and Abstract Syntax Tree concept you should study official documentation first.

grammar Grammar {
    regex TOP { ^ [  |  ]* $ }
    regex text { <-[ [ ] >+ }
    regex variable { '[VARIABLE ' $=(\w+) ']' }
}

class Actions {

has %.fields is required;

method TOP ( $/ ) {
make [~]( map { .made }, $/{'chunk'} );
}
method text ( $/ ) {
make ~$/;
}
method variable ( $/ ) {
make %.fields{$/{'name'}};
}

}

sub grammar_actions_direct ( $template, %fields ) {
my $actions = Actions.new( fields => %fields );
return Grammar.parse($template, :$actions).made;
}

The most important thing is defining our template syntax as a grammar. Grammar is just a set of named regular expressions that can call each other. On "TOP" (where parsing starts) we see that our template is composed of chunks. Each chunk can be text or variable. Regexp for text matches everything until it hits variable start ('[' character, let's assume it is forbidden in text to make things more simple). Regexp for variable should look familiar from previous approaches, however now we capture variable name in named way instead of positional.

Action class has methods that are called whenever regexp with corresponding name is matched. When called, method gets match object ($/) from this regexp and can "make" something from it. This "made" something will be seen by upper level method when it is called. For example our "TOP" regexp calls "text" regexp which matches "Hi " part of template and calls "text" method. This "text" method just "make"s this matched string for later use. Then "TOP" regexp calls "variable" regexp which matches "[VARIABLE name]" part of template. Then "variable" method is called and it checks in match object for variable name and "makes" value of this variable from %fields hash for later use. This continues until end of template string. Then "TOP" regexp is matched and "TOP" method is called. This "TOP" method can access array of text or variable "chunks" in match object and see what was "made" for those chunks earlier. So all it has to do is to "make" those values concatenated together. And finally we get this "made" template from "parse" method. So let's look at benchmarks:

Timing 1000 iterations of grammar_actions_direct_long, grammar_actions_direct_short...
grammar_actions_direct_long: 149.5412 wallclock secs @ 6.6871/s (n=1000)
grammar_actions_direct_short: 0.2405 wallclock secs @ 4158.1981/s (n=1000)

We got rid of two more flaws from previous approaches. Original template is not destroyed when fields are filled and that means less memory copying. There is also no reallocation of memory during substitution of each field because now every action method just "make"s strings to be joined later. And we can easily extend our template syntax by adding loops, conditions and more features just by throwing some regexps into grammar and defining corresponding behavior in actions. Unfortunately we see some performance regression and this happens because every time template is processed it is parsed, match objects are created, parse tree is built and it has to track all those "make"/"made" values when it is collapsed to final output. But that was not our final word...

4. Grammars and closure Actions

Finally we reached "boss level", where we have to exterminate last and greatest flaw - re-parsing.
The idea is to use grammars and actions like in previous approach, but this time instead of getting direct output we want to generate executable and reusable code that works like this under the hood:

sub ( %fields ) {
    return join '',
        sub ( %fields ) { return "Hi "}.( %fields ),
        sub ( %fields ) { return %fields{'person'} }.( %fields ),
        ...
}

That's right, we will be converting our template body to a cascade of subroutines.
Each time this cascade is called it will get and propagate %fields to deeper subroutines.
And each subroutine is responsible for handling piece of template matched by single regexp in grammars. We can reuse grammar from previous approach and modify only actions:

class Actions {
    
    method TOP ( $/ ) {
        my @chunks = $/{'chunk'};
        make sub ( %fields ) {
           return [~]( map { .made.( %fields ) }, @chunks );
        };
    }
    method text ( $/ ) {
        my $text = ~$/;
        make sub ( %fields ) {
            return $text;
        };
    }
    method variable ( $/ ) {
        my $name = $/{'name'};
        make sub ( %fields  ) {
            return %fields{$name}
        };
    }
    
}

sub grammar_actions_closures ( $template, %fields ) {
state %cache{Str};
my $closure = %cache{$template} //= Grammar.parse(
$template, actions => Actions.new
).made;
return $closure( %fields );
}

Now every action method instead of making final output makes a subroutine that will get %fields and do final output later. To generate this cascade of subroutines template must be parsed only once. Once we have it we can call it with different set of %fields to populate in our template variables. Note how Object Hash %cache is used to determine if we already have subroutines tree for given $template. Enough talking, let's crunch some numbers:

Timing 1000 iterations of grammar_actions_closures_long, grammar_actions_closures_short...
grammar_actions_closures_long: 22.0476 wallclock secs @ 45.3563/s (n=1000)
grammar_actions_closures_short: 0.0439 wallclock secs @ 22778.8885/s (n=1000)

Nice result! We have extensible template engine that is 4 times faster for short templates and 10 times faster for long templates than our initial approach. And yes, there is bonus level...

4.1. Grammars and closure Actions in parallel

Last approach opened a new optimization possibility. If we have subroutines that will generate our template why not run them in parallel? So let's modify our action "TOP" method to process text and variable chunks simultaneously:

method TOP ( $/ ) {
    my @chunks = $/{'chunk'};
    make sub ( %fields ) {
       return [~]( @chunks.hyper.map( {.made.( %fields ) } ).list );
    };
}

Such optimization will shine if your template engine must do some lengthy operations to generate chunk of final output, for example execute heavy database query or call some API. It is perfectly fine to ask for data on the fly to populate template, because in feature rich template engine you may not be able to predict and generate complete set of data needed beforehand, like we did with our %fields. Use this optimization wisely - for fast subroutines you will see a performance drop because cost of sending and retrieving chunks to/from threads will be higher that just executing them in serial on single core.

Which approach should I use to implement my own template engine?

That depends how much you can reuse templates. For example if you send one password reminder per day - go for simple substitution and reach for grammar with direct actions if you need more complex features. But if you are using templates for example in PSGI processes to display hundreds of pages per second for different users then grammar and closure actions approach wins hands down.

You can download all approaches with benchmarks in single file here.

To be continued?

If you like this brief introduction to template engines and want to see more complex features like conditions of loops implemented leave a comment under this article on blogs.perl.org or send me a private message on irc.freenode.net #perl6 channel (nick: bbkr).

Perlgeek.de: Perl 6 By Example: Functional Refactorings for Directory Visualization Code

Published by Moritz Lenz on 2017-02-25T23:00:01

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, either in this book project or any other Perl 6 book news, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


In the last installment we've seen some code that generated tree maps and flame graphs from a tree of directory and file sizes.

There's a pattern that occurs three times in that code: dividing an area based on the size of the files and directories in the tree associated with the area.

Extracting such common code into a function is a good idea, but it's slightly hindered by the fact that there is custom code inside the loop that's part of the common code. Functional programming offers a solution: Put the custom code inside a separate function and have the common code call it.

Applying this technique to the tree graph flame graph looks like this:

sub subdivide($tree, $lower, $upper, &todo) {
    my $base = ($upper - $lower ) / $tree.total-size;
    my $var  = $lower;
    for $tree.children -> $child {
        my $incremented = $var + $base * $child.total-size;
        todo($child, $var, $incremented);
        $var = $incremented,
    }
}

sub flame-graph($tree, :$x1!, :$x2!, :$y!, :$height!) {
    return if $y >= $height;
    take 'rect' => [
        x      => $x1,
        y      => $y,
        width  => $x2 - $x1,
        height => 15,
        style  => "fill:" ~ random-color(),
        title  => [$tree.name ~ ', ' ~ format-size($tree.total-size)],
    ];
    return if $tree ~~ File;
    subdivide( $tree, $x1, $x2, -> $child, $x1, $x2 {
        flame-graph( $child, :$x1, :$x2, :y($y + 15), :$height );
    });
}

sub tree-map($tree, :$x1!, :$x2!, :$y1!, :$y2) {
    return if ($x2 - $x1) * ($y2 - $y1) < 20;
    take 'rect' => [
        x      => $x1,
        y      => $y1,
        width  => $x2 - $x1,
        height => $y2 - $y1,
        style  => "fill:" ~ random-color(),
        title  => [$tree.name],
    ];
    return if $tree ~~ File;

    if $x2 - $x1 > $y2 - $y1 {
        # split along the x-axis
        subdivide $tree, $x1, $x2, -> $child, $x1, $x2 {
            tree-map $child, :$x1, :$x2, :$y1, :$y2;
        }
    }
    else {
        # split along the y-axis
        subdivide $tree, $y1, $y2, -> $child, $y1, $y2 {
            tree-map $child, :$x1, :$x2, :$y1, :$y2;
        }
    }
}

The newly introduced subroutine subdivide takes a directory tree, a start point and an end point, and finally a code object &todo. For each child of the directory tree it calculates the new coordinates and then calls the &todo function.

The usage in subroutine flame-graph looks like this:

subdivide( $tree, $x1, $x2, -> $child, $x1, $x2 {
flame-graph( $child, :$x1, :$x2, :y($y + 15), :$height );
});

The code object being passed to subdivide starts with ->, which introduces the signature of a block. The code block recurses into flame-graph, adding some extra arguments, and turning two positional arguments into named arguments along the way.

This refactoring shortened the code and made it overall more pleasant to work with. But there's still quite a bit of duplication between tree-map and flame-graph: both have an initial termination condition, a take of a rectangle, and then a call or two to subdivide. If we're willing to put all the small differences into small, separate functions, we can unify it further.

If we pass all those new functions as arguments to each call, we create an unpleasantly long argument list. Instead, we can use those functions to generate the previous functions flame-graph and tree-map:

sub svg-tree-gen(:&terminate!, :&base-height!, :&subdivide-x!, :&other!) {
    sub inner($tree, :$x1!, :$x2!, :$y1!, :$y2!) {
        return if terminate(:$x1, :$x2, :$y1, :$y2);
        take 'rect' => [
            x      => $x1,
            y      => $y1,
            width  => $x2 - $x1,
            height => base-height(:$y1, :$y2),
            style  => "fill:" ~ random-color(),
            title  => [$tree.name ~ ', ' ~ format-size($tree.total-size)],
        ];
        return if $tree ~~ File;
        if subdivide-x(:$x1, :$y1, :$x2, :$y2) {
            # split along the x-axis
            subdivide $tree, $x1, $x2, -> $child, $x1, $x2 {
                inner($child, :$x1, :$x2, :y1(other($y1)), :$y2);
            }
        }
        else {
            # split along the y-axis
            subdivide $tree, $y1, $y2, -> $child, $y1, $y2 {
                inner($child, :x1(other($x1)), :$x2, :$y1, :$y2);
            }
        }
    }
}

my &flame-graph = svg-tree-gen
    terminate   => -> :$y1, :$y2, | { $y1 > $y2 },
    base-height => -> | { 15 },
    subdivide-x => -> | { True },
    other       => -> $y1 { $y1 + 15 },

my &tree-map = svg-tree-gen
    terminate   => -> :$x1, :$y1, :$x2, :$y2 { ($x2 - $x1) * ($y2 - $y1) < 20 },
    base-height => -> :$y1, :$y2 {  $y2 - $y1 },
    subdivide-x => -> :$x1, :$x2, :$y1, :$y2 { $x2 - $x1 > $y2 - $y1 },
    other       => -> $a { $a },
    ;

So there's a new function svg-tree-gen, which returns a function. The behavior of the returned function depends on the four small functions that svg-tree-gen receives as arguments.

The first argument, terminate, determines under what condition the inner function should terminate early. For tree-map that's when the area is below 20 pixels, for flame-graph when the current y-coordinate $y1 exceeds the height of the whole image, which is stored in $y2. svg-tree-gen always calls this function with the four named arguments x1, x2, y1 and y2, so the terminate function must ignore the x1 and x2 values. It does this by adding | as a parameter, which is an anonymous capture. Such a parameter can bind arbitrary positional and named arguments, and since it's an anonymous parameter, it discards all the values.

The second configuration function, base-height, determines the height of the rectangle in the base case. For flame-graph it's a constant, so the configuration function must discard all arguments, again with a |. For tree-graph, it must return the difference between $y2 and $y1, as before the refactoring.

The third function determines when to subdivide along the x-axis. Flame graphs always divide along the x-axis, so -> | { True } accomplishes that. Our simplistic approach to tree graphs divides along the longer axis, so only along the x-axis if $x2 - $x1 > $y2 - $y1.

The fourth and final function we pass to svg-tree-gen calculates the coordinate of the axis that isn't being subdivided. In the case of flame-graph that's increasing over the previous value by the height of the bars, and for tree-map it's the unchanged coordinate, so we pass the identity function -> $a { $a }.

The inner function only needs a name because we need to call it from itself recursively; otherwise an anonymous function sub ($tree, :$x1!, :$x2!, :$y1!, :$y2!) { ... } would have worked fine.

Now that we have very compact definitions of flame-graph and tree-map, it's a good time to play with some of the parameters. For example we can introduce a bit of margin in the flame graph by having the increment in other greater than the bar height in base-height:

my &flame-graph = svg-tree-gen
    base-height => -> | { 15 },
    other       => -> $y1 { $y1 + 16 },
    # rest as before

Another knob to turn is to change the color generation to something more deterministic, and make it configurable from the outside:

sub svg-tree-gen(:&terminate!, :&base-height!, :&subdivide-x!, :&other!,
                 :&color=&random-color) {
    sub inner($tree, :$x1!, :$x2!, :$y1!, :$y2!) {
        return if terminate(:$x1, :$x2, :$y1, :$y2);
        take 'rect' => [
            x      => $x1,
            y      => $y1,
            width  => $x2 - $x1,
            height => base-height(:$y1, :$y2),
            style  => "fill:" ~ color(:$x1, :$x2, :$y1, :$y2),
            title  => [$tree.name ~ ', ' ~ format-size($tree.total-size)],
        ];
        # rest as before
}

We can, for example, keep state within the color generator and return a slightly different color during each iteration:

sub color-range(|) {
    state ($r, $g, $b) = (0, 240, 120);
    $r = ($r + 5) % 256;
    $g = ($g + 10) % 256;
    $b = ($b + 15) % 256;
    return "rgb($r,$g,$b)";
}

state variables keep their values between calls to the same subroutine and their initialization runs only on the first call. So this function slightly increases the lightness in each color channel for each invocation, except when it reaches 256, where the modulo operator % resets it back to a small value.

If we plug this into our functions by passing color => &color-range to the calls to svg-tree-gen, we get much less chaotic looking output:

Tree map with deterministic color generation

And the flame graph:

Flame graph with deterministic color generation and one pixel margin between
bars

More Language Support for Functional Programming

As you've seen in the examples above, functional programming typically involves writing lots of small functions. Perl 6 has some language features that make it very easy to write such small functions.

A common task is to write a function that calls a particular method on its argument, as we've seen here:

method total-size() {
    $!total-size //= $.size + @.children.map({.total-size}).sum;
    #                                        ^^^^^^^^^^^^^
}

This can be abbreviated to *.total-size:

method total-size() {
    $!total-size //= $.size + @.children.map(*.total-size).sum;
}

This works for chains of method calls too, so you could write @.children.map(*.total-size.round) if total-size returned a fractional number and you wanted to the call .round method on the result.

There are more cases where you can replace an expression with the "Whatever" star * to create a small function. To create a function that adds 15 to its argument, you can write * + 15 instead of -> $a { $a + 15 }.

If you need to write a function to just call another function, but pass more arguments to the second function, you can use the method assuming. For example -> $x { f(42, $x } can be replaced with &f.assuming(42). This works also for named arguments, so -> $x { f($x, height => 42 ) } can be replaced with &f.assuming(height => 42).

Summary

Functional programming offers techniques for extracting common logic into separate functions. The desired differences in behavior can be encoded in more functions that you pass in as arguments to other functions.

Perl 6 supports functional programming by making functions first class, so you can pass them around as ordinary objects. It also offers closures (access to outer lexical variables from functions), and various shortcuts that make it more pleasant to write short functions.

Subscribe to the Perl 6 book mailing list

* indicates required

Weekly changes in and around Perl 6: 2017.08 Evolving Slangs

Published by liztormato on 2017-02-20T21:21:33

TimToady has been working on providing a more stable API for handling sub languages, aka slangs, such as Slang::Piersing (allowing identifier names such as $foo? and $bar!) and Slang::Tuxic (uncuddle parens open from sub/method name when calling them). This is all part of work making parsing of grammars faster, which so far has been able to get a few seconds off of building the setting. It is also making it possible to implement new grammar related features in the future. And it should help keeping people, wanting to work on this piece of the Rakudo internals, sane.

Rakudo 2017.02 Compiler Release

Zoffix Znet and his trusty bots have done it again: this month’s Rakudo Compiler release, aka Rakudo 2017.02 is out now!

Blog Posts

Other Core Developments

Meanwhile on StackOverflow

Meanwhile on Twitter

The perl6org Twitter newsfeed now has more than 200 followers. Tweets not otherwise mentioned in the Perl 6 Weekly are:

Winding Down

Still a little jetlagged. Hope yours truly did not make too many mistakes in this episode of the Perl 6 Weekly Saga! Be sure to tune in next week, when yours truly hopes to have returned to her normal level of sanity.


Perlgeek.de: Perl 6 By Example: A File and Directory Usage Graph

Published by Moritz Lenz on 2017-02-18T23:00:01

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, either in this book project or any other Perl 6 book news, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


A File and Directory Usage Graph

You bought a shiny new 2TB disk just a short while ago, and you're already getting low disk space warnings. What's taking up all that space?

To answer this question, and experiment a bit with data visualization, let's write a small tool that visualizes which files use up how much disk space.

To do that, we must first recursively read all directories and files in a given directory, and record their sizes. To get a listing of all elements in a directory, we can use the dir function, which returns a lazy list of IO::Path objects.

We distinguish between directories, which can have child entries, and files, which don't. Both can have a direct size, and in the case of directories also a total size, which includes files and subdirectories, recursively:

class File {
    has $.name;
    has $.size;
    method total-size() { $.size }
}

class Directory {
    has $.name;
    has $.size;
    has @.children;
    has $!total-size;
    method total-size() {
        $!total-size //= $.size + @.children.map({.total-size}).sum;
    }
}

sub tree(IO::Path $path) {
    if $path.d {
        return Directory.new(
            name     => $path.basename,
            size     => $path.s,
            children => dir($path).map(&tree),
        );
    }
    else {
        return File.new(
            name => $path.Str,
            size => $path.s,
        );
    }
}

Method total-size in class Directory uses the construct $var //= EXPR´. The//stands for *defined-OR*, so it returns the left-hand side if it has a defined value. Otherwise, it evalutes and returns the value ofEXPR`. Combined with the assignment operator, it evaluates the right-hand side only if the variable is undefined, and then stores the value of the expression in the variable. That's a short way to write a cache.

The code for reading a file tree recursively uses the d and s methods on IO::Path. d returns True for directories, and false for files. s returns the size. (Note that .s on directories used to throw an exception in older Rakudo versions. You must use Rakudo 2017.01-169 or newer for this to work; if you are stuck on an older version of Rakudo, you could hard code the size of a directory to a typical block size, like 4096 bytes. It typically won't skew your results too much).

Just to check that we've got a sensible data structure, we can write a short routine that prints it recursively, with indention to indicate nesting of directory entries:

sub print-tree($tree, Int $indent = 0) {
    say ' ' x $indent, format-size($tree.total-size), '  ', $tree.name;
    if $tree ~~ Directory {
        print-tree($_, $indent + 2) for $tree.children
    }
}

sub format-size(Int $bytes) {
    my @units = flat '', <k M G T P>;
    my @steps = (1, { $_ * 1024 } ... *).head(6);
    for @steps.kv -> $idx, $step {
        my $in-unit = $bytes / $step;
        if $in-unit < 1024 {
            return sprintf '%.1f%s', $in-unit, @units[$idx];
        }
    }
}

sub MAIN($dir = '.') {
    print-tree(tree($dir.IO));
}

The subroutine print-tree is pretty boring, if you're used to recursion. It prins out the name and size of the current node, and if the current node is a directory, recurses into each children with an increased indention. The indention is applied through the x string repetition operator, which when called as $string x $count repeates the $string for $count times.

To get a human-readable repretation of the size of a number, format-size knows a list of six units: the empty string for one, k (kilo) for 1024, M (mega) for 1024*1024 and so on. This list is stored in the array @units. The multiplies assoziated with each unit is stored in @steps, which is iniitliazed through the series operator. .... Its structure is INITIAL, CALLABLE ... LIMIT, and repeatedly applies CALLABLE first to the initial value, and then to next value generated and so on, until it hits LIMIT. The limit here is *, a special term called Whatever, which in this means it's unlimited. So the sequence operator returns a lazy, potentially infinite list, and the tailing .head(6) call limits it to 6 values.

To find the most appropriate unit to print the size in, we have to iterate over both the values and in the indexes of the array, which for @steps.kv -> $idx, $step { .. } accomplishes. sprintf, known from other programming languages, does the actual formatting to one digit after the dot, and appends the unit.

Generating a Tree Map

One possible visualization of file and directory sizes is a tree map, which represents each directory as a rectangle, and a each file inside it as a rectangle inside directory's rectangle. The size of each rectangle is proportional to the size of the file or directory it represents.

We'll generate an SVG file containing all those rectangles. Modern browsers support displaying those files, and also show mouse-over texts for each rectangle. This alleviates the burden to actually label the rectangnles, which can be quite a hassle.

To generate the SVG, we'll use the SVG module, which you can install with

$ zef install SVG

or

$ panda install SVG

depending on the module installer you have available.

This module provides a single static method, to which you pass nested pairs. Pairs whose values are arrays are turned into XML tags, and other pairs are turned into attributes. For example this Perl 6 script

use SVG;
print SVG.serialize(
    :svg[
        width => 100,
        height => 20,
        title => [
            'example',
        ]
    ],
);

produces this output:

<svg xmlns="http://www.w3.org/2000/svg"
     xmlns:svg="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     width="100"
     height="20">
        <title>example</title>
</svg>

(without the indention). The xmlns-tags are helpfully added by the SVG module, and are necessary for programs to recognize the file as SVG.

To return to the tree maps, a very simple way to lay out the rectangle is to recurse into areas, and for each area subdivide it either horizontally or vertically, depending on which axis is longer:

sub tree-map($tree, :$x1!, :$x2!, :$y1!, :$y2) {
    # do not produce rectangles for small files/dirs
    return if ($x2 - $x1) * ($y2 - $y1) < 20;

    # produce a rectangle for the current file or dir
    take 'rect' => [
        x      => $x1,
        y      => $y1,
        width  => $x2 - $x1,
        height => $y2 - $y1,
        style  => "fill:" ~ random-color(),
        title  => [$tree.name],
    ];
    return if $tree ~~ File;

    if $x2 - $x1 > $y2 - $y1 {
        # split along the x axis
        my $base = ($x2 - $x1) / $tree.total-size;
        my $new-x = $x1;
        for $tree.children -> $child {
            my $increment = $base * $child.total-size;
            tree-map(
                $child,
                x1 => $new-x,
                x2 => $new-x + $increment,
                :$y1,
                :$y2,
            );
            $new-x += $increment;
        }
    }
    else {
        # split along the y axis
        my $base = ($y2 - $y1) / $tree.total-size;
        my $new-y = $y1;
        for $tree.children -> $child {
            my $increment = $base * $child.total-size;
            tree-map(
                $child,
                :$x1,
                :$x2,
                y1 => $new-y,
                y2 => $new-y + $increment,
            );
            $new-y += $increment;
        }
    }
}

sub random-color {
    return 'rgb(' ~ (1..3).map({ (^256).pick }).join(',') ~ ')';
}

sub MAIN($dir = '.', :$type="flame") {
    my $tree = tree($dir.IO);
    use SVG;
    my $width = 1024;
    my $height = 768;
    say SVG.serialize(
        :svg[
            :$width,
            :$height,
            | gather tree-map $tree, x1 => 0, x2 => $width, y1 => 0, y2 => $height
        ]
    );
}


Tree map of an example directory, with random colors and a mouse-over hover identifying one of the files.

The generated file is not pretty, due to the random colors, and due to some files being identified as very narrow rectangles. But it does make it obvious that there are a few big files, and many mostly small files in a directory (which happens to be the .git directory of a repository). Viewing a file in a browser shows the name of the file on mouse over.

How did we generate this file?

Sub tree-map calls take to adds elements to a result list, so it must be called in the context of a gather statement. gather { take 1; take 2 } returns a lazy list of two elements, 1, 2. But the take calls don't have to occur in the lexical scope of the gather, they can be in any code that's directly or indirectly called from the gather. We call that the dynamic scope.

The rest of sub tree-map is mostly straight-forward. For each direction in which the remaining rectangle can be split, we calculate a base unit that signifies how many pixels a byte should take up. This is used to split up the current canvas into smaller ones, and use those to recurse into tree-map.

The random color generation uses ^256 to create a range from 0 to 256 (exclusive), and .pick returns a random element from this range. The result is a random CSS color string like rgb(120,240,5).

In sub MAIN, the gather returns a list, which would normally be nested inside the outer array. The pipe symbol | in :svg[ ..., | gather ... ] before the gather prevents the normal nesting, and flattens the list into the outer array.

Flame Graphs

The disadvantage of tree maps as generated before is that the human brain isn't very good at comparing rectangle sizes of different aspect ratios, so if their ratio of width to height is very different. Flame graphs prevent this perception errors by showing file sizes as horizontal bars. The vertical arrangement indicates the nesting of directories and files inside other directories. The disadvantage is that less of the available space is used for visualizing the file sizes.

Generating flame graphs is easier than tree maps, because you only need to subdivide in one direction, whereas the height of each bar is fixed, here to 15 pixel:

sub flame-graph($tree, :$x1!, :$x2!, :$y!, :$height!) {
    return if $y >= $height;
    take 'rect' => [
        x      => $x1,
        y      => $y,
        width  => $x2 - $x1,
        height => 15,
        style  => "fill:" ~ random-color(),
        title  => [$tree.name ~ ', ' ~ format-size($tree.total-size)],
    ];
    return if $tree ~~ File;

    my $base = ($x2 - $x1) / $tree.total-size;
    my $new-x = $x1;

    for $tree.children -> $child {
        my $increment = $base * $child.total-size;
        flame-graph(
            $child,
            x1 => $new-x,
            x2 => $new-x + $increment,
            y => $y + 15,
            :$height,
        );
        $new-x += $increment;
    }
}

We can add a switch to sub MAIN to call either tree-map or flame-graph, depending on a command line option:

sub MAIN($dir = '.', :$type="flame") {
    my $tree = tree($dir.IO);
    use SVG;
    my $width = 1024;
    my $height = 768;
    my &grapher = $type eq 'flame'
            ?? { flame-graph $tree, x1 => 0, x2 => $width, y => 0, :$height }
            !! { tree-map    $tree, x1 => 0, x2 => $width, y1 => 0, y2 => $height }
    say SVG.serialize(
        :svg[
            :$width,
            :$height,
            | gather grapher()
        ]
    );
}

Since SVG's coordinate system places the zero of the vertical axis at the top, this actually produces an inverted flame graph, sometimes called icicle graph:


Inverted flame graph with random colors, where the width of each bar represents a file/directory size, and the vertical position the nesting inside a directory.

Summary

We've explored tree maps and flame graphs to visualize which files and directories use up how much disk space.

But the code contains quite some duplications. Next week we'll explore techniques from functional programming to reduce code duplication. We'll also try to make the resulting files a bit prettier.

Subscribe to the Perl 6 book mailing list

* indicates required

brrt to the future: Register Allocator Update

Published by Bart Wiegmans on 2017-02-09T16:19:00

Hi everybody, I thought some yof you might be interested in an update regarding the JIT register allocator, which is after all the last missing piece for the new 'expression' JIT backend. Well, the last complicated piece, at least. Because register allocation is such a broad topic, I don't expect to cover all topics relevant to design decisions here, and reserve a future post for that purpose.

I think I may have mentioned earlier that I've chosen to implement linear scan register allocation, an algorithm first described in 1999. Linear scan is relatively popular for JIT compilers because it achieves reasonably good allocation results while being considerably simpler and faster than the alternatives, most notably via graph coloring (unfortunately no open access link available). Because optimal register allocation is NP-complete, all realistic algorithms are heuristic, and linear scan applies a simple heuristic to good effect. I'm afraid fully explaining the nature of that heuristic and the tradeoffs involves is beyond the scope of this post, so you'll have to remind me to do it at a later point.

Commit ab077741 made the new allocator the default after I had ironed out sufficient bugs to be feature-equivalent to the old allocator (which still exists, although I plan to remove it soon).
Commit 0e66a23d introduced support for 'PHI' node merging, which is really important and exciting to me, so I'll have to explain what it means. The expression JIT represents code in a form in which all values are immutable, called single static assignment form, or SSA form shortly. This helps simplify compilation because there is a clear correspondence between operations and the values they compute. In general in compilers, the easier it is to assert something about code, the more interesting things you can do with it, and the better code you can compile. However, in real code, variables are often assigned more than one value. A PHI node is basically an 'escape hatch' to let you express things like:

int x, y;
if (some_condition()) {
x = 5;
} else {
x = 10;
}
y = x - 3;

In this case, despite our best intentions, x can have two different values. In SSA form, this is resolved as follows:

int x1, x2, x3, y;
if (some_condition()) {
x1 = 5;
} else {
x2 = 10;
}
x3 = PHI(x1,x2);
y = x3 - 3;

The meaning of the PHI node is that it 'joins together' the values of x1 and x2 (somewhat like a junction in perl6), and represents the value of whichever 'version' of x was ultimately defined. Resolving PHI nodes means ensuring that, as far as the register allocator is concerned, x1, x2, and x3 should preferably be allocated to the same register (or memory location), and if that's not possible, it should copy x1 and x2 to x3 for correctness. To find the set of values that are 'connected' via PHI nodes, I apply a union-find data structure, which is a very useful data structure in general. Much to my amazement, that code worked the first time I tried it.

Then I had to fix a very interesting bug in commit 36f1fe94 which involves ordering between 'synthetic' and 'natural' tiles. (Tiles are the output of the tiling process about which I've written at some length, they represent individual instructions). Within the register allocator, I've chosen to identify tiles / instructions by their index in the code list, and to store tiles in a contiguous array. There are many advantages to this strategy but they are also beyond the scope of this post. One particular advantage though is that the indexes into this array make their relative order immediately apparent. This is relevant to linear scan because it relies on relative order to determine when to allocate a register and when a value is no longer necessary.

However, because of using this index, it's not so easy to squeeze in new tiles to that array - which is exactly what a register allocator does, when it decides to 'spill' a value to memory and load it when needed. (Because inserts are delayed and merged into the array a single step, the cost of insertion is constant). Without proper ordering, a value loaded from memory could overwrite another value that is still in use. The fix for that is, I think, surprisingly simple and elegant. In order to 'make space' for the synthetic tiles, before comparison all indexes are multiplied by a factor of 2, and synthetic tiles are further offset by -1 or +1, depending on whether they should be handled before or after the 'natural' tile they are inserted for. E.g. synthetic tiles that load a value should be processed before the tile that uses the value they load.

Another issue soon appeared, this time having to do with x86 being, altogether, quaint and antiquated and annoying, and specifically with the use of one operand register as source and result value. To put it simply, where you and I and the expression JIT structure might say:

a = b + c

x86 says:

a = a + b

Resolving the difference is tricky, especially for linear scan, since linear scan processes the values in the program rather than the instructions that generate them. It is therefore not suited to deal with instruction-level constraints such as these. If a, b, and c in my example above are not the same (not aliases), then this can be achieved by a copy:

a = b
a = a + c

If a and b are aliases, the first copy isn't necessary. However, if a and c are aliases, then a copy may or may not be necessary, depending on whether the operation (in this case '+') is commutative, i.e. it holds for '+' but not for '-'. Commit 349b360 attempts to fix that for 'direct' binary operations, but a fix for indirect operations is still work in progress. Unfortunately, it meant I had to reserve a register for temporary use to resolve this, meaning there are fewer available for the register allocator to use. Fortunately, that did simplify handling of a few irregular instructions, e.g. signed cast of 32 bit integers to 64 bit integers.

So that brings us to today and my future plans. The next thing to implement will be support for function calls by the register allocator, which involves shuffling values to the right registers and correct positions on the stack, and also in spilling all values that are still required after the function call since the function may overwrite them. This requires a bit of refactoring of the logic that spills variables, since currently it is only used when there are not enough registers available. I also need to change the linear scan main loop, because it processes values in order of first definition, and as such, instructions that don't create any values are skipped, even if they need special handling like function calls. I'm thinking of solving that with a special 'interesting tiles' queue that is processed alongside the main values working queue.

That was it for today. I hope to write soon with more progress.

gfldex: cale2 asked a hard question

Published by gfldex on 2017-02-02T00:02:57

Today cale2 refused to asked an easy question. Let’s start with his hard question.

 <cale2> I need a tutorial on * vs $_ vs $

What is a fairly good question to ask. Those three things are syntactic sugar that can help a great deal to keep lines from wrapping, allowing for less bugs because of context switching.

Let’s start with our good old friend the topic called $_. In Perl 5 it just appeared out of nowhere in every sub. In Perl 6 it appears out of a default for blocks.

my &block = { 'oi‽' };
&block.signature.say;
# OUTPUT«(;; $_? is raw)␤»

The default signature for a block is one positional parameter called $_ So every block has one. There are further statements that set the topic without introducing a new block like with and given (given does introduce a block but it’s special and I wont provide details here).

say $_ with 42;
# OUTPUT«42␤»

Since it’s the default Perl 6 will expect it at quite a few places. Most prominent is when and the objectless method call.

$_ = 42; say 'oi‽' when 42;
.say;
# OUTPUT«oi‽␤42␤»

A lone $ is actually two (and a bit) things. In a Signature it’s a marker for a positional parameter that we can never use (maybe because we don’t care) because it didn’t got a name and therefor doesn’t introduce a symbol in the Routine it applies to. It is useful for protos and free floating Signatures. It’s second use actually introduces a container, again without a symbol. It’s also a state variable whereby it’s initialiser (if there is one) is not handled as state and will be called more then once. We can use the anonymous state variable to count things.

my @a = <a b c d>;
for @a { say @a[$++] };
# OUTPUT«a␤b␤c␤d␤»

We can abuse an anonymous $ to skip values in list assignment. Quite handy when we got a short list returned from a routine.

my ($,$,$c) = <a b c d>;
say $c;
# OUTPUT«c␤»

I like the fact that I don’t have to come up with names for one-shot-variables. The less cognitive load the better.

The Whatever * is the hard part of the question. At times it’s a syntactic sugar marker that we use to tell Perl 6 that we don’t care what will be picked. At other times it means Inf in the sense of All Of Them.

my @a = <a b c d>;
put @a.pick(*);
# OUTPUT«c d b a␤»

If a lone * is used in an argument list it’s turned into the singleton Whatever. Since we can asked for values in signatures we can provide interfaces for multies where the user can state a lack of care.

multi sub foo(Int $i){ $i * 42 };
multi sub foo(Whatever){ <42 oi‽ ♥>.pick };
say foo(*);
# OUTPUT«♥␤» (your results may vary)

If we use * in a statement that involves operators or calls, it forms a block without a scope and acts like a placeholder variable, without losing the information that this block was spawned by a *. The type of the resulting Callable is WhateverCode and a routine could act on it differently then when provided with Sub or Method. Quite in contrast to a real placeholder variable Whatever can be used in a where-clause.

sub foo($a where * < 10){}

This is not a complete answer to his question and that’s why I would like to suggest to cole2 to read the *beep*ing docs.


Strangely Consistent: Deep Git

Published by Carl Mäsak

I am not good at chess.

I mean... "I know how the pieces move". (That's the usual phrase, isn't it?) I've even tried to understand chess better at various points in my youth, trying to improve my swing. I could probably beat some of you other self-identified "I know how the pieces move" folks out there. With a bit of luck. As long as you don't, like, cheat by having a strategy or something.

I guess what I'm getting at here is that I am not, secretly, an international chess master. OK, now that's off my chest. Phew!

Imagining what it's like to be really good at chess is very interesting, though. I can say with some confidence that a chess master never stops and asks herself "wait — how does the knight piece move, again?" Not even I do that! Obviously, the knight piece is the one that moves √5 distances on the board. 哈哈

I can even get a sense of what terms a master-level player uses internally, by reading what master players wrote. They focus on tactics and strategy. Attacks and defenses. Material and piece values. Sacrifices and piece exchange. Space and control. Exploiting weaknesses. Initiative. Openings and endgames.

Such high-level concerns leave the basic mechanics of piece movements far behind. Sure, those movements are in there somewhere. They are not irrelevant, of course. They're just taken for granted and no longer interesting in themselves. Meanwhile, the list of master-player concerns above could almost equally well apply to a professional Go player. (s:g/piece/stone/ for Go.)

Master-level players have stopped looking at individual trees, and are now focusing on the forest.

The company that employs me (Edument) has a new slogan. We've put it on the backs of sweaters which we then wear to events and conferences:

We teach what you can't google.

I really like this new slogan. Particularly, it feels like something we as a teaching company have already trended towards for a while. Some things are easy to find just by googling them, or finding a good cheat sheet. But that's not why you attend a course. We should position ourselves so as to teach answers to the deep, tricky things that only emerge after using something for a while.

You're starting to see how this post comes together now, aren't you? 😄

2017 will be my ninth year with Git. I know it quite well by now, having learned it in depth and breadth along the way. I can safely say that I'm better at Git than I am at chess at this point.

Um. I'm most certainly not an international Git grandmaster — but largely that's because such a title does not exist. (If someone reads this post and goes on to start an international Git tournament, I will be very happy. I might sign up.)

No, my point is that the basic commands have taken on the role for me that I think basic piece movements have taken on for chess grandmasters. They don't really matter much; they're a means to an end, and it's the end that I'm focusing on when I type them.

(Yes, I still type them. There are some pretty decent GUIs out there, but none of them give me the control of the command line. Sorry-not-sorry.)

Under this analogy, what are the things I value with Git, if not the commands? What are the higher-level abstractions that I tend to think in terms of nowadays?

(Yes, these are the ACID guarantees for database transactions, but made to work for Git instead.)

A colleague of mine talks a lot about "definition of done". It seems to be a Scrum thing. It's his favorite term more than mine, but I still like it for its attempt at "mechanizing" quality, which I believe can succeed in a large number of situations.

Another colleague of mine likes the Boy Scout Rule of "Always leave the campground cleaner than you found it". If you think of this in terms of code, it means something like refactoring a code base as you go, cleaning it up bit by bit and asymptotically approaching code perfection. But if you think of it in terms of process, it dovetails quite nicely with the "definition of done" above.

Instead of explaining how in the abstract, let's go through a concrete-enough example:

  1. Some regression is discovered. (Usually by some developer dogfooding the system.)
  2. If it's not immediately clear, we bisect and find the offending commit.
  3. ASAP, we revert that commit.
  4. We analyze the problematic part of the reverted commit until we understand it thoroughly. Typically, the root cause will be something that was not in our definition of done, but should've been.
  5. We write up a new commit/branch with the original (good) functionality restored, but without the discovered problem.
  6. (Possibly much later.) We attempt to add discovery of the problem to our growing set of static checks. The way we remember to do that is through a TODO list in a wiki. This list keeps growing and shrinking in fits and starts.

Note in particular the interplay between process, quality and, yes, Git. Someone could've told me at the end of step 6 that I had totalled 29 or so Git basic commands along the way, and I would've believed them. But that's not what matters to us as a team. If we could do with magic pixie dust what we do with Git — keep historic snapshots of the code while ensuring quality and isolation — we might be satisfied magic pixie dust users instead.

Somewhere along the way, I also got a much more laid-back approach to conflicts. (And I stopped saying "merge conflicts", because there are also conflicts during rebase, revert, cherry-pick, and stash — and they are basically the same deal.) A conflict happens when a patch P needs to be applied in an environment which differs too much from the one in which P was created.

Aside: in response to this post, jast++ wrote this on #perl6: "one minor nitpick: git knows two different meanings for 'merge'. one is commit-level merge, one is file-level three-way merge. the latter is used in rebase, cherry-pick etc., too, so technically those conflicts can still be called merge conflicts. :)" — TIL.

But we actually don't care so much about conflicts. Git cares about conflicts, becuase it can't just apply the patch automatically. What we care about is that the intent of the patch has survived. No software can check that for us. Since the (conflict ↔ no conflict) axis is independent from the (intent broken ↔ intent preserved) axis, we get four cases in total. Two of those are straightforward, because the (lack of) conflict corresponds to the (lack of) broken intent.

The remaining two cases happen rarely but are still worth thinking about:

If we care about quality, one lesson emerges from mst's example: always run the tests after you merge and after you've resolved conflicts. And another lesson from my example: try to introduce automatic checks for structures and relations in the code base that you care about. In this case, branch A could've put in a test or a linting step that failed as soon as it saw something according to the old naming convention.

A lot of the focus on quality also has to do with doggedly going to the bottom of things. It's in the nature of failures and exceptional circumstances to clump together and happen at the same time. So you need to handle them one at a time, carefully unraveling one effect at a time, slowly disassembling the hex like a child's rod puzzle. Git sure helps with structuring and linearizing the mess that happens in everyday development, exploration, and debugging.

As I write this, I realize even more how even when I try to describe how Git has faded into the background as something important-but-uninteresting for me, I can barely keep the other concepts out of focus. Quality being chief among them. In my opinion, the focus on improving not just the code but the process, of leaving the campground cleaner than we found it, those are the things that make it meaningful for me to work as a developer even decades later. The feeling that code is a kind of poetry that punches you back — but as it does so, we learn something valuable for next time.

I still hear people say "We don't have time to write tests!" Well, in our teams, we don't have time not to write tests! Ditto with code review, linting, and writing descriptive commit messages.

No-one but Piet Hein deserves the last word of this post:

The road to wisdom? — Well, it's plain
and simple to express:

Err
and err
and err again
but less
and less
and less.

rakudo.org: Announce: Rakudo Star Release 2017.01

Published by Steve Mynott on 2017-01-30T21:27:12

A useful and usable production distribution of Perl 6

On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the January 2017 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the January 2017 release is available from <http://rakudo.org/downloads/star/>.

Binaries for macOS and Windows (64 bit) are also available.

This is the sixth post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

This is the first Rakudo Star release to include “zef” as module installer. “panda” is to be shortly replaced by “zef” and will be removed in the near future.

It’s hoped to produce quarterly Rakudo Star releases during 2017 with 2017.04 (April), 2017.07 (July) and 2017.10 (October) to follow.

Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

In the Perl 6 world, we make a distinction between the language (“Perl 6”) and specific implementations of the language such as “Rakudo Perl”.

This Star release includes [release 2017.01] of the [Rakudo Perl 6 compiler], version 2017.01 of [MoarVM], plus various modules, documentation, and other resources collected from the Perl 6 community.

The Rakudo compiler changes since the last Rakudo Star release of 2016.11 are now listed in “2016.12.md” and “2017.01.md” under the “rakudo/docs/announce” directory of the source distribution.

Notable changes in modules shipped with Rakudo Star:

+ DBIish: Pg: TypeConverter post-merge cleanup
+ Linenoise: Remove dependency on Native::Resources
+ Pod-To-HTML: Bump version for #22 fix
+ Terminal-ANSIColor: Drop ‘nqp’ dependency
+ doc: Too many to list
+ json: Fix parsing of string literals with a leading combining character.
+ json\_fast: bump version since we now escape null bytes
+ panda: modified to warn of its removal in the short term
+ perl6-http-easy: Several pull requests merged
+ perl6-lwp-simple: Tweak tests to work with TAP::Harness
+ perl6-pod-to-bigpage: bump version
+ test-mock: Bump version.
+ zef: imported

There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases.
Some of the not-quite-there features include:

* advanced macros
* non-blocking I/O (in progress)
* some bits of Synopsis 9 and 11

There is an online resource at <http://perl6.org/compilers/features> that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at <rakudobug@perl.org>.

See <http://perl6.org/> for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see <http://rakudo.org/how-to-help>, ask on the <perl6-compiler@perl.org> mailing list, or join us on IRC #perl6 on freenode.

gfldex: Once a Week a happy zef

Published by gfldex on 2017-01-22T13:54:37

Since zef installs modules to a temporary directory and may or may not set the $*CWD to what we would like, we need to open files in tests carefully. Given that files of interest to a test file are in the same or a child directory of the test, we can operate relative to $*PROGRAM.

my $gitconfig-path = $*PROGRAM.parent.child('data').child('gitconfig');

That would translate to t/data/gitconfig seen from the modules base directory.


gfldex: Once a Week

Published by gfldex on 2017-01-20T21:34:12

Rakudo is still changing quickly for the betterment of mankind (I’m quite sure of that). Once in a while commits will break code that is or was working. Checking for regressions is boring because most of the time it wont happen. As the army of bots that populate #perl6 shows, we like to offload the housekeeping to software and for testing we use travis.

Since a while travis provides build in cronjobs for rebuilding repos automatically. It’s a little hidden.

teavis-cron

I missed the cron settings at first because one needs to run a build before it even shows up in the settings tab.

With cron jobs setup to test my modules once a week I will get some angry e-mails every time Rakudo breaks my code.

After clicking the same buttons over and over again I got bored and found the need to automate that step away as well. A meta module that doesn’t do anything but having dependencies to my real modules would save me the clicking and would produce exactly one angry e-mail, what will become more important as the Perl 6 years pass by. The .travis.yml looks like this:

language: perl6
sudo: false
perl6:
- latest
install:
- rakudobrew build-zef
- zef --debug install .

And the META6.json is:

{
"name" : "gfldex-meta-zef-test",
"source-url" : "git://github.com/gfldex/gfldex-meta-zef-test",
"perl" : "v6",
"build-depends" : [ ],
"provides" : {
},
"depends" : [ "Typesafe::XHTML::Writer", "Rakudo::Slippy::Semilist", "Operator::defined-alternation", "Concurrent::Channelify", "Concurrent::File::Find", "XHTML::Writer", "Typesafe::HTML" ],
"description" : "Meta package to test installation with zef",
"test-depends" : [ ],
"version" : "0.0.1",
"authors" : [
"Wenzel P. P. Peppmeyer"
]
}

And lo and behold I found a bug. Installing XHTML::Writer on a clean system didn’t work because zef uses Build.pm differently then panda. I had to change the way Rakudos lib path is set because zef keeps dependency modules in temporary directories until the last test passed.

my @include = flat "$where/lib/Typesafe", $*REPO.repo-chain.map(*.path-spec);
my $proc = run 'perl6', '-I' «~« @include, "$where/bin/generate-function-definition.p6", "$where/3rd-party/xhtml1-strict.xsd", out => $out-file, :bin;

Please note that Build.pm will be replaced with something reasonable as soon as we figured out what reasonable means for portable module building.

There is a little bug in zef that makes the meta module fail to test. Said bug is being hunted right now.

All I need to do to get a new module tested weekly by travis is to add it to the meta module and push the commit. Laziness is a programmers virtue indeed.


Death by Perl6: Hello Web! with Purée Perl 6

Published by Tony O'Dell on 2017-01-09T18:19:56

Let's build a website.

Websites are easy to build. There are dozens of frameworks out there to use, perl has Mojo and Catalyst as its major frameworks and other languages also have quite a few decent options. Some of them come with boilerplate templates and you just go from there. Others don't and you spend your first few hours learning how to actually set up the framework and reading about how to share your DB connection between all of your controllers and blah, blah, blah. Let's look at one of P6's web frameworks.

Enter Hiker

Hiker doesn't introduce a lot of (if any) new ideas. It does use paradigms you're probably used to and it aims to make the initialization of creating your website very straight forward and easy, that way you can get straight to work sharing your content with the English.

The Framework

Hiker is intended to make things fast and easy from the development side. Here's how it works. If you're not into the bleep blop and just want to get started, skip to the Boilerplate heading.

Application Initialization

  1. Hiker reads from the subdirectories we'll look at later. The controllers and models are classes.
  2. Looking at all controllers, initializes a new object for that class, and then checks for their .path attribute
    1. If Hiker can't find the path attribute then it doesn't bind anything and produces a warning
  3. After setting up the controller routes, it instantiates a new object for the model as specified by the controller (.model)
    1. If none is given by the controller then nothing is instantiated or bound and nothing happens
    2. If a model is required by the controller but it cannot be found then Hiker refuses to bind
  4. Finally, HTTP::Server::Router is alerted to all of the paths that Hiker was able to find and verify

The Request

  1. If the path is found, then the associated class' .model.bind is called.
    1. The response (second parameter of .model.bind($req, $res)) has a hash to store information: $res.data
  2. The controller's .handler($req, $res) is then executed
    1. The $res.data hash is available in this context
  3. If the handler returns a Promise then Hiker waits for that to be kept (and expects the result to be True or False)
    1. If the response is already rendered and the Promise's status is True then the router is alerted that no more routes should be explored
    2. If the response isn't rendered and the Promise's result is True, then .render is called automagically for you
    3. If the response isn't rendered and the Promise's result is False, then the next matching route is called

Boilerplate

Ensure you have Hiker installed:

$ zef install Hiker
$ rakudobrew rehash #this may be necessary to get the bin to work

Create a new directory where you'd like to create your project's boilerplate and cd. From here we'll initialize some boilerplate and look at the content of the files.

somedir$ hiker init  
==> Creating directory controllers
==> Creating directory models
==> Creating directory templates
==> Creating route MyApp::Route1: controllers/Route1.pm6
==> Creating route MyApp::Model1: models/Model1.pm6
==> Creating template templates/Route1.mustache
==> Creating app.pl6

Neato burrito. From the output you can see that Hiker created some directories - controllers, models, templates - for us so we can start out organized. In those directories you will find a few files, let's take a look.

The Model

use Hiker::Model; 

class MyApp::Model1 does Hiker::Model {  
  method bind($req, $res) {
    $res.data<who> = 'web!';
  }
}  

Pretty straight forward. MyApp::Model1 is instantiated during Hiker initialization and .bind is called whenever the controller's corresponding path is requested. As you can see here, this Model just adds to the $res.data hash the key value pair of who => 'web!'. This data will be available in the Controller as well as available in the template files (if the controller decides to use that).

The Controller

use Hiker::Route; 

class MyApp::Route1 does Hiker::Route {  
  has $.path     = '/';
  has $.template = 'Route1.mustache';
  has $.model    = 'MyApp::Model1';

  method handler($req, $res) {
    $res.headers<Content-Type> = 'text/plain';
  }
}  

As you can see above, the Hiker::Route has a lot of information in a small space and it's a class that does a Hiker role called Hiker::Route. This let's our framework know that we should inspect that class for the path, template, model so it can handle those operations for us - path and template are the only required attributes.

As discussed above, our Route can return a Promise if there is some asynchronous operation that is to be performed. In this case all we're going to do is set the header's to indicated the Content Type and then, automagically, render the template file. Note: if you return a Falsey value from the handler method, then the router will not auto render and it will attempt to find the next route. This is so that you can cascade paths in the event that you want to chain them together, do some type of decision making real time to determine whether that's the right class for the request, or perform some other unsaid dark magic. In the controller above we return a Truethy value and it auto renders.

By specifying the Model in the Route, you're able to re-use the same Model class across multiple routes.

The Path

Quick notes about .path. You can pass a ('/staticpath'), maybe a path with a placeholder ('/api/:placeholder'), or if you're path is a little more complicated then you can pass in a regex (/ .* /). Check out the documentation for HTTP::Server::Router (repo).

The Template

The template is specified by the controller's .template attribute and Hiker checks for that file in the ./templates folder. The default template engine is Template::Mustache (repo). See that module's documentation for more info.

Running the App

Really pretty straight forward from the boilerplate:

somedir$ perl6 app.pl6  

Now you can visit http://127.0.0.1:8080/ in your favorite Internet Explorer and find a nice 'Hello web!' message waiting to greet you. If you visit any other URI you'll receive the default 'no route found' message from HTTP::Server::Router.

The Rest

The module is relatively young. With feedback from the community, practical applications, and some extra feature expansion, Hiker could be pretty great and it's a good start to taking the tediousness out of building a website in P6. I'm open to feedback and I'd love to hear/see where you think Hiker can be improved, what it's missing to be productive, and possibly anything else [constructive or otherwise] you'd like to see in a practical, rapid development P6 web server.

gfldex: Leaving out considered dangerous

Published by gfldex on 2017-01-07T19:50:17

A knowledge seeker asked us why a loop spec allows $i>10 but not $i<10. The reason is that the postcircumfix:«< >» changes the grammar in such a way that it expects a list quote after the $i. As a result you get the following.

loop (my $i=0;$i<10;$i++) {};
# OUTPUT«===SORRY!=== Error while compiling ␤Whitespace required before < operator␤at :1␤------> loop (my $i=0;$i<10;$i++) {};⏏␤    expecting any of:␤        postfix␤»

I tried to illustrate the problem by making the $i>10 case fail as well by defining a new operator.

sub postcircumfix:«> <»($a){}; loop (my $i=0;$i>10;$i++) {};
# OUTPUT«===SORRY!=== Error while compiling ␤Unable to parse expression in postcircumfix:sym«> <»; couldn't find final $stopper ␤at :1␤------> rcumfix:«> <»($a){}; loop (my $i=0;$i>10⏏;$i++) {};␤    expecting any of:␤        s…»

I concluded with the wisdom that that Perl 6 is a dynamic dynamic language. While filing a related bug report I made the new years resolution to put a white space around each and every operator. You may want to do the same.


gfldex: Perl 6 is Smalltalk

Published by gfldex on 2017-01-04T22:35:59

Masak kindly pointed the general public to a blog post that talks about how awesome Smalltalk is.
The example presented there reads:

a < b
  ifTrue: [^'a is less than b']
  ifFalse: [^'a is greater than or equal to b']

The basic idea is that ifTrue and ifFalse are methods on the class Bool. Perl 6 don’t got that and I thought it would be tricky to add because augment enum doesn’t work. After some tinkering I found that augment doesn’t really care what you hand it as long as it is a class. As it happens Rakudo doesn’t check if the class is really a class, it simply looks for a type object with the provided name. The following just works.

use MONKEY-TYPING;
augment class Bool {
    method ifTrue(&c){ self ?? c(self) !! Nil; self }
    method ifFalse(&c){ self ?? Nil !! c(self); self }
}

(1 < 2)
    .ifTrue({say ‚It's True!‘})
    .ifFalse({ say ‚It's False!‘});

If we call only one of the new methods on Bool, we could even use the colon form.

(True)
    .ifTrue: { say "It's $^a!" };

As you likely spotted I went a little further as Smalltalk by having the added methods call the blocks with the Bool in question. Since Block got a single optional positional parameter the compiler wont complain if we just hand over a block. If a pointy block or a Routine is provided it would need a Signature with a single positional or a slurpy.

Please note that augment on an enum that we call a class is not in the spec yet. A bug report was filed and judgement is pending. If that fails there is always the option to sneak the methods into the type object behind Bool at runtime via the MOP.

And so I found that Perl 6 is quite big but still nice to talk about.

UPDATE: There where complains that IfTrue contained an if statement. That’s was silly and fixed.


Steve Mynott: Rakudo Star: Past Present and Future

Published by Steve Mynott on 2017-01-02T14:07:31

At YAPC::EU 2010 in Pisa I received a business card with "Rakudo Star" and the
date July 29, 2010 which was the date of the first release -- a week earlier
with a countdown to 1200 UTC. I still have mine, although it has a tea stain
on it and I refreshed my memory over the holidays by listening again to Patrick
Michaud speaking about the launch of Rakudo Star (R*):

https://www.youtube.com/watch?v=MVb6m345J-Q

R* was originally intended as first of a number of distribution releases (as
opposed to a compiler release) -- useable for early adopters but not initially production
Quality. Other names had been considered at the time like Rakudo Beta (rejected as
sounding like "don't use this"!) and amusingly Rakudo Adventure Edition.
Finally it became Rakudo Whatever and Rakudo Star (since * means "whatever"!).

Well over 6 years later and we never did come up with a better name although there
was at least one IRC conversation about it and perhaps "Rakudo Star" is too
well established as a brand at this point anyway. R* is the Rakudo compiler, the main docs, a module installer, some modules and some further docs.

However, one radical change is happening soon and that is a move from panda to
zef as the module installer. Panda has served us well for many years but zef is
both more featureful and more actively maintained. Zef can also install Perl
6 modules off CPAN although the CPAN-side support is in its early days. There
is a zef branch (pull requests welcome!) and a tarball at:

http://pl6anet.org/drop/rakudo-star-2016.12.zef-beta2.tar.gz

Panda has been patched to warn that it will be removed and to advise the use of
zef. Of course anyone who really wants to use panda can reinstall it using zef
anyway.

The modules inside R* haven't changed much in a while. I am considering adding
DateTime::Format (shown by ecosystem stats to be widely used) and
HTTP::UserAgent (probably the best pure perl6 web client library right now).
Maybe some modules should also be removed (although this tends to be more
controversial!). I am also wondering about OpenSSL support (if the library is
available).

p6doc needs some more love as a command line utility since most of the focus
has been on the website docs and in fact some of these changes have impacted
adversely on command line use, eg. under Windows cmd.exe "perl 6" is no longer
correctly displayed by p6doc. I wonder if the website generation code should be
decoupled from the pure docs and p6doc command line (since R* has to ship any
new modules used by the website). p6doc also needs a better and faster search
(using sqlite?). R* also ships some tutorial docs including a PDF generated from perl6intro.com.
We only ship the English one and localisation to other languages could be
useful.

Currently R* is released roughly every three months (unless significant
breakage leads to a bug fix release). Problems tend to happen with the
less widely used systems (Windows and the various BSDs) and also with the
module installers and some modules. R* is useful in spotting these issues
missed by roast. Rakudo itself is still in rapid development. At some point a less frequently
updated distribution (Star LTS or MTS?) will be needed for Linux distribution
packagers and those using R* in production). There are also some question
marks over support for different language versions (6.c and 6.d).

Above all what R* (and Rakudo Perl 6 in general) needs is more people spending
more time working on it! JDFI! Hopefully this blog post might
encourage more people to get involved with github pull requests.

https://github.com/rakudo/star

Feedback, too, in the comments below is actively encouraged.


Perl 6 Advent Calendar: Day 24 – Make It Snow

Published by ab5tract on 2016-12-24T13:14:02

Hello again, fellow sixers! Today I’d like to take the opportunity to highlight a little module of mine that has grown up in some significant ways this year. It’s called Terminal::Print and I’m suspecting you might already have a hint of what it can do just from the name. I’ve learned a lot from writing this module and I hope to share a few of my takeaways.

Concurrency is hard

Earlier in the year I decided to finally try to tackle multi-threading in Terminal::Print and… succeeded more or less, but rather miserably. I wrapped the access to the underlying grid (a two-dimensional array of Cell objects) in a react block and had change-cell and print-cell emit their respective actions on a Supply. The react block then handled these actions. Rather slowly, unfortunately.

Yet, there was hope. After jnthn++ fixed a constructor bug in OO::Monitors I was able to remove all the crufty hand-crafted handling code and instead ensure that method calls to the Terminal::Print::Grid object would only run in a single thread at any given time. (This is the class which holds the two-dimensional array mentioned before and was likewise the site of my react block experiment).

Here below are the necessary changes:

- unit class Terminal::Print::Grid;
+ use OO::Monitors;
+ unit monitor Terminal::Print::Grid;

This shift not only improved the readability and robustness of the code, it was significantly faster! Win! To me this is really an amazing dynamic of Perl 6. jnthn’s brilliant, twisted mind can write a meta-programming module that makes it dead simple for me to add concurrency guarantees at a specific layer of my library. My library in turn makes it dead simple to print from multiple threads at once on the screen! It’s whipuptitude enhancers all the the way down!

That said, our example today will not be writing from multiple threads. For some example code that utilizes async, I point you to examples/async.p6 and examples/matrix-ish.p6.

Widget Hero

Terminal::Print is really my first open source library in the sense that it is the first time that I have started my own module from scratch with the specific goal of filling a gap in a given language’s ecosystem. It is also the first time that I am not the sole contributor! I would be remiss not to mention japhb++ in this post, who has contributed a great deal in a relatively short amount of time.

In addition to all the performance related work and the introduction of a span-oriented printing mechanism, japhb’s work on widgets especially deserves its own post! For now let’s just say that it has been a real pleasure to see the codebase grow and extend even as I have been too distracted to do much good. My takeaway here is a new personal milestone in my participation in libre/open software (my first core contributor!) that reinforces all the positive dynamics it can have on a code base.

Oh, and I’ll just leave this here as a teaser of what the widget work has in store for us:

rpg-ui-p6

You can check it out in real-time and read the code at examples/rpg-ui.p6.

Snow on the Golf Course

Now you are probably wondering, where is the darn, snow! Well, here we go! The full code with syntax highlighting is available in examples/snowfall.p6. I will be going through it step by step below.

use Terminal::Print;

class Coord {
    has Int $.x is rw where * <= T.columns = 0;
    has Int $.y is rw where * <= T.rows = 0 ;
}

Here we import Terminal::Print. The library takes the position that when you import it somewhere, you are planning to print to the screen. To this end we export an instantiated Terminal::Print object into the importer’s lexical scope as T. This allows me to immediately start clarifying the x and y boundaries of our coordinate system based on run-time values derived from the current terminal window.

class Snowflake {
    has $.flake = ('❆','❅','❄').roll;
    has $.pos = Coord.new;
}

sub create-flake {
    state @cols = ^T.columns .pick(*); # shuffled
    if +@cols > 0 {
        my $rand-x = @cols.pop;
        my $start-pos = Coord.new: x => $rand-x;
        return Snowflake.new: pos => $start-pos;
    } else {
        @cols = ^T.columns .pick(*);
        return create-flake;
    }
}

Here we create an extremely simple Snowflake class. What is nice here is that we can leverage the default value of the $.flake attribute to always be random at construction time.

Then in create-flake we are composing a way to make sure we have hit every x coordinate as a starting point for the snowfall. Whenever create-flake gets called, we pop a random x coordinate out of the @cols state variable. The state variable enables this cool approach because we can manually fill @cols with a new randomized set of our available x coordinates once it is depleted.

draw( -> $promise {

start {
    my @flakes = create-flake() xx T.columns;
    my @falling;
    
    Promise.at(now + 33).then: { $promise.keep };
    loop {
        # how fast is the snowfall?
        sleep 0.1; 
    
        if (+@flakes) {
            # how heavy is the snowfall?
            my $limit = @flakes > 2 ?? 2            
                                    !! +@flakes;
            # can include 0, but then *cannot* exclude $limit!
            @falling.push: |(@flakes.pop xx (0..$limit).roll);  
        } else {
            @flakes = create-flake() xx T.columns;
        }
    
        for @falling.kv -> $idx, $flake {
            with $flake.pos.y -> $y {
                if $y > 0 {
                    T.print-cell: $flake.pos.x, ($flake.pos.y - 1), ' ';
                }

                if $y < T.rows {
                    T.print-cell: $flake.pos.x, $flake.pos.y, $flake.flake;            
                }

                try {
                    $flake.pos.y += 1;
                    CATCH {
                        # the flake has fallen all the way
                        # remove it and carry on!
                        @falling.splice($idx,1,());
                        .resume;
                    }
                }
            }
        }
    }
}

});

Let’s unpack that a bit, shall we?

So the first thing to explain is draw. This is a handy helper routine that is also imported into the current lexical scope. It takes as its single argument a block which accepts a Promise. The block should include a start block so that keeping the argument promise works as expected. The implementation of draw is shockingly simple.

So draw is really just short-hand for making sure the screen is set up and torn down properly. It leverages promises as (I’m told) a “conv-var” which according to the Promises spec might be an abuse of promises. I’m not very futzed about it, to be honest, since it suits my needs quite well.

This approach also makes it quite easy to create a “time limit” for the snowfall by scheduling a promise to be kept at now + 33 — thirty three seconds from when the loop starts. then we keep the promise and draw shuts down the screen for us. This makes “escape” logic for your screensavers quite easy to implement (note that SIGINT also restores your screen properly. The most basic exit strategy works as expected, too :) ).

The rest is pretty straightforward, though I’d point to the try block as a slightly clever (but not dangerously so) combination of where constraints on Coord‘s attributes and Perl 6’s resumable exceptions.

Make it snow!

And so, sixers, I bid you farewell for today with a little unconditional love from ab5tract’s curious little corner of the universe. Cheers!

snowfall-p6


Perl 6 Advent Calendar: Day 24 – One Year On

Published by liztormato on 2016-12-24T10:51:57

This time of year invites one to look back on things that have been, things that are and things that will be.

Have Been

I was reminded of things that have been when I got my new notebook a few weeks ago. Looking for a good first sticker to put on it, I came across an old ActiveState sticker:

If you don’t know Perl
you don’t know Dick

A sticker from 2000! It struck me that that sticker was as old as Perl 6. Only very few people now remember that a guy called Dick Hardt was actually the CEO of ActiveState at the time. So even though the pun may be lost on most due to the mists of time, the premise still rings true to me: that Perl is more about a state of mind, then about versions. There will always be another version of Perl. Those who don’t know Perl are doomed to re-implement it, poorly. Which, to me, is why so many ideas were borrowed from Perl. And still are!

Are

Where are we now? Is it the moment we know, We know, we know? I don’t think we are at twenty thousand people using Perl 6 just yet. But we’re keeping our fingers crossed. Just in case.

We are now 12 compiler releases after the initial Christmas release of Perl 6. In this year, many, many areas of Rakudo Perl 6 and MoarVM have dramatically improved in speed and stability. Our canary-in-the-coalmine test has dropped from around 14 seconds a year ago to around 5.5 seconds today. A complete spectest run is now about 3 minutes, whereas it was about 4.5 minutes at the beginning of the year, while about 4000 tests were added (from about 50K to 54K). And we now have 757 modules in the Perl 6 ecosystem (aka temporary CPAN for Perl 6 modules), with a few more added every week.

The #perl6 IRC channel has been too busy for me to follow consistently. But if you have a question related to Perl 6 and you want a quick answer, the #perl6 channel is the place to be. You don’t even have to install an IRC client: you can also use a browser to chat, or just follow “live” what is being said.

There are also quite a few useful bots on that channel: they e.g. take care of running a piece of Perl 6 code for you. Or find out at which commit the behaviour of a specific piece of code changed. These are very helpful for the developers of Perl 6, who usually also hang out on the #perl6-dev IRC channel. Which could be you! The past year, at least one contributor was added to the CREDITS every month!

Will Be

The coming year will see at least three Perl 6 books being published. First one will be Think Perl 6 – How To Think Like A Computer Scientist by Laurent Rosenfeld. It is an introduction to programming using Perl 6. But even for those of you with programming experience, it will be a good book to start learning Perl 6. And I can know. Because I’ve already read it :-)

Second one will be Learning Perl 6 by veteran Perl developer and writer brian d foy. It will have the advantage of being written by a seasoned writer going through the newbie experience that most people will have when coming from Perl 5.

The third one will be Perl 6 By Example by Moritz Lenz, which will, as the title already gives away, introduce Perl 6 topics by example.

There’ll be at least two (larger) Perl Conferences apart from many smaller Perl workshops: the The Perl Conference NA on 18-23 June, and the The Perl Conference in Amsterdam on 9-11 August. Where you will meet all sorts of nice people!

And for the rest? Expect a faster, leaner, Perl 6 and MoarVM compiler release on the 3rd Saturday every month. And an update of weekly events in the Perl 6 Weekly on every Monday evening/Tuesday morning (depending on where you live).


Perl 6 Advent Calendar: Day 23 – Everything is either wrong or less than awesome

Published by AlexDaniel on 2016-12-23T00:07:12

Have you ever spent your precious time on submitting a bug report for some project, only to get a response that you’re an idiot and you should f⊄∞÷ off?

Right! Well, perhaps consider spending your time on Perl 6 to see that not every free/open-source project is like this.

In the Perl 6 community, there is a very interesting attitude towards bug reports. Is it something that was defined explicitly early on? Or did it just grow organically? This remains to be a Christmas mystery. But the thing is, if it wasn’t for that, I wouldn’t be willing to submit all the bugs that I submitted over the last year (more than 100). You made me like this.

Every time someone submits a bug report, Perl 6 hackers always try to see if there is something that can done better. Yes, sometimes the bug report is invalid. But even if it is, is there any way to improve the situation? Perhaps a warning could be thrown? Well, if so, then we treat the behavior as LTA (Less Than Awesome), and therefore the bug report is actually valid! We just have to tweak it a little bit, meaning that the ticket will now be repurposed to improve or add the error message, not change the behavior of Perl 6.

The concept of LTA behavior is probably one of the key things that keeps us from rejecting features that may seem to do too little good for the amount of effort required to implement them, but in the end become game changers. Another closely related concept that comes to mind is “Torment the implementors on behalf of the users”.

OK, but what if this behavior is well-defined and is actually valid? In this case, it is still probably our fault. Why did the user get into this situation? Maybe the documentation is not good enough? Very often that is the issue, and we acknowledge that. So in a case of a problem with the documentation, we will usually ask you to submit a bug report for the documentation, but very often we will do it ourselves.

Alright, but what if the documentation for this particular case is in place? Well, maybe the thing is not easily searchable? That could be the reason why the user didn’t find it in the first place. Or maybe we lack some links? Maybe the places that should link to this important bit of information are not doing so? In other words, perhaps there are still ways to improve the docs!

But if not, then yes, we will have to write some tests for this particular case (if there are no tests yet) and reject the ticket. This happens sometimes.

The last bit, even if obvious to some, is still worth mentioning. We do not mark tickets resolved without tests. One reason is that we want roast (which is a Perl 6 spec) to be as full as possible. The other reason is that we don’t want regressions to happen (thanks captain obvious!). As the first version of Perl 6 was released one year ago, we are no longer making any changes that would affect the behavior of your code. However, occasional regressions do happen, but we have found an easy way to deal with those!

If you are not on #perl6 channel very often, you might not know that we have a couple of interesting bots. One of them is bisectable. In short, Bisectable performs a more user-friendly version of git bisect, but instead of building Rakudo on each commit, it has done it before you even asked it to! That is, it has over 5500 rakudo builds, one for every commit done in the last year and a half. This turns the time to run git bisect from minutes to about 10 seconds (Yes, 10 seconds is less than awesome! We are working on speeding it up!). And there are other bots that help us inspect the progress. The most recent one is Statisfiable, here is one of the graphs it can produce.

So if you pop up on #perl6 with a problem that seems to be a regression, we will be able to find the cause in seconds. Fixing the issue will usually take a bit more than that though, but when the problem is significant, it will usually happen in a day or two. Sorry for breaking your code in attempts to make it faster, we will do better next time!

But as you are reading this, perhaps you may be interested in seeing some bug reports? I thought that I’d go through the list of bugs of the last year to show how horribly broken things were, just to motivate the reader to go hunting for bugs. The bad news (oops, good news I mean), it seems that the number of “horrible” bugs is decreasing a bit too fast. Thanks to many Rakudo hackers, things are getting more stable at a very rapid pace.

Anyway, there are still some interesting things I was able to dig up:

That being said, my favorite bug of all times is RT #127473. Three symbols in the source code causing it to go into an infinite loop printing stuff about QAST nodes. That’s a rather unique issue, don’t you think?

I hope this post gave you a little insight on how we approach bugs, especially if you are not hanging around on #perl6 very often. Is our approach less than awesome? Do you have some ideas for other bots that could help us work with bugs? Leave it in the comments, we would like to know!


Perl 6 Advent Calendar: Day 22 – Generative Testing

Published by SmokeMachine on 2016-12-22T00:00:47

OK! So say you finished writing your code and it’s looking good. Let’s say it’s this incredible sum function:

module Sum {
   sub sum($a, $bis export {
      $a + $b
   }
}

Great, isn’t it?! Let’s use it:

use Sum;
say sum 2, 3; # 5

That worked! We summed the number 2 with the number 3 as you saw. If you carefully read the function you’ll see the variables $a and $b haven’t a type set. If you don’t type a variable it’s, by default, of type Any. 2 and 3 are Ints… Ints are Any. So that’s OK! But do you know what’s Any too? Str (just a example)!

Let’s try using strings?

use Sum;
say sum "bla", "ble";

We got a big error:

Cannot convert string to number: base-10 number must begin with valid digits or '.' in 'bla' (indicated by ⏏)
  in sub sum at sum.p6 line 1
  in block  at sum.p6 line 7

Actually thrown at:
  in sub sum at sum.p6 line 1
  in block  at sum.p6 line 7

Looks like it does not accept Strs… It seems like Any may not be the best type to use in this case.

Worrying about every possible input type for all our functions can prove to demand way too much work, as well as still being prone to human error. Thankfully there’s a module to help us with that! Test::Fuzz is a perl6 module that implements the “technique” of generative testing/fuzz testing.

Generative testing or Fuzz Testing is a technique of generating random/extreme data and using this data to call the function being tested.

Test::Fuzz gets the signature of your functions and decides what generators it should use to test it. After that it runs your functions giving it (100, by default) different arguments and testing if it will break.

To test our function, all that’s required is:

module Sum {
   use Test::Fuzz;
   sub sum($a, $bis export is fuzzed {
      $a + $b
   }
}
multi MAIN(:$fuzz!) {
   run-tests
}

And run:

perl6 Sum.pm6 --fuzz

This case will still show a lot of errors:

Use of uninitialized value of type Thread in numeric context
  in sub sum at Sum.pm6 line 4
Use of uninitialized value of type int in numeric context
  in sub sum at Sum.pm6 line 4
    ok 1 - sum(Thread, int)
Use of uninitialized value of type X::IO::Symlink in numeric context
  in sub sum at Sum.pm6 line 4
    ok 2 - sum(X::IO::Symlink, -3222031972)
Use of uninitialized value of type X::Attribute::Package in numeric context
  in sub sum at Sum.pm6 line 4
    ok 3 - sum(X::Attribute::Package, -9999999999)
Use of uninitialized value of type Routine in numeric context
  in sub sum at Sum.pm6 line 4
    not ok 4 - sum(áéíóú, (Routine))
...

What does that mean?

That means we should use one of the big features of perl6: Gradual typing. $a and $b should have types.

So, let’s modify the function and test again:

module Sum {
   use Test::Fuzz;
   sub sum(Int $a, Int $bis export is fuzzed {
      $a + $b
   }
}
multi MAIN(:$fuzz!) {
   run-tests
}
    ok 1 - sum(-2991774675, 0)
    ok 2 - sum(5471569889, 7905158424)
    ok 3 - sum(8930867907, 5132583935)
    ok 4 - sum(-6390728076, -1)
    ok 5 - sum(-3558165707, 4067089440)
    ok 6 - sum(-8930867907, -5471569889)
    ok 7 - sum(3090653502, -2099633631)
    ok 8 - sum(-2255887318, 1517560219)
    ok 9 - sum(-6085119010, -3942121686)
    ok 10 - sum(-7059342689, 8930867907)
    ok 11 - sum(-2244597851, -6390728076)
    ok 12 - sum(-5948408450, 2244597851)
    ok 13 - sum(0, -5062049498)
    ok 14 - sum(-7229942697, 3090653502)
    not ok 15 - sum((Int), 1)

    # Failed test 'sum((Int), 1)'
    # at site#sources/FB587F3186E6B6BDDB9F5C5F8E73C55195B73C86 (Test::Fuzz) line 62
    # Invocant requires an instance of type Int, but a type object was passed.  Did you forget a .new?
...

A lot of OKs!  \o/

But there’re still some errors… We can’t sum undefined values…

We didn’t say the attributes should be defined (with :D). So Test::Fuzz generated every undefined sub-type of Int that it could find. It uses every generator of a sub-type of Int to generate values. It also works if you use a subset or even if you use a where in your signature. It’ll use a super-type generator and grep the valid values.

So, let’s change it again!

module Sum {
   use Test::Fuzz;
   sub sum(Int:D $a, Int:D $bis export is fuzzed {
      $a + $b
   }
}
multi MAIN(:$fuzz!) {
   run-tests
}
    ok 1 - sum(6023702597, -8270141809)
    ok 2 - sum(-8270141809, -3762529280)
    ok 3 - sum(242796759, -7408209799)
    ok 4 - sum(-5813412117, -5280261945)
    ok 5 - sum(2623325683, 2015644992)
    ok 6 - sum(-696696815, -7039670011)
    ok 7 - sum(1, -4327620877)
    ok 8 - sum(-7712774875, 349132637)
    ok 9 - sum(3956553645, -7039670011)
    ok 10 - sum(-8554836757, 7039670011)
    ok 11 - sum(1170220615, -3)
    ok 12 - sum(-242796759, 2015644992)
    ok 13 - sum(-9558159978, -8442233570)
    ok 14 - sum(-3937367230, 349132637)
    ok 15 - sum(5813412117, 1170220615)
    ok 16 - sum(-7408209799, 6565554452)
    ok 17 - sum(2474679799, -3099404826)
    ok 18 - sum(-5813412117, 9524548586)
    ok 19 - sum(-6770230387, -7408209799)
    ok 20 - sum(-7712774875, -2015644992)
    ok 21 - sum(8442233570, -1)
    ok 22 - sum(-6565554452, 9999999999)
    ok 23 - sum(242796759, 5719635608)
    ok 24 - sum(-7712774875, 7039670011)
    ok 25 - sum(7408209799, -8235752818)
    ok 26 - sum(5719635608, -8518891049)
    ok 27 - sum(8518891049, -242796759)
    ok 28 - sum(-2474679799, 2299757592)
    ok 29 - sum(5356064609, 349132637)
    ok 30 - sum(-3491438968, 3438417115)
    ok 31 - sum(-2299757592, 7580671928)
    ok 32 - sum(-8104597621, -8158438801)
    ok 33 - sum(-2015644992, -3)
    ok 34 - sum(-6023702597, 8104597621)
    ok 35 - sum(2474679799, -2623325683)
    ok 36 - sum(8270141809, 7039670011)
    ok 37 - sum(-1534092807, -8518891049)
    ok 38 - sum(3551099668, 0)
    ok 39 - sum(7039670011, 4327620877)
    ok 40 - sum(9524548586, -8235752818)
    ok 41 - sum(6151880628, 3762529280)
    ok 42 - sum(-8518891049, 349132637)
    ok 43 - sum(7580671928, 9999999999)
    ok 44 - sum(-8235752818, -7645883481)
    ok 45 - sum(6460424228, 9999999999)
    ok 46 - sum(7039670011, -7788162753)
    ok 47 - sum(-9999999999, 5356064609)
    ok 48 - sum(8510706378, -2474679799)
    ok 49 - sum(242796759, -5813412117)
    ok 50 - sum(-3438417115, 9558159978)
    ok 51 - sum(8554836757, -7788162753)
    ok 52 - sum(-9999999999, 3956553645)
    ok 53 - sum(-6460424228, -8442233570)
    ok 54 - sum(7039670011, -7712774875)
    ok 55 - sum(-3956553645, 1577669672)
    ok 56 - sum(0, 9524548586)
    ok 57 - sum(242796759, -6151880628)
    ok 58 - sum(7580671928, 3937367230)
    ok 59 - sum(-8554836757, 7712774875)
    ok 60 - sum(9524548586, 2474679799)
    ok 61 - sum(-7712774875, 2450227203)
    ok 62 - sum(3, 1257247905)
    ok 63 - sum(8270141809, -2015644992)
    ok 64 - sum(242796759, -3937367230)
    ok 65 - sum(6770230387, -6023702597)
    ok 66 - sum(2623325683, -3937367230)
    ok 67 - sum(-5719635608, -7645883481)
    ok 68 - sum(1, 6770230387)
    ok 69 - sum(3937367230, 7712774875)
    ok 70 - sum(6565554452, -5813412117)
    ok 71 - sum(7039670011, -8104597621)
    ok 72 - sum(7645883481, 9558159978)
    ok 73 - sum(-6023702597, 6770230387)
    ok 74 - sum(-3956553645, -7788162753)
    ok 75 - sum(-7712774875, 8518891049)
    ok 76 - sum(-6770230387, 6565554452)
    ok 77 - sum(-8554836757, 5356064609)
    ok 78 - sum(6460424228, 8518891049)
    ok 79 - sum(-3438417115, -9999999999)
    ok 80 - sum(-1577669672, -1257247905)
    ok 81 - sum(-5813412117, -3099404826)
    ok 82 - sum(8158438801, -3551099668)
    ok 83 - sum(-8554836757, 1534092807)
    ok 84 - sum(6565554452, -5719635608)
    ok 85 - sum(-5813412117, -2623325683)
    ok 86 - sum(-8158438801, -3937367230)
    ok 87 - sum(5813412117, -46698532)
    ok 88 - sum(9524548586, -2474679799)
    ok 89 - sum(3762529280, -2474679799)
    ok 90 - sum(7788162753, 9558159978)
    ok 91 - sum(6770230387, -46698532)
    ok 92 - sum(1577669672, 6460424228)
    ok 93 - sum(4327620877, 3762529280)
    ok 94 - sum(-6023702597, -2299757592)
    ok 95 - sum(1257247905, -8518891049)
    ok 96 - sum(-8235752818, -6151880628)
    ok 97 - sum(1577669672, 7408209799)
    ok 98 - sum(349132637, 6770230387)
    ok 99 - sum(-7788162753, 46698532)
    ok 100 - sum(-7408209799, 0)
    1..100
ok 1 - sum

No errors!!!

Currently Test::Fuzz only implement generators for Int and Str, but as I said, it will be used for its super and sub classes. If you want to have generators for your custom class, you just need to implement a “static” method called generate-samples that returns sample instances of your class, infinite number of instances if possible.

Test::Fuzz is under development and isn’t in perl6 ecosystem yet. And we’re needing some help!

EDITED: New now you can only call run-tests()


Death by Perl6: Adding on to Channels and Supplies in Perl6

Published by Tony O'Dell on 2016-12-21T16:11:13

Channels and supplies are perl6's way of implementing the Oberserver pattern. There's some significant differences behind the scenes of the two but both can be used to implement a jQuery.on("event" like experience for the user. Not a jQuery fan? Don't you worry your pretty little head because this is perl6 and it's much more fun than whatever you thought.

Why?

Uhh, why do we want this?

This adds some sugar to the basic reactive constructs and it makes the passing of messages a lot more friendly, readable, and manageable.

What in Heck Does that Look Like?

Let's have an example and then we'll dissect it.

A Basic Example

use Event::Emitter;  
my Event::Emitter $e .= new;

$e.on(/^^ .+ $$/, -> $data {
  # you can operate on $data here
  '  regex matches'.say;
});

$e.on({ True; }, -> $data {
  '  block matches'.say;
});

$e.on('event', -> $data {
  '  string matches'.say;
});

'event'.say;  
$e.emit("event", { });

'empty event'.say;  
$e.emit("", { });

'abc'.say;  
$e.emit("abc", { });

Output * this is the output for an emitter using Supply, more on this later

event  
  regex matches
  block matches
  string matches
empty event  
  block matches
abc  
  regex matches
  block matches

Okay, that looks like a lot. It is, and it's much nicer to use than a large given/when combination. It also reduces indenting, so you have that going for you, which is nice.

Let's start with the simple .on blocks we have.

  $e.on(/^^ .+ $$/, -> $data { ...

This is telling the emitter handler that whenever an event is received, run that regular expression against it and if it matches, execute the block (passed in as the second argument). As a note, and illustrated in the example above, the handler can match against a Callable, Str, or Regex. The Callable must return True or False to let the handler know whether or not to execute the block.

If that seems pretty basic, it is. But little things like this add up over time and help keep things manageable. Prepare yourself for more convenience.

The Sugar

Do you want ants? This is how you get ants.

So, now we're looking for more value in something like this. Here it is: you can inherit from the Event::Emitter::Role::Template (or roll your own) and then your classes will automatically inherit these on events.

Example
use Event::Emitter::Role::Template;

class ZefClass does Event::Emitter::Role::Template {  
  submethod TWEAK {
    $!event-emitter.on("fresh", -> $data {
      'Aint that the freshness'.say;
    });
  }
}

Then, further along in your application, whenever an object wants ZefClass to react to the 'fresh' event, all it needs to do is:

$zef-class-instance.emit('fresh');

Pretty damn cool.

Development time is reduced significantly for a few reasons right off the bat:

  1. Implementing Supplier (or Channel) methods, setup, and event handling becomes unnecessary
  2. Event naming or matching is handled so it's easy to debug
  3. Handling or adding new event handling functions during runtime (imagine a plugin that may want to add more events to handle - like an IRC client that implements a handler for channel parting messages)
  4. Messages can be multiplexed through one Channel or Supply rather easily
  5. Creates more readable code

That last reason is a big one. Imagine going back into one of your modules 2 years from now and debugging an issue where a Supplier is given an event and some data and digging through that 600 lines of given/when.

Worse, imagine debugging someone else's.

A Quick Note on Channel vs Supply

The Channel and Supply thing can take some getting used to for newcomers. The quick and dirty is that a Channel will distribute the event to only one listener (chosen by the scheduler) and order isn't guaranteed while a Supply will distribute to all listeners and the order of the messages are distributed in the order received. Because the Event::Emitter Channel based handler executes the methods registered with it directly, when it receives a message all of your methods are called with the data.

So, you've seen the example above as a Supply based event handler, check it out as a Channel based and note the difference in .say and the instantiation of the event handler.

use Event::Emitter;  
my Event::Emitter $e .= new(:threaded); # !important - this signifies a Channel based E:E

$e.on(/^^ .+ $$/, -> $data {
  # you can operate on $data here
  "  regex matches: $data".say;
});

$e.on({ True; }, -> $data {
  "  block matches: $data".say;
});

$e.on('event', -> $data {
  "  string matches: $data".say;
});

'event'.say;  
$e.emit("event", "event");

'empty event'.say;  
$e.emit("", "empty event");

'abc'.say;  
$e.emit("abc", "abc");

Output

event  
empty event  
abc  
  regex matches: event
  block matches: event
  string matches: event
  block matches: empty event
  regex matches: abc
  block matches: abc

Perl 6 Advent Calendar: Day 21 – Show me the data!

Published by nadimkhemir on 2016-12-21T00:01:18

Over the years, I have enjoyed using the different data dumpers that Perl5 offers. From the basic Data::Dumper to modules dumping in hexadecimal, JSON, with colors, handling closures, with a GUI, as graphs via dot and many other that fellow module developers have posted on CPAN (https://metacpan.org/search?q=data+dump&search_type=modules).

I always find things easier to understand when I can see data and relationships. The funkiest display belonging to ddd (https://www.gnu.org/software/ddd/) that I happen to fire up now and then just for the fun (in the example showing C data but it works as well with the Perl debugger).

ddd

Many dumpers are geared towards data transformation and data transmission/storage. A few modules specialize in generating output for the end user to read; I have worked on system that generated hundreds of thousands lines of output and it is close to impossible to read dumps generated by, say, Data::Dumper.

When I started using Perl6, I immediately felt the need to dump data structures (mainly because my noob code wasn’t doing what I expected it to do); This led me to port my Perl5 module (https://metacpan.org/pod/Data::TreeDumper  https://github.com/nkh/P6-Data-Dump-Tree) to Perl6. I am now also thinking about porting my HexDump module. I recommend warmly learning Perl6 by porting your modules (if you have any on CPAN), it’s fun, educative, useful for the Perl6 community, and your modules implement a need in a domain that you master leaving you time to concentrate on the Perl6.

My Perl5 module was ripe for a re-write and I wanted to see if and how it would be better if written in Perl6, I was not disappointed.

Perl6 is a big language, it takes time to get the pieces right, for a beginner it may seem daunting, even if one has years of experience, the secret is to take it easy, not give up and listen. Porting a module is the perfect exercise, you can take it easy because you have already done it before, you’re not going to give up because you know you can do it, and you have time to listen to people that have more experience (they also need your work), the Perl6 community has been examplary, helpful, patient, supportive and always present; if you haven visited #perl6 irc channel yet, now is a good time.

.perl

Every object in Perl6 has a ‘perl’ method, it can be used to dump the object and objects under it. The official documentation (https://docs.perl6.org/language/5to6-nutshell#Data%3A%3ADumper) provides a good example.

.gist

Every object also inherits a ‘gist’ method from Mu, the official documentation (https://docs.perl6.org/routine/gist#(Mu)_routine_gist) states: “Returns a string representation of the invocant, optimized for fast recognition by humans.”

dd, the micro dumper

It took me a while to discover this one, I saw that in a post on IRC. You know how it feel when you discover something simple after typing .perl and .gist a few hundred times, bahhh!

https://docs.perl6.org/routine/dd

The three dumpers above are built-in. They are also the fastest way to dump data but as much as their output is welcome, I know that it is possible to present data in a more legible way.

Enter Data::Dump

You can find the module on https://modules.perl6.org/ where all the Perl6 modules are. Perl6 modules link to repositories, Data::Dump source is on https://github.com/tony-o/perl6-data-dump.

Data::dump introduces color, depth limitation, and type specific dumps. The code is a compact hundred lines that is quite easy to understand. This module was quite helpful for a few cases that I had. It also dumps all the methods associated with objects. Unfortunately, it did fail on a few types of objects. Give it a try.

Data::Dump::Tree

Emboldened by the Perl6 community, the fact that I really needed a Dumper for visualization, and the experience from my Perl5 module (mainly the things that I wanted to be done differently) I started working on the module. I had some difficulties at the beginning, I knew nothing about the details of Perl6 and even if there is a resemblance with Perl5, it’s another beast. But I love it, it’s advanced, clean, and well designed, I am grateful for all the efforts that where invested in Perl6.

P6 vs P5 implementation

It’s less than half the size and does as much, which makes it clearer (as much as my newbie code can be considered clean). The old code was one monolithic module with a few long functions, the new code has a better organisation and some functionality was split out to extra modules. It may sound like bit-rot (and it probably is a little) but writing the new code in Perl6 made the changes possible, multi dispatch, traits and other built-in mechanism greatly facilitate the re-factoring.

What does it do that the other modules don’t?

I’ll only talk about a few points here and refer you to the documentation for all the details (https://raw.githubusercontent.com/nkh/P6-Data-Dump-Tree/master/lib/Data/Dump/Tree.pod); also have a look at the examples in the distribution.

The main goal for Data::Dump::Tree is readability, that is achieved with filter, type specific dumpers, colors, and dumper specialization via traits. In the examples directory, you can find JSON_parsed.pl which parses 20 lines of JSON by JSON::Tiny(https://github.com/moritz/json),. I’ll use it as an example below. The parsed data is dumped with .perl,  .gist , Data::Dump, and Data::Dump::Tree

.perl output (500 lines, unusable for any average human, Gods can manage)screenshot_20161219_185724

.gist (400 lines, quite readable, no color and long lines limit the readability a bit). Also note that it looks better here than on my terminal who has problems handling unicode properly.screenshot_20161219_190004

Data::Dump (4200 lines!, removing the methods would probably make it usable)screenshot_20161219_190439

The methods dump does not help.screenshot_20161219_190601

Data::Dump::Tree (100 lines, and you are the judge for readability as I am biased). Of course, Data::Dump::Tree is designed for this specific usage, first it understands Match objects, second it can display only part of the string that are matched, which greatly reduces the noise.
screenshot_20161219_190932

Tweeking output

The options are explained in the documentation but here is a little list
– Defining type specific dumper
screenshot_20161219_185409

– filtering to remove data or add a representation for a data set;  below the data structure is dumped as it is and then filtered (a filter that shows what it is doing).

As filtering happens on the “header” and “footer” is should be easy to make a HTML/DHTML plugin; Althoug bcat (https://rtomayko.github.io/bcat/), when using ASCII glyphs, works fine.

screenshot_20161219_191525
– set the display colors
– change the glyphs
– display address information or not
– use subscripts for indexes
– use ASCII, ANSI, or unicode for the glyphs

Diffs

I tried to implement a diff display with the Perl5 module but failed miserably as it needed architectural changes, The Perl6 version was much easier, in fact, it’s an add-on, a trait, that synchronizes two data dumps. This could be used in tests to show differences between expected and gotten data.

screenshot_20161219_184701
Of course we can eliminate the extra glyphs and the data that is equivalent (I also changed the glyph types to ASCII)screenshot_20161219_185035

From here

Above anything else, I hope many authors will start writing Perl6 modules. And I also hope to see other data dumping modules. As for Data::Dump::Tree, as it gathers more users, I hope to get requests for change, patches, and error reports.


rakudo.org: Advance Notice: Lexical Module Loading Bug Fix

Published by Zoffix Znet on 2016-12-17T15:06:40

Please note that the NEXT (January 2017) release of the Rakudo Compiler will include a fix for a bug some may be relying on under the assumption of it being a feature. For details, please see the explanation below. If you have any further questions, please ask us on our irc.freenode.net/#perl6 IRC channel.

What

Perl 6 takes great care to avoid global state, i.e. whatever you do in your module, it should not affect other code. That’s why e.g. subroutine definitions are lexically (my) scoped by default. If you want others to see them, you need to explicitly make them our scoped or export them.

Classes are exported by default on the assumption that loading a module will not be of much use when you cannot access the classes it contains. This works as advertised with a small but important caveat. Those classes are not only visible in the computation unit that loads the module, but globally. This means that as soon as some code loads a module, those classes are immediately visible everywhere.

For example, given a module Foo:

unit class Foo;
use Bar;

And your own program:

use Foo;
my $foo = Foo.new; # works as expected
my $bar = Bar.new; # huh!? Where is Bar coming from?

Why

This doesn’t sound so bad (it at least saves you some typing), except for that it makes another feature of Perl 6 impossible to have: the ability to load multiple versions of a module at the same time in different parts of your program:

{
    use Baz:ver(v1);
    my $old-baz = Baz.new;
}
{
    use Baz:ver(v2);
    my $shiny-new-baz = Baz.new;
}

This will explode as on loading Baz:ver(v2), rakudo will complain about “Baz” already being defined.

What To Do

To fix this, we no longer register loaded classes globally but only in the scope which loaded them in the first place. Coming back to our first example, we would need to explicitly load Bar in the main program:

use Foo;
use Bar;
my $foo = Foo.new; # still works of course
my $bar = Bar.new; # now it's clear where Bar is coming from

So if you suddenly get an “Undeclared name: Bar” error message after upgrading to a newer Perl 6 compiler, you will most probably just need to add a: “use Bar;” to your code.

Pawel bbkr Pabian: Let the fake times roll...

Published by Pawel bbkr Pabian on 2016-12-14T17:59:11

In my $dayjob at GetResponse I have to deal constantly with time dependent features. For example this email marketing platform allows you to use something called 'Time Travel', which is sending messages to your contacts at desired hour in their time zones. So people around the world can get email at 8:00, when they start their work and chance for those messages message being read are highest. No matter where they live.

But even such simple feature has more pitfalls that you can imagine. For example user has three contacts living in Europe/Warsaw, America/Phoenix and Australia/Sydney time zones.

The obvious validation is to exclude nonexistent days, for example user cannot select 2017-02-29 because 2017 is not a leap year. But what if he wants to send message at 2017-03-26 02:30:00? For America/Phoenix this is piece of cake - just 7 hours difference from UTC (or unix time). For Australia/Sydney things are bit more complicated because they use daylight saving time and this is their summer so additional time shift must be calculated. And for Europe/Warsaw this will fail miserably because they are just changing to summer time from 01:59:00 to 03:00:00 and 02:30 simply does not exist therefore some fallback algorithm should be used.

So for one date and time there are 3 different algorithms that have to be tested!
Unfortunately most of the time dependent code does not expose any interface to pass current time to emulate all edge cases, methods usually call time( ) or DateTime.now( ) internally. So let's test such blackbox - it takes desired date, time and time zone and it returns how many seconds are left before message should be sent.

package Timers;

use DateTime;

sub seconds_till_send {

my $when = DateTime->new( @_ )->epoch( );
my $now = time( );

return ( $when > $now ) ? $when - $now : 0;
}

Output of this method changes in time. To test it in consistent manner we must override system time( ) call:

#!/usr/bin/env perl

use strict;
use warnings;

BEGIN {
*CORE::GLOBAL::time = sub () { $::time_mock // CORE::time };
}

use Timers;
use Test::More;

# 2017-03-22 00:00:00 UTC
$::time_mock = 1490140800;

is Timers::seconds_till_send(
'year' => 2017, 'month' => 3, 'day' => 26,
'hour' => 2, 'minute' =>30,
'time_zone' => 'America/Phoenix'
), 379800, 'America/Phoenix time zone';

Works like a charm! We have consistent test that pretends our program is ran at 2017-03-22 00:00:00 UTC and that means there are 4 days, 9 hours and 30 minutes till 2017-03-26 02:30:00 in America Phoenix.

We can also test DST case in Australia.

# 2017-03-25 16:00:00 UTC
$::time_mock = 1490457600;

is Timers::seconds_till_send(
'year' => 2017, 'month' => 3, 'day' => 26,
'hour' => 2, 'minute' =>30,
'time_zone' => 'Australia/Sydney'
), 0, 'America/Phoenix time zone';

Because during DST Sydney has +11 hours from UTC instead of 10 that means when we run our program at 2017-03-25 16:00:00 UTC requested hour already passed there and message should be sent instantly. Great!

But what about nonexistent hour in Europe/Warsaw? We need to fix this method to return some useful values in DWIM-ness spirit instead of crashing. And I haven't told you whole, scarry truth yet, because we have to solve two issues at once here. First is nonexistent hour - in this case we want to calculate seconds to nearest possible hour after requested one - so 03:00 Europe/Warsaw should be used if 02:30 Europe/Warsaw does not exist. Second is ambiguous hour that happens when clocks are moved backwards and for example 2017-10-29 02:30 Europe/Warsaw occurs twice during this day - in this case first hour occurrence should be taken - so if 02:30 Europe/Warsaw is both at 00:30 UTC and 01:30 UTC seconds are calculated to the former one. Yuck...

For simplicity let's assume user cannot schedule message more than one year ahead, so only one time change related to DST will take place. With that assumption fix may look like this:

sub seconds_till_send {
    my %params = @_;
    my $when;

# expect ambiguous hour during summer to winter time change
if (DateTime->now( 'time_zone' => $params{'time_zone'} )->is_dst) {

# attempt to create ambiguous hour is safe
# and will always point to latest hour
$when = DateTime->new( %params );

# was the same hour one hour ago?
my $tmp = $when->clone;
$tmp->subtract( 'hours' => 1 );

# if so, correct to earliest hour
if ($when->hms eq $tmp->hms) {
$when = $when->epoch - 3600;
}
else {
$when = $when->epoch;
}
}

# expect nonexistent hour during winter to summer time change
else {

do {

# attempt to create nonexistent hour will die
$when = eval { DateTime->new( %params )->epoch( ) };

# try next minute maybe...
if ( ++$params{'minute'} > 59 ) {
$params{'minute'} = 0;
$params{'hour'}++;
}

} until defined $when;

}

my $now = time( );

return ( $when > $now ) ? $when - $now : 0;
}

If your eyes are bleeding here is TL;DR. First we have to determine which case we may encounter by checking if there is currently DST in requested time zone or not. For nonexistent hour we try to brute force it into next possible time by adding one minute periods and adjusting hours when minutes overflow. There is no need to adjust days because DST never happens on date change. For ambiguous hour we check if by subtracting one hour we get the same hour (yep). If so we have to correct unix timestamp to get earliest one.

But what about our tests? Can we still write it in deterministic and reproducible way? Luckily it occurs that DateTime->now( ) uses time( ) internally so no additional hacks are needed.

# 2017-03-26 00:00:00 UTC
$::time_mock = 1490486400;

is Timers::seconds_till_send(
'year' => 2017, 'month' => 3, 'day' => 26,
'hour' => 2, 'minute' => 30,
'time_zone' => 'Europe/Warsaw'
), 3600, 'Europe/Warsaw time zone nonexistent hour';

Which is expected result, 02:30 is not available in Europe/Warsaw so 03:00 is taken that is already in DST season and 2 hours ahead of UTC.

Now let's solve leap seconds issue where because of Moon slowing down Earth and causing it to run on irregular orbit you may encounter 23:59:60 hour every few years. OK, OK, I'm just kidding :) However in good tests you should also take leap seconds into account if needed!

I hope you learned from this post how to fake time in tests to cover weird edge cases.
Before you leave I have 3 more things to share:

  1. Dave Rolsky, maintainer of DateTime module does tremendous job. This module is a life saver. Thanks!
  2. Overwrite CORE::time before loading any module that calls time( ). If you do it this way
    use DateTime;
    
    

    BEGIN {
    *CORE::GLOBAL::time = sub () { $::time_mock // CORE::time };
    }

    $::time_mock = 123;
    say DateTime->now( );

    it won't have any effect due to sub folding.


  3. Remember to include empty time signature

    BEGIN {
        # no empty signature
        *CORE::GLOBAL::time = sub { $::time_mock // CORE::time }; 
    }
    
    

    $::time_mock = 123;

    # parsed as time( + 10 )
    # $y = 123, not what you expected !
    my $y = time + 10;

    because you cannot guarantee someone always used parenthesis when calling time( ).



6guts: Complex cocktail causes cunning crash

Published by jnthnwrthngtn on 2016-12-09T00:02:33

I did a number of things for Perl 6 yesterday. It was not, however, hard to decide which of them to write up for the blog. So, let’s dig in.

Horrible hiding heisenbugs

It all started when I was looking into this RT, reporting a segfault. It was filed a while ago, and I could not reproduce it. So, add a test and case closed? Well, not so fast. As I discussed last week, garbage collection happens when a thread fills up its nursery. Plenty of bugs only show up when the timing is just right (or is that just wrong?) What can influence when GC runs? How many allocations we’ve done. And what can influence that? Pretty much any change to the program being run, the compiler, or the environment. The two that most often have people tearing their hair out are:

While GC-related issues are not the only cause of SEGVs, in such a simple piece of code using common features it’s far and away the most likely cause. So, to give myself more confidence that the bug truly was gone, I adjusted the nursery size to be just 32KB instead of 4MB, which causes GC to run much more often. This, of course, is a huge slowdown, but it’s good for squeezing out bugs.

And…no bug! So, in goes the test. Simples!

In even better news, it was lunch time. Well, actually, that was only sort of good news. A few days ago I cooked a rather nice chicken and berry pulao. It came out pretty good, but cooking 6 portions worth of it when I’m home alone for the week wasn’t so smart. I don’t want to see chicken pulao for a couple of months now. Anyway, while I was devouring some of my pulao mountain, I set off a spectest run on the 32KB nursery stress build of MoarVM, just to see if it showed up anything.

Putrid pointers

A couple of failures did, in fact, show up, one of them in constant.t. This rang a bell. I was sure somebody had a in the last couple of weeks reported a crash in that test file, which had then vanished. I checked in with the person who I vaguely recalled mentioning it and…sure enough, it was that very test file. In their normal test runs, the bug had long since vanished. I figured, having now got a reproduction of it, I should probably hunt it down right away. Otherwise, we’d probably end up playing “where’s Wally” with it for another month or ten.

So, how did the failure look?

$ ./perl6-m -Ilib t/spec/S04-declarations/constant.rakudo.moar 
Segmentation fault (core dumped)

It actually segfaulted while compiling the test file. Sad! So, where?

$ ./perl6-gdb-m -Ilib t/spec/S04-declarations/constant.rakudo.moar 
[boring output omitted]
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()

That looks…ungood. That final line is meant to be a code location, which means something decided to tell the CPU to go execute code living at the NULL address. At this point, things could go two ways: the JIT spat out something terrible, or a function pointer somewhere was NULL. But which?

(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff78cacbc in MVM_coerce_smart_stringify (tc=0x6037c0, obj=0x605c10, res_reg=0x56624d8)
    at src/core/coerce.c:214
#2  0x00007ffff789dff4 in MVM_interp_run (tc=tc@entry=0x6037c0, initial_invoke=0x60ea80, 
    invoke_data=0x56624d8) at src/core/interp.c:827
#3  0x00007ffff7978b21 in MVM_vm_run_file (instance=0x603010, 
    filename=0x7fffffffe09f "/home/jnthn/dev/rakudo/perl6.moarvm") at src/moar.c:309
#4  0x000000000040108b in main (argc=9, argv=0x7fffffffdbc8) at src/main.c:192

Phew, it looks like the second case, given there’s no JIT entry stub on the stack. So, we followed a NULL function pointer. Really?

(gdb) frame 1
#1  0x00007ffff78cacbc in MVM_coerce_smart_stringify (tc=0x6037c0, obj=0x605c10, res_reg=0x56624d8)
at src/core/coerce.c:214
214     ss = REPR(obj)->get_storage_spec(tc, STABLE(obj));

Yes, really. Presumably, that get_storage_spec is bogus. (I did a p to confirm it.) So, how is obj looking?

(gdb) p *obj
$1 = {header = {sc_forward_u = {forwarder = 0x48000000000001, sc = {sc_idx = 1, idx = 4718592}, 
  st = 0x48000000000001}, owner = 6349760, flags = 0, size = 0}, st = 0x6d06c0}

Criminally corrupt; let me count the ways. For one, 6349760 looks like a very high thread ID for a program that’s only running a single thread (they are handed out sequentially). For two, 0 is not a valid object size. And for three, idx is just a nuts value too (even Rakudo’s CORE.setting isn’t made up of 4 million objects). So, where does this object live? Well, let’s try out last week’s handy object locator to figure out:

(gdb) p MVM_gc_debug_find_region(tc, obj)
In tospace of thread 1

Well. Hmpfh. That’s actually an OK place for an object to be. Of course, the GC spaces swap often enough at this nursery size that a pointer could fail to be updated, point into fromspace after one GC run, not be used until a later GC run, and then come to point into some random bit of tospace again. How to test this hypothesis? Well, instead of 32768 bytes of nursery, what if I make it…well, 40000 maybe?

Here we go again:

$ ./perl6-gdb-m -Ilib t/spec/S04-declarations/constant.rakudo.moar 
[trust me, this omitted stuff is boring]
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff78b00db in MVM_interp_run (tc=tc@entry=0x6037c0, initial_invoke=0x0, invoke_data=0x563a450)
    at src/core/interp.c:2855
2855                    if (obj && IS_CONCRETE(obj) && STABLE(obj)->container_spec)

Aha! A crash…somewhere else. But where is obj this time?

(gdb) p MVM_gc_debug_find_region(tc, obj)
In fromspace of thread 1

Hypothesis confirmed.

Dump diving

So…what now? Well, just turn on that wonder MVM_GC_DEBUG flag and the bug will make itself clear, of course. Alas, no. It didn’t trip a single one of the sanity checks added by enabling thee flag. So, what next?

The where in gdb tells us where in the C code we are. But what high level language code was MoarVM actually running at the time? Let’s dump the VM frame stack and find out:

(gdb) p MVM_dump_backtrace(tc)
   at <unknown>:1  (./blib/Perl6/Grammar.moarvm:initializer:sym<=>)
 from gen/moar/stage2/QRegex.nqp:1378  (/home/jnthn/dev/MoarVM/install/share/nqp/lib/QRegex.moarvm:!protoregex)
 from <unknown>:1  (./blib/Perl6/Grammar.moarvm:initializer)
 from src/Perl6/Grammar.nqp:3140  (./blib/Perl6/Grammar.moarvm:type_declarator:sym<constant>)
 from gen/moar/stage2/QRegex.nqp:1378  (/home/jnthn/dev/MoarVM/install/share/nqp/lib/QRegex.moarvm:!protoregex)
 from <unknown>:1  (./blib/Perl6/Grammar.moarvm:type_declarator)
 from <unknown>:1  (./blib/Perl6/Grammar.moarvm:term:sym<type_declarator>)
 from gen/moar/stage2/QRegex.nqp:1378  (/home/jnthn/dev/MoarVM/install/share/nqp/lib/QRegex.moarvm:!protoregex)
 from src/Perl6/Grammar.nqp:3825  (./blib/Perl6/Grammar.moarvm:termish)
 from gen/moar/stage2/NQPHLL.nqp:886  (/home/jnthn/dev/MoarVM/install/share/nqp/lib/NQPHLL.moarvm:EXPR)
from src/Perl6/Grammar.nqp:3871  (./blib/Perl6/Grammar.moarvm:EXPR)
...

I’ve snipped out a good chunk of a fairly long stack trace. But look! We were parsing and compiling a constant at the time of the crash. That’s somewhat interesting, and explains why constant.t was a likely test file to show this bug up. But MoarVM has little idea about parsing or Perl 6’s idea of constants. Rather, something on that codepath of the compiler must run into a bug of sorts.

Looking at the location in interp.c the op being interpreted at the time was decont, which takes a value out of a Scalar container, if it happens to be in one. Combined with knowing what code we were in, I can invoke moar --dump blib/Perl6/Grammar.moarvm, and then locate the disassembly of initializer:sym<=>.

There were a few uses of the decont op in that function. All of them seemed to be on things looked up lexically or dynamically. So, I instrumented those ops with a fromspace check. Re-compiled, and…

(gdb) break MVM_panic
Breakpoint 1 at 0x7ffff78a19a0: file src/core/exceptions.c, line 779.
(gdb) r
Starting program: /home/jnthn/dev/MoarVM/install/bin/moar --execname=./perl6-gdb-m --libpath=/home/jnthn/dev/MoarVM/install/share/nqp/lib --libpath=/home/jnthn/dev/MoarVM/install/share/nqp/lib --libpath=. /home/jnthn/dev/rakudo/perl6.moarvm --nqp-lib=blib -Ilib t/spec/S04-declarations/constant.rakudo.moar
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, MVM_panic (exitCode=1, 
    messageFormat=0x7ffff799bc58 "Collectable %p in fromspace accessed") at src/core/exceptions.c:779
779 void MVM_panic(MVMint32 exitCode, const char *messageFormat, ...) {
(gdb) where
#0  MVM_panic (exitCode=1, messageFormat=0x7ffff799bc58 "Collectable %p in fromspace accessed")
    at src/core/exceptions.c:779
#1  0x00007ffff78ba657 in MVM_interp_run (tc=0x1, tc@entry=0x6037c0, initial_invoke=0x0, 
    invoke_data=0x604b80) at src/core/interp.c:374
#2  0x00007ffff7979071 in MVM_vm_run_file (instance=0x603010, 
    filename=0x7fffffffe09f "/home/jnthn/dev/rakudo/perl6.moarvm") at src/moar.c:309
#3  0x000000000040108b in main (argc=9, argv=0x7fffffffdbc8) at src/main.c:192

And what’s in interp.c around that line? The getdynlex op. That’s the one that is used to lookup things like $*FOO in Perl 6. So, a dynamic lexical lookup seemed to be handing back an outdated object. How could that happen?

Interesting idea is insufficient

My next idea was to see if I could catch the moment that something bad was put into the lexical. I’d already instrumented the obvious places with no luck. But…what if I could intercept every single VM register access and see if an object from fromspace was read? Hmmm… It turned out that I could make that happen with a sufficiently cunning patch. I made it opt-in rather than the default for MVM_GC_DEBUG because it’s quite a slow-down. I’m sure that this will come in really useful for finding some GC bug some day. But for this bug? It was no direct help.

It was of some indirect help, however. It suggested strongly that at the time the lexical was set (actually, it turned out to be $*LEFTSIGIL), everything was valid. Somewhere between then and the lookup of it using the getdynlex op, things went bad.

Cache corruption

So what does getdynlex actually do? It checks if the current frame declares a lexical of the specified name. If so, it returns the value. If not, it looks in the caller for the value. If that fails, it goes on to the caller’s caller, until it runs out of stack to search and then gives up.

If that’s what it really did, then this bug would never have happened. But no, people actually want Perl 6 to run fast and stuff, so we can’t just implement the simplest possible thing and go chill. Instead, there’s a caching mechanism. And, as well all know, the two hardest problems in computer science are cache invalidation and cache invalidation.

The caching is relatively simple: each frame has slots for sticking a name, register pointer, and type in it.

MVMString   *dynlex_cache_name;
MVMRegister *dynlex_cache_reg;
MVMuint16    dynlex_cache_type;

When getdynlex finds something the slow way, it then looks down the stack again and finds a frame with an empty dynlex_cache_name. It then sticks the name of dynamic lexical into the name slot, a pointer to the MoarVM lexical into the reg slot, and what type of lexical it was (native int, native num, object, etc.) into the type slot. The most interesting of these is the reg slot. The MVMRegister type is actually a union of different types that we may store in a register or lexical. We re-use the union for registers that live while the frame is on the callstack and lexicals that may need to live longer thanks to closures. So, each frame as two arrays of these:

MVMRegister *env;   /* The lexical environment */
MVMRegister *work;  /* Working space that dies with the frame */

And so the dynlex_cache_reg ends up pointing to env somewhere in the frame that we found the lexical in.

So, the big question: was the caching to blame? I shoved in a way to disable it and…the bug vanished.

Note by this point we’re up to two pieces that contribute to the bug: the GC and the dynamic lexical cache. The thing is, the dynamic lexical cache is used very heavily. My gut feeling told me there must be at least one more factor at play here.

Suspicious specialization

So, what could the other factor be? I re-enabled the cache, verified the crash came back, and then stuck MVM_SPESH_DISABLE=1 into the environment. And…no bug. So, it appeared that dynamic optimization was somehow involved too. That’s the magic that looks at what types actually show up at runtime, and compiles specialized versions of the code that fast-paths a bunch of operations based on that (this specialization being where the name “spesh” comes from). Unfortunately, MVM_SPESH_DISABLE is a rather blunt instrument. It disables a huge range of things, leaving a massive surface area to consider for the bug. Thankfully, there are some alternative environment variables that just turn off parts of spesh.

First, I tried MVM_JIT_DISABLE=1, which results in spesh interpreting the specialized version of the code rather than turning it into machine code to remove the interpreter overhead. The bug remained.

Next, I tried MVM_SPESH_OSR_DISABLE, which disables On Stack Replacement. This is a somewhat fiddly optimization that detects hot loops as they are being interpreted, pauses execution, produces an optimized version of the code, and then recalculates the program counter so it points to the appropriate point in the optimize code and continues execution. Basically, the interpreter gets the code it’s executing replaced under it – perhaps with machine code, which the interpreter is instructed to jump into immediately. Since this also fiddles with frames “in flight”, it seemed like a good candidate. But…nope. Bug remained.

Finally, I tried MVM_SPESH_INLINE_DISABLE, which disables inlining. That’s where we spot a call to a relatively small subroutine or method, and just replace the call with the code of the sub or method itself, saving the cost of setting up callframes. And…bug vanished!

So, inlining was apparently a factor too. The trouble is, that also didn’t seem to add up to an obvious bug. Consider:

sub foo($a) {
    bar($a);
}
sub bar($b) {
    my $c = $b + 6;
    $c * 6
}

Imagine that bar was to be inlined into foo. Normally they’d have lexical slots in ->env as follows:

A:  | $_ | $! | $/ | $a |
B:  | $_ | $! | $/ | $b | $c |

The environment for the frame inline(A, B) would look like:

inline(A, B):  | $_ | $! | $/ | $a | $_ | $! | $/ | $b | $c |
                \---- from A ----/  \------- from B -------/

Now, it’s easy to imagine various bugs that could arise in the initial lookup of a dynamic lexical in such a frame. Indeed, the dynamic lexical lookup code actually has two bunches of code that deal with such frames, one in the case the specialized code is being interpreted and one in the case where it has been JIT compiled. But by the time we are hitting the cache, there’s nothing smart going on at all: it’s just a cheap pointer deference.

Dastardly deoptimization

So, it seems we need a fourth ingredient to explain the bug. By now, I had a pretty good idea what it was. MoarVM doesn’t just to optimizations based on properties it can prove will always hold. It can also do speculative optimization based on properties that it expects will probably hold up. For example, suppose we have:

sub foo($a, $b) {
    $b.some-huge-complex-call($a);
    return $a.Str;
}

Imagine we’re generating a specialization of this routine for the case $a is an object of type Product. The Str method is tiny, so we go ahead and inline it. However, some-huge-complex-call takes all kinds of code paths. We can’t be sure, from our analysis, that at some point it won’t mix in to the object in $a. What if it mixes in a role that has an alternate Str method? Our inlining would break stuff! We’d end up calling the Product.Str method, not the one from the mixin.

One reaction is to say “well, we’ll just not ever optimize stuff unless we can be REALLY sure”, which is either hugely limiting or relies on much more costly analyses. The other path, which MoarVM does, is to say “eh, let’s just assume mixins won’t happen, and if they do, we’ll fix things then!” The process of fixing things up is called deoptimization. We walk the call stack, rewriting return addresses to point to the original interpreted code instead of the optimized version of the code.

But what, you might wonder, do we do if there’s a frame on the stack that is actually the result of an inlining operation? What if we’re in the code that resulted from inline(A,B), in the bit that corresponds to the code of B? Well, we have to perform – you guessed it – uninlining! The composite call frame has to be dissected, and the call stack rewritten to look like it would have if we’d been running the original interpreted code. To do this, we’d create a call frame for B, complete with space for its lexicals, and copy the lexicals from inline(A,B) that belong to B into that new buffer.

The code that does this is one of the very few parts of MoarVM that frightens me.

For good reason, it turns out. This deoptimization, together with uninlining, was the final ingredient needed for the bug. Here’s what happened:

  1. The method EXPR in Perl6::Grammar was inlined into one of its callers. This EXPR method declares a $*LEFTSIGIL variable.
  2. While parsing the constant, the $*LEFTSIGIL is assigned to the sigil of the constant being declared, if it has one (so, in constant $a = 42 it would be set to $).
  3. Something does a lookup of $*LEFTSIGIL. It is located and cached. The cache entry points into a region of the ->env of the frame that inlined, and thus incorporated, the lexical environment of EXPR.
  4. At some point, a mixin happens, causing a deoptimization of the call stack. The frame that inlined EXPR gets pulled apart. A new EXPR frame comes to exist, with the lexicals that used to live in the composite frame copied into them. Execution continues.
  5. A GC happens. The object containing the $ substring moves. The new EXPR frame’s lexical environment is updated.
  6. Another lookup of $*LEFTSIGIL happens. It hits the cache. The cache, however, still points to the place the lexical used to live in the composite frame. This memory has not been freed, because the first part of it is still being used. However, the GC no longer cares about its contents because that content is unreachable. Therefore, it contains an outdated pointer, thus leading to accessing memory that’s being used for something else entirely by that point, leading to the eventual segmentation fault.

The most natural fix was to invalidate the cache during deoptimization.

Lessons learned

The bug I wrote up last week was thanks to a comparatively simple oversight made within the scope of a few lines of a single C function. While this one could be fixed with a small amount of code added in a single file, the segfault arose from the interaction of four distinct features existing in MoarVM:

Even when a segfault was not produced, thanks to “lucky” GC timing, the bug would lead to reading of stale data. It just so turned out that the data wasn’t ever stale enough in reality to break things on this particular code path.

All of garbage collection, inlining, and deoptimization are fairly complicated. By contrast, the dynamic lexical lookup cache is fairly easy. Interestingly, it was the addition of this easy feature that introduced the bug – not because the code that was added was wrong, but rather because it did something that some other far flung piece of code – the deoptimizer – had quietly relied on not happening.

So, what might be learned for the future?

The most immediate practical learning is that taking interior pointers into mutable data structures is risky. In this case, that data structure was a composite lexical environment, that later got taken apart. Conceptually, the environment was resized and the interior pointer was off the end of the new size. This suggests either providing a safe way to acquire such a reference, or an alternative design for the dynamic lexical cache to avoid needing to do so.

Looking at the bigger picture, this is all about managing complexity. Folks who work with me tend to observe I worry a good bit about loose coupling, to the degree that I’m much more hesitant than the typical developer when it comes to code re-use. Acutely aware that re-use means use, and use means dependency, and dependency means coupling, I tend to want things to prove they really are the same thing rather than just looking like they might be the same thing. MoarVM reflects this in various ways: to the degree I can, I try to organize it as a collection of parts that either keep themselves very much to themselves, or that collaborate over a small number of very stable data structures. One of the reasons Git works architecturally is because while all the components of it are messing with the same data structure, it’s a very stable and well-understood data structure.

In this bug, MVMFrame is the data structure in question. A whole load of components know about it and work with it because – so the theory went – it’s one of the key stable data structures of the VM. Contrast it with the design of things like the Unicode normalizer or the fixed size allocator, which nothing ever pokes into directly. These are likely to want to evolve over time to choose smarter data structures, or to get extra bits of state to cope with Unicode’s latest and greatest emoji boundary specification. Therefore, all work with them is hidden nicely behind an API.

In reality, MVMFrame has grown to contain quite a fair few things as MoarVM has evolved. At the same time, its treated as a known quantity by lots of parts of the codebase. This is only sustainable if every addition to MVMFrame is followed by considering how every other part of the VM that interacts with it will be affected by the change, and making compensating changes to those components. In this case, the addition of the dynamic lexical cache into the frame data structure was not accompanied by sufficient analysis of which other parts of the VM may need compensating changes.

The bug I wrote up last week isn’t really the kind that causes an architect a headache. It was a localized coding slip-up that could happen to anyone on a bad day. It’s a pity we didn’t catch it in code review, but code reviewers are also human. This bug, by contrast, arose as a result of the complexity of the VM – or, more to the point, insufficient management of that complexity. And no, I’m not beating myself up over this. But, as MoarVM architect, this is exactly the type of bug that catches my eye, and causes me to question assumptions. In the immediate, it tells me what kinds of patches I should be reviewing really carefully. In the longer run, the nature of the MVMFrame data structure and its level of isolation from the rest of the codebase deserves some questioning.


6guts: Taking a couple of steps backwards to fix a GC bug

Published by jnthnwrthngtn on 2016-11-30T23:16:42

When I popped up with a post here on Perl 6 OO a few days ago, somebody noted in the comments that they missed my write-ups of my bug hunting and fixing work in Rakudo and MoarVM. The good news is that the absence of posts doesn’t mean an absence of progress; I’ve fixed dozens of things over the last months. It was rather something between writers block and simply not having the energy, after a day of fixing things, to write about it too. Anyway, it seems I’ve got at least some of my desire to write back, so here goes. (Oh, and I’ll try and find a moment in the coming days to reply to the other comments people wrote on my OO post too.)

Understanding a cryptic error

There are a number of ways MoarVM can come tumbling down when memory gets corrupted. Some cases show up as segmentation faults. In other cases, the VM comes across something that simply does make any kind of sense and can infer that memory has become corrupted. Two panics commonly associated with this are “zeroed target thread ID in work pass” and “invalid thread ID XXX in GC work pass”, where XXX tends to be a sizable integer. At the start of a garbage collection – where we free up memory associated with dead objects – we do something like this:

  1. Go through all the threads that have been started, and signal those that are not blocked (e.g. waiting for I/O, a lock acquisition, or for native code to finish) to come and participate in the garbage collection run.
  2. Assign each non-blocked thread itself to work on.
  3. Assign each blocked thread’s work to a non-blocked thread.

So, every thread – blocked or not – ends up assigned to a running thread to take care of its collection work. It’s the participation of multiple threads that makes the MoarVM GC parallel (which is a different thing to having a concurrent GC; MoarVM’s GC can barely claim to be that).

The next important thing to know is that the every object, at creation, is marked with the ID of the thread that allocated it. This means that, as we perform GC, we know whether the object under consideration “belongs” to the current thread we’re doing GC work for, or some other one. In the case that the ID in the object header doesn’t match up with the thread ID we’re doing GC work for, then we stick it into a list of work to pass off to the thread that is responsible. To avoid synchronization overhead, we pass then off in batches (so there’s only synchronization overhead per batch). This is far from the only way to do parallel GC (other schemes include racing to write forwarding pointers), but it keeps the communication between participating threads down and leaves little surface area for data races in the GC.

The funny thing is that if none of that really made any sense to you, it doesn’t actually matter at all, because I only told you about it all so you’d have a clue what the “work pass” in the error message means – and even that doesn’t matter much for understanding the bug I’ll eventually get around to discussing. Anyway, TL;DR version (except you did just read it all, hah!) is that if the owner ID in an object header is either zero or an out-of-range thread ID, then we can be pretty sure there’s memory corruption afoot. The pointer under consideration is either to zeroed memory, or to somewhere in memory that does not correspond to an object header.

So, let’s debug the panic!

Getting the panic is, perhaps, marginally better than a segmentation fault. I mean, sure, I’m a bit less embarrassed when Moar panics than SEGVs, and perhaps it’s mildly less terrifying for users too. But at the end of the day, it’s not much better from a debugging perspective. At the point we spot the memory corruption, we have…a pointer. That points somewhere wrong. And, this being the GC, it just came off the worklist, which is full of a ton of pointers.

If only we could know where the pointer came from, I hear you think. Well, it turns out we can: we just need to detect the problem some steps back, where the pointer is added to the worklist. In src/gc/debug.h there’s this:

#define MVM_GC_DEBUG 0

Flip that to a 1, recompile, and magic happens. Here’s a rather cut down snippet from in worklist.h:

#if MVM_GC_DEBUG
#define MVM_gc_worklist_add(tc, worklist, item) \
    do { \
        MVMCollectable **item_to_add = (MVMCollectable **)(item); \
        if (*item_to_add) { \
            if ((*item_to_add)->owner == 0) \
                MVM_panic(1, "Zeroed owner in item added to GC worklist"); \
                /* Various other checks here.... */ 
        } \
        if (worklist->items == worklist->alloc) \
            MVM_gc_worklist_add_slow(tc, worklist, item_to_add); \
        else \
            worklist->list[worklist->items++] = item_to_add; \
    } while (0)
#else
#define MVM_gc_worklist_add(tc, worklist, item) \
    do { \
        MVMCollectable **item_to_add = (MVMCollectable **)(item); \
        if (worklist->items == worklist->alloc) \
            MVM_gc_worklist_add_slow(tc, worklist, item_to_add); \
        else \
            worklist->list[worklist->items++] = item_to_add; \
    } while (0)
#endif

So, in the debug version of the macro, we do some extra checks – including the one to detect a zeroed owner. This means that when MoarVM panics, the GC code that is placing the bad pointer into the list is on the stack. Then it’s a case of using GDB (or your favorite debugger), sticking a breakpoint on MVM_panic (spelled break MVM_panic in GDB), running the code that explodes, and then typing where. In this case, I was pointed at the last line of this bit of code from roots.c:

void MVM_gc_root_add_frame_roots_to_worklist(MVMThreadContext *tc, MVMGCWorklist *worklist,
                                             MVMFrame *cur_frame) {
    /* Add caller to worklist if it's heap-allocated. */
    if (cur_frame->caller && !MVM_FRAME_IS_ON_CALLSTACK(tc, cur_frame->caller))
        MVM_gc_worklist_add(tc, worklist, &cur_frame->caller);

    /* Add outer, code_ref and static info to work list. */
    MVM_gc_worklist_add(tc, worklist, &cur_frame->outer);

So, this tells me that the bad pointer is to an outer. The outer pointer of a call frame points to the enclosing lexical scope, which is how closures work. This provides a bit of inspiration for bug hunting; for example, it would now make sense to consider codepaths that assign outer to see if they could ever fail to keep a pointer up to date. The trouble is, for such an incredibly common language feature to be broken in that way, we’d be seeing it everywhere. It didn’t fit the pattern. In fact, both my private $dayjob application that was afflicted with this, together with the whateverable set of IRC bots, had in common that they did a bunch of concurrency work and both spawned quite a lot of subprocesses using Proc::Async.

But where does the pointer point to?

Sometimes I look at a pointer and it’s obviously totally bogus (a small integer usually suggests this). But this one looked feasible; it was relatively similar to the addresses of other valid pointers. But where exactly does it point to?

There are only a few places that a GC-managed object can live. They are:

So, it would be very interesting to know if the pointer was into one of those. Now, I could just go examining it in the debugger, but with a dozen running threads, that’s tedious as heck. Laziness is of course one of the virtues of a programmer, so I wrote a function to do the search for me. Another re-compile, reproducing the bug in GDB again, and then calling that routine from the debugger told me that the pointer was into the tospace of another thread.

Unfortunately, thinking is now required

Things get just a tad mind-bending here. Normally, when a program is running, if we see a pointer into fromspace we know we’re in big trouble. It means that the pointer points to where an object used to be, but was then moved into either tospace or the old generation. But when we’re in the middle of a GC run, the two spaces are flipped. The old tospace is now fromspace, the old fromspace becomes the new tospace, and we start evacuating living objects in to it. The space left at the end will then be zeroed later.

I should mention at this point that the crash only showed up a fraction of the time in my application. The vast majority of the time, it ran just fine. The odd time, however, it would panic – usually over a zeroed thread owner, but sometimes over a junk value being in the thread owner too. This all comes down to timing: different thread are working on GC, in different runs of the program they make progress at different paces, or get head starts, or whatever, and so whether the zeroing of the unused part of tospace happened or not yet will vary.

But wait…why didn’t it catch the problem even sooner?

When the MVM_GC_DEBUG flag is turned on, it introduces quite a few different sanity checks. One of them is in MVM_ASSIGN_REF, which happens whenever we assign a reference to one object into another. (The reason we don’t simply use the C assignment operator for that is because the inter-generational write barrier is needed.) Here’s how it looks:

#if MVM_GC_DEBUG
#define MVM_ASSIGN_REF(tc, update_root, update_addr, referenced) \
    { \
        void *_r = referenced; \
        if (_r && ((MVMCollectable *)_r)->owner == 0) \
            MVM_panic(1, "Invalid assignment (maybe of heap frame to stack frame?)"); \
        MVM_ASSERT_NOT_FROMSPACE(tc, _r); \
        MVM_gc_write_barrier(tc, update_root, (MVMCollectable *)_r); \
        update_addr = _r; \
    }
#else
#define MVM_ASSIGN_REF(tc, update_root, update_addr, referenced) \
    { \
        void *_r = referenced; \
        MVM_gc_write_barrier(tc, update_root, (MVMCollectable *)_r); \
        update_addr = _r; \
    }
#endif

Once again, the debug version does some extra checks. Those reading carefully will have spotted MVM_ASSERT_NOT_FROMSPACE in there. So, if we used this macro to assign to the ->outer that had the outdated pointer, why did it not trip this check?

It turns out, because it only cared about checking if it was in fromspace of the current thread, not all threads. (This is in turn because the GC debug bits only really get any love when I’m hunting a GC bug, and once I find it then they go back in the drawer until next time around.) So, I enriched that check and…the bug hunt came to a swift end.

Right back to the naughty deed

The next time I caught it under the debugger was not at the point that the bad ->outer assignment took place. It was even earlier than that – lo and behold, inside of some of the guts that power Proc::Async. Once I got there, the problem was clear and fixed in a minute. The problem was that the callback pointer was not rooted while an allocation took place. The function MVM_repr_alloc_init can trigger GC, which can move the object pointed to by callback. Without an MVMROOT to tell the GC where the callback pointer is so it can be updated, it’s left pointing to where the callback used to be.

So, bug fixed, but you may still be wondering how exactly this bug could have led to a bad ->outer pointer in a callframe some way down the line. Well, callback is a code object, and code objects point to an outer scope (it’s actually code objects that we clone to make closures). Since we held on to an outdated code object pointer, it in turn would point to an outdated pointer to the outer frame it closed over. When we invoked callback, the outer from the code object would be copied to be the outer of the call frame. Bingo.

Less is Moar

The hard part about GCs is not just building the collector itself. It’s that collectors bring invariants that are to be upheld, and a momentary lapse in concentration by somebody writing or reviewing a patch can let a bug like this slip through. At least 95% of the time when I handwavily say, “it was a GC bug”, what I really mean was “it was a bug that arose because some code didn’t play by the rules the GC requires”. A comparatively tiny fraction of the time, there’s actually something wrong in the code living under src/gc/.

People sometimes ask me about my plans for the future of MoarVM. I often tell them that I plan for there to be less of it. In this case, the code with the bug is something that I hope we’ll eventually write in, say, NQP, where we don’t have to worry about low-level details like getting write barriers correct. It’s just binding code to libuv, a C library, and we should be able to do that using the MoarVM native calling support (which is likely mature enough by now). Alas, that also has its own set of costs, and I suspect we’d need to improve native calling performance to not come out at a measurable loss, and that means teaching the JIT to emit native calls, but we only JIT on x64 so far. “You’re in a maze of twisty VM design trade-offs, and their funny smells are all alike.”


rakudo.org: Announce: Rakudo Star Release 2016.11

Published by Steve Mynott on 2016-11-27T18:51:38

On behalf of the Rakudo and Perl 6 development teams, I’m pleased to
announce the November 2016 release of “Rakudo Star”, a useful and usable
production distribution of Perl 6. The tarball for the November 2016 release
is available from http://rakudo.org/downloads/star/.

This is the fifth post-Christmas (production) release of Rakudo Star and
implements Perl v6.c. It comes with support for the MoarVM backend (all
module tests pass on supported platforms).

Please note that this release of Rakudo Star is not fully functional with
the JVM backend from the Rakudo compiler. Please use the MoarVM backend
only.

In the Perl 6 world, we make a distinction between the language (“Perl
6”) and specific implementations of the language such as “Rakudo Perl”.
This Star release includes release 2016.11 of the Rakudo Perl 6 compiler,
version 2016.11 of MoarVM, plus various modules, documentation, and
other resources collected from the Perl 6 community.

The changes in this release are outlined below:

New in 2016.11:

Fixes:
+ Various improvements to warning/error-reporting
+ Fixed assigning values to shaped arrays through iterators [839c762]
+ Fixed Str.Int not failing on numerics with combining characters [d540fc8]
+ [JVM] Fixed <a b c>.antipairs breakage [dd7b055]
+ defined routine now correctly authothreads with Junctions [189cb23]
+ Fixed poor randomness when .pick()ing on ranges with >32-bit numbers [34e515d]
+ Fixed infix:<x> silencing Failures [2dd0ddb]
+ Fixed edge case in is-approx that triggers DivByZero exception [f7770ed]
+ (Windows) Fixed returning of an error even when succeeding in mkdir [208a4c2]
+ (Windows) Fixed precomp unable to rename a newly compiled file [44a4c75]
+ (Test.pm) Fixed indent of multi-line diag() and test failure messages [43dbc96]
+ Fixed a callframe crash due to boxing of NULL filenames [200364a]
+ ∞ ≅ ∞ now gives True [4f3681b]
+ Fixed oversharing with grammars used by multiple threads [7a456ff]
+ Fixed incorrect calculations performed by acotan(num) [8e9fd0a]
+ Fixed incorrect calculations performed by asinh(num)/acosh(num) [a7e801f]
+ Fixed acosh return values for large negative numbers [5fe8cf7]
+ asinh(-∞) now returns -∞ instead of NaN [74d0e36]
+ atanh(1) now returns ∞ instead of throwing [906719c][66726e8]
+ Fixed missing close in IO::Path.slurp(:bin) [697a0ae]
+ :U QuantHashes now auto-vivify to their correct type and not Hash [79bb867]
+ Mix/MixHash.Bag/BagHash coersion now ignores negative weights [87bba04]
+ arity-0 infix:<Z> now returns a Seq instead of a List [3fdae43]
+ Fix augment of a nested package [87880ca]
+ Smartmatch with Regex variable now returns a Match instead of Bool [5ac593e]
+ Empty ()[0] now returns Nil instead of False [f50e39b]
+ Failed IO::Socket::Async connection no longer produces unexpected crash [f50e39b]
+ Quitting Supplies with no QUIT phasers no longer unexpectedly crash [f50e39b]
+ Fixed NativeCall issues on big endian machines [627a77e]
+ Fixed broken handling of $/ in some uses of `.match` [ba152bd]
+ Fixed Lock.protect not releasing the lock on control exceptions [48c2af6]
+ MoarVM now builds on any version of macOS [b4dfed2]
+ Fixed concurrency crashes due to garbage collection [6dc5074]
+ Fixed race condition in EmptyIterator [ed2631c]
+ Fixed hang with multi-threaded long-running NativeCall calls [f99d958]
+ Made my @a[10] = ^Inf work [aedb8e7]
+ Fixed [;]:delete [3b9c4c9]
+ Fixed incorrect handling of negative weights in ∪ operator [e10f767]
+ duckmap now preserves types of Iterables [43cb55f]
+ Fixed premature overflow to Inf with large Num literals [729d7e3]
+ Fixed race condition in NativeCall callsite used by multiple threads [49fd825]
+ Fixed instabilities in programs launching many threads at startup [0134132]
+ Fixed crash when reporting X::Composition::NotComposable or
X::Inheritance::Unsupported exceptions [a822bcf]
+ Fixed clock_gettime issue on macOS [ee8ae92]
+ Fixed SEGV in multi-threaded programs with strings-as-strands [395f369]
+ `regex` TOP Grammar rule will now backtrack if needed [4ccb2f3]
+ Fixed .rotate/.reverse on 1-dimmed arrays assigning to self [2d56751]
+ Fixed exponentiation involving zeros for Complex numbers [7f32243]
+ Fixed Label.gist [29db214][53d7b72]
+ Fixed certain edge cases of incorrect stringification of Rationals
with .Str, .perl, and .base [b5aa3c5]

Additions:
+ Added TWEAK submethod for object construction [fdc90a2][9409d68]
+ Added DateTime.hh-mm-ss [bf51eca]
+ Added parse-base routine [7e21a24]
+ is-approx with no explicit tolerances now uses more complex algorithm to
choose a tolerance to use (same as old is_approx) [82432a4]
+ on failure, is-approx now displays actual values received [b4fe680]
+ Added Iterator.skip-one to skip a single value [71a01e9]
+ Added Iterator.skip-at-least to skip several values [8d357af]
+ Re-imagined Str.match [b7201a8]:
+ the family of :nth is now lazy will return whatever can find
+ non-monotonically increasing :nth iterator values will now die
+ Str.match/subst/subst-mutate now have :as adverb [1b95636][c9a24d9][aaec517]
+ &infix: now works with Setty objects [d92e1ad]
+ :ov and :ex adverbs are now illegal in Str.subst [b90c741]
+ Made nextwith/samewith/nextsame/callwith to be real subroutines [70a367d]
+ Added CX::Emit and CX::Done control exceptions [07eeea8]
+ Setty.Mix/.MixHash coercers now use Int weights instead of Bool [7ba7eb4]
+ Implemented :kv,:p,:k,:v on 1-element multidim [;] [764cfcd]
+ .chrs can now also accepts codepoint numbers as Str [4ae3f23]
+ Added support for compilation of Rakudo on Solaris [a43b0c1]
+ IterationEnd.perl/gist/Str now returns text “IterationEnd” [59bb1b1]
+ Added X::Syntax::Number::InvalidCharacter exception [2faa55b]
+ .reverse/rotate on 1-dimmed arrays are now nodal [cd765e6]
+ .line and .file on core Code now references original source files [b068e3a]
+ .rethrow now works on unthrown exceptions [58a4826]
+ All Reals now accept `Whatever` as the second argument to .base() [c1d2599]
+ sub MAIN usage message shows possible Enum values if param is
named and is an Enum [a3be654]

Efficiency:
+ Made slip(@a) about 1.2x faster [37d0e46]
+ Made initialization of 2+dimmed array 10x to 16x faster [dfb58d4]
+ Str.match is now about 5% faster [4fc17df]
+ Str.match with :nth feature is now about 2x faster [41e2572]
+ my @a = Str.match(…) is now about 5% faster [e472420]
+ Str.comb(Regex) is now about 7x faster [1794328]
+ Simple Str.subst/subst-mutate is now about 30% faster [364e67b]
+ Match.Str|prematch|postmatch is now about 2x faster [e65d931]
+ Match.Bool is now about 1% faster (not much, but used a lot) [1fce095]
+ Made ~~ /foo/ faster: 2% for successful/6% for failed matches [05b65d0]
+ Made <foo bar baz> ~~ /foo/ about 2x faster [e4dc8b6]
+ Made %h ~~ /foo/ about 2x faster [33eeb32]
+ Frequent accesses of multiple adverbs (e.g. %h<a>:p:exists)
is now 2x faster [f22f894]
+ Made infix: faster: Str: 14x, type: 10x, Range: 3.5x,
Int|Seq|Hash: 1.5x, Array: 1.2x [bc7fcc6]
+ IO::Spec::Unix.canonpath is now 7x to 50x faster [f3f00fb]
+ Baggy.roll/pick is now about 10% faster [fc47bbf]
+ Made copying shaped arrays 10x to 20x faster [a1d8e93][0cf7b36][d27ecfa]
+ Made setting up a shaped array 2x as fast [f06e4c3]
+ Made creation of typed shaped arrays 15% faster [f5bf6c1]
+ Made 2d/3d array accesses about 7x as fast [d3a0907]
+ Made AT-POS on 1,2,3dim arrays about 20x faster [feb7bcb]
+ Made creating a shaped array about 50% faster [1293188][576f3a1]
+ Made .AT-POS on 3+ dimmed arrays 20% faster [1bb5aad]
+ Made over-indexed .AT-POS on 1,2,3 dimmed arrays about 10% faster [1bb5aad]
+ Made multi-dimmed ASSIGN-POS about 30% faster [5b2bdeb]
+ Made .ASSIGN-POS for 1,2,3dimmed arrays about 40x faster [050cf72]
+ Made n-dimmed .EXISTS-POS about 1.5x faster [006f008]
+ Made .EXISTS-POS for 1,2,3dimmed arrays about 10x faster [b1c41b7]
+ Made n-dimmed DELETE-POS about 1.3x faster [6ccecb1]
+ Made .DELETE-POS for 1,2,3dimmed arrays about 20x faster [55b9e90]
+ Made n-dimmed BIND-POS about 1.3x faster [2827edb]
+ Made .BIND-POS for 1,2,3dimmed arrays about 20x faster [9f94525]
+ Made @a[10].STORE at least as fast as @a.STORE [8064ff1]
+ Made .kv on shaped Arrays about 4x faster [e42b68e]
+ Made .pairs/.antipairs on shaped Arrays about 2.8x faster [0f2566a][f608a33]
+ Made List.kv about 30% faster [7a2baf4]
+ for loops on 1-dimmed arrays are now 3x faster [ed36e60]
+ .kv on 1-dimmed arrays is now 7x faster [608de26]
+ .pairs/.antipairs on 1-dimmed arrays is now 3x faster [b7d9537][1c425f9]
+ postcircumfix[;] on 2/3 dimmed arrays is now 9x faster [0b97362]
+ Assignment to [;] for 2/3 dimmed arrays is now 10x faster [ce85ba3]
+ [;]:exists for 2/3 dimmed arrays is now 7x faster [e3e3fef]
+ [;]:delete for 2/3 dimmed arrays is now 10x faster [3b9c4c9]
+ [;] := foo for 2/3 dimmed arrays is now 10x faster [eaf4132]
+ .iterator and .values on shaped arrays are now about 4x faster [736ab11]
+ Fixed optimization of shaped arrays that gives 10% gain on simple `for`
loops and likely will give larger gains on bigger programs [b7e632e]
+ Made starting MappyIterator faster, affecting all Hash/Map/Baggy iterator
methods. 2-elem Hash iteration is 1.6x faster [97fb6c2]
+ Made starting a grepper faster: .grep on with no adverbs on 2-element list
is now 20% faster [077c8f0]
+ Made Date/DateTime creation 20% faster [0e7f480]
+ Hashes now use 4 (32-bit) or 8 (64-bit) bytes less memory per stored item [395f369]
+ Rational.Str is now about 40% faster [b5aa3c5]
+ Rational.base is now about 30% faster [b5aa3c5]
+ Various streamlining of code that’s hard to measure the final impact of

Notable changes in modules shipped with Rakudo Star:

+ DBIish: Prepare tests for lexical module loading and do not rely on your dependencies to load modules
+ Pod-To-HTML: Bump version and use a separate element as anchor for “to top” links
+ Terminal-ANSIColor: Update Pod documentation and support 256-color and 24-bit RGB modes
+ doc: Large number of changes – too many to detail here
+ json_fast: Speed gains
+ panda: Explicitly use Panda::Project as we use it in class Panda
+ perl6-lwp-simple: Change to http.perl6.org since the main site is going https and add NO_NETWORK_TESTING

There are some key features of Perl 6 that Rakudo Star does not yet
handle appropriately, although they will appear in upcoming releases.
Some of the not-quite-there features include:

* advanced macros
* non-blocking I/O (in progress)
* some bits of Synopsis 9 and 11

There is an online resource at http://perl6.org/compilers/features
that lists the known implemented and missing features of Rakudo’s
backends and other Perl 6 implementations.

In many places we’ve tried to make Rakudo smart enough to inform the
programmer that a given feature isn’t implemented, but there are many
that we’ve missed. Bug reports about missing and broken features are
welcomed at rakudobug@perl.org.

See http://perl6.org/ for links to much more information about
Perl 6, including documentation, example code, tutorials, presentations,
reference materials, design documents, and other supporting resources.
Some Perl 6 tutorials are available under the “docs” directory in
the release tarball.

The development team thanks all of the contributors and sponsors for
making Rakudo Star possible. If you would like to contribute, see
http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org
mailing list, or join us on IRC #perl6 on freenode.

6guts: Perl 6 is biased towards mutators being really simple. That’s a good thing.

Published by jnthnwrthngtn on 2016-11-25T01:09:33

I’ve been meaning to write this post for a couple of years, but somehow never quite got around to it. Today, the topic of mutator methods came up again on the #perl6 IRC channel, and – at long last – conincided with me having the spare time to write this post. Finally!

At the heart of the matter is a seemingly simple question: why does Perl 6 not have something like the C# property syntax for writing complex setters? First of all, here are some answers that are either wrong or sub-optimal:

Back to OO basics

The reason the question doesn’t have a one-sentence answer is because it hinges on the nature of object orientation itself. Operationally, objects consist of:

If your eyes glazed over on the second bullet point, then I’m glad you’re reading. If I wasn’t trying to make a point, I’d have simply written “a mechanism for calling a method on an object”. So what is my point? Here’s a quote from Alan Kay, who coined the term “object oriented”:

I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea. The big idea is “messaging”…”

For years, I designed OO systems primarily thinking about what objects I’d have. In class-based languages, this really meant what classes I’d have. How did I figure that out? Well, by thinking about what fields go in which objects. Last of all, I’d write the methods.

Funnily enough, this looks very much like procedural design. How do I build a C program? By modeling the state into various structs, and then writing functions work with with those structs. Seen this way, OO looks a lot like procedural. Furthermore, since OO is often taught as “the next step up” after procedural styles of programming, this way of thinking about objects is extremely widespread.

It’s little surprise, then, that a lot of OO code in the wild might as well have been procedural code in the first place. Many so-called OO codebases are full of DTOs (“Data Transfer Objects”), which are just bundles of state. These are passed to classes with names like DogManager. And a manager is? Something that meddles with stuff – in this case, probably the Dog DTO.

Messaging thinking

This is a far cry from how OO was originally conceived: autonomous objects, with their own inner state, reacting to messages received from the outside world, and sending messages to other objects. This thinking can be found today. Of note, it’s alive and well in the actor model. These days, when people ask me how to get better at OO, one of my suggestions is that they take a look at actors.

Since I grasped that the messages are the important thing in OO, however, the way I design objects has changed dramatically. The first question I ask is: what are the behaviors? This in turn tells me what messages will be sent. I then consider the invariants – that is, rules that the behaviors must adhere to. Finally, by grouping invariants by the state they care about, I can identify the objects that will be involved, and thus classes. In this approach, the methods come first, and the state comes last, usually discovered as I TDD my way through implementing the methods.

Accessors should carry a health warning

An accessor method is a means to access, or mutate, the state held within a particular attribute of an object. This is something I believe we should do far more hesitantly than is common. Objects are intended to hide state behind a set of interesting operations. The moment the underlying state model is revealed to the outside world, our ability to refactor is diminished. The world outside of our object couples to that view of it, and it becomes far too tempting to put operations that belong inside of the object on the outside. Note that a get-accessor is a unidirectional coupling, while a mutate-accessor implies a bidirectional (and so tighter) coupling.

But it’s not just refactoring that suffers. Mutable state is one of the things that makes programs difficult to understand and reason about. Functional programming suggests abstinence. OO suggests you just stick to a pint or two, so your side-effects will be at least somewhat less obnoxious. It does this by having objects present a nice message-y view to the outside world, and keeping mutation of state locked up inside of objects. Ideas such as value objects and immutable objects take things a step further. These have objects build new objects that incorporate changes, as opposed to mutating objects in place. Perl 6 encourages these in various ways (notice how clone lets you tweak data in the resulting object, for example).

Furthermore, Perl 6 supports concurrent and parallel programming. Value objects and immutable objects are a great fit for that. But what about objects that have to mutate their state? This is where state leakage will really, really, end up hurting. Using OO::Monitors or OO::Actors, turning an existing class into a monitor (method calls are synchronous but enforce mutual exclusion) or an actor (method calls are asynchronous and performed one at a time on a given object) is – in theory – easy. It’s only that easy, however, if the object does not leak its state, and if all complex operations on the object are expressed as a single method. Contrast:

unless $seat.passenger {
    $seat.passenger = $passenger;
}

With:

$seat.assign-to($passenger);

Where the method does:

method assign-to($passenger) {
    die "Seat already taken!" if $!passenger;
    $!passenger = $passenger;
}

Making the class of which $seat is an instance into a monitor won’t do a jot of good in the accessor/mutator case; there’s still a gaping data race. With the second approach, we’d be safe.

So if mutate accessors are so bad, why does Perl 6 have them at all?

To me, the best use of is rw on attribute accessors is for procedural programming. They make it easy to create mutable record types. I’d also like to be absolutely clear that there’s no shame in procedural programming. Good OO design is hard. There’s a reason Perl 6 has sub and method, rather than calling everything a method and then coining the term static method, because subroutine sounds procedural and “that was the past”. It’s OK to write procedural code. I’d choose to deal with well organized procedural code over sort-of-but-not-really-OO code any day. OO badly used tends to put the moving parts further from each other, rather than encapsulating them.

Put another way, class is there to serve more than one purpose. As in many languages, it doubles up as the thing used for doing real OO programming, and a way to define a record type.

So what to do instead of a fancy mutator?

Write methods for semantically interesting operations that just happen to set an attribute among their various other side-effects. Give the methods appropriate and informative names so the consumer of the class knows what they will do. And please do not try to hide complex operations, potentially with side-effects like I/O, behind something that looks like an assignment. This:

$analyzer.file = 'foo.csv';

Will lead most readers of the code to think they’re simply setting a property. The = is the assignment operator. In Perl 6, we make + always mean numeric addition, and pick ~ to always mean string concatenation. It’s a language design principle that operators should have predictable semantics, because in a dynamic language you don’t statically know the types of the operands. This kind of predictability is valuable. In a sense, languages that make it easy to provide custom mutator behavior are essentially making it easy to overload the assignment operator with additional behaviors. (And no, I’m not saying that’s always wrong, simply that it’s inconsistent with how we view operators in Perl 6.)

By the way, this is also the reason Perl 6 allows definition of custom operators. It’s not because we thought building a mutable parser would be fun (I mean, it was, but in a pretty masochistic way). It’s to discourage operators from being overloaded with unrelated and surprising meanings.

And when to use Proxy?

When you really do just want more control over something that behaves like an assignment. A language binding for a C library that has a bunch of get/set functions to work with various members of a struct would be a good example.

In summary…

Language design is difficult, and involves making all manner of choices where there is no universally right or wrong answer, but just trade-offs. The aim is to make choices that form a consistent whole – which is far, far, easier said than done becuase there’s usually a dozen different ways to be consistent too. The choice to dehuffmanize (that is, make longer) the writing of complex mutators is because it:


Steve Mynott: Rakudo Star 2016.11 Release Candidate

Published by Steve Mynott on 2016-11-20T14:01:22

There is a Release Candidate for Rakudo Star 2016.11 (currently RC2) available at

http://pl6anet.org/drop/

This includes binary installers for Windows and Mac.

Usually Star is released about every three months but last month's release didn't include a Windows installer so there is another release.

I'm hoping to release the final version next weekend and would be grateful if people could try this out on as many systems as possible.

Any feedback email steve *dot* mynott *at* gmail *dot* com

Full draft announce at

https://github.com/rakudo/star/blob/master/docs/announce/2016.11.md

brrt to the future: A guide through register allocation: Introduction

Published by Bart Wiegmans on 2016-11-06T10:29:00

This is the first post in what I intend to be a series on the register allocator for the MoarVM JIT compiler. It may be a bit less polished than usual, because I also intend to write more of these posts than I have in the past few months.

The main reason to write a register allocator is that it is needed by the compiler. The original 'lego' MoarVM JIT didn't need one, because it used what is called a 'memory-to-memory' model, meaning that every operation is expected to move operands from and to memory. In this it follows closely the behavior of virtually every other interpreter existing and especially that of MoarVM. However, many of these memory operations are logically redundant (for example, when storing and immediately loading an intermediate value, or loading the same value twice). Such redundancies are inherent to a memory-to-memory code model. In theory some of that can be optimized away, but in practice that involves building an unreasonably complicated state machine.

The new 'expression' JIT compiler was designed with the explicit (well, explicit to me, at least) goals of enabling optimization and specialization of machine code. That meant that a register-to-register code model was preferable, as it makes all memory operations explicit, which in turn enables optimization to remove some of them. (Most redundant 'load' operations can already be eliminated, and I'm plotting a way to remove most redundant 'store' operations, too). However, that also means the compiler must ensure that values can fit into the limited register set of the CPU, and that they aren't accidentally overwritten (for example as a result of a subroutine call). The job of the register allocator is to translate virtual registers to physical registers in a given code segment. This may involve modifying the original code by inserting load, store and copy operations.

Register allocation is known as a hard problem in computer science, and I think there are two reasons for that. The first reason is that finding the optimal allocation for a code segment is (probably) NP-complete. (NP-complete basically means that you have to consider all possible solutions in order to find the one you are after. A common feature of NP-complete problems is that the effect of a local choice on the global solution cannot be fully predicted). However, for what I think are excellent reasons, I can sidestep most of that complexity using the 'linear scan' register allocation algorithm. The details of that algorithm are subject of a later post.

The other reason that register allocation is hard is that the output code must meet the demanding specifications of the target CPU. For instance, some instructions take input only from specific registers, and some implicitly overwrite other registers. Calling conventions can also present a significant source of complexity as values must be placed in the right registers (or on the right stack locations) where the called function may expect them. So the register allocator must somehow encode these specific demands and ensure they are not violated.

Now that I've introduced register allocation, why it is needed, and what the challenges are, the next posts can begin to describe the solutions that I'm implementing.

samcv: Multithreading in Perl 6 Part 2

Published on 2016-11-02T07:00:00

This post is not yet complete.

This is Part 2 in my Multithreading in Perl 6 series. This assumes you have read Part 1.

Concurrency and Thinking with Threads

While Perl 6 makes it very easy to work do asynchronous work with multiple threads, there are things you need to keep in mind. Let's see a simple example of where threading can go terribly wrong!

my $i = 0;
await do for ^10000 {
    start {
        $i++;
    }
}
say $i;
# 9971
# Result is not consistent on different runs of the program

Hmm.. we told it to iterate 10,000 times, but we did not get a total of 10,000 by the end. What went wrong? When you iterate a variable with $i++ Perl 6 gets the value of $i, adds one to it, and sets $i to that value. If two threads read the variable at the same time and read the value as 150 they then each add one to 150 and then set $i to 151.

You should never be writing to global variables when using threading. Just don't do it!

Let's see how we can safely do this. One very easy way is to use the return values from the threads. The object returned by the last statement is what is returned.

my @result = await do for ^10000 {
    start {
        1;
    }
}
say @result.sum;
# 10000

You could also use Channels or Supplies to do this if you needed to share information between threads.

Locks

It is best practice to not manually use Lock. Despite this, knowing how Lock works will help you better understand other Perl 6 concepts which use Locks under the hood. We create a lock with Lock.new. We can then call the .protect method on that lock to protect a section of code inside that lock. Any code protected by the same Lock object will never be running in two threads at once.

my $lock = Lock.new;
my $j = 0;
await do for ^10000 {
    start {
        $lock.protect( {
            $j++;
        } );
    }
}
say $j;
# 10000

The main issue with Lock is that any code protected will block execution until the Lock is unlocked when another protect section is done.

Supplies

In Part 1 we used the react block to react to things specified by whenever. Under the hood react works using Supplies.

There are two kinds of Supplies, live and on-demand Supplies. Live Supplies will only receive messages from the time in which they tap into the Supply. A Supplier will emit messages and anything subscribed to it will receive them. We use Supplier.new to create a live Supplier, and .Supply to get a Supply we can tap into.

# Supplier.new creates a `live` Supplier
my $supplier = Supplier.new;
my $supply   = $supplier.Supply;
say "Is Supply live? {$supply.live}";
$supply.tap( -> $v { say $v });
for 1 .. 10 {
    $supplier.emit($_);
}

.tap is not guarenteed to be executed in only one thread at a time. So make sure to not write to any shared variables. Remember react from before? React uses .act, which is similar to .tap but is guarenteed to only execute in one thread at a time.

samcv: Multithreading in Perl 6 Part 1

Published on 2016-10-31T07:00:00

Promises

The simplest way to make a Promise is using the start {} block.

start { sleep 1; say 'Two'; sleep 2; say 'Four' }
say 'One';
sleep 2;
say 'Three';

Magically, this will print out One Two Three so we can tell the code in the start block is executed concurrently with the code below it. We don't see Four printed out though. Why? We reached the end of the program with the line say 'Three' and the program ends. In reality start { ... } is the same thing as doing Promise.start( { ... } ) so it returns a Promise object. Promise's made by start will be fullfilled if the code inside the start block terminates without an exception. We don't get Four printed out because our code doesn't wait for the Promise to be fullfilled. How now to wait for the start block to complete? Enter await. We call await with the Promise returned by the start block, and use it like so:

my $promise = start { sleep 1; say 'Two'; sleep 2; say 'Four' }
say 'One';
sleep 2;
say 'Three';
await $promise;

This gives us the output we want: One Two Three Four

Here we find all markdown extension files and pass them through to the Unix/Linux file command and check if they are all either UTF-8 or ASCII and also make sure none have the BOM (Byte Order Marker):

my @files = qx{find . -name '*.markdown'}.lines;
for @files -> $file {
    my $file_type = qqx{file $file};
    say "$file is not UTF-8" if $file_type !~~ / 'UTF-8' | ASCII /;
    say "$file has BOM" if $file_type ~~ / BOM /;
}

Making this program asynchronous and multithreaded is only three words away! Perl 6 will spawn as many processes as it thinks will complete the job as quickly as possible. In my case it used 10 threads.

my @files = qx{find . -name '*.markdown'}.lines;
await do for @files -> $file {
    start {
        my $file_type = qqx{file $file};
        say "$file is not UTF-8" if $file_type !~~ / 'UTF-8' | ASCII /;
        say "$file has BOM" if $file_type ~~ / BOM /;
    }
}

You have seen how Perl 6 makes it easy to make multithreaded programs. Now lets look at using channels to communicate between parts of the program.

Channels

Channels are asynchronous queues. We create a Channel with Channel.new. Using the .closed method on a channel will return a Promise that will be kept when the channel is both closed and the queue is empty. We send data on a Channel by calling the .send method on it. We can close a Channel with the .close method. Know that $chan.closed will only evaluate as true if we have removed all the items from the queue and we have closed it with .closed.

my $chan = Channel.new;
$is_closed = $chan.closed;
$is_closed.then( { note "It's closed!" } );
for qx{ find . -name '*.markdown' }.lines -> $line {
    $chan.send($line)
}

We can call the .then method on a promise to schedule the code inside the block to be executed when the Promise is kept. In this case when the Channel is closed (.then actually returns another promise, but you don't have to use it if you don't want to).

We can poll a Channel with the .poll method which returns the next item in the channel in a nonblocking manner, but this is not very efficient:

my @promises;
while ! $is_closed  {
    if $chan.poll -> $file {
        push @promises, start {
            my $file_type = qqx{file "$file"};
            note "'$file' is not UTF-8" if $file_type !~~ / 'UTF-8' | ASCII /;
            note "'$file' has BOM" if $file_type ~~ / BOM /;
        }
    }
}
await Promise.allof(@promises);

Above we create an array called @promises and add to it each Promise returned by start. Then we await for all of the Promises of the worker threads to be kept. The .allof method returns a Promise that's only kept when all of the Promises in @promises have been kept.

React, Don't Poll

Using a react block we can create a tap on a Channel or Supply using the whenever keyword. (More on taps and Supplies later).

my @promises;
react {
    whenever $chan -> $file {
        push @promises, start {
            my $file_type = qqx{file "$file"};
            note "'$file' is not UTF-8" if $file_type !~~ / 'UTF-8' | ASCII /;
            note "'$file' has BOM" if $file_type ~~ / BOM /;
            say "'$file'";
        }
    }
}
await Promise.allof(@promises);

This has the advantage of not looping unnecessarily. If we wanted to react while code below was executing we could wrap the react in a start block (though in this case we don't need to do this).

Make sure to check out Part 2 where we will discuss Supplies and data concurrency!

samcv: Creating your Own Operators in Perl 6

Published on 2016-10-29T07:00:00

Perl 6 makes it easy to define your own operators. Here you will see how you can make your own operators in Perl 6! To get started lets first discuss the different types of operators in Perl 6: infix, prefix, postfix, circumfix and postcircumfix.

What the hell is a circumfix?

Probably the one you are most familiar with is infix. This is the type of operator we know well from its use in the standard notation for addition, subtraction and multiplication:

5 * 8 is infix. 7 + 4 is infix. 9 ÷ 3 is infix.

In fact, in Perl 6, we can actually do 9 ÷ 3 and it will do the division!

What about prefix? You are probably most familiar with using a - sign to note a negative value

my $var = 7;
say -$var; #> -7

In Perl 6 this is using the - prefix operator and will make the number after it negative. This is also prefix if you are familiar with programming languages which use ++ or -- to increase or decrease the value of the variable by one:

++$variable; Increments the variable by one and returns the new value.

This can also be used as postfix but slightly different functionality:

$variable++ This increments the variable by one and returns the original value.

Now onto circumfix and postcircumfix. Circumfix goes at the start and the end. For example we can use the list constructor [..] to create an list/array like so:

say [1..10]; #> [1 2 3 4 5 6 7 8 9 10]

Postcircumfix is similar, but it goes immediately after something. We use a postcircumfix operator to access elements of a list. Notice how in the example below it comes after the variable with no spaces. That is how postcircumfix works.

my @array = ('first', 'second', 'third' );
say @array[0]; #> 'first'

Here is a variation on the = assignment operator, but only assigns if the right side is defined!

sub infix:< ?= > ($left is rw, $right) { $left = $right if defined $right }
my $var;
my $new ?= $var; # No assignment happens because no value is set for $var yet.
$var = 24;
$new ?= $var; # $new now gets set to 24

The x infix operator usually duplicates a string the specified number of times.

say 13 x 3; #> 131313 # String is duplicated 3 times

If we want, we can redefine x to do multiplication instead!

sub infix:< x >($left, $right) { $left * $right }
say 13 x 3; #> 39 # Now they are multiplied!

Feel free to use full unicode characters. We can make our own three way comparer:

sub infix:<¯\_(ツ)_/¯>($left, $right) { $left <=> $right }
10 ¯\_(ツ)_5; #> More
5 ¯\_(ツ)_5; #> Same

Let's see some circumfix! Here we use ⟅ ⟆ to reverse each string in the list.

sub circumfix:<⟅ ⟆> ( *@array ) {
        for @array.keys -> $i {
            @array[$i] = @array[$i].flip;
    }
    @array;
}
say'cat', 'dog', 'bird'⟆; #> [tac god drib]

samcv: Syntax Highlighting in Jekyll

Published on 2016-10-27T03:45:01

The default syntax highlighter in Jekyll is Rouge. You can also use the Pygments highlighter which supports ~100 languages way more than 2x what Rouge supports (So many!). Especially great for me is Pygments supports Perl 6 highlighting ☺.

For this tutorial I am using Jekyll 3.

First install pygments.rb and redcarpet with:

gem install pygments.rb redcarpet

If you are using a Gemfile you should add the following to your Gemfile:

gem "pygments.rb"
gem "redcarpet"

Then run bundle install. Next open up your _config.yml. Make sure you have the markdown engine set to redcarpet and the highlighter set to pygments:

markdown: redcarpet
highlighter: pygments

If you are using Jekyll 2 then instead of highlighter: pygments you should put pygments: true instead of highlighter: pygments.

What we have done so far will work fine, and we can highlight syntax in a block by doing:

{% highlight perl6 %}
say "This is going to be highlighted!";
{% endhighlight %}

Showing up on the page like this:

say "This is going to be highlighted!";

Now we will make a change so that line numbers won't look terrible. Get in the root of your jekyll site's folder. If you don't already have it create an assets folder. We will need to override our themes head.html to add in a css file. So copy your theme's head.html to the _includes folder. Add this line to your head.html between the head tags:

<head>
<link rel="stylesheet" href="/assets/syntax_line.css">
</head>

Now go into the assets folder and create a file called syntax_line.css.

/* Add padding to the left of the code block */
.code { padding-left: 10px; }
/* linenodiv pre styles line numbers */
.linenodiv pre {  margin: 0px 0px 22px 0px; color: #ccc; border-radius: 0; border: none; border-right: solid 1px; word-wrap: normal; }

If you want to use some of the different themes that pygment comes with, you can generate a css file this way:

pygmentize -f html -S friendly > syntax.css

You will have to make sure that pygment and pygmentize are installed with either your package manager or using pip.

Steve Mynott: Rakudo Star 2016.10 Release Candidate

Published by Steve on 2016-10-16T13:10:00

There is a Release Candidate for Rakudo Star 2016.10 (currently RC0) available at

http://pl6anet.org/drop/

This should be quite a bit faster than previous releases and work better on OpenBSD/FreeBSD than the previous release.

It also features "prove6" which is now used by Panda -- removing a run-time dependency on Perl 5. Although it still needs Perl 5 to build.

I'm hoping to release the final version next weekend (Oct 21st) and would be grateful if people could try this out on as many systems as possible (eg. exotic systems like Solaris-like ones and Windows!) 

Full draft announce at

https://github.com/rakudo/star/blob/master/docs/announce/2016.10.md

Note compiling under Windows is possible using the gcc which comes with Strawberry Perl and gmake running under cmd.exe.  Further instructions will be added (thanks to Christopher for feedback).

Any feedback email steve *underscore* mynott *at* gmail *dot* com

Zoffix Znet: "Perl 6: What Programming In The Future Is Like?" (Lightning Talk Slides and Video)

Published on 2016-09-29T00:00:00

Brief overview of Perl 6's multi-core power

Strangely Consistent: The curious case of the disappearing test

Published by Carl Mäsak

I've recently learned a few valuable things about testing. I outline this in my Bondcon talk — Bondcon is a fictional anti-conference running alongside YAPC::Europe 2016 in a non-corporeal location but unfortunately frozen in time due to a procrastination-related mishap, awaiting the only speaker's tuits — but I thought I might blog about it, too.

Those of us who use and rely on TDD know to test the software itself: the model, the behaviors, etc. But as a side effect of attaching TravisCI to the 007, another aspect of testing came to light: testing your repository itself. Testing code-as-artifact, not code-as-effect.

Thanks to TravisCI, we currently test a lot of linter-like things that we care about, such as four spaces for indentation, no trailing whitespace, and that META.info parses as correct JSON. That in itself is not news — it's just using the test suite as a linter.

But there are also various bits of consistency that we test, and this has turned out to be very useful. I definitely want to do more of this in the future in my projects. We often talk about coupling as something bad: if you change component A and some unrelated component B breaks, then they are coupled and things are bad.

But some types of coupling are necessary. For example, part of the point of the META.info is to declare what modules the project provides. Do you know how easy it is to forget to update META.info when you add a new module? (Hint: very.) Well, we have a test which double-checks.

We also have a consistency test that makes sure a method which does a certain resource-allocating thing also remembers to do the corresponding resource-deallocating thing. (Yes, there are still a few of those, even in memory-managed languages.) This was after a bug happened where allocations/deallocations were mismatched. The test immediately discovered another location in the code that needed fixing.

All of the consistency tests are basically programmatic ways for the test suite to send you a message from a future, smarter you that remembered to do some B action immediately after doing some A action. No wonder I love them. You could call it "managed coupling", perhaps: yes, there's non-ideal coupling, but the consistency test makes it manageable.

But the best consistency check is the reason I write this post. Here's the background. 007 has a bunch of builtins: subs, operators, but also all the types. These types need to be installed into the setting by the initialization code, so that when someone actually looks up Sub from a 007 program, it actually refers to the built-in Sub type.

Every now and again, we'd notice that we'd gotten a few new types in the language, but they hadn't been added as built-ins. So we added them manually and sighed a little.

Eventually this consistency-test craze caught up, and we got a test for this. The test is text-based, which is very ironic considering the project itself; but hold that thought.

Following up on a comment by vendethiel, I realized we could do better than text-based comparison. On the Perl 6 types side, we could simply walk the appropriate module namespaces to find all the types.

There's a general rule at play here. The consistency tests are very valuable, and testing code-as-artifact is much better than nothing. But if there's a corresponding way to do it by running the program instead of just reading it, then that way invariably wins.

Anyway, the test started doing Stash traversal, and after a few more tweaks looked really nice.

And then the world paused a bit, like a good comedian, for maximal effect.

Yes, the test now contained an excellent implementation of finding all the types in Perl 6 land. This is exactly what the builtin initialization code needed to never be inconsistent in the first place. The tree walker moved into the builtins code itself. The test file vanished in the night, its job done forever.

And that is the story of the consistency check that got so good at its job that it disappeared. Because one thing that's better than managed coupling is... no coupling.

Pawel bbkr Pabian: Oh column, where art thou?

Published by Pawel bbkr Pabian on 2016-09-26T09:53:06

When ramiroencinas added FileSystem::Capacity::VolumesInfo to Perl 6 ecosystem I've spotted that it has no macOS support. And while trying to contribute to this module I've discovered how less known Perl 6 features can save the day. What FileSystem::Capacity::VolumesInfo module does is parsing output from df command, which looks like this:

$ df -k -P
Filesystem                                  1024-blocks       Used Available Capacity  Mounted on
/dev/disk3                                   1219749248  341555644 877937604    29%    /
devfs                                               343        343         0   100%    /dev
/dev/disk1s4                                  133638140  101950628  31687512    77%    /Volumes/Untitled
map -hosts                                            0          0         0   100%    /net
map auto_home                                         0          0         0   100%    /home
map -fstab                                            0          0         0   100%    /Network/Servers
//Pawel%20Pabian@biala-skrzynka.local./Data  1951417392 1837064992 114352400    95%    /Volumes/Data
/dev/disk5s2                                 1951081480 1836761848 114319632    95%    /Volumes/Time Machine Backups
bbkr@localhost:/Users/bbkr/foo 123           1219749248  341555644 877937604    29%    /Volumes/osxfuse

(if you see wrapped or trimmed output check raw one here)

And while this looks nice for humans, it is tough task for parser.

So let's use Perl 6 features to deal with this mess.

Capture command line output.

my ($header, @volumes) = run('df', '-k', '-P', :out).out.lines;

Method run executes shell command and returns Proc object. Method out creates Pipe object to receive shell command output. Method lines splits this output into lines, first goes to $header variable, remaining to @volumes array.

Parse header.

my $parsed_header = $header ~~ /^
    ('Filesystem')
    \s+
    ('1024-blocks')
    \s+
    ('Used')
    \s+
    ('Available')
    \s+
    ('Capacity')
    \s+
    ('Mounted on')
$/;

We do it because match object keeps each capture, and each capture knows from and to position at which it matched, for example:

say $parsed_header[1].Str;
say $parsed_header[1].from;
say $parsed_header[1].to;

Will return:

1024-blocks
44
55

That will help us a lot with dynamic columns width!

Extract row values.

First we have to look at border between Filesystem and 1024-blocks column. Because Filesystem is aligned to left and 1024-blocks is aligned to right so data from both columns can occupy space between those headers, for example:

Filesystem                      1024-blocks
/dev/someverybigdisk        111111111111111
me@host:/some directory 123    222222222222
         |                      |
         |<----- possible ----->|
         |<--- border space --->|

We cannot simply split it by space. But we know where 1024-blocks columns ends, so the number that ends at the same position is our volume size. To extract it, we can use another useful Perl 6 feature - regexp position anchor.

for @volumes -> $volume {
    $volume ~~ / (\d+) <.at($parsed_header[1].to)> /;
    say 'Volume size is ' ~ $/[0] ~ 'KB';
}

That finds sequence of digits that are aligned to the position of the end of the header. Every other column can be extracted using this trick if we know the data alignment.

$volume ~~ /
    # first column is not used by module, skip it
    \s+

    # 1024-blocks, right aligned
    (\d+) <.at($parsed_header[1].to)>

    \s+

    # Used, right aligned
    (\d+) <.at($parsed_header[2].to)>

    \s+

    # Available, right aligned
    (\d+) <.at($parsed_header[3].to)>

    \s+

    # Capacity, middle aligned, never longer than header
    <.at($parsed_header[4].from)>
        \s* (\d+ '%') \s*
    <.at($parsed_header[4].to)> 

    \s+

    # Mounted on, left aligned
    <.at($parsed_header[5].from)>(.*)
$/;

Profit!

By using header name positions and position anchors in regular expression we got bomb-proof df parser for macOS that works with regular disks, pendrives, NFS / AFS / FUSE shares, weird directory names and different escaping schemas.

Zoffix Znet: Perl 6 Core Hacking: Can Has Moar Cover?

Published on 2016-09-22T00:00:00

Discussion about roast coverage of Rakudo compiler

Strangely Consistent: Where in the sky

Published by Carl Mäsak

Our 1.5yo has a favorite planet. By a long margin, it's Mars.

I've told him he's currently on Earth, and shown him where, and he's OK with that. Mars is still the favorite. Earth's moon is OK too.

Such an interest does not occur — pun not intended — in a vacuum. We have a book at home (much like this one, but a different one, and in Swedish), and we open it sometimes to admire Mars (and then everything else).

But what really left its mark is the Space Room in our local Tekniska Museet — a complete dark room with projectors filling a wall with pictures of space. Using an Xbox controller, you decide where to fly in the solar system. If anything, this is what made Mars real for our son. In that room, we've orbited Mars. We've stood on its surface, looking at the Mars moons in the Mars sky. We've admired a Mars sunrise, standing quiet together in the red sands.

Inevitably, I got the first hard science question I couldn't answer from him a couple of weeks ago.

I really should have seen it coming. I was dropping him off at kindergarten. Before we went inside, I crouched next to him to point up at the moon above the city. He looked at it and said "Moon" in Swedish. Then he turned to me, eyes intent, and said "Mars?". It was a question.

He had put together that the planets we were admiring in the book and in the Space Room were actually up there somewhere. And now he just wanted me to point him towards Mars.

I had absolutely no idea. I told him it's not on the sky right now, but for all I knew, it might have been.

Now I really want to know how to find this out. Sure, there are calculations involved. I want to learn enough about them to be able to write a small, understandable computer program for me. It should be able to answer a question such as "Where in the sky is Mars?". Being able to ask it for all the planets in our solar system seems like an easy generalization without much extra cost.

Looking at tutorials like this one with illustrations and this one with detailed calculations, I'm heartened to learn that it's not only possible to do this, but more or less the steps I hoped:

There are many complicating factors that together make a simple calculation merely approximate, and the underlying reasons are frankly fascinating, but it seems that if I just want to be able to point roughly in the right direction and have a hope of finding a planet there, a simple method will do.

I haven't written any code yet, so consider this a kind of statement of intent. I know there must be oodles of "night sky" apps and desktop programs that already present this information... but my goal is to make the calculation myself, with a program, and to get it right. Lovingly handcrafted planetary positions.

It would also be nice to be able to ask "Where in the sky is the moon?" (that one's easy to double-check) or "Where in the sky is the International Space station?". If anything, that ought to be a much simpler calculation, since these orbit Earth.

Once I can reliably calculate all the positions, being able to know at what time things rise and set would also be very useful.

I went outside to throw the garbage last night, and it turned out it was a cloudless late evening. I saw some of the brighter stars, even from the light-polluted vantage point of our yard. I may have been gazing on Mars without knowing it. It's a nice feeling to find out how to learn something. Even nicer when it's for your child, so you can show him his favorite planet in the sky.

Zoffix Znet: Perl6.Fail, Release Robots, and Other Goodies

Published on 2016-09-17T00:00:00

Description of work done to automate Perl 6 releases