pl6anet

Perl 6 RSS Feeds

Steve Mynott (Freenode: stmuk) steve.mynott (at)gmail.com / 2017-10-17T23:11:15


Weekly changes in and around Perl 6: 2017.42 Taking Ticketing Seriously

Published by liztormato on 2017-10-16T22:02:12

Aleks-Daniel Jakimenko-Aleksejev has been ticketing away! The result can be seen in an overview of status changes in RT tickets of the past week or so. He explains it like this:

The high number of [REGRESSION] tickets is caused by the concentrated effort to detect any unintentional changes in Rakudo Perl 6. I extracted all messages to the evalbot (e.g. “m:” on the #perl6 channel) from the last few years, and passed these code snippets to different versions of Rakudo Perl 6. These snippets were filtered by comparing the output and exit code, which resulted in a list of thousands of samples that produce different results on different versions. Most of these differences are due to bug fixes or other harmless reasons (e.g. current time being shown in the output), but there were some notable regressions. All regressions noticed during the manual inspection are now filed, and some of them are already resolved (with tests, of course). This is the second time we see dedicated effort to find otherwise hard-to-notice regressions, first time being in December 2016.

So it should be easier to keep the sanity of Rakudo Perl 6 in check!

One more TPCiA Rakudo Perl 6 Video

Samantha McVey‘s High End Unicode in Rakudo Perl 6 has arrived! Of course, you can still keep checking for more uploads in the coming weeks.

Blog Posts

Meanwhile on Twitter

  • Please welcome, CPAN6 by builtinperl.
  • Rakudo Perl 6 BuildPack by brian d foy.
  • 20% shaved off by Zoffix Znet.
  • Debugging an Amazon issue by Moritz Lenz.
  • Whole day Rakudo Perl 6 Tutorial at TPCiG by Jeff Goff.
  • !perl6 at DuckDuckGo by Zoffix Znet.
  • int32.new shouldn’t give you an Int by Zoffix Znet.
  • Meanwhile on StackOverflow

    Meanwhile on perl6-users

    Meanwhile on PerlMonks

    Lately, there have been some Rakudo Perl 6 related Meditations on PerlMonks. Yours truly feels they should be mentioned in the Perl 6 Weekly as well, although some troll hugging may be needed:

    Ecosystem Additions

    This week will probably be the last time you will see the ecosystem additions mentioned in the Perl 6 Weekly. Thanks to Zoffix Znet and Aleks-Daniel Jakimenko-Aleksejev we will soon have a page on that will show the recent uploads to CPAN (the new preferred way putting your module out into the wild). So here goes for the last time:

    Winding Down

    Being too tired to investigate the latest Core Developments of the past week, I will keep them from you until next week. Which will also coincide with the 2017.10 release of Rakudo Perl 6! So please check in again next week!


    Weekly changes in and around Perl 6: 2017.41 The Case for Empathy

    Published by liztormato on 2017-10-09T20:46:02

    Yours truly is glad she can finally link to Ruth Holloway‘s impressive The Case for Empathy keynote at The Perl Conference in Amsterdam. Please excuse the poor sound quality for the first 7 minutes or so: kudos to Amanda who managed to recover and process the sound from that period and supply the subtitles.

    Other videos of TPCiA related to Rakudo Perl 6 are:

    It is expected that more videos will be uploaded in the coming week, so please check again from time to time!

    CPAN6 is here

    Zoffix Znet did a lot of work on getting Rakudo Perl 6 modules out of CPAN, just as he promised last week. He explains it all in his blog post CPAN6 is here (Reddit comments). So now we finally have a front-end to browse the available Rakudo Perl 6 distributions that have been uploaded to CPAN!

    The Second Bug Squashathon

    Last weekend saw the second Community Bug Squashathon, this time focused on fixing problems in the core of Rakudo Perl 6. And the results are in! Looking forward to the next Squashathon, on 4 November 2017!

    Inside the head of Larry Wall

    Joe Armstrong conducted a Q&A with Larry Wall at the Erlang & Elixir Factory SF Bay Area 2017 conference (Hacker News comments). Also recommended as a very nice intro to Rakudo Perl 6 and the thoughts that have gone into the 100 year language.

    Rakudo Perl 6 performance analysis tooling

    Timo Paulssen‘s grant proposal for Rakudo Perl 6 performance analysis tooling has been approved by the TPF Grant Committee. Yours truly can hardly wait to see the results of this grant come to fruition!

    Call for Rakudo Perl 6 Papers!

    This is the time to propose your Rakudo Perl 6 presentations! Please select one or more conferences:

    Core Developments

    Other Blog Posts

    Meanwhile on Twitter

    Meanwhile on perl6-users

    Winding Down

    Quite an eventful week again, with the JIT being merged and the Squashathon going on. More than 100 commits in the rakudo repository alone. Please check in again next week for more Rakudo Perl 6 news!


    gfldex: It’s Classes All The Way Down

    Published by gfldex on 2017-10-08T17:23:05

    While building a cache for a web api that spits out JSON I found myself walking over the same data twice to fix a lack of proper typing. The JSON knows only about strings even though most of the fields are integers and timestamps. I’m fixing the types after parsing the JSON with JSON::Fast by coercively .map-ing .

    @stations.=hyper.map: { # Here be heisendragons!
        .<lastchangetime> = .<lastchangetime>
            ?? DateTime.new(.<lastchangetime>.subst(' ', 'T') ~ 'Z', :formatter(&ISO8601))
            !! DateTime;
        .<clickcount> = .<clickcount>.Int;
        .<lastcheckok> = .<lastcheckok>.Int.Bool;
    
        (note "$_/$stations-count processed" if $_ %% 1000) with $++;
    
        .Hash
    };
    

    The hyper helps a lot to speed things up but will put a lot of stress on the CPU cache. There must be a better way to do that.

    Then lizmat showed where Rakudo shows its guts.

    m: grammar A { token a { }; rule a { } }
    OUTPUT: «5===SORRY!5=== Error while compiling <tmp>␤Package 'A' already has a regex 'a' 
    (did you mean to declare a multi-method?)␤
    

    Tokens are regex or maybe methods. But if tokens are methods then grammars must be classes. And that allows us to subclass a grammar.

    grammar WWW::Radiobrowser::JSON is JSON {
        token TOP {\s* <top-array> \s* }
        rule top-array      { '[' ~ ']' <station-list> }
        rule station-list   { <station> * % ',' }
        rule station        { '{' ~ '}' <attribute-list> }
        rule attribute-list { <attribute> * % ',' }
    
        token year { \d+ } token month { \d ** 2 } token day { \d ** 2 } token hour { \d ** 2 } token minute { \d ** 2 } token second { \d ** 2}
        token date { <year> '-' <month> '-' <day> ' ' <hour> ':' <minute> ':' <second> }
    
        token bool { <value:true> || <value:false> }
    
        token empty-string { '""' }
    
        token number { <value:number> }
    
        proto token attribute { * }
        token attribute:sym<clickcount> { '"clickcount"' \s* ':' \s* '"' <number> '"' }
        token attribute:sym<lastchangetime> { '"lastchangetime"' \s* ':' \s* '"' <date> '"' }
        token attribute:sym<lastcheckok> { '"lastcheckok"' \s* ':' \s* '"' <bool> '"' }
    }
    

    Here we overload some tokens and forward calls to tokens that got a different name in the parent grammar. The action class follows suit.

    class WWW::Radiobrowser::JSON::Actions is JSON::Actions {
        method TOP($/) {
            make $<top-array>.made;
        }
        method top-array($/) {
            make $<station-list>.made.item;
        }
        method station-list($/) {
            make $<station>.hyper.map(*.made).flat; # Here be heisendragons!
        }
        method station($/) {
            make $<attribute-list>.made.hash.item;
        }
        method attribute-list($/) {
            make $<attribute>».made.flat;
        }
        method date($_) { .make: DateTime.new(.<year>.Int, .<month>.Int, .<day>.Int, .<hour>.Int, .<minute>.Int, .<second>.Num) }
        method bool($_) { .make: .<value>.made ?? Bool::True !! Bool::False }
        method empty-string($_) { .make: Str }
    
        method attribute:sym<clickcount>($/) { make 'clickcount' => $/<number>.Int; }
        method attribute:sym<lastchangetime>($/) { make 'lastchangetime' => $/<date>.made; }
        method attribute:sym<lastcheckok>($/) { make 'lastcheckok' => $/<bool>.made; }
    }
    

    In case you wonder how to call a method with such a funky name, use the quoting version of postfix:<.>.

    class C { method m:sym<s>{} }
    C.new.'m:sym<s>'()
    

    I truncated the examples above. The full source can be found here. The .hyper-Version is still quite a bit faster but also heisenbuggy. In fact .hyper may not work at all when executed to fast after a program starts or when used in a recursive Routine. This is mostly due to the grammer being one of the oldest parts of Rakudo with the least amount of work to make it fast. That is a solvable problem. I’m looking forward to Grammar All The Things.

    If you got grammars please don’t hide them. Somebody might need them to be classy.

     


    Zoffix Znet: CPAN6 Is Here

    Published on 2017-10-06T00:00:00

    CPAN support for Rakudo Perl 6 dists

    Weekly changes in and around Perl 6: 2017.40 Unicode Granted

    Published by liztormato on 2017-10-02T22:30:12

    Samantha McVey has completed her work on the “Improving the Robustness of Unicode Support” Grant. In an extensive blog post she describes what she has done in the past months, how all achievable goals were achieved, and how quite a number of bonus features also made it. You can leave any congratulations, support or other comments on the official TPF page about the completion of the grant. I think we can now safely say that Rakudo Perl 6 has the most complete Unicode support of any programming language in the world. Kudos to Samantha!

    Object creation 1.5x faster

    Elizabeth Mattijsen (with a lot of help by Timo Paulssen and Jonathan Worthington) completed her work on auto-generating the BUILDALL method for each class. This is an internal method that is called to create an object using the default .new method. Instead of interpreting a build plan of the class every time an object of that class is created, a custom BUILDALL method is created directly using QAST ops (basically writing source code in Abstract Syntax Trees and then “EVALling” them). This makes each call to .new 1.5x faster, making our canary about 8% faster!

    Nativecall also much faster

    Stefan Seifert also finished his work on auto-generating native call function bodies, making the interface between Rakudo Perl 6 internals and Inline::Perl5 much faster. This also made the Inline::Perl5 version of our canary about 10% faster. The race continues to be on!

    Roast Versioning

    Rakudo Perl 6 has a file that contains a list of test-files (roast) it should run (spectest.data) when doing a so-called spectest. A single file used to be used for all language versions, which is bad because:

    Zoffix Znet implemented roast versioning by making a separate list of files for each language version so that Rakudo Perl 6 can run tests for each language version seperately. Roast runner just looks at VERSION file in roast to figure out which version of the list of test-files to load (the file list is now in spectest.data and spectest.data.6.c; for a future release a spectest.data.6.d file will be added).

    The language that will not be named

    Zoffix Znet rekindled the discussion about the naming of Rakudo Perl 6 again, this time suggesting 6lang (to be pronounced slang). Again, quite some discussion on blogs.perl.org, /r/perl6, /r/perl and Twitter. Yours truly hopes we can settle this once and for all before Rakudo Perl 6 language version 6.d comes out (or whatever we will call it then).

    Hacktoberfest is here again

    It’s the month of October again and your chance to win a T-shirt! Simply sign up, have a look at the issues at hand, fix only 4 of them and profit!

    Community Bug SQUASHathon

    On Saturday 7 October we will have yet another Community Bug SQUASHathon. Please join us on #perl6-dev on freenode and help us squash those pesky core bugs.

    Texas to ASCII

    To avoid any political connotations, it was felt that we should be moving towards documenting operators consisting purely of ASCII characters as “ASCII” operators (such as (<=)), and operators that consist of at least 1 non-ASCII character (such as ) as “Unicode” operators. Internal documentation has already been adapted, external documentation will follow shortly unless consensus changes.

    Other Core Developments

    Other Blog Posts

    Meanwhile on Twitter

    Meanwhile on StackOverflow

    Meanwhile on perl6-users

    Ecosystem Additions

    Winding Down

    Bart Wiegmans has created a Pull Request for all of the work that has done on the new JIT: this is now going through the final reviewing process. This work will be merged any day now, which should allow for quite some more “so many percent faster” items in the next issue. So please check in again next week!


    Zoffix Znet: 6lang: The Naming Discussion Update

    Published on 2017-09-28T00:00:00

    A rose by any other name 2.0...

    Weekly changes in and around Perl 6: 2017.39 Smarting up the Pool

    Published by liztormato on 2017-09-25T20:02:25

    Jonathan Worthington elaborates on his new ThreadPool implementation in an extensive blog post, which was made possible by the sponsoring of Vienna.pm. A recommended read if you want to keep up to date on the features and performance of asynchronous processing with Rakudo Perl 6! In short, there are now 3 queues internally: one for timers, one for workers that sequentially process, and one for general purpose activity. There’s now also a “Sufficiently Smart Supervisor” thread that looks at the state of the other threads about 100x per second and decides if adding more threads would make sense or not. Exciting times!

    Performance analysis tooling

    Timo Paulssen submitted a grant proposal to improve the performance analysis tools of Rakudo Perl 6. Judging by the comments so far, he’s not the only one wanting to see this happen. Please leave your comments if you also have an opinion about this proposal!

    Other Core Developments

    Jonathan Worthington was responsible for the most of last weeks commits. I guess everybody else was just watching in awe, or busy with things that didn’t make it into the main rakudo repo just yet. Notable exceptions are:

    Other Blog Posts

    Meanwhile on Twitter

    Meanwhile on StackOverflow

    Meanwhile on perl6-users

    Ecosystem Additions

    Winding Down

    Apart from all the visible work Jonathan and Samantha have done this week, there was a lot going on behind the scenes. One of them promises to make object creation in Rakudo Perl 6 at least 10% faster. Too bad we couldn’t report about it this week. So be sure to check out the next issue of the Perl 6 Weekly!


    samcv: Grant Final Report

    Published on 2017-09-25T07:00:00

    This contains the work since the last report as well as my final report.

    Table of Contents

    Work Since the Last Report

    Merged in Unicode Collation Algorithm

    I merged the Unicode Collation Algorithm branch into MoarVM. Now that the sort is stable, the coll, unicmp and .collate operators in Rakudo are no longer experimental so use experimental :collation no longer is needed to use them.

    The $*COLLATION dynamic variable is still hidden under experimental, since it is possible design changes could be made to them.

    Prepend

    In some of my other posts I talked about the difficulties of getting Prepend codepoints working properly. To do this I had changed how we store synthetics so as not to assume that the first codepoint of a synthetic is always the base character. This month I merged in change in synthetic representation and implemented the features which were now possible with the new representation.

    The feature was to detect which character is the base character and store its index in the synthetic. Most combiners, such as diacritics come after the base character and are Extend codepoints: a + ◌́. Prepend has the reverse functionality and comes before: ؀◌ + 9 (Arabic number sign + number).

    This required many assumptions our code rightfully made before Unicode 9.0 to be abandoned. When a synthetic is created, we now check to see if the first codepoint is a Prepend codepoint. If so, we keep checking until we find a codepoint that is not a Prepend. In most cases, the base character is the codepoint following the last Prepend mark.

    In degenerate cases there is no base character, which could be a grapheme composed of all Prepend’s or only Prepend’s and Extend’s. In these degenerate cases we set the first codepoint as the base character.

    Once I had that done, I was able to fix some of our ops which did not work correctly if there were Prepend characters. This included fixing our case changing op so it would now work on graphemes with Prepend marks. Since the case change ops apply the case change to the base codepoint, it is necessary for us to have the correct base character. Similarly, ordbaseat which gets the base codepoint also needed to be fixed. This allowed ignoremark to now work for graphemes with Prepend marks.

    Documentation

    I wrote documentation on our Unicode Collation Algorithm, which explains to the reader why the UCA solves, with examples of different single to multiple or multiple to single mappings of codepoints. It goes in a fair amount of detail on how it was implemented.

    UTF8-C8

    Bugs with Encoding into UTF8-C8

    Since MoarVM normalizes all input text, our way of dealing with not normalizing, is important to people who want their strings to be unchanged unchanged. Previously there was a bug where if something was a valid UTF8 storable value, such as a Surrogate or a value higher than 0x10FFFF, it would create a Str type with that value, even though it was not valid Unicode. It would then throw when this value was attempted to be used (since the Str type shouldn’t hold values higher than 0x10FFFF or Surrogates). As this is the only way we have of dealing with text unaltered, this seemed like a serious issue that would prevent UTF8-C8 from being usable in a production environment attempting to encode arbitrary data into UTF8-C8. [f112fbcf]

    Bugs While Working with UTF8-C8 Strings

    Another issue I fixed was that under concatenation, text replacement or renormalization, the UTF8-C8 codepoints would be "flattened". They would lose their special properties and instead start acting like any other set of Unicode codepoints (although unusual since it contains a private use character and a hex code of the stored value). I changed our codepoint iterator so that optionally you can choose to pass back UTF8-C8 Synthetics unchanged. We use Synthetics to store both UTF8-C8 values as well as storing graphemes which contain multiple codepoints. When iterating by codepoint on an already existing arbitrary string, we want to retain the UTF8-C8 codepoints and make sure they are not changed during the renormalization process. This has been fixed, and UTF8-C8 strings are now drastically more reliable, and hopefully, much more production ready. [2f71945d]

    Grapheme Caching and Move To

    The function which moves a grapheme iterator forward a specified number of graphemes now works even if we aren’t starting from the very start of the string. In this function we have a first loop which locates the correct strand, and had a second loop which would find the correct grapheme inside that strand. I refactored the grapheme locating code and was able to remove the loop.

    In the grapheme caching implementation we can save a lot of work by not creating a new iterator for every grapheme access. Not only that, I also sped it the move_to function about 30%. While the cached iterator reduces access to this function for the functions I added it to, there are still many which may seek for each grapheme requested, this will speed that up.

    Other MoarVM Changes

    I setup automated Appveyor builds for MoarVM so we get automated builds on Windows (Travis CI only builds MacOS and Linux builds).

    I fixed a segfault that occurred when compiling nqp on Alpine Linux which uses musl as its libc. I ended up reducing the depth of recursion in the optimize_bb() function when compiling nqp from 99 to 29 (3x reduction). In the Rakudo build, we have a 5x reduction in the depth of recursion.

    Final Report

    As I’ve already said many things in my previous grant reports (1, 2, 3, 4) I will iterate on some of the big things I did do, which is not exhaustive. For full details of all the changes please see my other grant reports as well as a partial list of bonus changes I made in MoarVM during the grant at the bottom of the page.

    The only thing not completed was implementing a whole new UCD backend. While I planned on doing this, I ended up choosing not to do so. I realized that it would not have been the best use of my time on the grant, as there were many more user noticeable changes I could do. Despite this, I did achieve the goals that the rewrite was intended to solve; namely making property values distinct for each property and making the UCD database generation more reproducible. While it is not exactly the same on different runs, the only thing that changes is the internal property codes which does not affect anything adversely. It is fully functional every time instead of database regeneration breaking our tests most of the time. Once the database was stable, I was then able to update our database to Unicode 10. Without my improvements regarding reproducibility and property values becoming distinct for each property, updating to Unicode 10 would have not been possible. In addition all Hangul (Korean) characters now have names in the Unicode database.

    A big thing I wanted to implement was the Unicode Collation Algorithm, which ended up being a total success. I was able to still retain the ability to choose which collation levels the user wanted to sort with as well as reverse the sort order of individual collation levels.

    Yet I did not only implement one algorithm, I also implemented the Knuth-Morris-Prat string search algorithm which can take advantage of repeated characters in the needle (can be multiple times faster if you have sections repeating). The Knuth-Morris-Pratt algorithm was adjusted to use either the new cached grapheme iterator or the simple lookup depending on if it was a flat or strand haystack as well. Indexing a strand based string with a one grapheme long needle was sped up by 9x by making a special case for this.

    Practically all string ops were sped up, often by multiple times due to getting MVM_string_get_grapheme_at_nocheck inlined. In addition to this, I changed the way we access strings in many of our most used string ops, intelligently using either grapheme iterators, cached grapheme iterators or direct access depending on circumstances. With the MVM_string_get_grapheme_at_nocheck inlined, the time to accessing graphemes with this function was sped up between 1.5x for strands and up to 2x for flat strings. Ops we use a lot, like the op that backs eq and nqp::eqat was given special casing for Strand ←→ Strand, Flat ←→ Flat and Strand ←→ Flat (which also covers Flat ←→ Strand as well). This special casing spec up eq by 1.7x when one is a strand and one is flat, and 2.5x when both strings are flat. Applying similar optimizations to index made a 2x speedup when haystack is flat and 1.5x speedup when haystack is a strand (on top of the previous improvements due to the Knuth-Morris-Pratt algorithm)

    I fixed a longstanding bug in NQP which caused 'ignoremark+ignorecase' operations to be totally broken. I fixed this by adding more MoarVM ops and refactoring the code to have many less branches. In MoarVM we now use a centralized function to do each variation of with/without ignorecase and ignore mark which is also fully compatible with foldcase operations as well as igoremark.

    Doing the ignoremark/ignorecase indexing work sped them up by multiple times, but then in addition to that, it became 2x faster when the haystack was made up of 10 strands by implementing a cached grapheme iterator.

    I implemented full Unicode 9.0 support not just in our grapheme segmentation, but also in our other ops, refactoring how we store synthetic codepoints to allow us to have the 'base' codepoint be a codepoint other than the 1st in the synthetic to support Prepend codepoints.

    Our concatenation was improved so as to make full renormalization of both input strings no longer needed in almost all cases. The x repeat operator was fixed so it always creates normalized strings. Previously it could create unnormalized strings instead, causing issues when it was used.

    I believe I have more than accomplished what I set out to do in this grant. I made tons of user facing changes: to speed, Unicode normalization support, full Unicode 9.0 support. I added awesome collation features and fixed all the major issues with decoding and working with UTF8-C8 representations. I have listed an incomplete list of bonus deliverables below the deliverables which were part of this project.

    Deliverables

    • I documented MoarVM’s string representation, with lots of good information for future developers as well as interested users.

    • Hangul syllables now have Unicode names in our database, with a test added in roast.

    • I implemented the Unicode Collation Algorithm [866623d9]

      • Tests have been added in roast for the UCA and the unicmp op

      • I wrote documentation on our Unicode Collation Algorithm implementation

      • Regarding language specific sort. This would involve us using data from the Unicode CLDR. Once we have this data available from MoarVM, it simply requires a conditional to override DUCET and check a different set of data before checking the DUCET data. This information is in our documentation for collation.

    • Text normalization

      • Speed of normalization was improved

      • Full Unicode 9.0 support for text segmentation and normalization was added

    • While I did not fully rewrite the database, I did solve the needed issues:

      • Property values are now unique for each property

      • Running the generation script creates a functional database every time it is run, rather than only some of the time.

      • I added Unicode Collation data to our database, generated from a Perl 6 script, which happened to be the only property required to complete my deliverables

    Bonus Deliverables

    Here is a somewhat complete list of bonus deliverables:

    • Updated our database to Unicode 10. This was only possible once I had fixed the problems with the database generation, and made property values unque.

    • Implemented Knuth-Morris-Pratt string search

    • Set up Appveyor builds. Appveyor builds and tests MoarVM on Windows, similar to Travis CI.

    • Fixed ignoremark+ignorecase regex when used together as well as huge speed increases.

    UTF8-C8/UTF-8

    • Fix UTF8-C8 encoding so it can encode values > 0x10FFFF as well as surrogates

    • Fix UTF8-C8 strings so they do not get corrupted and flattened when string operations are performed on them.

    • MVM_string_utf8_decodestream: free the buffer on malformed UTF-8 [a22f98db]

    String Ops

    • Have MVM_string_codes iterate the string with codepoint iterator [ed84a632]

    • Make eqat 1.7x-2.5x faster [3e823659]

    • Speed up index 9x when Haystack is strand, needle is 1 long [0b364cb8]

    • Implement the Knuth-Morris-Pratt string search algorithm [6915d80e]

    • Add indexim_s op and improve/fix bugs in eqatim_s [127fa2ce]

    • Fix a bug in index/eqat(im) and in ord_getbasechar causing us to not decompose the base character when the grapheme was a synthetic [712cff33]

    • Fix MVM_string_compare to support deterministic comparing of synthetics [abc38137]

    • Added a codes op which gets the number of codepoints in a string rather than the number of graphemes. Rakudo is now multipe times faster doing the .codes op now. Before it would request an array of all the codepoints and then get number of elements, which was much slower.

    Fix string ops with Prepend characters

    • Rework MVMNFGSynthetic to not store base separately [3bd371f1]

    • Fix case change when base cp isn’t the first cp in synthetic [49b90b99]

    • For degenerate Synth’s with Prepend and Extend set base cp to 1st cp [db3102c4]

    • Fix ignoremark with Prepend characters and ordbaseat op [f8a639e2]

    Memory/GC/Build Fixes

    • Fix segfault when compiling nqp with musl as libc [5528083d]

    • Avoid recursion in optimize_bb() when only 1 child node [6d844e00]

    • Fix various memory access/garbage collection issues in some string ops that were showing up when running in Valgrind or using Address Sanitizer

    Grapheme Iteration

    • Ensure we can move forward in a grapheme iterator even if we aren’t starting at the very beginning of a string.

    • Use grapheme iterator cached for ignorecase/ignoremark index ops [b2770e27]

    • Optimize MVM_string_gi_move_to. Optimize the loop which finds the correct location within a strand so that it isn’t a loop and is just conditionals.[c2fc14dd]

    • Use MVMGraphemeIter_cached for strands in KMP index [ce76c994]

    • Allow MVM_string_get_grapheme_at_nocheck to be inlined

    • Refactor code into iterate_gi_into_string() to reduce code duplication [1e92fc96]

    Tests Added to Roast

    • Add tests for testing collation. Tests for the unicmp operator [5568a0d63]

    • Test that Hangul syllables return the correct Unicode name [6a4bc5450]

    • Add tests for case changes when we have Prepend codepoints [00554ccbd]

    • Add tests for x operator to ensure normalization retained [1e4fd2148] [59909ca9a]

    • Add a large number of string comparison tests [51c4ac08b]

      • Add tests to make sure synthetics compare properly with cmp [649d9dc50]

    • Improve ignoremark tests to cover many different cases [810e218c8]

      • Add ignoremark tests to cover JSON::Tiny regression + other issue [c185acc57]

    • Add generated tests (from UCD data) and manually created ones to ensure strings concatenation is stable, when the concatenated string would change the normalization. [2e1f7d92a][9f304b070] [64e385faf][0976d07b9][59909ca9a][2e1f7d92a] [88635564e] [a6bbc73cf] [a363e3ff1]

    • Add test for MoarVM Issue #566 .uniprop overflow [9227bc3d8]

    • Add tests to cover RT #128875 [0a80d0d2e]

    • Make new version of GraphemeBreakTest.t to better test grapheme segmentation [54c96043c]

    NQP Work

    Below is a listing of some of the commits I made to NQP. This included adding the ops I created over the course of the grant: eqatim, indexim, indexicim, eqaticim, and codes (gets the number of codepoints in a string rather than graphemes).

    The testing in NQP was inadequate for our string ops, so I added hundreds of tests for practically all of the string ops, so we could properly test the different variations of index* and eqat*.

    NQP Documentation

    • Add docs for a few variations of index/eqat [589a3dd5c] Bring the unicmp_s op docs up to date [091625799]

    • Document hasuniprop moar op [650840d74]

    NQP Tests

    • Add more index* tests to test empty string paths [8742805cb]

    • run indexicim through all of the indexic tests [26adef6be]

    • Add tests for RT #128875, ignorecase+ignoremark false positives [b96a0afe7]

    • Add tests for nqp::eqatim op on MoarVM [6fac0036e]

    Other Work

    • Added script to find undocumented NQP ops [0ead16794]

    • Added nqp::codes to QASTOperationsMAST.nqp [59421ffe1]

    • Update QASTRegexCompilerMAST to use new indexicim and eqaticim ops [18e40936a]

    • Added eqatim and indexim ops. Fix a bug when using ignoremark [9b94aae27]

    • Added nqp::hasuniprop op to QASTOperationsMAST.nqp [d03c4023a]

    6guts: Rakudo gets a new thread pool

    Published by jnthnwrthngtn on 2017-09-23T14:00:12

    Vienna.pm have funded me to work 50 hours on Perl 6. After some discussion, we decided I would first work on improving the thread pool scheduler, and then move on to continuing the work around non-blocking await, a feature of the upcoming Perl 6.d. In this post I’ll discuss the work on the thread pool scheduler, which was merged shortly after the latest Rakudo release to maximize testing time, and thus will appear in the next release (2017.10).

    What was wrong with the ThreadPoolScheduler before?

    My (by now slightly hazy) memory is that I wrote the initial Perl 6 thread pool implementation in the space of an hour or two on a train, probably heading to some Perl event or other. It was one of those “simplest thing that could possibly work” implementations that turned out to work well enough that it survived with only isolated tweaks and fixes until January this year. At that point, I added the initial bits of support for non-blocking await. Even that change was entirely additive, leaving the existing scheduling mechanism completely intact.

    When I first implemented the thread pool, there were approximately no people writing concurrent Perl 6 programs. Happily, that’s changed, but with it came a few bug reports that could be traced back to the thread pool. Along with those, there were things that, while not being outright bugs, were far from ideal.

    Before I dug into a re-design, I first summarized all of the problems I was aware of with the thread pool, so as I considered new designs I could “crash test” them against the problems that afflicted the previous design. Here’s my list of problems with the previous design.

    1. It was wasteful, spawning far too many threads in quite a lot of cases. A single use of Proc::Async would cause some threads to be created. A second use after that would add some more. A third use would add more. Even if these uses were not concurrent, threads would still be added, up until the maximum pool size. It was a very conservative way to make sure a thread would be available for each piece of work, provided the pool didn’t reach exhaustion point. But really, the pool doesn’t need more than a thread or two for many programs. RT #131915 was a ticket resulting from this behavior: each thread added to the memory consumption of the program, making memory use a good bit higher than it should have been for programs that really didn’t need more than a thread or two.
    2. Very active async I/O could swamp the thread pool with events. Since there was a single queue, then timer events – which might be being used to kill off a process after a timeout – may not have fired in a very timely manner at all, since the timer event would be stuck behind all of the I/O events. RT #130370
    3. Sometimes, despite the conservative mechanism described in #1, not enough threads got started and this led to occasional deadlocks. RT #122709
    4. It didn’t try to scale the number of threads to match the available CPU cores in any way. Just because there is a lot of work in the queue does not mean that we need more threads; if we are making progress and the CPU load is pretty high then adding more threads is unlikely to lead to more progress.
    5. In high-core-count systems, the default limit of 16 was too low. 32-core CPUs at tolerable prices very much exist by now!
    6. For programs that manage to really block a lot of threads with non-CPU bound work, we could deadlock. Most of that will go away with non-blocking await, but there will still be ways to write code that really blocks a bunch of OS threads and can’t make progress without yet more threads being made available to run something. At some point, people just have to write their programs differently, but the default limit of 16 threads was not very generous.
    7. Despite wishing to raise the maximum default pool size a good bit, we couldn’t do it because issues #1 and #4 meant we’d likely end up hitting the maximum in most programs, and memory use would become totally unreasonable.
    8. We suffered from poor thread affinity for events that would inevitably queue up due to serial supplies. For example, if two packets arrived from a socket then they might be picked up by different threads in the pool. If the first packet was still being processed, then the second would contend for the lock used to enforce serial processing of messages by supplies, and block until it was available.

    3 kinds of queue

    Problems 2 (active I/O swamping the queue and delaying timers) and 8 (poor thread affinity) are resolved in the new thread pool by recognizing that not all work given to the pool is equal. Timers really want to be dealt with in a timely manner, so in programs that have timers it makes sense to have a queue, and one or more worker threads, exclusively for time-based events. Work items that will need processing sequentially anyway should be assigned to a single thread, implying that each such “affinity worker” gets a queue of its own. Finally, there is a general queue for everything else, and one or more threads eat from it.

    Adding a supervisor

    The other problems were addressed by the addition of a Sufficiently Smart Supervisor. The supervisor is a thread created by the thread pool upon its first use, and living from then until the end of the program. It spends most of its time sleeping, waking up around 100 times a second to check how things are going. It is the supervisor that makes most of the decisions about how many threads to add to the pool. It factors in:

    Having the pool soft-limited to the number of threads, and reluctantly able to go beyond that, means that we can raise the maximum default pool size quite a lot; it now stands at 64 threads. In reality, many programs don’t even reach the CPU core count, as there’s never enough work to trigger the addition of more threads.

    Affinity workers

    Affinity worker threads aren’t added by the supervisor. Instead, when an affinity queue is requested, any existing affinity worker threads will have their queue lengths inspected. The one with the shortest queue will be picked, provided that the queue length is below a threshold. If it’s over the threshold (which increases per affinity worker thread added), then another affinity worker thread will be spawned and the work allocated to it.

    Outcomes

    For all the cases I’ve tried so far, the new scheduler seems to do either at least as well or better than the one it replaced – and sometimes much better. The deadlock due to insufficient threads bug is gone, thanks to the supervisor. Programs that do the occasional bit of work needing pool threads end up with just a thread or two, greatly reducing their memory consumption. A Cro web application that is processing just a handful of requests a second will now spawn just a few threads (and so use much less memory and get better locality), while one faced with some hundreds of requests per second will spawn more. And a program doing a lot of CPU-bound work now spawns as many threads as there are cores, which gives a small speedup compared to oversubscribing the CPU under the previous scheduler. Finally, timer events are delivered and handled in a timely way, even when there is a lot of output from a process.

    And next?

    Well, most immediately I should write some extra regression tests, so I can get the RT tickets mentioned earlier in the article closed up. Feel free to beat me to doing that, though! :-)

    That aside, I’d like to mention that while the new scheduler is a decided improvement, it’s also improvable. The heuristics it uses to decide when to add further threads can surely be tuned some more. The code doing that decision making is written in Perl 6, which I hope makes it accessible to those who would like to experiment and tweak.

    Once again, thanks to Vienna.pm for making this work possible. And, finally, a mention that I’m still looking for funding to help me keep on doing Perl 6 things for a sizable chunk of my work time; a handful of companies each sponsoring 10 hours a month would soon add up!


    p6steve: Clone Wars

    Published by p6steve on 2017-09-20T18:52:04

    Apologies to those that have OO steeped in their blood. I am a wary traveller in OO space,  maybe I am an technician, not an architect at heart. So for me, no sweeping frameworks unless and until they are needed. And, frankly, one can go a long way on procedural code with subroutines to gather repetitive code sequences.

    (And don’t get me started on functional programming…)

    Some time ago, I was tasked to write a LIMS in old perl. Physical ‘aliquots’ with injections of various liquids would be combined and recombined by bio-machines. This led to a dawning realization that these aliquot objects could be modelled in an object style with parent / child relationships. After a few weeks, I proudly delivered my lowly attempt at ‘Mu’ for this (and only this) problem. Kudos to the P6 team – after a couple of weeks in here, it’s just sensational the level of OO power that the real Mu delivers:

    Screenshot 2017-09-20 19.32.21

    Now, hard at work, on the perl6 version of Physics::Unit, I am wondering how to put the OO theory into productive practice. One of my aims was to find a medium sized (not core tools) problem that (i) I know something about and (ii) would be a good-sized problem to wrangle.

    So I am enjoying the chance to design some classes and some interfaces that will make this all hang together. But – as an explorer, it has become clear that I only have three options. The problem runs like this:

    Initially I had some success with object types ::T – but these only let you read the type and  duplicate if needed for a new left hand side container. Then I tried the built in (shallow) clone method. But…

    Ultimately, thanks to rosettacode.org, I worked out that $x.perl.EVAL with some ~~ s/// substitions on the way would do the trick!

    Phew. Please comment below if you have a better way to share – or would like to point out the risks of this technique…


    Weekly changes in and around Perl 6: 2017.38 Color Me Booked

    Published by liztormato on 2017-09-18T21:40:33

    There’s been a lot of book activity in the Perl 6 world lately. Andrew Shitov announced his new book Perl 6 Deep Dive (preliminary table of contents). And Moritz Lenz also continued working on his “Parsing with Perl 6 Regexes and Grammars” book. To top it off this week, Zoffix Znet announced the Rakudo Book Project – a plan to write some Rakudo books (/r/perl and /r/perl6 comments).
    Butterflies Galore!
    Please check out his plans and support him in any way you can!

    AlexDaniel++ for his second release

    Aleks-Daniel Jakimenko-Aleksejev has done his second Rakudo compiler release! The announcement for Rakudo Perl 6 2017.09 shows quite a number of fixes and improvements again this month. Please note there are currently no plans for creating a Rakudo Star release for this compiler release.

    London Perl Workshop – 25 November

    Saturday 25 November will see another London Perl Workshop 2017. And yours truly would love to see a lot of Rakudo Perl 6 presentations there: so please submit your presentation. Hope to see you there!

    New ThreadPoolScheduler implementation

    Jonathan Worthington started work on a new thread pool scheduler (which got merged after the 2017.09 release because of possible ecosystem fallout). This implementation has separate general and timer queues with separate workers, and also introduces affinity queues, which are intended for cases where events will be fed into a Supply, and thus there’s no point having lots of threads competing over them only to immediately stumble over each other. The separate timer queue helps when timer events are being delayed, for example if a process is producing a load of output.

    This implementation also adds a supervisor, which is where the smarts on how many threads to have in the pool will be put. For now, it is already smart enough to start a lot less threads than the previous scheduler when they obviously aren’t needed. This helps with memory consumption. And it can add more threads on demand when needed to break deadlocks. The default maximum number of threads has been raised to 64, now that the scheduler does not start up the maximum number of threads even when they don’t have any work to do.

    For debugging, the RAKUDO_SCHEDULER_DEBUG and RAKUDO_SCHEDULER_DEBUG_STATUS environment variables can be set.

    This work has been kindly sponsored by the Vienna Perl Mongers.

    Other Core Developments

    These features made it to the 2017.09 compiler release.

    Other blog posts

    Meanwhile on Twitter

    Meanwhile on StackOverflow

    Meanwhile on perl6-users

    Ecosystem Additions

    Quite a nice catch this week!

    Winding down

    Between the rain and the wind, quite a lot happened in the Rakudo Perl 6 world yet again. Sometimes we forget how many ways we found how not to make Perl 6. With that in mind, see you next week for more Rakudo Perl 6 goodies!


    6guts: MoarVM Specializer Improvements Part 2: Planning

    Published by jnthnwrthngtn on 2017-09-17T22:39:51

    It’s been a good while since I posted part 1 of this little series. In the meantime, I paid a visit to the Swiss Perl Workshop, situated in a lovely hotel in the beautiful village of Villars. Of course, I was working on my slides until the last minute – but at least I could do it with this great view!

    DSC08286-small

    I gave two talks at the workshop, one of them about the MoarVM specializer (slidesvideo). Spoiler alert: the talk covers much of what I will write about in this blog series. :-) Now that I’ve recovered from the talk writing and the travel, it’s time to get back to writing here. Before I dig in to it, I’d like to say thanks to The Perl Foundation and their sponsors, who have made this work possible.

    Where we left off

    In the last part I discussed how we collected statistics about running code, in order to understand what to optimize and how to optimize it. The running example, which I’ll continue with, was this

    sub shorten($text, $limit) is export {
        $text.chars > $limit
            ?? $text.substr(0, $limit) ~ '...'
            !! $text
    }
    

    And an example of the kind of information collected is:

    Latest statistics for 'infix:<~>' (cuid: 4245, file: SETTING::src/core/Str.pm:2797)
    
    Total hits: 367
    
    Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
    Positional flags: obj, obj
    
        Callsite hits: 367
    
        Maximum stack depth: 32
    
        Type tuple 0
            Type 0: Str (Conc)
            Type 1: Str (Conc)
            Hits: 367
            Maximum stack depth: 32
    

    Which, to recap, tells us that the ~ infix operator was called 367 times so far, and was always passed two Str instances, neither of which was held in a Scalar container.

    The job of the planner

    The planner is the component that takes this statistical data and decides what code to optimize and what cases to optimize it for. It turns out to be, at least at the time of writing, one of the smaller parts of the whole specialization subsystem; subtract memory management code and debug bits and it’s not much over 100 lines.

    Its inputs are a list of static frames whose statistics were updated by the latest data. There’s no point considering anything whose statistics did not change, since there will be nothing new since the last time the planner saw them and so the conclusion would be the same. In Perl 6 terms, a static frame may represent a routine (SubMethod), Block, or Code thunk.

    Initial filtering

    The first thing the planner looks at is whether the static frame got enough hits to be worth even considering investing time on. So if it sees something like this:

    Latest statistics for 'command_eval' (cuid: 8, file: src/Perl6/Compiler.nqp:31)
    
    Total hits: 1
    
    No interned callsite
        Callsite hits: 1
    
        Maximum stack depth: 5
    

    It can simply ignore it based on the total hits alone. Something that gets a single hit isn’t worth spending optimization effort on.

    There are cases where the total hits are very low – perhaps just a single hit – but optimization is still interesting, however. Here is the loop that I used to make shorten hot so we could look at its statistics and plan:

    for 'x' x 50 xx 100000 {
        shorten($_, 42)
    }
    

    The statistics for the mainline of the program, containing the loop, start out like this:

    Latest statistics for '<unit>' (cuid: 4, file: xxx.p6:1)
    
        Total hits: 1
        OSR hits: 273
    
        Callsite 0x7f1cd6241020 (0 args, 0 pos)
    
            Callsite hits: 1
    
            OSR hits: 273
    
            Maximum stack depth: 10
    

    The total hits may well be 1, but obviously the loop inside of this code is hot and would be worth optimizing. MoarVM knows how to perform On Stack Replacement, where code running in a frame already on the callstack can be replaced with an optimized version. To figure out whether this is worth doing, it records the hits at “OSR points” – that is, each time we cross the location in the loop where we might replace the code with an optimized version. Here, the loop has racked up 273 OSR hits, which – despite only one hit in terms of entering the code at the top – makes it an interesting candidate for optimization.

    Hot callsites

    In MoarVM, a callsite (of type MVMCallsite) represents the “shape” of a set of arguments being passed. This includes:

    When bytecode is produced, callsites are interned – that is to say, if we have two calls that are both passing two positional object arguments, then they will share the callsite data. This makes them unique within a compilation unit. When bytecode is loaded, then any callsites without flattening are further interned by the VM, meaning they are globally shared across all VM instances. This means that they can then be used as a kind of “key” for things like multi-dispatch caching. The specializer’s statistics are also aggregated by callsite.

    The planner, therefore, takes a look at what callsites showed up. Often, code will be called with a consistent callsite, but things with optional parameters may not be. For example, if we were to run:

    sub foo(:$bar) { }
    for ^100000 { rand < 0.9 ?? foo(bar => 42) !! foo() }
    

    The the statistics of foo might end up looking like this:

    Latest statistics for 'foo' (cuid: 1, file: -e:1)
    
    Total hits: 367
    
    Callsite 0x506d920 (2 args, 0 pos)
      - bar
    
        Callsite hits: 331
    
        Maximum stack depth: 13
    
        Type tuple 0
            Type 0: Int (Conc)
            Hits: 331
            Maximum stack depth: 13
    
    Callsite 0x7f3fd7461620 (0 args, 0 pos)
    
        Callsite hits: 36
    
        Maximum stack depth: 13
    
        Type tuple 0
            Hits: 36
            Maximum stack depth: 13
    

    Unsurprisingly, the case where we pass bar is significantly more common than the case where we don’t, so the planner would in this case favor spending time making a specialization for the case where bar was passed.

    As an aside, specialization by optional argument tends to be a good investment. A common pattern is:

    sub foo($bar, Bool :$some-option) {
        if $some-option {
            extra-stuff();
        }
    }
    

    In the case that $some-option is not passed, the specializer:

    That stripping out of code might in turn bring the optimized version below the inlining limit, whereas before it might have been too big, which could bring about an additional win.

    Hot type tuples

    Once one or more hot callsites have been identified, the types that were passed will be considered. In the best case, that turns out to be very boring. For example, here:

    Latest statistics for 'chars' (cuid: 4219, file: SETTING::src/core/Str.pm:2728)
    
    Total hits: 273
    
    Callsite 0x7f1cd6231ae0 (1 args, 1 pos)
    Positional flags: obj
    
        Callsite hits: 273
    
        Maximum stack depth: 14
    
        Type tuple 0
            Type 0: Scalar (Conc) of Str (Conc)
            Hits: 273
            Maximum stack depth: 14
    

    The code is consistently called with a Scalar container holding a Str instance. Therefore, we can see this code is monomorphic in usage. It is quite clearly going to be worth spending time producing a specialization for this case.

    In other cases, the code might be somewhat polymorphic:

    Latest statistics for 'sink' (cuid: 4795, file: SETTING::src/core/List.pm:694)
    
    Total hits: 856
    
    Callsite 0x7f1cd6231ae0 (1 args, 1 pos)
    Positional flags: obj
    
        Callsite hits: 856
    
        Maximum stack depth: 31
    
        Type tuple 0
            Type 0: Slip (Conc)
            Hits: 848
            Maximum stack depth: 31
    
        Type tuple 1
            Type 0: List (Conc)
            Hits: 7
            Maximum stack depth: 26
    

    Here we can see the sink method is being called on both Slip and List. But it turns out that the Slip case is far more common. Thus, the planner would conclude that the Slip case is worth a specialization, and the List case is – at least for now – not worth the bother.

    Sometimes, a situation like this comes up where the type distribution is more balanced. Any situation where a given type tuple accounts for 25% or more of the hits is considered to be polymorphic. In this case, we produce the various specializations that are needed.

    Yet another situation is where there are loads of different type tuples, and none of them are particularly common. One place this shows up is in multiple dispatch candidate sorting, which is seeing all kinds of types. This case is megamorphic. It’s not worth investing the time on specializing the code for any particular type, because the types are all over the place. It is still worth specialization by callsite shape, however, and there may be other useful optimizations that can be performed. It can also be compiled into machine code to get rid of the interpreter overhead.

    Sorting

    By this point, the planner will have got a list of specializations to produce. Specializations come in two forms: observed type specializations, which are based on matching a type tuple, and certain specializations, which are keyed only on an interned callsite (or the absence of one). Certain specializations are produced for megamorphic code. The list of specializations are then sorted. But why?

    One of the most valuable optimizations the specializer performs is inlining. When one specialized static frame calls another, and the callee is small, then it can often be incorporated into the caller directly. This avoids the need to create and tear down an invocation record. Being able to do this relies on a matching specialization being available – that is, given a set of types that will be used in the call, there should be a specialization for those types.

    The goal of the sorting is to try and ensure we produce specializations of callees ahead of specializations of their callers. You might have wondered why every type tuple and callsite was marked up with the maximum call depth. Its role is for use in sorting: specializing things with the deepest maximum call stack depth first means we will typically end up producing specialized code for potential inlinees ahead of the potential inliners. (It’s not perfect, but it’s cheap to compute and works out pretty well. Perhaps more precise would be to form a graph and do a topological sort.)

    Dumping

    The plan is then dumped into the specialization log, if that is enabled. Here are some examples from our shorten program:

    ==========
    
    Observed type specialization of 'infix:<~>' (cuid: 4245, file: SETTING::src/core/Str.pm:2797)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
    Positional flags: obj, obj
    
    It was planned for the type tuple:
        Type 0: Str (Conc)
        Type 1: Str (Conc)
    Which received 367 hits (100% of the 367 callsite hits).
    
    The maximum stack depth is 32.
    
    ==========
    
    Observed type specialization of 'substr' (cuid: 4316, file: SETTING::src/core/Str.pm:3061)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231a60 (3 args, 3 pos)
    Positional flags: obj, obj, obj
    
    It was planned for the type tuple:
        Type 0: Str (Conc)
        Type 1: Scalar (Conc) of Int (Conc)
        Type 2: Scalar (Conc) of Int (Conc)
    Which received 274 hits (100% of the 274 callsite hits).
    
    The maximum stack depth is 15.
    
    ==========
    
    Observed type specialization of 'chars' (cuid: 4219, file: SETTING::src/core/Str.pm:2728)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231ae0 (1 args, 1 pos)
    Positional flags: obj
    
    It was planned for the type tuple:
        Type 0: Scalar (Conc) of Str (Conc)
    Which received 273 hits (100% of the 273 callsite hits).
    
    The maximum stack depth is 14.
    
    ==========
    
    Observed type specialization of 'substr' (cuid: 2562, file: SETTING::src/core/Cool.pm:74)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231a60 (3 args, 3 pos)
    Positional flags: obj, obj, obj
    
    It was planned for the type tuple:
        Type 0: Scalar (Conc) of Str (Conc)
        Type 1: Int (Conc)
        Type 2: Scalar (Conc) of Int (Conc)
    Which received 273 hits (100% of the 273 callsite hits).
    
    The maximum stack depth is 14.
    
    ==========
    
    Observed type specialization of 'infix:«>»' (cuid: 3165, file: SETTING::src/core/Int.pm:365)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
    Positional flags: obj, obj
    
    It was planned for the type tuple:
        Type 0: Int (Conc)
        Type 1: Scalar (Conc) of Int (Conc)
    Which received 273 hits (100% of the 273 callsite hits).
    
    The maximum stack depth is 14.
    
    ==========
    
    Observed type specialization of 'shorten' (cuid: 1, file: xxx.p6:1)
    
    The specialization is for the callsite:
    Callsite 0x7f1cd6231ac0 (2 args, 2 pos)
    Positional flags: obj, obj
    
    It was planned for the type tuple:
        Type 0: Str (Conc)
        Type 1: Int (Conc)
    Which received 273 hits (100% of the 273 callsite hits).
    
    The maximum stack depth is 13.
    
    ==========
    

    Compared to before

    So how does this compare to the situation before my recent work in improving specialization? Previously, there was no planner at all, nor the concept of certain specializations. Once a particular static frame hit a (bytecode size based) threshold, the type tuple of the current call was taken and a specialization produced for it. There was no attempt to discern monomorphic, polymorphic, and megamorphic code. The first four distinct type tuples that were seen got specializations. The rest got nothing: no optimization, and no compilation into machine code.

    The sorting issue didn’t come up, however, because specializations were always produced at the point of a return from a frame. This meant that we naturally would produce them in deepest-first order. Or at least, that’s the theory. In practice, if we have a callee in a conditional that’s true 90% of the time, before there was a 10% chance we’d miss the opportunity. That’s a lot less likely to happen with the new design.

    Another problem the central planning avoids is a bunch of threads set off executing the same code all trying to produce and install specializations. This was handled by letting threads race to install a specialization. However, it could lead to a lot of throwaway work. Now the planner can produce them once; when it analyzes the logs of other threads, it can see that the specialization was already produced, and not plan anything at all.

    Future plans

    One new opportunity that I would like to exploit in the future is for derived type specializations. Consider assigning into an array or hash. Of course, the types of the assigned values will, in a large application, be hugely variable. Today we’d consider ASSIGN-POS megamorphic and so not try and do any of the type-based optimizations. But if we looked more closely, we’d likely see the type tuples are things like (Array, Int, Str)(Array, Int, Piranha)(Array, Int, Catfish), and so forth. Effectively, it’s monomorphic in its first two arguments, and megamorphic in its third argument. Therefore, a specialization assuming the first two arguments are of type Array and Int, but without specialization of the third argument, would be a win.

    Next time…

    We now have a specialization plan. Let’s go forth and optimize stuff!


    Zoffix Znet: The Rakudo Book Project

    Published on 2017-09-17T00:00:00

    A plan to write some Rakudo books

    gfldex: The Siege Can Continue

    Published by gfldex on 2017-09-16T20:14:31

    A wise internet spaceship pirate once wrote: “Whining gets you stuff. That’s why humans are at the top of the food chain.”. My Whining got me a fix that put an end to segfaults on long running scripts that react to http requests.

    So I continued to add missing bits to my golfed httpd and found another good use for term:<>.

    constant term:<HTTP-HEADER-404> = "HTTP/1.1 404 Not Found", "Content-Type: text/plain; charset=UTF-8", "Content-Encoding: UTF-8", "";

    Without term:<> the compiler would think I wanted to substract 400 from a http header.

    If you got some time to spare please write a whiny blog post about a crash that is bugging you – works like a charm.


    gfldex: Goto the Last Fifo

    Published by gfldex on 2017-09-13T08:45:21

    I always admired the elegance of the sysfs. You take a text and write it to a file to talk to a function running in kernel space. As soon as you know how to work with files, you can change the systems behaviour and reuse access control mechanism without any special tools. It’s very easy to script and dump (parts) of the systems state.

    Linux and friends come with fifos that serve the same purpose. Create a fifo, set access rights and start reading from that pseudo-file. Very easy to do in Perl 5.

    my $fifo; open($fifo, "+);
    while (<$fifo>) {
        do-things-with $_;
    } 

    Rakudo doesn’t really know about fifos yet, as a result it doesn’t block on a read of a fifo that don’t got data anymore. After a bit of fiddeling I found a way around that problem.

          1 use v6.c;
          2
          3 # `mkfifo radio-fifo-in`
          4 # `echo "foo" > radio-fifo-in`
          5 # `echo "foo^D" > radio-fifo-in`
          6
          7 my $fifo-in = open(„radio-fifo-in“, :r);
          8
          9 LABEL: loop {
         10     react {
         11         whenever supply { .emit for $fifo-in.lines } {
         12             say .Str;
         13             last LABEL if /‚^D‘/;
         14         }
         15     }
         16 }
    

    I learned that whenever reacts to last and will teach the docs about it later today. Luckily Perl 6 got labels so we can tell last where to goto.

    UPDATE: scovit found a short expression that gets very close to the behaviour of Perl 5.

    my $fifo = open("radio-fifo-in", :r);
    while defined $_ = $fifo.get { .say }

    gfldex: We Need to Siege Moar

    Published by gfldex on 2017-09-10T20:32:25

    Alas, my attempt to avoid bugs by moving with little load was not as fruitful as I hoped.

    Siegeing my tiny httpd lasted for about an hour and then the gates gave way. With dmesg the culprit was swiftly revealed.

    [4361268.087988] moar[12491]: segfault at 3c8b ip 00007f86b426868b sp 00007f86aaff86b0 error 4 in libmoar.so[7f86b407b000+55d000]

    A had the suspicion that close to 70000 tests will only get us the failure modes that are cought within minutes. Looks like there is quite some squashing left to do for the moar team.


    gfldex: Golfing httpd

    Published by gfldex on 2017-09-10T19:12:17

    I’m building a internet radio player and wanted both a curses and web interface to switch stations. Looking at the various modules in the ecosystem provided lots of options that do lots of things. I had a bug hunt in those field a while ago and didn’t like it. The amount of httpd I need is fairly small and I thought to myself: „Somebody should golf that!“. And so I did.

    The objective is to have as little http as I can get away with. I want to display the names of available stations and receive a station-id to switch stations. And maybe a stop, play and record button. That can be done with with lines of text. So text/plain it is.

    I can stuff channel-ids and button names into URLs. What means implementing only GET will do. Caching or other fancy stuff wont be needed, what makes the HTTP-header static. The most complex thing to do would be taking the URL apart.

    That’s what I came up with:

    my sub term:<now>() { DateTime.now.Instant but role :: { method Str { self.DateTime.hh-mm-ss } } };
    my &BOLD = $*OUT.t ?? sub (*@s) { "\e[1m{@s.join('')}\e[0m" } !! sub (|c) { c };
    
    constant HTTP-HEADER = "HTTP/1.1 200 OK", "Content-Type: text/plain; charset=UTF-8", "Content-Encoding: UTF-8", "";
    constant term:<HTTP-HEADER-404> = "HTTP/1.1 404 Not Found", "Content-Type: text/plain; charset=UTF-8", "Content-Encoding: UTF-8", "";
    
    react {
        whenever IO::Socket::Async.listen('0.0.0.0', 8080) -> $conn {
            note „{now} incomming connection from {$conn.peer-host}:{$conn.peer-port}“;
            my @msg = HTTP-HEADER;
            whenever $conn.Supply.lines {
                if /^GET  (<[\w„/“]>+) [„HTTP“ \d „/“ \d]? / {
                    note „{now} GET $0“;
                    given $0.Str {
                        @msg.push: „running since {BEGIN now} UTC“ when „/status“;
    
                        @msg.push: „Hello World!“ when „/“;
    
                        done when „/exit“
                        default {
                            @msg = HTTP-HEADER-404;
                            @msg.push: „Resource {.Str} not found.“;
                        }
                    }
                }
    
                if /^$/ { 
                    for {
                        once note .Str;
                        $conn.print(.Str ~ "\n")
                    }
                    $conn.close;
                }
            }
            CLOSE {
                note „{now} connection closed“;
            }
            CATCH { default { warn BOLD .^name, ': ', .Str; warn BOLD "handled in $?LINE"; } }
        }
    }

    The whole thing waits for a connection. Then takes the URL apart and fills an Array with lines of text. That Array is then send back to the client. A few lines of logging and error handling.

    The less code there is the less ground a bug hunter has to cover. Luckily Perl 6 is very friendly to golfers.

    UPDATE: Add proper handling of the http header terminator, what leads to less error handling. Also, add some more log output.


    p6steve: perl6 Atomic Fission

    Published by p6steve on 2017-09-03T18:46:48

    I have been listening to the reaction on the web to the incorporation of an emoji as a unicode symbol in perl6 rakudo. Here’s a flavour…

    (https://p6weekly.wordpress.com/2017/08/21/2017-34-going-atomic/ )

    The rationale for the use of unicode symbols is as follows:

    BTW- ASCII versions are known as Texas versions since they are always bigger

    Certainly this has caused some consternation – ranging from how can I type ⚛️ on my keyboard (hit CTRL-CMD-SPACE if you are on macOS ) to this will never be accepted for the coding standards of my company.

    On reflection, while it is understandable that programmers have a well established comfort zone of ASCII text and using English for keywords, I think that perl6 is leading the way on an irresistible path. Of the 6.5bn people on the planet, only a small fraction prefer to work in English – or even in Latin alphabets. Now, the pioneering work to embed unicode in a programming language will open the doors to all kinds of invention. What about:

    And this, in combination with perl6 Grammars, opens some interesting conceptual doors.

    ~p6steve


    samcv: Grant Status Update 4

    Published by Samantha McVey on 2017-08-31T07:00:00

    This is my fourth grant update. A productive and noteworthy month. I gave two presentations at YAPC-EU in the Netherlands earlier this month, on High End Unicode and MoarVM Internals (links to my slides). It was my first Perl conference and I had a great time meeting people in the Perl 6 and Perl community!

    Despite the conference, I made some big changes this month: big speedups for indexing operations, as well as an update to the Unicode 10 database. Before I could regenerate the database I had to fix ucd2c.pl, the script that generates the database, to be more deterministic and have unique property values per property (without this regenerating it with the Unicode 10 data would result in partially broken Unicode properties). I also implemented the Knuth-Morris-Pratt string search algorithm for our index function.

    I have added documentation to MoarVM which is an overview of how our strings are represented as well as details about normalization, synthetics and other topics. On Hacker News someone noted that this was not documented anywhere, so I made sure to add documentation for this. If you are interested in some of the internals of MoarVM, I’m hoping the document should be pretty accessible even if you are not a MoarVM developer.

    Table of Contents

    Index

    Knuth-Morris-Pratt String Search Algorithm

    Previously we did not have any natively implemented efficient string search algorithms, we only had memmem which was optimized, but would only work if both strings were flat and both the same bitwidth per grapheme.

    Now all index operations with a 4096 or under length needle are optimized and no longer use brute force methods for searching (this does not include case insensitive or ignoremark indexing operations).

    When my KMP implementation in MVM_string_index is used:
    • Strings with non-matching types

      • That have more than one codepoint in the needle

      • That don’t have a needle more than 4096 graphemes long

    • Speedup can be small, or large depending on the pattern of the needle

      • Repeating letters can cause multiple times speed increase depending on the haystack

    We still use memmem when both strings are both flat strings and both the same data type (which use Knuth-Morris-Pratt or Booyer-Moore depending on platform). Most of the strings we will work with — especially once the strings have been modified — are strands.

    Grapheme Caching

    Since the Knuth-Morris-Pratt algorithm often will request the same grapheme again, but will never request an earlier point in the haystack, I was able to optimize the KMP string search function I added to cache the graphemes so we can use a grapheme iterator instead of using MVM_string_get_grapheme_at_nocheck which for strands, will have to find its place in the haystack from the beginning each time. What this grapheme caching does, is it caches the last returned grapheme, so if the same grapheme is requested again it continues to return that without requesting a new grapheme from the grapheme iterator. If it requests the next grapheme, or some number of graphemes after the current position, it will either grab the next grapheme or move the grapheme iterator forward (skip a certain number of graphemes) and then get a grapheme from the grapheme iterator.

    Here are some timings I got before and after the grapheme iterator for an English language book text file, searching for English words misspelled by one letter (so it would search the entire file but be representative of searching for actual words).

    Description

    caching

    master

    index with needle (strand haystack)

    0.832730

    2.67385181

    index with word needle (flat haystack)

    0.8357464

    1.01405131

    As you can see from the table, we actually even got savings when the haystack was flat. This surprised me, since getting a grapheme with a flat haystack points to a function with one switch and then returning the integer of the blob array at the specified position. I am guessing this is likely caused by the caching function generating more efficient machine code, since the savings can’t be only explained by the savings from caching the grapheme — the speed up was seen even when I manipulated the needle so that there were no cache “hits” and it always requested different graphemes.

    ℹ️
    The grapheme caching has not yet been merged, but is ready to go after fixing some merge conflicts

    Inlining MVM_string_get_grapheme_at_nocheck

    After I finished writing the previous section, I was able to discover the reason for the speed difference with flat haystacks. By getting MVM_string_get_grapheme_at_nocheck to inline I was able to speed up index operations for a flat haystack by 2x. This is on top of the speedups of about we got from the KMP algorithm! This should affect any code which uses this function, making the function 2x as fast for flat strings, and likely a slight speedup for strands as well. This has huge implications as this function is used extensively throughout our string ops. This may change what I do with the grapheme caching code. It is likely I will change it so that it uses the grapheme caching for strands, and uses MVM_string_get_grapheme_at_nocheck for flat haystacks.

    Single Grapheme Needle Index

    I sped up string index by 9x when the haystack is a strand and the needle is 1 grapheme long. For this we use a grapheme iterator when looking for a single grapheme inside of a strand, and use a simpler faster loop since we are only looking for a single grapheme.

    Unicode

    Property Values

    Property values are now unique for each property in the Unicode Database. Since property values are non-unique, we must store them uniquely. Previously this would cause only the property whose value was last in the C data array to work. Now that property values are unique for each property code, that should no longer cause breakage.

    Unicode Database Updated to Unicode 10

    The Unicode database has been updated for Unicode 10. Now that the previous breaking point — the database not always generating properly each time — was solved, I was able to make changes to the Unicode database, including updating it to version 10. The main significant changes in this release were the addition of new scripts, symbols and Emoji. You can see the full list of changes here.

    Unicode Names

    Hangul (Korean) characters now have names in our name database. These are generated algorithmically on database creation by decomposing the characters and concatening the 2 or 3 resulting codepoint’s Jamo names. This is needed since the Unicode data file does not include the name for these codepoints and leaves it to the implementer to create them. A roast test has been added to check for support of these Korean characters.

    Fixed a bug in ignoremark/ordbaseat

    Ignoremark relys on the MoarVM ordbaseat op under the hood. For graphemes made up of a single character, we decompose the character and get the resulting base charactewhich it relays on assumed once we had gotten a synthetic’s base character that we already had the final base character. For non-synthetics we would decompose This wasn’t true for example with: "\c[LATIN SMALL LETTER J WITH CARON, COMBINING DOT BELOW]" The "j with caron" would not be decomposed because it was the base character of a synthetic.

    This also fixes a bug in indexim and eqatim ops which caused it to fail when encountering a synthetic.

    We now have ord_getbasechar handle synthetic and non-synthetic’s and have MVM_string_ord_basechar_at just handle the needed string based checks, then grabbing the grapheme from the string and passing its value on to ord_getbasechar.

    Synthetic Grapheme Rework

    I reworked MVMNFGSynthetic, the C struct which represents our synthetic graphemes. This is done to (eventually) get Prepend support working for things like ignoremark regex and the ordbaseat op which gets the base codepoint of a grapheme.

    Unlike all other marks which come after a base character, Prepend characters come before the base character. All of our current code assumed the first codepoint of the synthetic is the base character.

    For now, we also assume the first codepoint of the synthetic is the base character, but we now have a base_index which will be able to hold the index of the base character in the codepoint array.

    While this doesn’t add Prepend support to everything, this is one step toward getting that working, and decoupling base codepoints from being the first codepoint of a synthetic.

    ℹ️
    This is not yet merged, but ready to go

    Collation

    There’s not as much to say about this, since it is almost ready. As I said last month it is fully functional, and I have since done some work cleaning it up. Left to be done is integrating the data generation into the other Unicode database generation script. Although the file and code it uses to generate is different, ideally we will have only one script to run to generate and update all of our files on new Unicode releases.

    Previously I created a script which downloads all of the Unicode files needed in the generation, so an update should hopefully only require a run of the download script to fetch the UCD, the UCA (Unicode Collation Algorithm), and Emoji data.

    One thing I had not talked about previously was some of the ways I have sped up the UCA implementation to be quite fast and efficient, only having to evaluate the collation arrays efficiently even for very large strings. If you are interested, the details are in my slides.

    NQP Documentation

    In addition to the MoarVM string documentation I mentioned in the introduction, I also ensured all of the ops I have added over my project are documented in NQP.

    Ensured all the NQP index, eqat variations (indexic, indexim, indexicim, eqatic, eqaticim) are documented, and added the couple that had not yet been added to the ops list.

    Added a Perl 6 program which gets a list of ops which are not mentioned in the oplist which will hopefully be useful to other developers.

    The Goal is in Sight

    The grant is finally winding down. I have all significant things implemented although not everything has been merged yet. I also have implemented additional things that were not part of the grant (Knuth-Morris-Pratt string search algorithm).

    Left to be Done

    To finally complete my work and fullfill my objectives I will make any additional documentation or tests that need to be made. If other Perl 6 devs or users of the community want to make any requests for Unicode or string related documentation, you can send me an email or send me a message on freenode IRC (nick samcv).

    Other than this, I only need to clean up and merge the collation arrays branch, merge the synthetic grapheme reword, and update the grapheme caching for KMP branch to use caching for strand haystacks and use the now inlined MVM_string_get_grapheme_at_nocheck for flat haystacks.

    Zoffix Znet: On Troll Hugging, Hole Digging, and Improving Open Source Communities

    Published on 2017-08-31T00:00:00

    How to be better

    stmuk: Swiss Perl Workshop 2017

    Published by stmuk on 2017-08-30T17:48:17

    cropped-lake_castle1.jpeg

    After a perilous drive up a steep, narrow, winding road from Lake Geneva we arrived at an attractive Alpine village (Villars-sur-Ollon) to meet with fellow Perl Mongers in a small restaurant.  There followed much talk and a little clandestine drinking of exotic spirits including Swiss whisky. The following morning walking to the conference venue there was an amazing view of mountain ranges. On arrival I failed to operate the Nespresso machine which I later found was due to it simply being off.  Clearly software engineers should never try to use hardware. At least after an evening of drinking.

    Wendy’s stall was piled high with swag including new Bailador (Perl 6 dancer like framework) stickers, a Shadowcat booklet about Perl 6 and the new O’Reilly “Thinking in Perl 6″. Unfortunately she had sold out of Moritz’s book “Perl 6 Fundamentals” (although there was a sample display copy present). Thankfully later that morning I discovered I had a £3 credit on Google Play Books so I bought the ebook on my phone.

    The conference started early with Damian Conway’s Three Little Words.  These were “has”, “class” and “method” from Perl 6 which he liked so much that he had added them to Perl 5 with his “Dios” – “Declarative Inside-Out Syntax” module.  PPI wasn’t fast enough so he had to replace it with a 50,000 character regex PPR. Practical everyday modules mentioned included Regexp::Optimizer and Test::Expr. If the video  doesn’t appear shortly on youtube a version of his talk dating from a few weeks earlier is available at https://www.youtube.com/watch?v=ob6YHpcXmTg

    Jonathan Worthington returned with his Perl 6 talk on “How does deoptimization help us go faster?” giving us insight into why Perl 6 was slow at the Virtual Machine level (specifically MoarVM). Even apparently simple and fast operations like indexing an array were slow due to powerful abstractions, late binding and many levels of Multiple Dispatch. In short the flexibility and power of such an extensible language also led to slowness due to the complexity of code paths. The AST optimizer helped with this at compile time but itself took time and it could be better to do this at a later compile time (like Just In Time).  Even with a simple program reading lines from a file it was very hard to determine statically what types were used (even with type annotations) and whether it was worth optimizing (since the file could be very short).

    The solution to these dynamic problems was also dynamic but to see what was happening needed cheap logging of execution which was passed to another thread.  This logging is made visible by setting the environment variable MVM_SPESH_LOG to a filename. Better tooling for this log would be a good project for someone.

    For execution planning we look for hot (frequently called) code, long blocks of bytecode (slow to run) and consider how many types are used (avoiding “megamorphic” cases with many types which needs many versions of code).  Also analysis of the code flow between different code blocks and SSA.  Mixins made the optimization particularly problematic.

    MoarVM’s Spesh did statistical analysis of the code in order to rewrite it in faster, simpler ways. Guards (cheap check for things like types) were placed to catch cases where it got it wrong and if these were triggered (infrequently) it would deoptimize as well, hence the counterintuitive title since “Deoptimization enables speculation” The slides are at http://jnthn.net/papers/2017-spw-deopt.pdf with the video at https://www.youtube.com/watch?v=3umNn1KnlCY The older and more dull witted of us (including myself) might find the latter part of the video more comprehensible at 0.75 Youtube speed.

    After a superb multi-course lunch (the food was probably the best I’d had at any Perl event) we returned promptly to hear Damian talk of “Everyday Perl 6”. He pointed out that it wasn’t necessary to code golf obfuscated extremes of Perl 6 and that the average Perl 5 programmer would see many things simpler in Perl 6.  Also a rewrite from 5 to 6 might see something like 25% fewer lines of code since 6 was more expressive in syntax (as well as more consistent) although performance problems remained (and solutions in progress as the previous talk had reminded us).

    Next Liz talked of a “gross” (in the numerical sense of 12 x 12 rather than the American teen sense) of Perl 6 Weeklies as she took us down memory lane to 2014 (just about when MoarVM was launched and when unicode support was poor!)  with some selected highlights and memories of Perl 6 developers of the past (and hopefully future again!). Her talk was recorded at https://www.youtube.com/watch?v=418QCTXmvDU

    newton

    Cal then spoke of Perl 6 maths which he thought was good with its Rats and FatRats but not quite good enough and his ideas of fixing it.  On the following day he showed us he had started some TDD work on TrimRats. He also told us that Newton’s Method wasn’t very good but generated a pretty fractal. See https://www.youtube.com/watch?v=3na_Cx-anvw

    Lee spoke about how to detect Perl 5 memory leaks with various CPAN modules and his examples are at https://github.com/leejo/Perl_memory_talk

    The day finished with Lightning Talks and a barbecue at givengain — a main sponsor.

    On the second day I noticed the robotic St Bernards dog in a tourist shop window had come to life.

    dog1

    Damian kicked off the talks with my favourite of his talks,  “Standing on the Shoulders of Giants”, starting with the Countess of Lovelace and her Bernoulli number program. This generated a strange sequence with many zeros. The Perl 6 version since it used rational numbers not floating point got the zeros right whereas the Perl 5 version initially suffered from floating point rounding errors (which are fixable).

    Among other things he showed us how to define a new infix operator in Perl 6. He also showed us a Perl 6 sort program that looked exactly like LISP even down to the Lots of Irritating Superfluous Parentheses. I think this was quicksort (he certainly showed us a picture of Sir Tony Hoare at some point). Also a very functional (Haskell-like) equivalent  with heavy use of P6 Multiple Dispatch.  Also included was demonstration of P6 “before” as a sort of typeless/multi-type comparison infix. Damian then returned to his old favourite of Quantum Computing.

    My mind and notes got a bit jumbled at this point but I particularly liked the slide that explained how factorisation could work by observing the product of possible inputs since this led to a collapse that revealed the factors.  To do this on RSA etc., of course, needs real hardware support which probably only the NSA and friends have (?). Damian’s code examples are at http://www.bit.do/Perl6SOG with  an earlier version of his talk at https://www.youtube.com/watch?v=Nq2HkAYbG5o Around this point there was a road race of classic cars going on outside up the main road into the village and there were car noises in the background that strangely were more relaxing than annoying.

    File_000

    After Quantum Chaos Paul Johnson brought us all back down to ground with an excellent practical talk on modernising legacy Perl 5 applications based on his war stories. Hell, of course, is “Other People’s Code”, often dating from Perl’s early days and lacking documentation and sound engineering.

    Often the original developers had long since departed or, in the worse cases, were still there.  Adding tests and logging (with stack traces) were particularly useful. As was moving to git (although its steep learning curve meant mentoring was needed) and handling CPAN module versioning with pinto.  Many talks had spoken of the Perl 6 future whereas this spoke of the Perl 5 past and present and the work many of us suffer to pay the bills. It’s at https://www.youtube.com/watch?v=4G5EaUNOhR0

    File_000 (1)

    Jonathan then spoke of reactive distributed software.  A distributed system is an async one where “Is it working?” means “some of it is working but we don’t know which bits”.  Good OO design is “tell don’t ask” — you tell remote service to do something for you and not parse the response and do it yourself thus breaking encapsulation.  This is particularly important in building well designed distributed systems since otherwise the systems are less responsive and reliable.  Reactive (async) works better for distributed software than interactive (blocking or sync).

    We saw a table that used a Perl 6 promise for one value and a supply for many values for reactive (async) code and the equivalent (one value) and a Perl 6 Seq for interactive code. A Supply could be used for pub/sub and the Observer Pattern. A Supply could either be live (like broadcast TV) or, for most Perl 6 supplies, on-demand (like Netflix). Then samples of networking (socket) based code were discussed including a web client, web server and SSH::LibSSH (async client bindings often very useful in practical applications like port forwarding)

    https://github.com/jnthn/p6-ssh-libssh

    Much of the socket code had a pattern of “react { whenever {” blocks with “whenever” as a sort of async loop.He then moved on from sockets to services (using a Supply pipeline) and amazed us by announcing the release of “cro”, a microservices library that even supports HTTP/2 and Websockets, at http://mi.cro.services/.  This is installable using Perl 6 by “zef install –/test cro”.

    Slides at http://jnthn.net/papers/2017-spw-sockets-services.pdf and video at https://www.youtube.com/watch?v=6CsBDnTUJ3A

    Next Lee showed Burp Scanner which is payware but probably the best web vulnerabilities scanner. I wondered if anyone had dare run it on ACT or the hotel’s captive portal.

    Wendy did some cheerleading in her “Changing Image of Perl”.  An earlier version is at https://www.youtube.com/watch?v=Jl6iJIH7HdA

    Sue’s talk was “Spiders, Gophers, Butterflies” although the latter were mostly noticeably absent. She promises me that a successor version of the talk will use them more extensively. Certainly any Perl 6 web spidering code is likely to fit better on one slide than the Go equivalent.

    During the lightning talks Timo showed us a very pretty Perl 6 program using his SDL2::Raw to draw an animated square spiral with hypnotic colour cycling type patterns. Also there was a talk by the author about https://bifax.org/bif/— a distributed bug tracking system (which worked offline like git).

    Later in the final evening many of us ate and chatted in another restaurant where we witnessed a dog fight being narrowly averted and learnt that Wendy didn’t like Perl 5’s bless for both technical and philosophical reasons.


    p6steve: perl6 Module How To

    Published by p6steve on 2017-08-18T16:29:28

    Some investigation has discovered great resources on how to write and then list a perl6 module….


    p6steve: Physics::Unit in perl6

    Published by p6steve on 2017-08-18T16:23:07

    First and foremost, homage to the original authors of Physics::Unit and related perl5 CPAN modules. I would be honoured to hear from you and to collaborate in any way.

    What’s the big picture? TOP down, I have in mind:

    So, that said, given my poor state of knowledge of most of these things, my thinking is to start build BOTTOM up and see the shape of things that emerges, while learning some perl6 on the way.

    So, first I am going to need some MEASUREMENTS which are VALUES expressed in UNITS with associated ERROR.

    I took a gander at this CPAN module Physics::Udunits2 which is a perl5 interface to udunits2 and felt that the richness of it’s units and adherence to NIST guidelines were not of sufficient benefit to overcome my sense of incoherence.

    So, to cut a long story short, decided to take inspiration from Physics::Unit .

    Next, I needed some guidance on How to Build a perl6 module…


    p6steve: perl6, really?

    Published by p6steve on 2017-08-18T15:19:54

    I have been waiting for perl6 for over 15 years since it was first conceived. Recently, I have had an urge to get back to hands’ on coding and, having seen the latest Rakudo* release of perl6 felt that it is now sufficiently mature for my nefarious purposes.

    No doubt I am not the only person to have been frustrated by the slow progress of perl6, and certainly many have dropped by the wayside. Perhaps going to the siren call of Python(2 or 3), Ruby, Swift or Go. And now it is finally here, the community is obviously worried that no one will adopt the fruits of their work.

    Here ‘zoffix’ makes a desperate plea to change the name from ‘perl6’ to ‘rakudo’ to reboot the brand…. https://www.reddit.com/r/perl6/comments/6lstq3/the_hot_new_language_named_rakudo/

    My rebuttal to this concept is reproduced here.

    I’m a slow thinking guy who has had two careers:- perl5 dev and marketing manager. I have been wrestling with Zoffix’ proposed change and it is certainly a well construed argument made with feeling. Here’s my line:

    So, I detect some natural frustration within and without the community. Keep the faith. We have a new audience, they value truth and beauty, and a story of battling the odds. They need this TECHNOLOGY. It is [perl6 | rakudo] – who cares. It may have to start in academia, it may have to start in the p5 stalwarts. It will ignite. Finish the journey. Do not deny your heritage.

    So, yes, I think that perl6 is awesome (whatever it’s called). And I believe that it will be an interesting personal journey to come to grips with the ultimate programming power tool and to deliver something interesting on the way. As a user and sometime low level contributor perhaps.

    ~p6steve


    Perl 6 Maven: Printing to Standard Error in Perl 6

    Published by szabgab

    Perlgeek.de: My Ten Years of Perl 6

    Published by Moritz Lenz on 2017-08-08T22:00:01

    Time for some old man's reminiscence. Or so it feels when I realize that I've spent more than 10 years involved with the Perl 6 community.

    How I Joined the Perl 6 Community

    It was February 2007.

    I was bored. I had lots of free time (crazy to imagine that now...), and I spent some of that answering (Perl 5) questions on perlmonks. There was a category of questions where I routinely had no good answers, and those were related to threads. So I decided to play with threads, and got frustrated pretty quickly.

    And then I remember that a friend in school had told me (about four years earlier) that there was this Perl 6 project that wanted to do concurrency really well, and even automatically parallelize some stuff. And this was some time ago, maybe they had gotten anywhere?

    So I searched the Internet, and found out about Pugs, a Perl 6 compiler written in Haskell. And I wanted to learn more, but some of the links to the presentations were dead. I joined the #perl6 IRC channel to report the broken link.

    And within three minutes I got a "thank you" for the report, the broken links were gone, and I had an invitation for a commit bit to the underlying SVN repo.

    I stayed.

    The Early Days

    Those were they wild young days of Perl 6 and Pugs. Audrey Tang was pushing Pugs (and Haskell) very hard, and often implemented a feature within 20 minutes after somebody mentioned it. Things were unstable, broken often, and usually fixed quickly. No idea was too crazy to be considered or even implemented.

    We had bots that evaluated Perl 6 and Haskell code, and gave the result directly on IRC. There were lots of cool (and sometimes somewhat frightening) automations, for example for inviting others to the SVN repo, to the shared hosting system (called feather), for searching SVN logs and so on. Since git was still an obscure and very unusable, people tried to use SVK, an attempt to implement a decentralized version control system on top of of the SVN protocol.

    Despite some half-hearted attempts, I didn't really make inroads into compiler developments. Having worked with neither Haskell nor compilers before proved to be a pretty steep step. Instead I focused on some early modules, documentation, tests, and asking and answering questions. When the IRC logger went offline for a while, I wrote my own, which is still in use today.

    I felt at home in that IRC channel and the community. When the community asked for mentors for the Google Summer of Code project, I stepped up. The project was a revamp of the Perl 6 test suite, and to prepare for mentoring task, I decided to dive deeper. That made me the maintainer of the test suite.

    Pet Projects

    I can't recount a full history of Perl 6 projects during that time range, but I want to reflect on some projects that I considered my pet projects, at least for some time.

    It is not quite clear from this (very selected) timeline, but my Perl 6 related activity dropped around 2009 or 2010. This is when I started to work full time, moved in with my girlfriend (now wife), and started to plan a family.

    Relationships

    The technologies and ideas in Perl 6 are fascinating, but that's not what kept me. I came for the technology, but stayed for the community.

    There were and are many great people in the Perl 6 community, some of whom I am happy to call my friends. Whenever I get the chance to attend a Perl conference, workshop or hackathon, I find a group of Perl 6 hackers to hang out and discuss with, and generally have a good time.

    Four events stand out in my memory. In 2010 I was invited to the Open Source Days in Copenhagen. I missed most of the conference, but spent a day or two with (if memory serve right) Carl Mäsak, Patrick Michaud, Jonathan Worthington and Arne Skjærholt. We spent some fun time trying to wrap our minds around macros, the intricacies of human and computer language, and Japanese food. (Ok, the last one was easy). Later the same year, I attended my first YAPC::EU in Pisa, and met most of the same crowd again -- this time joined by Larry Wall, and over three or four days. I still fondly remember the Perl 6 hallway track from that conference. And 2012 I flew to Oslo for a Perl 6 hackathon, with a close-knit, fabulous group of Perl 6 hackers. Finally, the Perl Reunification Summit in the beautiful town of Perl in Germany, which brought together Perl 5 and Perl 6 hackers in a very relaxed atmosphere.

    For three of these four events, different private sponsors from the Perl and Perl 6 community covered travel and/or hotel costs, with their only motivation being meeting folks they liked, and seeing the community and technology flourish.

    The Now

    The Perl 6 community has evolved a lot over the last ten years, but it is still a very friendly and welcoming place. There are lots of "new" folks (where "new" is everybody who joined after me, of course :D), and a surprising number of the old guard still hang around, some more involved, some less, all of them still very friendly and supportive

    The Future

    I anticipate that my family and other projects will continue to occupy much of my time, and it is unlikely that I'll be writing another Perl 6 book (after the one about regexes) any time soon. But the Perl 6 community has become a second home for me, and I don't want to miss it.

    In the future, I see myself supporting the Perl 6 community through infrastructure (community servers, IRC logs, running IRC bots etc.), answering questions, writing a blog article here and there, but mostly empowering the "new" guard to do whatever they deem best.

    samcv: Grant Status Update 3

    Published on 2017-08-06T07:00:00

    This month I accomplished a lot. We now support full Unicode 9.0/Emoji 4.0 text segmentation (putting codepoints into graphemes), some really awesome concatenation improvements (these are both in master). Also a fully functional Unicode Collation Algorithm (this not yet merged into master)

    Table of Contents

    Unicode Collation Algorithm

    In my collation-arrays branch I now have implemented a fully working Unicode Collation Algorithm. Only 82/190,377 of the Unicode collation tests are failing (99.956% passing).

    What I did:

    • Collation keys that need to be generated on the fly are working

    • Characters that need to be decomposed for collation (mainly Hangul characters) are working

    • The script that generates the linked list (used for codepoints with more than one collation array in them or for sequences of more than one codepoint with collation keys) was rewritten and I have not discovered any issues with the data

    I was also able to properly keep the ability to adjust each of the collation levels (reversing or disabling primary, secondary or tertiary levels).

    Since primary > secondary > tertiary for all collation elements we are able to have a default array:

    {-1, 0, 1} /* aka Less, Same, More */

    This is the default array that is used in case we are comparing between different levels (primary vs secondary for instance).

    If we are comparing between the same level, we use a 3x3 array which holds the return values based on whether the collation values are either Less, More or Same. This is the array that is customized to allow you to reverse or disable different collation levels.

     {-1, 0, 1}, {-1, 0, 1}, {-1, 0, 1}
    /*  primary,  secondary,   tertiary */

    Below, pos_a and pos_b track moving between keys on the collation stack while level_a and level_b track comparing the levels between each other. For more information about how this works, see my [previous grant status update](/perl6/grant-status-update-2).

    if (level_a == level_b)
        effective_level_eval = level_eval_settings.a[level_a];
    else
        effective_level_eval = level_eval_default
    
    rtrn =
    stack_a.keys[pos_a].a[level_a] < stack_b.keys[pos_b].a[level_b] ?  effective_level_eval.s2.Less :
    stack_a.keys[pos_a].a[level_a] > stack_b.keys[pos_b].a[level_b] ?  effective_level_eval.s2.More :
                                                                       effective_level_eval.s2.Same ;

    Need to do:

    • Either use MVM_COLLATION_QC property to determine whether to look up a character in the linked list or instead store the index of the value of the main node in the Unicode database instead

      • Currently it checks all of the main nodes before then reverting to the values stored in the Unicode Database or based on generated values

    • Investigate country/language selective sorting

    Other than needing to improve the speed of finding the main nodes, the implementation is feature complete.

    The currently failing tests mostly relate to codepoints whose collation values are different in NFD form compared to NFC form. This happens if there are combining accent/extending characters after it, which would be reordered in NFD form. As the number of tests failing is quite low and there is only a very slight variation in sorting order because of this, I find it acceptable in its current form. I may find a work around to fix it, but since 99.956% of the tests pass and the incorrectness is only a slight variation of sorting.

    Text Segmentation/Normalization

    Emoji Fixes

    Improvements to segmentation of Emoji w/ GCB=Other

    Not all Emoji Modifiers have Grapheme_Cluster_Break = E_Base or E_Base_GAZ. In these cases we need to check the Emoji_Modifier_Base property. This fixed 25 more emoji than before.

    Commit: 4ff2f1f9

    Don’t break after ZWJ for Emoji=True + GCB=Other

    With this change we are now counting 100% of the Emoji v4.0 emoji as a single grapheme. Since ASCII numbers and the pound symbol are also counted as Emoji, we disregard any codepoints in the ASCII range. I updated the MoarVM Readme to show we have full Unicode 9.0/Emoji 4.0 text segmentation support!

    Other Improvements

    Avoid property lookups for Hangul and just use the Grapheme_Cluster_Break property.

    Lookup Canonical_Combining_Class as the raw integer property value instead of having to get the string and then converting to an integer. Even though the numbers we get back are not identical to the actual CCC, when doing the Unicode algorithms we only care about relative CCC. This results in lower CPU usage during normalization.

    Fix cmp/leg to support deterministic comparing of synthetics

    Fixes the cmp/leg operator to have deterministic results when the string contains a synthetic. Previously it compared based on the value the synthetic was stored as, which is based on which synthetic was allocated first. This makes it so cmp/leg always have the same result for the same string, no matter the order the synthetics are allocated. It also makes it so the codepoints the synthetic is made up of are compared instead of the value of the synthetic (whose value has no relation to the codepoints in the synthetic).

    Previously we compared naïvely by grapheme, and ended up comparing synthetic codepoints with non-synthetics. This would cause synthetics to be sorted incorrectly, in addition to it making comparing things non-deterministic; if the synthetics were added in a different order, you would get a different result with MVM_string_compare.

    Now, we compare down the string the same way we did before, but if the first non-matching grapheme is a synthetic, we iterate over the codepoints in the synthetic, so it returns the result based on codepoint. If the two non-matching graphemes contain the same initial codepoints but different in length, the grapheme with less codepoints in it is determined as less than the other.

    Concatenation Improvements

    When we concatenate strings normally, we create a new string which contains references to the string or strands of string a and the string or strands of string b.

    Previously if there were any of the following conditions for the last grapheme of string a (last_a) or the first grapheme of string b (first_b), we would renormalize string a and string b in their entirety:

    • last_a or first_b were synthetics (contained multiple codepoints in one grapheme) other than the CRLF synthetic

    • Didn’t pass NFG Quickcheck

    • Had a non-zero Canonical_Combining_Class

    The first improvement I made was to use same the code that we use during normalizations (should_break) to find out if two adjacent codepoints should be broken up or if they are part of the same grapheme (user visible character unit). This first improvement had a very big effect since previously we were renormalizing even if renormalization would not have caused there to be any change in the final output. This change reduced the amount we had to do full renormalization by about 80% in my tests of various different languages' scripts.

    The second change I made was more technically challenging. While the first change reduces how often we would do full renormalization, the second change’s goal was to only renormalize a small section of string a and string b instead of string a and b in their entirety.

    With the new code, we either copy the strands or set string a as the first strand of the new string (as we did previously). Then the should_break function we use for checking if we break between two codepoints or not is used. If we break between last_a and first_b we continue as I explained at the very start of this section. If should_break returns that we need to combine the two codepoints, then we create a renormalized_section strand for those two graphemes. Since last_a is now inside the renormalized_section, we adjust the reference to string a (similar to how .substr creates a substring) or if string a was made up of multiple strands, we adjust the last strand of string a.

    We then insert the renormalized section. String b or its strands are then set after the renormalized section. The same way we adjusted a, we adjust b or its first strand to be a substring.

    Since it’s possible for the adjusted strings or strands to be fully consumed (i.e. the substring is 0 in length), we "drop" the copied strand or string (for a or b) if the adjusted substring has no more graphemes in it. This currently can only happen if string a or the last strand of string a is a single grapheme or the first strand of string b or string b is a single grapheme.

    Other minor fixes

    Fix overflow in uniprop lookups

    Makes excessively large uniprop queries not overflow into negative numbers, which would cause it to throw. Closes MoarVM issue #566.

    Commit: a3e98692.

    Made string_index 16% faster for incompatible string types

    For incompatible string types use string_equal_at_ignore_case_INTERNAL_loop which results in a 16% speed boost. Only a minor change to the INTERNAL_loop function and it works for non-ignorecase/ignoremark as well as with ignorecase/ignoremark functionality.

    Commit: 161ec639.

    Add nqp::codes op to MoarVM for 3.5x faster doing .codes for short strings

    Added a nqp::codes which gets the number of codepoints in the string. We had nqp::chars to get the number of graphemes already, but for getting the number of codepoints in the string we called .NFC.codes which is 3.5x slower for short strings and 2x slower for longer strings.

    Tests

    Added in tests to make sure that concat is stable when concatenating things which change under normalization. This was done to make sure there were no issues with the concatenation improvement.

    Next Month

    In the next month I need to get the Collation stuff merged and work on fixing some very longstanding bugs in our Unicode Database implementation. I may not be fully changing out our Unicode Database implementation as I had planned, but I will still at least fix the majority of the issues.

    This includes ucd2c.pl not being deterministic and changing every time it is generated. I plan to make it generate the same code on every run. It was originally written to assumes properties and property values are unique. I need to make it so that when the script is generated, all of the properties and values are functional. Currently, when the script is regenerated, it is random which property values are functional and which are not; depending on which ends up last in the hash initializer array (since the property values where occur last overwrite the identical names which appear first).

    6guts: MoarVM Specializer Improvements Part 1: Gathering Data

    Published by jnthnwrthngtn on 2017-08-06T00:34:01

    Over the last weeks I’ve had the chance to work full time on Perl 6, and have dedicated this time to improving the MoarVM specializer. Since “specializer” is such a lot to type, but shortening it to “spec” would result in it being confused with specification (or perhaps bacon), we refer to it as “spesh”. But what is it?

    Specialization

    Specialization is the process of taking code with lots of late-binding in it, and producing one or more versions of the code with as much of that stripped out as we can. We take late binding in a very broad sense, really using it to mean any place where we defer a decision until the code is executed (that is, runtime) because we don’t know the context it will be executed in. So, if we write a module with this in it:

    sub shorten($text, $limit) is export {
        $text.chars > $limit
            ?? $text.substr(0, $limit) ~ '...'
            !! $text
    }
    

    Then this code isn’t committed to the types of $text and $limit. Thus, we must:

    Even if we put Str and Int type constraints on the sub, that still isn’t really enough: these could be subclassed (for example, by mixing in to them) and thus override the methods that we call. (Granted in this particular case that’d be a tad odd, but in the general case, it’s not.)

    Even with nice things like method caches and multi-dispatch caches, which do provide a huge speedup, we still must do the cache lookups, call the methods, and so forth. The job of the specializer is to notice what types we actually call shorten with, and the types methods like chars return, and thus strip out the need to even do cache lookups. It can then go further, for example inlining the body of the small chars method (and maybe substr too) into the caller so that there will be no invocation overhead for them at all!

    Before we go any further, a quick thank you

    This work is funded by my current TPF grant, which was approved some weeks back (so I’ve already been working away at it), though only recently publicly announced due to an internal TPF procedure that had to be completed first. So, thanks go to TPF for administering the funding, and to those who have donated to TPF and the Perl 6 Core Development Fund for providing the funding.

    A little series

    I could just write up a list of the improvements I’ve done, but I figure that many readers here don’t know a great deal about spesh. So instead, I will write a series of posts working through the way spesh works today, after my changes, together with some notes on how it used to work and why that changed. Hopefully that will be a more interesting and useful read.

    What to optimize?

    When I’ve talked people through how spesh works in the past, I’ve tended to start by discussing how the bytecode is turned into a control flow graph in single static assignment form, and then we go on and do the various bits of analysis and transformation. We’ll get to that, but in fact there has always been a step before it.

    Constructing a CFG and putting it in SSA form takes time; transforming it by doing a load of optimizations takes more. How much depends on the size and structure of the code. So we need to decide what code to spend time on. This was done in two ways.

    The first was just incrementing a counter every time the sub, block, method, or regex was called. If the count passed a threshold (which was chosen based upon the size of the bytecode) then an optimized version of it would be produced. Easy enough.

    Trouble is that this doesn’t catch an important case: that where we have a program that spends most of its lifetime in a loop. In that case, we only enter the block holding the loop once, which is never going to trigger optimization. Thus we also count the number of iterations of loops. If that counter hits a high enough value, then we produce the optimized code and replace the running code with it, potentially moving from the interpreter into machine code. This replacement of code with an optimized version is known as “on stack replacement”, because it’s replacing code already running on the call stack).

    Gathering data

    Just throwing bytecode representing a highly dynamic program into an optimizer won’t achieve much, however. The cost is in the late binding, and to remove that we need an idea of what kinds of things are showing up when the code is run. This data came from two sources.

    The first source was the incoming parameter types. When code was determined as hot, then the shape of the callsite (how many arguments, which named arguments) and the types of the arguments were taken and turned into a “key” for the specialization. These could then be assumed inside of the optimizer.

    This was not enough on its own to produce good optimizations, however, because a lot of data comes from attribute accesses, lexical accesses, and the return values of things the code calls. Therefore, before doing any significant work, spesh would go through the code and insert logging instructions. These would record the values that were observed. After 8 runs, the specialization process would continue, using this data.

    Bad decisions and missing information

    This worked relatively well, but had notable shortcomings. The main one concerns code that is highly polymorphic in nature – that is, it is called with many different types. In this case, whichever type it was called with on, say, its 100th call, would have a specialization produced for it. If the next call used a different type, another specialization would be produced for that. And so forth, up until we had four specializations, which was the number picked as the limit.

    This meant that things like defined or sink, which are called very often but on a lot of different types, would often go unoptimized and un-JITted. Worse, the way inlining works is to inline any existing specialization. Inlining all the defined and sink calls would be ideal. But in reality, for any non-trivial program, that would not happen in most cases. Clearly, if only the specializer were able to take a step back, it could see this situation and do something smarter. But the data just wasn’t there to do it.

    The 8 samples taken when logging could also get lucky or unlucky about what values they saw. If a value returned from a method is a certain type over 99% of the time, and less than 1% of the time it is something else, then it makes sense to stick a guard in, optimize for the 99% case, and let the guard fail and trigger deoptimization for the less than 1% of the cases. (A deoptimization is like the opposite of OSR: we swap the optimized version of the code for the slower one that can handle all of the cases, and let it take care of the rare cases).

    With only 8 values, however, 1 of them being different suggests we might have to deoptimize over 10% of the time. Deoptimization takes a bit of time to do, but leaves us interpreting the code with a bunch of late binding. The stakes are too high if all the data can tell us is that there’s a 1 in 8 chance.

    Another problem in the data was a lack of knowing about the stability of the types on the callee side of a callsite. We relied on being able to infer them based on things we did know, or had inserted checks to guard against. However, what if we were just one more guard clause away from being able to inline something? Previously we could never determine that, so had to miss out on the opportunity. With better data, it would be possible to do better.

    Interruptions and stampedes

    Spesh needed to interrupt the execution of the program twice for each frame it wished to optimize: once to parse the bytecode into the SSA CFG, insert the logging and produce a logged version of the code, and again after the logging to produce an optimized version of the code. This introduced pauses into the execution. So, every time spesh makes a program run faster, it has to do so sufficiently to overcome the time it stole to do its optimization work. And if it speeds something up that doesn’t get run much in the future, it can make the program slower overall.

    This is pretty poor use of the parallel hardware available in pretty much all computers nowadays. Better would be to spend time doing the optimization work on a thread separate from the execution of the user’s program. This not only gets rid of the pauses; it also means that if the work is completed by the interpreter before the optimized version is ready then we didn’t slow the program down by stopping it to do optimization work that was never going to pay off anyway.

    Having a thread taking care of specialization work can also resolve another problem. Imagine that a bunch of threads are set off executing the same code on a range of different data items (so, data parallelism). In this case, more than one thread could see that the code is hot, notice there’s not yet any specialization, and get to work on producing it. There was detection of any duplicate ones at installation time, but that still meant there was wasted work producing specializations. If 3 different threads wasted their time this way, then the amount of time before the specializer had paid for itself was also increased.

    A new approach to data collection

    Clearly, there was plenty of room for improvement. It was fairly clear that having a better data model representing the execution of the program, which the specialier would use to make smarter decisions, was a key part of this. At the same time, it was important that threads executing user code didn’t spend much of their time building this. Ideally, they would just throw data into a sequential buffer, and toss it over to another thread – a specializer thread – when it was full.

    So, that’s the direction I headed it. Each thread would log interesting events into a thread local buffer. The append-only and thread local (for writes) nature of it should make it fairly cache friendly. It was desirable that the entries were fairly small; the main place this had an impact was parameter type logging, where we both wish to log the container type and the type held inside of the container. This was taken care of by just writing two entries, rather than all the rest being padded out (fixed size entries weren’t all that important, I guess, but it did make things easier).

    I converged on this set of events in the log. It’s 24 bytes per entry, which isn’t too bad. One early mistake I made was logging invokes. I was keen that we start being able to inline calls to closures in the future, when it’s always the same code we call but with a different environment. So I logged the invoked code object, figuring the specializer thread could then pick the interesting parts out. This turned out to quite notably extend the lifetime of the closed over data, however. As the things we cared about were what code was invoked and whether the calling frame was also the outer frame, it was quite cheap to just extract those at the point of writing the log anyway.

    The end result only logs references to static frames, types objects, and code objects where we’re told the result of the lookup can be cached because it will always be the same. It never logs values. This fixes an issue in the previous logging mechanism: during its 8 runs it would keep values alive for longer, and – much worse – if the 8 runs were never completed, it could keep them alive indefinitely. It was a bounded leak, but certainly one I’m glad to no longer have.

    A hugely important part of the log is the correlation ID. This is a per-thread incrementing counter. It is bumped each time a frame is entered with logging turned on. It is then included in all events occurring within the execution of that frame. This allows the specializer thread to work out what events go with what code.

    The spesh worker thread

    The spesh worker thread sits in a loop, reading a log from a blocking queue (meaning it waits in an efficient way, and – once the program has been fully specialized – just never gets woken up again). Once a thread running code fills up a log buffer, it sticks it into the blocking queue, the spesh worker receives it, and it gets to work. I’ll go through the various steps it takes later in this series; for today I’ll focus only on the first one (updating the statistics model, below) and what it does before going to wait for another data buffer.

    To prevent runaway memory use if the threads running code are producing logs way faster than the specializer can make use of them, threads are given a quota of log buffers they are allowed to send. On sending, they decrement the quota. If it’s still greater than zero, then they allocate another log buffer and continue logging. Otherwise, logging is disabled for that thread. Once the specialization worker thread is done processing a buffer and acting upon its content, it increments the quotas. If it incremented it from zero, then it also installs a spesh log buffer for the thread, so it will continue logging.

    Nailing the main loop

    This scheme almost worked out fine, but there was an awkward problem. If we weren’t logging at the point that the outermost frame of the program started running (perhaps because we were specializing bits of the compiler still), then the outermost body of the program wouldn’t get a correlation ID, and so wouldn’t log any events. This would be especially unfortunate, since the first thing people measuring performance tend to do is write a hot loop in the mainline of a program! (More usefully, any perl6 -ne ... invocation has the same kind of program structure.)

    Thankfully, there was an easy way to deal with this: when we enter a new compilation unit for the first time, if there is no spesh log, then grant a temporary quota boost. Alternatively, if the log is almost full, then we send it off and then make a new one (with a quota boost if needed). This boost is temporary.

    Every heuristic written with good intentions can go rogue, and this one is no exception: imagine a program doing a load of EVALs. So, there is a fixed limit on how many times this quota boosting can happen, to ensure it handles the cases it is aimed at but doesn’t do harm.

    Building a model

    Over on the specialization thread, the events are fed into a simulation that recreates something much like the call stack of the logged program, so it can understand the relationships between callers and callees. This produces a set of statistics, hung off each invoked piece of code (know as a “static frame” in MoarVM); doing it this way has the advantage that the statistics will be garbage collected should the code also be garbage collected (this can matter in EVAL-heavy programs).

    One of the things I like most about this new approach is that, with the MVM_SPESH_LOG environment variable set to the name of a file to log to, the assembled statistical data will be dumped. This makes it possible to see the data being used to make decisions. Enough with me describing stuff, though: let’s see the stats! Here’s an example program:

    sub shorten($text, $limit) is export {
        $text.chars > $limit
            ?? $text.substr(0, $limit) ~ '...'
            !! $text
    }
    for ^10000 {
        shorten 'foo' x 100, (20..500).pick;
    }
    

    Here’s the statistics output after a while for the shorten sub:

    Latest statistics for 'shorten' (cuid: 1, file: xxx.p6:1)
    
    Total hits: 156
    
    Callsite 0x7fdb8a1249e0 (2 args, 2 pos)
    Positional flags: obj, obj
    
        Callsite hits: 156
    
        Maximum stack depth: 13
    
        Type tuple 0
            Type 0: Str (Conc)
            Type 1: Int (Conc)
            Hits: 156
            Maximum stack depth: 13
            Logged at offset:
                226:
                    156 x type Int (Conc)
                    156 x static frame 'chars' (4197) (caller is outer: 0)
                    156 x type tuple:
                        Type 0: Scalar (Conc) of Str (Conc)
                260:
                    156 x type Bool (Conc)
                    1 x static frame 'infix:«>»' (2924) (caller is outer: 0)
                    155 x static frame 'infix:«>»' (3143) (caller is outer: 0)
                    156 x type tuple:
                        Type 0: Int (Conc)
                        Type 1: Scalar (Conc) of Int (Conc)
                340:
                    91 x type Str (Conc)
                    91 x static frame 'substr' (2541) (caller is outer: 0)
                    91 x type tuple:
                        Type 0: Scalar (Conc) of Str (Conc)
                        Type 1: Int (Conc)
                        Type 2: Scalar (Conc) of Int (Conc)
                382:
                    91 x type Str (Conc)
                            91 x static frame 'infix:<~>' (4223) (caller is outer: 0)
                            91 x type tuple:
                        Type 0: Str (Conc)
                        Type 1: Str (Conc)
    
    Static values:
        - Sub+{<anon|77645888>} (0x2a70d48) @ 194
        - Sub+{<anon|77645888>}+{Precedence} (0x23a90c0) @ 288
    

    From this we can see:

    At compile time it was also worked out that some lexical lookups will always resolve to the very same object; the results of those have been logged also (the static values section), so the optimization process can elide the lookups and do further optimizations by knowing exactly what will be called. (The lookups in question are for the > and ~ operations, which are lexical, but in this case not overridden in a nested scope and so will come from the CORE setting.)

    Here’s another example, this time for chars:

    Latest statistics for 'chars' (cuid: 4197, file: SETTING::src/core/Str.pm:2728)
    
    Total hits: 157
    
    Callsite 0x7fdb8a124a00 (1 args, 1 pos)
    Positional flags: obj
    
        Callsite hits: 157
    
        Maximum stack depth: 33
    
        Type tuple 0
            Type 0: Str (Conc)
            Hits: 1
            Maximum stack depth: 33
    
        Type tuple 1
            Type 0: Scalar (Conc) of Str (Conc)
            Hits: 156
            Maximum stack depth: 14
    

    Here we can see that it was called once with a Str value, but a bunch of times with a Scalar holding a Str. Therefore, it’s worth producing optimized code for the latter case, but not worth bothering about the former.

    Clearing up

    But what about all of the statistics we record on one-off cold paths, such as at startup? We’ll never do optimizations based on them, and so they’d just sit around using memory. Or what about after we’ve optimized a frame in all the needed ways, and it’s no longer logging new data? We don’t really need the statistics any more.

    To alleviate this, each time a log buffer is received a statistics version number is incremented. The statistics are marked with the current version when updated. Any statistics that are not updated in a while will be presumed to be no longer interesting, and so thrown out.

    Debuggability

    When I first mentioned moving specialization work off to its own thread on #moarvm, brrt (who leads development of the JIT) was immediately concerned about the unpredictability this would introduce. Threads are scheduled a bit differently between runs, which would make it a nightmare to reproduce and hunt down specializer and JIT compiler bugs, including using the bisection approach to work out exactly which specialization made things go wrong. It was an excellent point.

    Therefore, I introduced the MVM_SPESH_BLOCKING environment variable. When set, a thread executing code will send off its log, and then block until the spesh worker thread has finished processing it and installing specializations. This means that, for a single threaded program, the behavior will again be fully deterministic.

    You won’t believe what happens next!

    Alas, that’s all I’m covering this time. Next time, I’ll talk about how the statistics are used.


    Perl 6 Maven: Continuous Integration for Perl 6 modules

    Published by szabgab

    6guts: Shrinking MoarVM call frames

    Published by jnthnwrthngtn on 2017-07-30T19:16:35

    Last week, I did some work to greatly decrease the size of call frames, also know as invocation records, in MoarVM. In theory, a call frame is created whenever a sub, method, regex, or block is entered. In reality, scopes may be flattened away at compile time, which decreases the number of call frames needed. Further to that, dynamic optimization at runtime leads to inlining, which has the same result except it can do it with late-bound calls, even with callees in different compilation units. Even with these optimizations (and in part because of their limitations), most programs will still need to create and destroy many call frames over their lifetime, making their setup and tear-down a hot path, and their size a factor in program memory performance.

    The work I’ll describe in this post has been funded by OETIKER+PARTNER AG, who responded to my recent funding call. In fact, they’re funding around 10 hours of work per month over the space of a year, so this will be just the first of a number of posts describing work that they are making possible. So, thanks!

    Background

    The MVMFrame data structure has been there since the very earliest days of MoarVM. My memory is hazy, but I suspect it was among the first dozen data structures that I sketched out when starting to design and implement the VM. Initially, call frames were reference counted and allocated out of a special pool. Now they live either in a per-thread callsite region allocated by incrementing a pointer and deallocated by decrementing the pointer, much like a traditional call stack, or on the heap if they “escape” (as a result of a closure, exception throw, or continuation). At present, that heap promotion is always lazy (so call frames are always born on the call stack).

    Therefore, the size of an MVMFrame impacts:

    Over the years, MVMFrame has grown. Various things were added in support of additional features. However, not all of those things are used by all frames. Additionally, some of them, while used widely, were both quite rarely read and very cheap to compute when they were needed, meaning it was better to just re-calculate them on demand.

    Effective handlers and bytecode

    Two pointers were taken up for storing the effective handlers and bytecode. The bytecode holds the instructions to execute, and the handlers are the regions of that bytecode covered by exception handlers. At first, these fields did not exist in MVMFrame. The bytecode and handlers weren’t properties of a given call, but of the code being called (known as the “static frame” in MoarVM). So why these fields?

    They were introduced with the dynamic optimizer. This produces one or more versions of the frame’s bytecode specialized by callsite and type. The specialized version of the code is selected based upon what the callee passes. Since the specialized bytecode contains different (typically many less) instructions, the code offests covered by exception handlers move also, so we need to use an updated table of those when locating an exception handler.

    It was certainly convenient to hang pointers to these two off the frame. But we could always locate them just by following the spesh_cand pointer in the frame, if it was set, or the static_info pointer if not. And this isn’t something that we needed to do so often that this extra dereference was going to add up. The only common path we needed to do it on was on return. But we’d be losing the instruction to set it in the invocation, so it about balances anyway – and that’s before the considering the memory savings.

    With those gone, MVMFrame shrunk 2 pointers, or 16 bytes in a 64-bit environment.

    Throw address

    When an exception was thrown, MoarVM stored the address in the program that it was thrown at into…a pointer in the currently executing frame. It then referenced the frame from the exception object. And that was pretty much all the throw address field was used for. This was a very easy 8 bytes (64-bit pointer) to win: just store the address in the exception object, where it belongs anyway. D’oh.

    Rarely used things

    Some things are used by just a small handful of frames:

    These added up to 8 pointers and one 16-bit integer. By moving them into an “extras” data structure hung off the frame, and allocated on demand, space equivalent to 7 pointers could be saved off the frame. With 64-bit pointers, that’s 56 bytes of savings for most frames. The CORE.setting compilation ends up with only 6% of frames needing this extra storage space.

    Alignment

    With a 16-bit integer moved off into the extras I realized that, with a little care, we could re-order things, force an enum to only use a single byte, and save another 8 bytes off MVMFrame, simply by not needing some empty wasted padding space in the struct (C compilers insert this to make sure memory reads are aligned to correct boundaries).

    Saving a memset

    That’s 88 bytes of savings, which is around a cache line and a half on a typical CPU. It also means that nearly all of the things left in MVMFrame were being initialized on every invocation as part of the callframe setup. Meaning? That the memset of MVMFrame could go away, at the cost of just inserting a couple of instructions to manually zero or NULL things out (some of them on rarely taken paths).

    But wait, there’s more…

    While I was working on this, and looking at profiles, I noticed that a large number of allocations came from creating a buffer to keep track of which named arguments had been used and which had not (for the sake of error reporting and slurpy argument handling). We allocated a byte array, but only really need a bit per named argument. So, I turned the field into a union of a 64-bit bit field for when there are at most 64 named arguments (which is probably just about every real world use case), and fall back to the old byte array approach otherwise.

    All in all…

    These changes provide some memory use reductions, but more importantly are good for CPU cache locality. They also knock 2% off the number of CPU instructions run during CORE.setting compilation; many other programs should see a similar improvement (how much depending on how much of a hot path invocation is, and how many closures they take).


    rakudo.org: Announce: Rakudo Star Release 2017.07

    Published by Steve Mynott on 2017-07-24T18:36:58

    A useful and usable production distribution of Perl 6

    On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the July 2017 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the July 2017 release is available from https://rakudo.perl6.org/downloads/star/.

    Binaries for macOS and Windows (64 bit) are also available.

    This is the eighth post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

    IMPORTANT: “panda” is to be removed very shortly since it is deprecated. Please use “zef” instead.

    Currently, Star is on a quarterly release cycle and 2017.10 (October) will follow later this year.

    Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

    In the Perl 6 world, we make a distinction between the language (“Perl 6”) and specific implementations of the language such as “Rakudo Perl”.

    This Star release includes release 2017.07 of the Rakudo Perl 6 compiler, version 2017.07 MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

    Note this Star release contains NQP version 2017.07-9-gc0abee7 rather than the release NQP 2017.07 in order to fix the –ll-exception command line flag.

    The Rakudo compiler changes since the last Rakudo Star release of 2017.01 are now listed in “2017.05.md”, “2017.06.md” and “2017.07.md” under the “rakudo/docs/announce” directory of the source distribution.

    Notable changes in modules shipped with Rakudo Star:

    + DBIish: Doc and CI updates
    + doc: Too many to list. p6doc fixed.
    + grammar-debugger: Works again now.
    + p6-io-string: New dep for doc.
    + p6-native-resources: Removed since deprecated and not used by linenoise.
    + panda: Officially deprecate panda in favour of zef.
    + perl6-Test-When: New dep for perl6-pod-to-bigpage.
    + perl6-lwp-simple: Fix breakage due to rakudo encoding refactor.
    + tap-harness6: Replaces deprecated tap-harness6-prove6.
    + zef: Too many to list.

    There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

    + advanced macros
    + non-blocking I/O (in progress)
    + some bits of Synopsis 9 and 11

    There is an online resource at http://perl6.org/compilers/features that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

    In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at rakudobug@perl.org.

    See https://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

    The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org mailing list, or join us on IRC #perl6 on freenode.

    Perlgeek.de: Perl 6 Fundamentals Now Available for Purchase

    Published by Moritz Lenz on 2017-07-21T22:00:01

    After about nine months of work, my book Perl 6 Fundamentals is now available for purchase on apress.com and springer.com.

    The ebook can be purchased right now, and comes in the epub and PDF formats (with watermarks, but DRM free). The print form can be pre-ordered from Amazon, and will become ready for shipping in about a week or two.

    I will make a copy of the ebook available for free for everybody who purchased an earlier version, "Perl 6 by Example", from LeanPub.

    The book is aimed at people familiar with the basics of programming; prior Perl 5 or Perl 6 knowledge is not required. It features a practical example in most chapters (no mammal hierarchies or class Rectangle inheriting from class Shape), ranging from simple input/output and text formatting to plotting with python's matplotlib libraries. Other examples include date and time conversion, a Unicode search tool and a directory size visualization.

    I use these examples to explain subset of Perl 6, with many pointers to more documentation where relevant. Perl 6 topics include the basic lexicographic structure, testing, input and output, multi dispatch, object orientation, regexes and grammars, usage of modules, functional programming and interaction with python libraries through Inline::Python.

    Let me finish with Larry Wall's description of this book, quoted from his foreword:

    It's not just a reference, since you can always find such materials online. Nor is it just a cookbook. I like to think of it as an extended invitation, from a well-liked and well-informed member of our circle, to people like you who might want to join in on the fun. Because joy is what's fundamental to Perl. The essence of Perl is an invitation to love, and to be loved by, the Perl community. It's an invitation to be a participant of the gift economy, on both the receiving and the giving end.

    Perl 6 Maven: LWP::Simple - a simple web client in Perl 6

    Published by szabgab

    Perlgeek.de: The Loss of Name and Orientation

    Published by Moritz Lenz on 2017-07-10T22:00:01

    The Perl 6 naming debate has started again. And I guess with good reason. Teaching people that Perl 6 is a Perl, but not the Perl requires too much effort. Two years ago, I didn't believe. Now you're reading a tired man's words.

    I'm glad that this time, we're not discussing giving up the "Perl" brand, which still has very positive connotations in my mind, and in many other minds as well.

    And yet, I can't bring myself to like "Rakudo Perl 6" as a name. There are two vary shallow reasons for that: Going from two syllables, "Perl six", to five of them, seems a step in the wrong direction. And two, I remember the days when the name was pretty young, and people would misspell it all the time. That seems to have abated, though I don't know why.

    But there's also a deeper reason, probably sentimental old man's reason. I remember the days when Pugs was actively developed, and formed the center of a vibrant community. When kp6 and SMOP and all those weird projects were around. And then, just when it looked like there was only a single compiler was around, Stefan O'Rear conjured up niecza, almost single-handedly, and out of thin air. Within months, it was a viable Perl 6 compiler, that people on #perl6 readily recommended.

    All of this was born out of the vision that Perl 6 was a language with no single, preferred compiler. Changing the language name to include the compiler name means abandoning this vision. How can we claim to welcome alternative implementations when the commitment to one compiler is right in the language name?

    However I can't weigh this loss of vision against a potential gain in popularity. I can't decide if it's my long-term commitment to the name "Perl 6" that makes me resent the new name, or valid objections. The lack of vision mirrors my own state of mind pretty well.

    I don't know where this leaves us. I guess I must apologize for wasting your time by publishing this incoherent mess.

    Perl 6 Maven: MongoDB with Perl 6 on Linux

    Published by szabgab

    Zoffix Znet: The Hot New Language Named Rakudo

    Published on 2017-07-07T00:00:00

    A rose by any other name...

    Perl 6 Maven: Parsing command line arguments in Perl 6 - ARGS - ARGV - MAIN

    Published by szabgab

    samcv: Grant Status Update 2

    Published on 2017-07-04T07:00:00

    This is my second grant progress report for my Perl Foundation grant entitled "Improving the Robustness of Unicode Support in Rakudo on MoarVM".

    I got to working on collation this month. I'm going to explain a bit of how the Unicode Collation Algorithm works.

    Unicode Collation Algorithm

    The collation data for UCA is made up of arrays like so:

    [primary.secondary.tertiary]

    Each one is an integer. Primary is different for different letters, 'a' vs 'z'. secondary is differences such as diacritics, 'a' vs 'á'. And tertiary is case, 'a' vs 'A'. While it works different for non-Latin characters, that is the gist of what they represent. In most cases you have one codepoint mapped to one collation array. Though in many cases, this is not true.

    Single codepoints can map to multiple collation array elements. Sequences of codepoints can also map to one or more than one collation array elements.

    Some sequences also can exist inside others.

    So the string xyz may have one set of collation elements but xy has another, where x y and z are codepoints in a three codepoint sequence with its own set of multiple collation keys.

    So, how do these collation elements translate into sorting the codepoints?

    [.0706.0020.0002], [.06D9.0020.0002]

    You take the two primary values, then append a 0 as a seperator. Then push the secondary, append another 0 as a separator and then push on the tertiary:

    0707, 06D9, 0, 020, 020, 0, 02, 02

    Now this would pose a problem since we would need to traverse the entire string before making any decisions. Instead what I have decided to do is to use the arrays with [primary.secondary.tertiary] and push them onto a stack instead of changing them into a linear progression, and iterate through the primary's, and then grab more collation elements as they are needed to resolve ties.

    Also when collation data for the next codepoint is added to the stack, if it is a starter is a sequence we will also pull the next codepoint going through a linked list stored in C arrays as needed. If the next codepoint ends up not being a part of a sequence we just push the codepoint we just "peeked" at onto the stack as well, so don't have to go back over codepoints.

    Now this improved Unicode Collation Algorithm is not complete, but I am continuing to work on integrating the new C data structure I've created into MoarVM, and it currently works partially, but not as well as the current implementation.

    Improvements to the Current UCA Implementation

    In the meantime I have made improvements to the current implementation of the Unicode Collation Algorithm. Previously it was possible to enable or disable the primary, secondary or tertiary levels. This allowed you to do things such as ignore diacritics when sorting or ignore casing. What you are now able to do is to reverse the sorting of different levels. This allows you to for example sort uppercase letters before lowercase (default UCA sorts lowercase before uppercase, since lowercase < uppercase). It can also let you put diacritic mark containing characters before the ordinary letters. Any of the three levels can be either enabled, disabled, or reversed. For anybody already using it, supplying True or False to set $*COLLATION still works the same as before, but you are now able to supply 1, 0, or -1 to enable, disable or reverse the collation for specific levels.

    Fixes

    Grapheme Cluster Break

    As I said last week I made improvements to the script that tests our breakup of graphemes. Now we have full support for the Prepend property that was added in Unicode 9.0, as well as passing all the tests for regional indicators. The only tests we now don't pass in GraphemeClusterBreakTest.t are a few emoji tests, and I believe we only fail 3 or so of these! The Prepend mark fixes needed us to save more state across parsing the code, as Prepend is different from all other Unicode grapheme break logic in that it comes before not after a base character.

    Igorecase+Ignoremark Regex

    The longstanding bug I mentioned in my previous status report has now been fixed. The bug was in regex when both ignorecase and ignoremark adverbs were used.

    say "All hell is breaking loose" ~~ m:i:m/"All is fine, I am sure of it"/
    # OUTPUT«「All hell is breaking loose」␤» Output before the fix. This should not have matched.
    

    This bug occurred when the entire length of the haystack was searched and all of the graphemes matched the needle.

    If the needle exceeded the length of the haystack past that point, it would erroneously think there was a match there, as it only checked that it matched the whole length of the haystack.

    Would cause 'fgh' to be found in: 'abcdefg'. This only occurred at the very end of the haystack.

    The internal string_equal_at_ignore_case_INTERNAL_loop now returns -1 if there was no match and 0 or more if there was a match at that index.

    This return value provides new information which is 0 if there was a match and some positive integer when the haystack was expanded when casefolding it.

    As explained by my previous post, information about when characters expand when foldcased must be retained.

    This information had been planned to be exposed in some way at a future date, as if we are searching for 'st' inside a string 'stabc', nqp::indexic (index ignorecase) will indicate that it is located at index 0, and in Perl 6 Rakudo it will return 'sta' when it should instead have returned 'st'.

    For now this additional information is only internal and the return values of the nqp::indexic_s and nqp::equatic_s ops have not changed.

    NQP Codepath Problems…

    Previously there were way too many different codepaths handling different variations of no regex adverbs, ignorecase, ignoremark, ignorecase+ignoremark. Problematically each combination had their own codepath. To really solve this bug and improve the code quality I decided to clean it up and correct this.

    In my past work I had already added a nqp::indexic op, so it was time to add another! I added a nqp::indexicim op and a nqp::eqaticim op and was able to reuse most of the code and not increase our code burden much on the MoarVM side, and greatly reduce the possibility for bugs to get in on varying combinations of ignorecase/ignoremark ops.

    This is was a very longstanding Unicode bug (I don't think both adverbs together ever worked) so it's great that it is now fixed :).

    Coming Up

    I will be continuing to fix out the issues in the new Unicode Collation Algorithm implementation as I described earlier in this post. I also plan on taking stock of all of the current Grapheme Cluster Break issues, which only exist now for certain Emoji (though the vast majority of Emoji work properly).

    I will also be preparing my talks for the Amsterdam Perl conference as well!

    Sidenote

    I released a new module, Font::QueryInfo which allows you to query font information using FreeType. It can even return the codepoints a font supports as a list of ranges!

    6guts: Optimizing reading lines from a file

    Published by jnthnwrthngtn on 2017-07-02T15:57:43

    Reading lines from a file and processing them one at a time is a hugely common scripting task. However, to date our performance at this task has been somewhat underwhelming. Happily, a grateful Perl 6 fan stepped up in response to my recent call for funding, offering 25 hours of funding to work on whatever I felt was most pressing, but with a suggestion that perhaps I could look at some aspect of I/O performance. Having recently been working on refactoring I/O anyway, this was very timely. So, I crafted a benchmark and dug in.

    The benchmark and a baseline

    Perl 5 is considered to have very good I/O performance, so I figured I’d use that as a rough measure of how close Perl 6 was to performing well at this task. A completely equivalent benchmark isn’t quite possible, but I tried to pick something representative of what the average programmer would write. The task for the benchmark was to take a file with one million lines, each having 60 characters, loop over them, and add up the number of characters on each line. That number would then be printed out at the end (it’s important that benchmarks calculating results return or consume the result in some way, as a sufficiently smart optimizer may otherwise manage to eliminate the work we think we’re measuring). The rules were that:

    The Perl 5 benchmark for this came out as follows:

    perl -e 'open my $fh, "<:encoding(UTF-8)", "longfile";
             my $chars = 0;
             while ($_ = <$fh>) { chomp; $chars = $chars + length($_) };
             close $fh;
             print "$chars\n"'
    

    With the Perl 6 one looking like this:

    perl6 -e 'my $fh = open "longfile";
              my $chars = 0;
              for $fh.lines { $chars = $chars + .chars };
              $fh.close;
              say $chars'
    

    I’ll note right off that in Perl 6 there are likely ways, today, to do a bit better. For example, the $chars variable could be given a native int type, and it’s entirely possible that a while loop might come out faster than the for loop. Neither of those are representative of what a typical programmer looking at the documentation and diving in to implementing stuff would do, however. I suspect that Perl 5 experts could similarly point out some trick I’ve missed, but I’m trying to benchmark typical use.

    One slight unfairness is that the Perl 6 solution will actually count the number of grapheme clusters, since strings are at grapheme level. This entails some extra processing work, even in the case that there are no multi-codepoint clusters in the input file (as there were not in this case). But again, the average user making comparisons won’t much care for such technicalities.

    All measurements were made on modern hardware with an Intel Xeon 6-core CPU and a fast SSD, and on Linux.

    At the point I started work, the Perl 6 solution clocked in at 2.87s, to just 1.13s for Perl 5. This made Perl 6 a factor of 2.5 times slower.

    First hints from the profile

    The whole I/O stack recently got a good overhaul, and this was the first time I’d looked at a profile since that work was completed. Looking at the output from --profile immediately showed up some rather disappointing numbers. Of all callframes, 57.13% were JIT-compiled. Worse, basically nothing was being inlined.

    At this point, it’s worth recalling that Perl 6 is implemented in Perl 6, and that there’s quite a bit going on between the code in the benchmark and ending up in either things implemented in C or a system call. The call to lines returns an Iterator object. Reading a line means calling the pull-one method on that Iterator. That in turn calls the consume-line-chars method on a $!decoder object, and that method is what actually calls down to the VM-backed decoder to read a line (so there’s a level of indirection here to support user provided decoders). The return value of that method then has defined called on it to check we actually got a line back. If yes, then it can be returned. If not, then read-internal should be called in order to fetch data from the file handle (given buffering, this happens relatively rarely). Running the loop body is a further invocation, passing the read line as a parameter. Getting the chars count is also a method call (which, again, actually calls down to the VM guts to access the string’s grapheme count).

    That’s quite a lot of method calling. While the VM provides I/O, decoding, and finding input up to a separator, the coordination of that whole process is implemented in Perl 6, and involves a bunch of method calls. Seen that way, it’s perhaps not surprising that Perl 6 would come in slower.

    There are, however, things that we can do to make it fast anyway. One of them is JIT compilation, where instead of having to interpret the bytecode that Perl 6 is compiled in to, we further translate it into machine code that runs on the CPU. That cuts out the interpreter overhead. Only doing that for 57% of the methods or blocks we’re in is a missed opportunity.

    The other really important optimization is inlining. This is where small methods or subroutines are taken and copied into their callers by the optimizer. This isn’t something we can do by static analysis; the point of methods calls is polymorphism. It is something a VM doing dynamic analysis and type specialization can do, however. And the savings can be rather significant, since it cuts out the work of creating and tearing down call frames, as well as opening the door to further optimization.

    The horrors in the logs

    There are a couple of useful logs that can be written by MoarVM in order to get an idea of how it is optimizing, or failing to optimize, code. The JIT log’s main point of interest for the purpose of optimization is that it can indicate why code is not being JIT-compiled – most commonly because it contains something the JIT doesn’t know about. The first thing in this case was the call into the VM-backed decoder to extract a line, which was happily easily handled. Oddly, however, we still didn’t seem to be running the JIT-compiled version of the code. Further investigation uncovered an unfortunate mistake. When a specialized version of a method calls a specialized version of another method, we don’t need to repeat the type checks guarding the second method. This was done correctly. However, the code path that was taken in this case failed to check if there was a JIT-compiled version of the target rather than just a specialized bytecode version, and always ran the latter. I fixed that, and went from 57.13% of frames JIT-compiled to 99.86%. Far better.

    My next point of investigation is why the tiny method to grab a line from the decoder was not being inlined. When I took a look at the post-optimization code for it, it turned out to be no surprise at all: while the logic of the method was very few instructions, it was bulked out by type checking of the incoming arguments and return values. The consume-line-chars method looks like this:

    method consume-line-chars(Bool:D :$chomp = False, Bool:D :$eof = False --> Str) {
        my str $line = nqp::decodertakeline(self, $chomp, $eof);
        nqp::isnull_s($line) ?? Str !! $line
    }
    

    Specializations are always tied to a callsite object, from which we can know whether we’re being passed a parameter or not. Therefore, we should be able to optimize out those checks and, in the case the parameter is being passed, throw out the code setting the return value. Further, the *%_ that all methods get automatically should have been optimized out, but was not being.

    The latter problem was fixed largely by moving code, although tests showed a regression that needed a little more care to handle – namely, that a sufficiently complex default value might do something that causes a deoptimization, and we need to make sure we can fall back into the interpreter and have things work correctly in that case.

    While these changes weren’t enough to get consume-line-chars inlined, they did allow an inlining elsewhere, taking the inline ratio up to 28.49% of call frames.

    These initial round of changes took the Perl 6 benchmark from 2.87s to 2.77s, so about 3.5% off. Not much, but something.

    Continuing to improve code quality

    The code we were producing even pre-optimization was disappointing in a few ways. Firstly, even though a simple method like consume-line-chars, or chars, would never possibly do a return, we were still spitting out a return exception handler. A little investigation revealed that we were only doing analysis and elimination of this for subs but not methods. Adding that analysis for methods too took the time down to 2.58s. Knocking 7% off with such a small change was nice.

    Another code generation problem lay in consume-line-chars. Access to a native lexical can be compiled in two ways: either just by reading the value (fine if it’s only used as an r-value) or by taking a reference to it (which is correct if it will be used as an l-value). Taking a reference is decidedly costly compare to just reading the value. However, it’s always going to always have the correct behavior, so it’s the default. We optimize doing so away whenever we can (in fact, all the most common l-value usages of it never need a reference either).

    Looking at consume-line-chars again:

    method consume-line-chars(Bool:D :$chomp = False, Bool:D :$eof = False --> Str) {
        my str $line = nqp::decodertakeline(self, $chomp, $eof);
        nqp::isnull_s($line) ?? Str !! $line
    }
    

    We can see the read of $line here is, since consume-line-chars is not marked is rw, an r-value. Unfortunately, it was compiled as an l-value because the conditional compilation lost that context information. So, I addressed that and taught Rakudo to pass along return value’s r-value context.

    A native reference means an allocation, and this change cut the number of GC runs enormously, from 182 or them to 41 of them. That sounds like it should make a sensational difference. In fact, it got things down to 2.45s, a drop of just 5%. Takeaway lesson: allocating less stuff is good, but MoarVM’s GC is also pretty good at throwing away short-lived things.

    Meanwhile, back in the specializer…

    With the worst issues of the code being fed into MoarVM addressed, it was back to seeing why the specializer wasn’t doing a better job of stripping out type checks. First of all, it turned out that optional named arguments were not properly marking the default code dead when the argument was actually passed.

    Unfortunately, that wasn’t enough to get the type tests stripped out for the named parameters to consume-line-chars. In fact, this turned out to be an issue for all optional parameters. When doing type analysis, and there are two branches, the type information has to be merged at join points in the control flow graph. So it might see something like this in the case that the argument was not passed:

        Bool (default path) \   / Unknown (from passed path)
                             \ /
                       Result: Unknown
    

    Or maybe this in the case that it was passed:

        Bool (default path) \   / Scalar holding Bool (from passed path)
                             \ /
                       Result: Unknown
    

    In both cases, the types disagree, so they merge to unknown. This is silly, as we’ve already thrown out one of the two branches, so in fact there’s really no merge to do at all! To fix this up, I marked variables (in single static assignment form) that died as a result of a basic block being removed. To make the dead basic blocks from argument analysis actually be removed, we needed to do the dead code removal earlier as well as doing it at the very end of the optimization process. With that marking in place, it was then possible to ignore now-dead code’s contribution to a merge, which meant a whole load of type checks could now be eliminated. Well, in fact, only in the case where the optional was passed; a further patch to mark the writers of individual instructions dead for the purpose of merges was needed to handle the case where it was not.

    That left the return type being checked on the way out, which also seemed a bit of a waste as we could clearly see it was a Str. After a tweak to Rakudo to better convey type information in one of its VM extension ops, that check was optimized out too.

    And for all of this effort, the time went from…2.45s to 2.41s, just 2% off. While it’s cheaper to not type check things, it’s only so costly in the first place.

    A further win was that, with the code for consume-line-chars now being so tiny, it should have been an inlining candidate. Alas, it was not, because the optional arguments was still having tracking information recorded just in case we needed to deoptimize. This seemed odd. It turned out that my earlier fix for this was too simplistic: it would leave them in if the method would ever deoptimize, not just if it would do it while handling arguments. I tightened that up and the time dropped to 2.37s, another 2% one. Again, very much worth it, but shows that invocation – while not super cheap – is also only so costly.

    With consume-line-chars inlining now conquered, another area of the code we were producing caught by eye: boolification was, in some cases, managing to box an int into an Int only to them immediately unbox it and turn it into a Bool. Clearly this was quite a waste! It turned out that an earlier optimization to avoid taking native references had unexpected consequences. But even nicer was that my earlier work to pass down r-value context meant I could delete some analysis and just use that mechanism instead. That was worth 4%, bringing us to 2.28s.

    Taking stock

    None of these optimizations so far were specific to I/O or touched the I/O code itself. Instead, they are general optimization and code quality improvements that will benefit most Perl 6 programs. Together, they had taken the lines benchmark from 2.87s to 2.28s. Each may have been just some percent, but together they had knocked 20% off.

    By this point, the code quality – especially after optimization – was far more pleasing. It was time to look for some other sources of improvement.

    Beware associativity

    Perhaps one of the easiest wins came from spotting the pull-one method of the lines iterator seemed to be doing two calls to the defined method. See if you can spot them:

    method pull-one() {
        $!decoder.consume-line-chars(:$!chomp) // $!handle.get // IterationEnd
    }
    

    The // operator calls .defined to test for definedness. Buy why two calls in the common case? Because of associativity! Two added parentheses:

    method pull-one() {
        $!decoder.consume-line-chars(:$!chomp) // ($!handle.get // IterationEnd)
    }
    

    Were worth a whopping 8%. At 2.09s, the 2 second mark was in sight.

    Good idea, but…

    My next idea for an improvement was a long-planned change to the way that simple for loops are compiled. With for being defined in terms of map, this is also how it had been implemented. However, for simple cases, we can just compile:

    for some-iteratable { blah }
    

    Not into:

    some-iterable.map({ blah }).sink-all;
    

    But instead in to something more like:

    my \i = some-iterable.iterator;
    while (my \v = i.pull-one) !== IterationEnd {
        blah
    }
    

    Why is this an advantage? Because – at least in theory – now the pull-one and loop body should become possible to inline. This is not the case if we call map, since that is used with dozens of different closures and iterator types. Unfortunately, however, due to limitations in MoarVM’s specializer, it was not actually possible to achieve this inlining even after the change. In short, because we don’t handle inlining of closure-y things, and the way the on-stack replacement works means the optimizer is devoid of type information to have a chance to doing better with pull-one. Both of these are now being investigated, but were too big to take on as part of this work.

    Even without those larger wins being possible (when they are, we’ll achieve a tasty near-100% inlining rate in this benchmark), it brought the time down to the 2.00s mark. Here’s the patch.

    Optimizing line separation and decoding

    Profiling at the C level (using callgrind) showed up some notable hot spots in the string handling code inside of MoarVM, which seemed to offer the chance to get further wins. At this point, I also started taking measurements of CPU instructions using callgrind too, which makes it easier to see the effects of changes that may come out as noise on a simple time measurement (even with taking a number of them and averaging).

    Finding the separator happens in a couple of steps. First, individual encodings are set up to decode to the point that they see the final character of any of the line separators (noting these are configurable, and multi-char separators are allowed). Then, a second check is done to check if the multi-char separator was found. This is complicated by needing to handle the case where a separator was not found, and another read needs to be done from a file handle.

    It turns out that this second pass was re-scanning the entire buffer of chars, rather than just looking close to the end of it. After checking there should not be a case where just jumping to look at the end would ever be a problem, I did the optimization and got a reduction from 18,245,144,315 instructions to 16,226,602,756, or 11%.

    A further minor hot-spot was re-resolving the CRLF grapheme each time it was needed. It turned out caching that value saved around 200 million instructions. Caching the maximum separator length saved another 78 million instructions. The wallclock time now stood at 1.79s.

    The identification of separators when decoding chars seemed the next place to find some savings. CPUs don’t much like having to do loops and dereferences on hot paths. To do better, I made a compact array of the final separator graphemes that could be quickly scanned through, and also introduced a maximum separator codepoint filter, which given the common case is control characters works out really quite well. These were worth 420 million and 845 million instructions respectively.

    Next, I turned to the UTF-8 decoding and NFG process. A modest 56 million instruction win came from tweaking this logic given we can never be looking for a separator and have a target number of characters to decode. But a vast win came from adding a normalization fast path for the common case where we don’t have any normalization work to do. In the case we do encounter such work, we simply fall into the slow path. One nice property of the way I implemented this is that, when reading line by line, one line may cause a drop into the slow path, but the next line will start back in the fast path. This change was worth a whopping 3,200 million decrease in the instruction count. Wallclock time now stood at 1.37s.

    Better memory re-use

    Another look at the profile now showed malloc/free as significant costs. Could anything be done to reduce the number of those we did?

    Yes, it turned out. Firstly, keeping around a decoding result data structure instead of freeing and allocating it every single line saved a handy 450 million instructions. It turned out that we were also copying the decoded chars into a new buffer when taking a line, but in the common case that buffer would contain precisely the chars that make up the line. Therefore, this buffer could simply be stolen to use as the memory for the string. Another 400 million instructions worth dropped away by a call less to malloc/free per line.

    Micro-optimizations

    A few futher micro-optimizations in the new UTF-8 decoding fast-path were possible. By lifting some tests out of the main loop, reading a value into a local because the compiler couldn’t figure out it was invariant, and moving some position updates so they only happen on loop exit, a further 470 million instructions were removed. If you’re thinking that sounds like a lot, this is a loop that runs every single codepoint we decode. A million line file with 60 chars per line plus a separator is 61 million iterations. These changes between them only save 7 cycles per codepoint; that just turns out to be a lot when multiplied by the number of codepoints!

    The final result

    With these improvements, the Perl 6 version of the benchmark now ran in 1.25s, which is just 44% of the time it used to run in. The Perl 5 version still wins, but by a factor of 1.1 times, not 2.5 times. While an amount of the changes performed during this work were specific to the benchmark in question, many were much more general. For example, the separator finding improvements will help with this benchmark in all encodings, and the code generation and specializer improvements will have far more cross-cutting effects.

    Actually, not so final…

    There’s still a decent amount of room for improvement yet. Once MoarVM’s specializer can perform the two inlinings it is not currently able to, we can expect a further improvement. That work is coming up soon. And beyond that, there will be more ways to shave off some instructions here and there. Another less pleasing result is that if Perl 5 is not asked to do UTF-8 decoding, this represents a huge saving. Ask Perl 6 for ASCII or Latin-1, however, however, and it’s just a small saving. This would be a good target for some future optimization work. In the meantime, these are a nice bunch of speedups to have.


    Perlgeek.de: Living on the (b)leading edge

    Published by Moritz Lenz on 2017-06-24T22:00:01

    Perl 6 is innovative in many ways, and sometimes we don't fully appreciate all the implications, for good or for bad.

    There's one I stumbled upon recently: The use of fancy Unicode symbols for built-in stuff. In this case: the `.gist` output of Match objects. For example

    my token word { \w+ }
    say 'abc=def' ~~ /<word> '=' <word>/;
    
    produces this output:
    「abc=def」
     word => 「abc」
     word => 「def」
    

    And that's where the problems start. In my current quest to write a book on Perl 6 regexes, I noticed that the PDF that LeanPub generates from my Markdown sources don't correctly display those pesky 「」 characters, which are

    $ uni -c 「」
    「 - U+0FF62 - HALFWIDTH LEFT CORNER BRACKET
    」 - U+0FF63 - HALFWIDTH RIGHT CORNER BRACKET
    

    When I copied the text from the PDF and pasted into my editor, they showed up correctly, which indicates that the characters are likely missing from the monospace font.

    The toolchain allows control over the font used for displaying code, so I tried all the monospace fonts that were available. I tried them in alphabetical order. Among the earlier fonts I tried was Deja Vu Sans Mono, which I use in my terminal, and which hasn't let me down yet. No dice. I arrived at Noto, a font designed to cover all Unicode codepoints. And it didn't work either. So it turns out these two characters are part of some Noto Sans variants, but not of the monospace font.

    My terminal, and even some font viewers, use some kind of fallback where they use glyphs from other fonts to render missing characters. The book generation toolchain does not.

    The Google Group for Leanpub was somewhat helpful: if I could recommend an Open Source mono space font that fit my needs, they'd likely include it in their toolchain.

    So I searched and searched, learning more about fonts than I wanted to know. My circle of geek friends came up with several suggestions, one of them being Iosevka, which actually contains those characters. So now I wait for others to step up, either for LeanPub to include that font, or for the Noto maintainers to create a monospace variant of those characters (and then LeanPub updating their version of the font).

    And all of that because Perl 6 was being innovative, and used two otherwise little-used characters as delimiters, in an attempt to avoid collisions between delimiters and content.

    (In the mean time I've replaced the two offending characters with ones that look similar. It means the example output is technically incorrect, but at least it's readable).

    Perlgeek.de: Perl 6 Books Landscape in June 2017

    Published by Moritz Lenz on 2017-06-07T22:00:01

    There are lots of news around Perl 6 books to share these days. If you follow the community very closely, you might be aware of most of it. If not, read on :-).

    Think Perl 6 is now available for purchase, and also for download as a free ebook. Heck, it's even Open Source, with the LaTeX sources on GitHub!

    Perl 6 at a Glance, previously only available in print form, is now available as an ebook. Save paper and shipping costs!

    My own book, Perl 6 Fundamentals, is now in the "production" phase: copyediting, indexing, layout. And just before the manuscript submission deadline, Larry Wall has contributed a foreword. How awesome is that?

    I've revamped perl6book.com to provide a short overview of the current and future Perl 6 books. As a small gimmick, it contains a flow chart explaining which book to chose. And I even got input from two other Perl 6 book authors (Laurent Rosenfeld of "Think Perl 6", Andrew Shitov of "Perl 6 at a Glance", "Migrating to Perl 6".

    From a pull request to perl6book.com, it looks like Andrew Shitov is working on two more Perl 6 books. Keep 'em coming!

    Last but not least, Gabor Szabo has started a crowd funding campaign for a Perl 6 book on web app development. There are still a few day left, so you can help it succeed!

    And as always, if you want to keep informed about Perl 6 books, you can sign up at perl6book.com for my Perl 6 books mailing list (low volume, typically less than one email per month).

    samcv: Grant Status Update 1

    Published on 2017-06-02T07:00:00

    This is my first grant progress report for my Perl Foundation grant entitled "Improving the Robustness of Unicode Support in Rakudo on MoarVM".

    I was not able to work quite as many hours as I would have liked this month, but I still made quite a lot of progress.

    Improvement for Tests

    Merged In

    In Roast there is a new version of GraphemeBreakTest.t.

    The script tests the contents of each grapheme individually from the GraphemeClusterBreak.txt file from the Unicode 9.0 test suite.

    Previously we only checked the total number of ‘.chars’ each for the string as a whole. Obviously we want something more precise than that, since the test specifies the location of each of the breaks between codepoints. The new code checks that codepoints are put in the correct graphemes in the proper order. In addition we also check the string length as well.

    This new test uses a grammar to parse the file and generally is much more robust than the previous script.

    Running the parse class generates an array of arrays. The index of the outer array indicates the grapheme, while the inner arrays indicate which codepoints should be in that grapheme.

    [[10084, 776], [9757]]

    The array above would indicate that the 1st grapheme is made up of codepoint's 10084 and 776 while the 2nd grapheme is made up codepoint 9757. This allows us to easily test the contents of each grapheme.

    The array shown above corresponds to the following line from the Unicode data file:

    ÷ 2764 × 0308 ÷ 261D ÷ where × means break and ÷ means no-break

    Work in Progress

    I have some currently unmerged tests which need to wait to be merged, although sections of it are complete and are being incorporated into the larger Unicode Database Retrofit, reusing this code.

    I have written grammars and modules to process and provide data on the PropertyValueAliases and PropertyAliases. They will be used for testing that all of the canonical property names and all the property values themselves properly resolve to separate property codes, as well as that they are usable in regex.

    Work on the Unicode Database Retrofit

    As part of my grant work I am working on making Unicode property values distinct per property, and also on allowing all canonical Unicode property values to work. For a background on this see my previous post about Unicode Property Names. The WIP generated code can be seen in this gist here and was generated from UCD-gen.p6. The code resolves property name and property value command line arguments and matches them with property codes and property value codes. It is also case insensitive and ignores underscores as Unicode spec says is permissible. In addition it is also deduplicated, meaning we only store one hash per unique set of property values.

    For example: Script and Script_Extensions both have the same values, so we don't store these more than once; likewise for the Boolean property values. The C program resolves the property string to a unique property code, and from there is able to look up the property value code. Note: aside from the property values which specify the lack of a property, these codes are internal and have no relation to the Unicode spec, for example Grapheme_Cluster_Break=Other is designated as property value 0.

    Docs

    I've also started adding some documentation to my Unicode-Grant wiki with information about what is enclosed in each Unicode data file; there are a few other pages as well. This wiki is planned to be expanded to have many more sections than it does currently. https://github.com/samcv/Unicode-Grant/wiki/All-Unicode-Files

    Future Work

    Next I must integrate the property name/value alias resolving code with UCD-gen.p6. UCD-gen.p6 already has a mostly functional Unicode database with a fair number of properties. When these two are integrated, the next step will be to start integrating it with the MoarVM codebase, making any changes to MoarVM or the database retrofit codebase as needed along the way.

    I will also be exploring ways of compressing the mapping of codepoints to unique combinations of Unicode property data in the bitfield. Due to the vast number of codepoints within Unicode, currently the mapping of codepoints to rows in the bitfield takes up many times more space than the actual property value data itself.

    For compressing the Unicode names, it is planned to use base 40 encoding with some additional tricks to save additional space for repeated words. I plan on making a blog post where I go into the details of the compression scheme.

    I am considering also rolling in the ignorecase/ignoremark bug into my grant. Even though it was not originally planned to be part of the Grant, I think it is important enough to warrant inclusion. Currently, using regex using both ignorecase and ignoremark together is completely broken.

    Note

    The work described above has been commited to the two repositories as listed below (in addition to the test work described which was merged into Roast).

    https://github.com/samcv/UCD

    https://github.com/samcv/Unicode-Grant

    brrt to the future: Function call return values

    Published by Bart Wiegmans on 2017-05-25T19:02:00

    Hi there, it's been about a month since last I wrote on the progress of the even-moar-jit branch, so it is probably time for another update.

    Already two months ago I wrote about adding support for function calls in the expression JIT compiler. This was a major milestone as calling C functions is essential for almost everything that is not pure numerical computing. Now we can also use the return values of function calls (picture that!) The main issue with this was something I've come to call the 'garbage restore' problem, by which I mean that the register allocator would attempt to 'restore' an earlier, possibly undefined, version of a value over a value that would result from a function call.

    This has everything to do with the spill strategy used by the compiler. When a value has to be stored to memory (spilled) in order to avoid being overwritten and lost, there are a number of things that can be done. The default, safest strategy is to store a value to memory after every instruction that computes it and to load it from memory before every instruction that uses it. I'll call this a full spill. It is safe because it effectively makes the memory location the only 'true' storage location, with the register being merely temporary caches. It can also be somewhat inefficient, especially if the code path that forces the spill is conditional and rarely taken. In MoarVM, this happens (for instance) around memory barriers, which are only necessary when creating cross-generation object references.

    That's why around function calls the JIT uses another strategy, which I will call a point spill. What I mean by that is that the (live) values which could be overwritten by the function call are spilled to memory just before the function call, and loaded back into their original registers directly after. This is mostly safe, since under normal control flow, the code beyond the function call point will be able to continue as if nothing had changed. (A variant which is not at all safe is to store the values to memory at the point, and load them from memory in all subsequent code, because it isn't guaranteed that the original spill-point-code is reached in practice, meaning that you overwrite a good value with garbage. The original register allocator for the new JIT suffered from this problem).

    It is only safe, though, if the value that is to be spilled-and-restored is both valid (defined in a code path that always precedes the spill) and required (the value is actually used in code paths that follow the restore). This is not the case, for instance, when a value is the result of a conditional function call, as in the following piece of code:

    1:  my $x = $y + $z;
    2:  if ($y < 0) {
    3:      $x = compute-it($x, $y, $z);
    4:  }
    5:  say "\$x = $x";

    In this code, the value in $x is defined first by the addition operation and then, optionally, by the function call to compute-it. The last use of $x is in the string interpolation on line 5. Thus, according to the compiler, $x holds a 'live' value at the site of the function call on line 3, and so to avoid it from being overwritten, it must be spilled to memory and restored. But in fact, loading $x from memory after compute-it would directly overwrite the new value with the old one.

    The problem here appears to be that when the JIT decides to 'save' the value of $x around the function call, it does not take into account that - in this code path - the last use of the old value of $x is in fact when it is placed on the parameter list to the compute-it call. From the perspective of the conditional branch, it is only the new value of $x which is used on line 5. Between the use on the parameter list and the assignment from the return value, the value of $x is not 'live' at all. This is called a 'live range hole'. It is then the goal to find these holes and to make sure a value is not treated as live when it is in fact not.

    I used an algorithm from a paper by Wimmer and Franz (2010) to find the holes. However, this algorithm relies on having the control flow structure of the program available, which usually requires a separate analysis step. In my case that was fortunately not necessary since this control flow structure is in fact generated by an earlier step in the JIT compilation process, and all that was necessary is to record it. The algorithm itself is really simple and relies on the following ideas:

    I think it goes beyond the scope of this blog post to explain how it works in full, but it is really not very complicated and works very well. At any rate, it was sufficient to prevent the JIT from overwriting good values with bad ones, and allowed me to finally enable functions that return values, which is otherwise really simple.

    When that was done, I obviously tried to use it and immediately ran into some bugs. To fix that, I've improved the jit-bisect.pl script, which wasn't very robust before. The jit-bisect.pl script uses two environment variables, MVM_JIT_EXPR_LAST_FRAME and MVM_JIT_EXPR_LAST_BB, to automatically find the code sequence where the expression compiler fails and compiles wrong code. (These variables tell the JIT compiler to stop running the expression compiler after a certain number of frames and basic blocks. If we know that the program fails with N blocks compiled, we can use binary search between 0 and N to find out which frame is broken). The jit-dump.pl script then provides disassembled bytecode dumps that can be compared and with that, it is usually relatively easy to find out where the JIT compiler bug is.

    With that in hand I've spent my time mostly fixing existing bugs in the JIT compiler. I am now at a stage in which I feel like most of the core functionality is in place, and what is left is about creating extension points and fixing bugs. More on that, however, in my next post. See you then!

    Death by Perl6: Perl Toolchain Summit 2017 - CPAN and Perl6

    Published by Nick Logan on 2017-05-25T05:41:53

    At the 2017 Perl Toolchain Summit (PTS) a lot of stuff got done. This is a brief demonstration style summary of the resulting CPAN-related feature enhancements to zef.

    First I should mention that now Perl6 distributions can be uploaded to CPAN (without needing to add a special Perl6/ folder), and will have their source-url automatically set or replaced with the appropriate CPAN url. Additionally App::Mi6 now has mi6 dist and mi6 upload to make the process even simpler.

    Now lets get started by making sure we are using a version with features developed at PTS:

    $ zef install "zef:ver(v0.1.15+)"
    All candidates are currently installed  
    No reason to proceed. Use --force to continue anyway  
    

    Perl6 distributions uploaded to CPAN are now indexed. Currently the index is generated by https://github.com/ugexe/Perl6-App--ecogen and stored at https://github.com/ugexe/Perl6-ecosystems alongside a mirror of the existing p6c ecosystem. It is also enabled by default now:

    $ zef list --max=10
    ===> Found via Zef::Repository::Ecosystems<cpan>
    Inline:ver('1.2.1')  
    Inline:ver('1.2')  
    Inline:ver('1')  
    IO::Glob:ver('0.1'):auth('github:zostay')  
    Text::CSV:ver('0.007'):auth('github:Tux')  
    Text::CSV:ver('0.008'):auth('github:Tux')  
    Data::Selector:ver('1.01')  
    Data::Selector:ver('1.02')  
    NativeCall:ver('1')  
    CompUnit::Repository::Mask:ver('0.0.1')  
    Inline::Perl5:ver('0.26'):auth('github:niner')
    
    $ zef info CompUnit::Repository::Mask
    - Info for: CompUnit::Repository::Mask
    - Identity: CompUnit::Repository::Mask:ver('0.0.1')
    - Recommended By: Zef::Repository::Ecosystems<cpan>
    Description:     hide installed modules for testing.  
    License:     Artistic-2.0  
    Source-url:     http://www.cpan.org/authors/id/N/NI/NINE/Perl6/CompUnit-Repository-Mask-0.0.1.tar.gz  
    Provides: 1 modules  
    Depends: 0 items
    

    A distribution can exist in multiple "ecosystems":

    $ zef search Inline::Perl5
    ===> Found 3 results
    -----------------------------------------------------------------------------------------------------------------------
    ID|From                             |Package                                       |Description  
    -----------------------------------------------------------------------------------------------------------------------
    1 |Zef::Repository::LocalCache      |Inline::Perl5:ver('0.26'):auth('github:niner')|Use Perl 5 code in a Perl 6 program  
    2 |Zef::Repository::Ecosystems<cpan>|Inline::Perl5:ver('0.26'):auth('github:niner')|Use Perl 5 code in a Perl 6 program  
    3 |Zef::Repository::Ecosystems<p6c> |Inline::Perl5:ver('0.26'):auth('github:niner')|Use Perl 5 code in a Perl 6 program  
    -----------------------------------------------------------------------------------------------------------------------
    

    Dependencies can be resolved by any and all ecosystems available, so distributions can be put on cpan and still have their dependencies that aren't get resolved:

    $ zef -v install Inline::Perl5
    ===> Searching for: Inline::Perl5
    ===> Found: Inline::Perl5:ver('0.26'):auth('github:niner') [via Zef::Repository::Ecosystems<cpan>]
    ===> Searching for missing dependencies: LibraryMake, File::Temp
    ===> Found dependencies: File::Temp [via Zef::Repository::Ecosystems<p6c>]
    ===> Found dependencies: LibraryMake:ver('1.0.0'):auth('github:retupmoca') [via Zef::Repository::LocalCache]
    ===> Searching for missing dependencies: Shell::Command, File::Directory::Tree
    ===> Found dependencies: Shell::Command, File::Directory::Tree:auth('labster') [via Zef::Repository::Ecosystems<p6c>]
    ===> Searching for missing dependencies: File::Which, File::Find
    ===> Found dependencies: File::Find:ver('0.1'), File::Which [via Zef::Repository::Ecosystems<p6c>]
    
    ...<more output>...
    

    In addition to CPAN we have access to CPAN testers. Garu worked with me to create a perl6 cpan testers report module: Zef::CPANReporter

    $ zef install Zef::CPANReporter
    ===> Searching for: Zef::CPANReporter
    ===> Searching for missing dependencies: Net::HTTP
    ===> Testing: Net::HTTP:ver('0.0.1'):auth('github:ugexe')
    ===> Testing [OK] for Net::HTTP:ver('0.0.1'):auth('github:ugexe')
    ===> Testing: Zef::CPANReporter:ver('0.0.1'):auth('github:garu')
    ===> Testing [OK] for Zef::CPANReporter:ver('0.0.1'):auth('github:garu')
    ===> Installing: Net::HTTP:ver('0.0.1'):auth('github:ugexe')
    ===> Installing: Zef::CPANReporter:ver('0.0.1'):auth('github:garu')
    
    # ...and in use:
    
    $ zef -v install Grammar::Debugger
    ===> Searching for: Grammar::Debugger
    ===> Found: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn') [via Zef::Repository::Ecosystems<p6c>]
    ===> Searching for missing dependencies: Terminal::ANSIColor
    ===> Found dependencies: Terminal::ANSIColor:ver('0.3') [via Zef::Repository::Ecosystems<p6c>]
    ===> Fetching [OK]: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn') to /Users/ugexe/.zef/tmp/grammar-debugger.git
    ===> Fetching [OK]: Terminal::ANSIColor:ver('0.3') to /Users/ugexe/.zef/tmp/Terminal-ANSIColor.git
    ===> Testing: Terminal::ANSIColor:ver('0.3')
    t/00-load.t .. ok  
    All tests successful.  
    Files=1, Tests=1,  0 wallclock secs  
    Result: PASS  
    ===> Testing [OK] for Terminal::ANSIColor:ver('0.3')
    Report for Terminal::ANSIColor:ver('0.3') will be available at http://www.cpantesters.org/cpan/report/a9ed11ac-4108-11e7-9b92-c8514edb94d5  
    ===> Testing: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
    t/debugger.t .. ok  
    t/ltm.t ....... ok  
    t/tracer.t .... ok  
    All tests successful.  
    Files=3, Tests=3,  1 wallclock secs  
    Result: PASS  
    ===> Testing [OK] for Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
    Report for Grammar::Debugger:ver('1.0.1'):auth('github:jnthn') will be available at http://www.cpantesters.org/cpan/report/ab9c911c-4108-11e7-a777-ca182bdd3934  
    ===> Installing: Terminal::ANSIColor:ver('0.3')
    ===> Install [OK] for Terminal::ANSIColor:ver('0.3')
    ===> Installing: Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
    ===> Install [OK] for Grammar::Debugger:ver('1.0.1'):auth('github:jnthn')
    

    This work (and my attendance) was made possible by a bunch of great perl companies and people:

    Booking.com, ActiveState, cPanel, FastMail, MaxMind, Perl Careers, MongoDB, SureVoIP, Campus Explorer, Bytemark, CAPSiDE, Charlie Gonzalez, Elastic, OpusVL, Perl Services, Procura, XS4ALL, Oetiker+Partner.

    rakudo.org: Announce: Rakudo Star Release 2017.04

    Published by Steve Mynott on 2017-05-01T15:35:01

    A useful and usable production distribution of Perl 6

    On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the April 2017 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the April 2017 release is available from https://rakudo.perl6.org/downloads/star/.

    Binaries for macOS and Windows (64 bit) are also available.

    This is the seventh post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

    This release includes “zef” as module installer. “panda” is to be shortly replaced by “zef” and will be removed in the near future.

    It’s hoped to produce quarterly Rakudo Star releases during 2017 with 2017.07 (July) and 2017.10 (October) to follow.

    Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

    In the Perl 6 world, we make a distinction between the language (“Perl 6”) and specific implementations of the language such as “Rakudo Perl”.

    This Star release includes [release 2017.04.3] of the Rakudo Perl 6 compiler, version 2017.04-53-g66c6dda of MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

    The Rakudo compiler changes since the last Rakudo Star release of 2017.01 are now listed in “2017.02.md” and “2017.04.md” under the “rakudo/docs/announce” directory of the source distribution.

    In particular this release featured many important improvements to the IO subsystem thanks to Zoffix and the support of the Perl Foundation.

    Please see
    Part 1: http://rakudo.org/2017/04/02/upgrade
    Part 2: http://rakudo.org/2017/04/03/part-2
    Part 3: http://rakudo.org/2017/04/17/final-notes

    Note there were point releases of 2017.04 so also see “2017.04.1.md”, “2017.04.2.md” and “2017.04.3.md”.

    Notable changes in modules shipped with Rakudo Star:

    + DBIish: New version with pg-consume-input
    + doc: Too many to list. Large number of “IO Grant” doc changes.
    + json\_fast: Too many to list. Big performance improvements.
    + perl6-lwp-simple: Fix for lexical require and incorrect regex for absolute URL matcher
    + test-mock: Enable concurrent use of mock objects
    + uri: Encoding fixes
    + zef: Too many to list. IO fixage.

    There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

    + advanced macros
    + non-blocking I/O (in progress)
    + some bits of Synopsis 9 and 11
    + There is an online resource at http://perl6.org/compilers/features that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

    In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at rakudobug@perl.org.

    See https://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

    The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the perl6-compiler@perl.org mailing list, or join us on IRC #perl6 on freenode.

    brrt to the future: Letting templates do what you mean

    Published by Bart Wiegmans on 2017-04-30T22:12:00

    Hi everybody, today I'd like to promote a minor, but important improvement in the 'expression template compiler' for the new JIT backend. This is a tool designed to make it easy to develop expression templates, which are themselves a way to make it easy to generate the 'expression tree' intermediate representation used by the new JIT backend. This is important because MoarVM instructions operate on a perl-like level of abstraction - single instructions can perform operations such as 'convert object to string', 'find first matching character in string' or 'access the last element of an array'. Such operations require rather more instructions to represent as machine code.

    This level of abstraction is rather convenient for the rakudo compiler, which doesn't have to consider low-level details when it processes your perl6 code. But it is not very convenient for the JIT compiler which does. The 'expression' intermediate representation is designed to be much closer to what hardware can support directly. Basic operations include loading from and storing to memory, memory address computation, integer arithmetic, (conditional) branching, and function calls. At some point in the future, floating point operations will also be added. But because of this difference in abstraction level, a single MoarVM instruction will often map to many expression tree nodes. So what is needed is an efficient way to convert between the two representations, and that is what expression templates are supposed to do.

    Expression templates are very much like the expression tree structure itself, in that both are represented as arrays of integers. Some of the elements represent instructions, some are constants, and some are references (indexes into the same array), forming a directed acyclic graph (not a tree). The only difference is that the template is associated with a set of instructions that indicate how it should be linked into the tree. (Instruction operands, i.e. the data that each instruction operates on, are prepared and linked by the template application process as well).

    Surprisingly, arrays of integers aren't a very user-friendly way to write instruction templates, and so the template compiler was born. It takes as input a text file with expression templates defined as symbolic expressions, best known from the LISP world, and outputs a header file that contains the templates, ready for use by the JIT compiler. Note that the word 'template' has become a bit overloaded, referring to the textual input of the template compiler as well as to the binary input to the JIT compiler. That's okay, I guess, since they're really two representations of the same thing. The following table shows how template text, binary, and expression tree relate to each other:

    Text 'Binary'Tree

    (template: unless_i
    (when
    (zr $0)
    (branch (label $1))
    ))

    template: {
    MVM_JIT_ZR,
    0,
    MVM_JIT_LABEL,
    1,
    MVM_JIT_BRANCH,
    2,
      MVM_JIT_WHEN,
      0,
      4,
    },
    info: ".f.f.l.ll",
    len: 9,
    root: 6

    I hope it isn't too hard to see how one maps to the other. The unless_i instruction executes a branch if its integer argument is zero, specified by a constant as its second argument. All symbols (like when, label and zr) have been replaced by uppercase prefixed constants (MVM_JIT_WHEN), and all nesting has been replaced by references (indexes) into the template array. The 'info' string specifies how the template is to be linked into the tree. Instruction operands are indicated by an 'f', and internal links by an 'l'. In the tree representation the operands have been linked into the tree by the JIT; they form the LOAD and CONST nodes and everything below them.

    Anyway, my improvement concerns a more complex form of template, such as the following example, an instruction to load an object value from the instance field of an object:

    (template: sp_p6oget_o
    (let: (($val (load (add (^p6obody $1) $2) ptr_sz)))
    (if (nz $val) $val (^vmnull))))

    This template contains a let: expression, which declares the $val variable. This value can be used in the subsequent expression by its name. Without such declarations the result of a computation could only have one reference, its immediate syntactic parent. (Or in other words, without let:, every template can only construct a tree). That is very inconvenient in case a result should be checked for null-ness, as in this case. (vmnull is a macro for the global 'null object' in MoarVM. The null object represents NULL wherever an object is needed, but isn't actually NULL, as that would mean it couldn't be dereferenced; it saves the interpreter from checking if a pointer to an object is NULL everywhere it is accessed).

    The let: construct has another purpose: it ensures the ordering of operations. Although most operations can be ordered in whatever way suits the compiler, some do not, most notably function calls. (Function calls may have numerous unpredictable side effects, after all). All statements declared in the 'let declaration body' are compiled to run before any statements in the 'expression body'. This enables the programmer to ensure that a value is not needlessly computed twice, and more importantly, it ensures that a value that is used in multiple branches of a conditional statement is defined in both of them. For instance:


    (let (($foo (...)))
    (if (...)
    (load $foo)
    $foo))

    This pseudo-snippet of template code would dereference $foo if some condition is met (e.g. $foo is not NULL) and returns $foo directly otherwise. Without let to order the computation of $foo prior to the blocks of if, the first (conditional) child of if would be the first reference to $foo. That would mean that the code to compute $foo is only compiled in the first conditional block, which would not be executed whenever the if condition was not true, meaning that $foo would be undefined in the alternative conditional block. This would mean chaos. So in fact let does order expressions. All is good.

    Except... I haven't told you how this ordering works, which is where my change comes in. Prior to commit 7fb1b10 the let expression would insert a hint to the JIT compiler to add the declared expressions as tree roots. The 'tree roots' are where the compiler starts converting the expression tree (graph) to a linear sequence of byte code. Hence the declaring expressions are compiled prior to the dependent expressions. But this has, of course, one big disadvantage, which is that the set of roots is global for the tree. Every declaration, no matter how deep into the tree, was to be compiled prior to the head of the tree. As a result, the following template code would not at all do what you want:


    (let ($foo (...))
    (if (nz $foo)
    (let (($bar (load $foo))) # dereference $foo !
    (... $bar))
    ...)


    The declaration of $bar would cause $foo to be dereferenced prior to checking whether it is non-null, causing a runtime failure. Chaos is back. Well, that's what I've changed. Fortunately, we have another ordering mechanism at our disposal, namely DO lists. These are nodes with a variable number of children that are also promised to be compiled in order. After the patch linked above, the compiler now transforms let expressions into the equivalent DO expressions. Because DO expressions can be nested safely, $bar is not computed prior to the null-check of $foo, as the programmer intended. I had originally intended to implement analysis to automatically order the expressions with regard to the conditionals, but I found that this was more complicated to implement and more surprising to the programmer. I think that in this case, relying on the programmer is the right thing.

    One thing that I found interesting is that this reduces the number of mechanisms in the compiler. The 'root-hint' was no longer useful, and subsequently removed. At the same time, all but the last child of a DO list must be void expressions, i.e. yield no value, because DO can only return the value of its last child. Since all expressions in a let declaration must yield some value - otherwise they would be useless - they required a new operation type: discard. Thus with a new node type (extension of data range) we can remove a class of behavior.

    After I had implemented this, I've started working on adding basic block analysis. That is a subject for a later post, though. Until next time!

    Strangely Consistent: The root of all eval

    Published by Carl Mäsak

    Ah, the eval function. Loved, hated. Mostly the latter.

    $ perl -E'my $program = q[say "OH HAI"]; eval $program'
    OH HAI
    

    I was a bit stunned when the eval function was renamed to EVAL in Perl 6 (back in 2013, after spec discussion here). I've never felt really comfortable with the rationale for doing so. I seem to be more or less alone in this opinion, though, which is fine.

    The rationale was "the function does something really weird, so we should flag it with upper case". Like we do with BEGIN and the other phasers, for example. With BEGIN and others, the upper-casing is motivated, I agree. A phaser takes you "outside of the normal control flow". The eval function doesn't.

    Other things that we upper-case are things like .WHAT, which look like attributes but are really specially code-generated at compile-time into something completely different. So even there the upper-casing is motivated because something outside of the normal is happening.

    eval in the end is just another function. Yes, it's a function with potentially quite wide-ranging side effects, that's true. But a lot of fairly standard functions have wide-ranging side effects. (To name a few: shell, die, exit.) You don't see anyone clamoring to upper-case those.

    I guess it could be argued that eval is very special because it hooks into the compiler and runtime in ways that normal functions don't, and maybe can't. (This is also how TimToady explained it in the commit message of the renaming commit.) But that's an argument from implementation details, which doesn't feel satisfactory. It applies with equal force to the lower-cased functions just mentioned.

    To add insult to injury, the renamed EVAL is also made deliberately harder to use:

    $ perl6 -e'my $program = q[say "OH HAI"]; EVAL $program'
    ===SORRY!=== Error while compiling -e
    EVAL is a very dangerous function!!! (use the MONKEY-SEE-NO-EVAL pragma to override this error,
    but only if you're VERY sure your data contains no injection attacks)
    at -e:1
    ------> program = q[say "OH HAI"]; EVAL $program⏏<EOL>
    
    $ perl6 -e'use MONKEY-SEE-NO-EVAL; my $program = q[say "OH HAI"]; EVAL $program'
    OH HAI
    

    Firstly, injection attacks are a real issue, and no laughing matter. We should educate each other and newcomers about them.

    Secondly, that error message ("EVAL is a very dangerous function!!!") is completely over-the-top in a way that damages rather than helps. I believe when we explain the dangers of code injection to people, we need to do it calmly and matter-of-factly. Not with three exclamation marks. The error message makes sense to someone who already knows about injection attacks; it provides no hints or clues for people who are unaware of the risks.

    (The Perl 6 community is not unique in eval-hysteria. Yesterday I stumbled across a StackOverflow thread about how to turn a string with a type name into the corresponding constructor in JavaScript. Some unlucky soul suggested eval, and everybody else immediately piled on to point out how irresponsible that was. Solely as a knee-jerk reaction "because eval is bad".)

    Thirdly, MONKEY-SEE-NO-EVAL. Please, can we just... not. 😓 Random reference to monkies and the weird attempt at levity while switching on a nuclear-chainsaw function aside, I find it odd that a function that enables EVAL is called something with NO-EVAL. That's not Least Surprise.

    Anyway, the other day I realized how I can get around both the problem of the all-caps name and the problem of the necessary pragma:

    $ perl6 -e'my &eval = &EVAL; my $program = q[say "OH HAI"]; eval $program'
    OH HAI
    

    I was so happy to realize this that I thought I'd blog about it. Apparently the very dangerous function (!!!) is fine again if we just give it back its old name. 😜

    rakudo.org: PART 3: Information on Changes Due to IO Grant Work

    Published by Zoffix Znet on 2017-04-17T20:22:46

    The IO grant work is at its wrap up. This note lists some of the last-minute changes to the plans delineated in earlier communications ([1], [2], [3]). Most of the listed items do not require any changes to users’ code.

    Help and More Info

    If you need help or more information, please join our IRC channel and ask there. You can also contact the person performing this work via Twitter @zoffix or by talking to user Zoffix in our dev IRC channel

    brrt to the future: Function Call Milestone

    Published by Bart Wiegmans on 2017-03-28T16:14:00

    Hi everybody. It's high time for another update, and this time I have good news. The 'expression' JIT compiler can now compile native ('C') function calls (although it's not able to use the results). This is a major milestone because function calls are hard! (At least from the perspective of a compiler, and especially from the perspective of the register allocator). Also because native function calls are really very important in MoarVM. Most of its 'primitive' operations (like hash table access, string equality, big integer arithmetic) are implemented by invoking native functions, and so to compile almost any program the JIT has to compile many function calls.

    What makes function calls 'hard' is that they must implement the 'calling convention' of the relevant 'application binary interface' (ABI). In short, the ABI specifies the locations of function call parameters.  A small number of parameters (on Windows, the first 4, for POSIX platforms, the first 6) are placed in registers, and if there are more parameters they are usually placed on the stack. Aside from the calling convention, the ABI also specifies the expected alignment of the stack pointer (per 16 bytes) and the registers a functions may overwrite (clobber in ABI-speak) and which registers must have their original values after the function returns. The last type of registers are called 'callee-saved'. Note that at least a few registers must be callee-saved, especially those related to call stack management, because if the callee function would overwrite those it would be impossible to return control back to the caller. By the way, manipulating exactly those registers is how the setjmp and longjmp 'functions' work.

    So the compiler is tasked with generating code that ensures the correct values are placed in the correct registers. That sounds easy enough, but what if the these registers are taken by other values, and what if those other values might be required for another parameter? Indeed, what if the value in the %rdx register needs to be in the %rsi register, and the value of the %rsi register is required in the %rdx register? How to determine the correct ordering for shuffling the operands?

    One simple way to deal with this would be to eject all values from registers onto the stack, and then to load the values from registers if they are necessary. However, that would be very inefficient, especially if most function calls have no more than 6 (or 4) parameters and most of these parameters are computed for the function call only. So I thought that solution wouldn't do.

    Another way to solve this would be if the register allocator could ensure that values are placed in their correct registers directly,- especially for register parameters -  i.e. by 'precoloring'. (The name comes from register allocation algorithms that work by 'graph coloring', something I will try to explain in a later post). However, that isn't an option due to my choice of 'linear scan' as the register allocation algorithm. This is a 'greedy' algorithm, meaning that it decides the allocation for a live range as soon as it encounters them, and that it cannot revert that decision once it's been made. (If it could, it would be more like a dynamic programming algorithm). So to ensure that the allocation is valid I'd have to make sure that the information about register requirements is propagated backwards from the instructions to all values that might conflict with it... and that point we're no longer talking about linear scan, and I would be better off re-engineering a new algorithm. Not a very attractive option either!

    Instead, I thought about it and it occurred to me that this problem seems a lot like unravelling a dependency graph, with a number of restrictions. That is to say, it can be solved by a topological sort. I map the registers to a graph structure as follows:

    I linked to the topological sort page for an explanation of the problem, but I think my implementation is really quite different from that presented there. They use a node visitation map and a stack, I use an edge queue and and outbound count. A register transfer (edge) can be enqueued if it is clear that the destination register is not currently used. Transfers from registers to stack locations (as function call parameters) or local memory (to save the value from being overwritten by the called function) are also enqueued directly. As soon as the outbound count of a node reaches zero, it is considered to be 'free' and the inbound edge (if any) is enqueued.


    Unlike a 'proper' dependency graph, cycles can and do occur, as in the example where '%rdx' and '%rsi' would need to swap places. Fortunately, because of the single-inbound edge rule, such cycles are 'simple' - all outbound edges not belonging to the cycle can be resolved prior to the cycle-breaking, and all remaining edges are part of the cycle. Thus, the cycle can always be broken by freeing just a single node (i.e. by copy to a temporary register).

    The only thing left to consider are the values that are used after the function call returns (survive the function call) and that are stored in registers that the called function can overwrite (which is all of them, since the register allocator never selects callee-saved registers). So to make sure they are available afterwards, we must spill them. But there are a few spill strategies to choose from (terminology made up by me):

    The current register allocator does a full spill when it's run out of registers, and it would make some sense to apply the same logic for function-call related spills. I've decided to use spill-and-restore, however, because a full spill complicates the sorting order (a value that used to be in a register is suddenly only in memory) and it can be wasteful, especially if the call only happens in an alternative branch. This is common for instance when assigning values to object fields, as that may sometimes require a write barrier (to ensure the GC tracks all references from 'old' to 'new' objects). So I'm guessing that it's going to be better to pay the cost of spilling and restoring only in those alternative branches, and that's why I chose to use spill-and-restore.

    That was it for today. Although I think being able to call functions is a major milestone, this is not the very last thing to do. We currently cannot allocate any of the registers used for floating-point calculations, which is a relatively minor limitation since those aren't used very frequently. But I also need to do some more work to actually use function return values and apply generic register requirements of tiles. But I do think the day is coming near where we can start thinking about merging the new JIT with the MoarVM master branch, making it available to everybody. Until next time!

    rakudo.org: PART 2: Upgrade Information for Changes Due to IO Grant Work

    Published by Zoffix Znet on 2017-04-03T00:15:07

    We’re making more changes!

    Do the core developers ever sleep? Nope! We keep making Perl 6 better 24/7!

    Why?

    Not more than 24 hours ago, you may have read Upgrade Information for Changes Due to IO Grant Work. All of that is still happening.

    However, it turned out that I, (Zoffix), had an incomplete understanding of how changes in 6.d language will play along with 6.c stuff. My original assumption was we could remove or change existing methods, but that assumption was incorrect. Pretty much the only sane way to incompatibly change a method in an object in 6.d is to add a new method with a different name.

    Since I rather us not have, e.g. .child and .child-but-secure, for the next decade, we have a bit of an in-flight course correction:

    ORIGINAL PLAN was to minimize incompatibilities with existing 6.c language code; leave everything potentially-breaking for 6.d

    NEW PLAN is to right away add everything that does NOT break 6.c-errata specification, into 6.c language; leave everything else for 6.d. Note that current 6.c-errata specification for IO is sparse (the reason IO grant is running in the first place), so there’s lots of wiggle room to make most of the changes in 6.c.

    When?

    I (Zoffix) still hope to cram all the changes into 2017.04 release. Whether that’s overly optimistic, given the time constraints… we’ll find out on April 17th. If anything doesn’t make it into 2017.04, all of it definitely will be in 2017.05.

    What?

    Along with the original list in first Upgrade Information Notice, the following changes may affect your code. I’m excluding any non-conflicting changes.

    Potential changes:

    Changes for 6.d language:

    Help and More Info

    If you need help or more information, please join our IRC channel and ask there. You can also contact the person performing this work via Twitter @zoffix or by talking to user Zoffix in our dev IRC channel