2010-05-10

JSMeta became ... CSMeta

And then some.

I took the fantastic RunSharp library (see http://runsharp.googlecode.com/), which is a runtime-code-generator compiler for the CLR (works perfectly on Mono (>= 2.6) too!), threw in a dynamic parser-generator (a generator of dynamic parsers, somewhat evolved from JSMeta, though entirely rewritten (again, for the 4th time or so)), and created a compiler-compiler framework for the CLR.

The first programming language created in the environment is ... Perlesque.  It is a simple language.  It tries to stick to the syntax of the strongly-typed subset of the Perl 6 specification.  Every variable is a scalar (dollar "sigil" in front of the name), and variables have exactly the same semantics as their C# counterparts (value vs reference, except that Perlesque implements full-blown closures for value types, unlike C#).  If you are familiar with any of the multitudinous CLR languages (especially C#), you will feel right at home in Perlesque.

Here is an example of Perlesque code: Knuth's Man or Boy litmus-test for a heap-frame runtime/language:


sub A(int $k, Callable[:(--> int)] $x1,
              Callable[:(--> int)] $x2,
              Callable[:(--> int)] $x3,
              Callable[:(--> int)] $x4,
              Callable[:(--> int)] $x5 --> int)
{
    my Callable[:(--> int)] $B;
    $B = sub (--> int)
    {
        $k -= 1;
        return A($k, $B, $x1, $x2, $x3, $x4)
    };
    if $k <= 0
    {
        return ($x4() + $x5())
    }
    return $B()
}

sub K(int $n --> Callable[:(--> int)])
{
    return sub (--> int)
    {
        return $n
    }
}

say( A(10, K(1), K(-1), K(-1), K(1), K(0) ) )


The ugly "Callable" type annotation happens to be the currently-spec'd syntax for strongly-typed closures in Perl 6.  TimToady (Larry Wall) says it might improve.

This program outputs the correct result, which for the starting value of 10 (as shown here on the last line), outputs -67.

Part of CSMeta is Sprixel, which will be my Perl 6 implementation on the CLR (both Microsoft's .NET and Novell's Mono).  The front end will be a version of TimToady's STanDard Perl 6 grammar/parser (which happens to be written in Perl 6) translated (by STD.pm6 and its sister "viv") to Perlesque code, so that the parser will run in the CLR.  It remains to be seen how the middle end will look.  The back end is, of course, Perlesque/RunSharp/CLR.

Each of the components of the project's source code is redistributable under one of several very open-source licenses, namely an MIT/X11-style license for my fork of RunSharp, an Apache-style license (the Microsoft Public License, MS-PL) for some bits borrowed from the DLR (Microsoft's Dynamic Language Runtime), and the AL2 (Artistic License 2) for all of the code I wrote.  All of these licenses are certified by the OSI (Open Source Initiative - www.opensource.org), and of course the AL2 is the most liberal of them all, since it's a meta-license (it permits redistribution under any of the other licenses approved by the OSI).

All of the CLR's standard library (and any other .dll/.exe you might write, including any that provide integration with native libraries, whether on Linux, Windows, or the Mac, are available from your source code.

To build the project for mono (you'll need at least version 2.6.3; earlier versions crash on Sprixel), svn checkout the source, and  `xbuild Sprixel.sln`.   In 7-15 seconds, you'll get a Sprixel.exe, which is the first stage of the compiler chain.  Upon every invocation, Sprixel.exe builds and emits perlesque.exe, the grammar/compiler of which is specified in Sprixel's source code, in declarative/fluent-style C#.  perlesque.exe is then given the input file or string provided to Sprixel.exe as its input, and it is compiled to asmbly_1.exe, which is the compiled representation of your program.

You can then run  the man_or_boy test  by typing    mono bin/Release/Sprixel.exe t/man_or_boy.t  .  It emits TAP-style output. 

The code that the compiler-compiler emits uses a "stackless" engine, which just means the C runtime stack is not used to represent the callframes/stack of the language.  In Sprixel's case, it uses a continuation-returning-style (to a trampoline).  If you do run the above man_or_boy test, take a look at the generated asmbly_1.exe in Reflector to see what I mean.

You might ask: "What about the gigantic performance hit due to reifying the language's stackframes?"  I would answer: that's the cost of the amazing benefit of "emulating" full-blown continuations on the CLR.  The "callframe" keyword in the Perlesque language is an expression that returns the current frame.  It would enable someone to write the equivalent of call/cc or let/cc in Perlesque, from which coroutines and iterators and other state machines can be constructed.

I recommend viewing Perlesque as a human-readable assembly language (since it's fully strongly-typed using the CTS (the CLR's Common Type System)), excellent for implementing programming languages when combined with a compiler-compiler framework such as CSMeta.

Visit csmeta.org to view the code, which is safe to read, even for employees of large software companies who fear unintentionally copying source code.  Contact me if you would like to contribute, or chat with us on IRC in #perl6 on the freenode network.

There are several major things remaining in the language before I move on to porting over the necessary bits of TimToady's parser from Perl 5 to Perlesque (or C#), on which I will implement Perl 6:
  1. An actual expression-parser (using the "precedence-climbing" algorithm - Google it) - currently you specify precedence with parentheses.  This is somewhat low on the priority list, since I am viewing Perlesque as an assembly language of sorts.
  2. Declarative classes/fields/methods in Perlesque (Perl 6 calls fields "attributes", not the same as C# attributes).  This should be finished this week.
  3. External module/DLL loading.  This can happen at parse-time, due to the way RunSharp works (thanks Stefan!).
  4. Array declaration/creation.  I haven't yet decided how to co-opt the Perl 6 syntax for this.  I probably should have mentioned above that CLR generic types are directly represented in Perlesque identically to how they're represented in C#, except with [ and ] instead of < and >.  And yes, that's implemented.

No comments:

Post a Comment