2010-05-28

Perlesque Obtains Reified Continuations

Yesterday Perlesque obtained reified continuations with semantics sortof a hybrid between setjmp/longjmp and let/cc/call/cc.  The _cc keyword is an expression that returns a reference to the currently-executing stack frame (activation record), which itself holds an instruction pointer as to its progress through the routine's instructions.

The resulting Frame, whose type derives FrameBase, can be "backtracked to" by using the syntax   goto $frame   as in the below:

my $e = sub (FrameBase $f-->int) {
  say(1);
  goto $f;
  say(2);
  return 1
};
sub callcc(Callable[:(FrameBase-->int)] $func, FrameBase $frame --> int) {
  $func($frame);
  say(3);
  return 1
};
callcc($e, _cc);
say(4);

Here, the subroutine named "callcc" takes two parameters: the first is a closure that ostensibly takes a stack frame and returns an int; the second is a stack frame.  "callcc" applies its first argument to its second argument, and then prints "3".  The closure that is stored in $e prints "1", does a "longjmp" (non-local goto) to the continuation stack-frame passed in as its argument, and then prints "2" (except say("2") is unreachable, since it immediately follows a goto statement).

The output of the above is:    1  4   since the "goto $f" statement in closure $e immediately returns control to the outermost scope, at the last point it called out from it.

Note: The semantics of these continuations are slightly different from other languages' reified continuations, in that the instruction pointer is tied to the stack frame itself, and not to the continuation of the stack frame only (since in this scheme, they're one-in-the-same).

This means that if a continuation reference is "goto'd" and then "returns" and that same continuation reference is then "goto'd" again from elsewhere (if, let's say, the continuation reference was stashed in a global variable somewhere), the routine will immediately return, and/or the behavior will be undefined, since other frames could exist on the "cactus stack" that have already returned.

Edit:  awwaiid pointed out that these are coroutines, not full-blown continuations, since they aren't cloned.  So I'm adding to FrameBase an abstract .Clone() method that will be generated by the compiler, so that a cloned continuation can be restarted from where it was originally captured (or cloned again). :)

Edit:  I finished adding .Clone() method generators to the compiler, so perlesque now truly has reified *full* continuations.

2010-05-22

Perlesque: Class Predeclarations Enable Cylical Dependencies

It took a bit of mental effort/stamina to fix the bug(s) in Perlesque's class declarations pointed out by ++pmurias, but now that I did, I can see the next step for Perlesque is to enable class predeclarations (a la the ones in Perl 6), which was another lack-of-a-feature that pmurias was [implicitly ;)] grousing about.

This is an issue because Perlesque is fully strongly/statically typed (though it has upward type-inference on untyped declarations with initializers) and because its parser does the type-checking as it parses (in the first/same pass).  A naïve implementation, yes, but accordingly one that was quick-to-implement.  It also helps enforce the try-to-prevent-unnecessary-backtracking or write-the-grammar-in-such-a-way-that-backtracking-rarely-occurs strategies that ++TimToady follows with the STanDard grammar.

Perl 6 class predeclarations are provided for a similar reason, so that a typename can be referenced before its declaration body is executed (appears in the input files(s)) and its members/properties/fields/slots/state/methods are declared.  Perlesque has a stronger requirement - a class' attributes and methods must also be predeclared, but interestingly, they can be declared gradually, so a dependency graph of two classes can be built up with multiple predeclarations of each interdependent class.  This is possible because, at parse-time, the CLR TypeBuilder objects hidden behind RunSharp's TypeGen objects support gradual typing.

The syntax for method predeclaration is similar to the Perl 6 ones - a fully type-annotated method signature and an empty routine body, perhaps with a ... ellipsis to denote "stub".  

On another note, I'm still trying to think about how best to help the bootstrapping effort.  ++sorear is feverishly working on fleshing out viv's emit_p5 (the Perl 5 emission target) so that gimme5 will no longer be needed to translate STanDard to Perl 5 (an edition of STanDard that has already been translated to Perl 5 by gimme5 will initially use viv's emit_p5 to translate its own Perl 6 original to Perl 5, and from then on, the translated-by-viv edition can be used).

After trying to understand sorear's code in viv (and TimToady's, for that matter), I'm starting to suspect I should stick to the "lower" level tasks (at the VM layer).  That is, the things that Parrot provides for Rakudo.  Following that to its logical conclusion, perhaps I should continue to flesh out Perlesque so that it provides all the same sorts of features that Parrot and NQP-RX provide.  Thoughts?

2010-05-20

Perlesque's Class Declarations Obtained Custom Constructors and Single Inheritance

Tonight I completed the "next tasks" from this mornng's post: constructors (including ones with parameters, of course) for CLR classes defined in Perlesque, and single inheritance of classes.  So the following Perlesque code is valid:

class Foo {
  has int $a;
  method new (--> Foo) {
    say("made a Foo");
  }
}
class Bar is Foo {
  has int $b;
  method Baz (--> int) {
    say("a is " ~ (self.a ~ (" and b is " ~ self.b)));
    return 1;
  }
  method new (int $a, int $b --> Bar) {
    say("made a Bar with a == " ~ ($a ~ (" and b == " ~ $b)));
    self.a = $a;
    self.b = $b; 
  }
}
my $bar = Bar.new(4, 6);
$bar.a += 7;
$bar.b *= 3;
$bar.Baz();


and the output is:
made a Foo
made a Bar with a == 4 and b == 6
a is 11 and b is 18

Next, I'm going to coordinate with sorear & pmurias on #perl6 to get a handle on Cursor/LazyMap/STD/viv, with the eventual (but hopefully soon!) goal of either extending Sprixel's grammar engine to cover all of Cursor's capabilities or manually porting Cursor to perlesque (which I guess is tractable), so that viv can emit STD's parse tree of itself to perlesque, using emit_perlesque.

Then STD is running on the CLR, and it's all downhill from there.... where by "downhill" I mean the gravity is strong enough to pull us all the way down to the bottom of the hill, but there are a few PetaNewtons of friction/obstacles the gravity will have to overcome between here and there.  But at least we'll be able to see the bottom.

Perlesque Obtained Class Declarations

Yesterday I finished a feature for which Paweł Murias (pmurias) has been clamoring: class declarations in Perlesque.  Class declarations act similarly to class declarations in Perl 6:

class Individual {
  has int $x_chrom; # attributes can't have initializers yet, but it's a \smop\
  has int $y_chrom;
  has Individual $xx_parent;
  has Individual $xy_parent;
  has string $dna;
  has int $id;

  my int $indiv_id; # just a unique tag


  # this tier of the block acts as a "static initializer"
  #  for the class, and runs at "declaration-time" (when
  #  the class block is encountered as the "interpreter"
  #  evaluates the program (from the programmer's perspective),
  #  so any lexicals declared here act as "private static fields"
  #  or "class fields" would for a Java/C# class, since they
  #  are accessible from the method blocks like a normal
  #  closure.  Named subs can also be declared here.


  method conceive(Individual $mate --> Individual) {
    my $offspring = Individual.new();
    # do a bunch of somewhat randomized stuff with $mate.dna and self.dna
    $offspring.dna = "CGGCTGTAGCTGTGTCGTGATGCACGTACGTG";
    $offspring.xx_parent = self; # The self keyword is like the "this" keyword in C#
    $offspring.xy_parent = $mate;
    $offspring.id = ($indiv_id += 1); # closure access to a "private static field"
    return $offspring;
  }
}
my $indiv = Individual.new();


Basically this enables Java/C#-style programming in Perlesque, in addition to the (functional-style subsets of) JavaScript/Perl5 enabled last week.

What's next:
  • Class derivation (a small matter of programming, actually - the declarative functionality is already available in RunSharp - it merely needs exposed in Perlesque syntax).  This would enable Perlesque programs to use things like the PrototypeChain<TKey, TValue> class I wrote in C# as a base class for a Perlesque class.  PrototypeChain is a generalized edition of the hashes-with-inheritance Prototype PMC (which I thought existed in Parrot, but which I was apparently imagining, according to #perl6 on freenode).  Such a thing would be quite useful for implementing lexical "pads" at runtime in a non-optimized Perl 6 implementation (and in fact the Perlesque compiler does use PrototypeChain to keep track of the lexical scopes).
  • Custom constructors, which should behave similarly to Perl 6 custom constructors, except these will just be declared using method new (optional paramlist) { } instead of the meta-object-y-ultra-flexible way Perl 6 uses

2010-05-11

What's Implemented in Perlesque, What's Not, and The Plan

This post lists the syntax and semantics already implemented in Perlesque, my aforementioned strongly-typed subset of Perl 6 on the CLR.
  • This bears repeating: keep in mind this is a strongly-typed subset of Perl 6; variables may not change type (note: the contents/referents of variables may have a type that is derived from the variable's type).
  • Named labels and the "goto LABEL" statement, and a label can appear *after* the goto, as well.  Control may not jump "down" into another block, though, of course.  I suspect, but I'm not certain, that Perlesque is the first implementation (of a subset of Perl 6) to support this feature, but please correct me if I'm wrong.
  • Control flow blocks: if/elsif/else, while, until, repeat/while, repeat/until, loop (C-style for loop), and the "next" ("continue") and "last" ("break") statements in all of those loops.  The semantics of next and last match neither the Perl 6 spec or the Perl 5 behavior, but they're quite close to the expected behavior.
  • Declaration of lexical variables:  my $scalar_root = initializer_expression();   The type of $scalar_root is inferred from the type of the initializer expression.  One can also declare the type explicitly:  my Tree::Walker $traverser;
  • Declaration of closures as lexical variables: same as the above, but the initializer expression is also a subroutine declaration, either named or anonymous:   my $routine = sub foo(List[int] $integers, str $name --> int) { say($integers.Count ~ (" items in " ~ $name)); return 1 };   This declares a routine that takes two parameters (a List of integers and a string) and returns an integer.  foo(List[int].new(), "hihi") may also be invoked after this statement, or in any lexical scopes enclosed by the lexical scope in which foo was declared.
  • Declaration of named subroutines: same as the above, but without assigning it to a variable.
  • Type literals.  You can use the name of a type as an expression, since on the CLR's CTS (Common Type System), there is a Type type.  This (sort of) maps from the Perl 6 notion of the name of a type representing the type object.
  • Decimal (signed) integer literals, string literals "double quote: \" <-- double quote"  'single quote: \' <-- single quote', and the common numeric operation operators and comparison operators, as well as the bitwise and logical operators (including negation).  The assignment (mutate in-place) editions of the relevant operators are also implemented.
These features enable Perlesque to be a functional programming language and an imperative programming language, similar to Perl 5 and JavaScript (and Perl 6), but strongly typed.

What's missing from Perlesque (that would make it a useful "human-writable" language)?
  • good (well, any appropriate) error messages on parse failures
  • the aforementioned expression parser, so that precedence can be inferred from an operator precedence table, so so many parentheses aren't necessary when typing the code
The Plan for Perlesque:

Recall my plan for Perlesque: an assembly language to which STD.pm6 will compile itself.  TimToady's "viv" program (written in Perl 5, originally stood for VI->V, 6 to 5, though cutely it can also be read as 5 to 4) will take the output of the STD.pm6 parser (TimToady's definitional grammar written in Perl 6), and will emit Perlesque code that is a direct translation of the Perl 6 code in STD.pm6 itself.

It will translate all expressions to trees of calls into the Cursor/Perl6 runtime (written in some combination of Perlesque and C#), and all control statements to their analogous control structures in Perlesque.  Allow me to gloss over the mountain of work it will take to enable this translation and the supporting runtime libraries, but keep in mind that nearly all of that work is also "implementing Perl 6", so it's work that must be done at some point anyway.

The Perlesque translation of STD.pm6 (let's call it STD.psq) will then be compiled by perlesque.exe to CIL (let's call it STD.psq.exe), and executed on a CLR (Mono on Linux, Mac, or Windows, or .NET on Windows, or Silverlight on many platforms in many browsers).  Tada.  Perl 6.  Larry Wall's definitional parser parsing your Perl 6 and executing it directly (after compiling each compilation unit to CIL), including all the fancy multi-stage compilation and phasers (ask on #perl6 if you don't know what these are) and the other parser-entwined features of Perl 6.

2010-05-10

JSMeta became ... CSMeta

And then some.

I took the fantastic RunSharp library (see http://runsharp.googlecode.com/), which is a runtime-code-generator compiler for the CLR (works perfectly on Mono (>= 2.6) too!), threw in a dynamic parser-generator (a generator of dynamic parsers, somewhat evolved from JSMeta, though entirely rewritten (again, for the 4th time or so)), and created a compiler-compiler framework for the CLR.

The first programming language created in the environment is ... Perlesque.  It is a simple language.  It tries to stick to the syntax of the strongly-typed subset of the Perl 6 specification.  Every variable is a scalar (dollar "sigil" in front of the name), and variables have exactly the same semantics as their C# counterparts (value vs reference, except that Perlesque implements full-blown closures for value types, unlike C#).  If you are familiar with any of the multitudinous CLR languages (especially C#), you will feel right at home in Perlesque.

Here is an example of Perlesque code: Knuth's Man or Boy litmus-test for a heap-frame runtime/language:


sub A(int $k, Callable[:(--> int)] $x1,
              Callable[:(--> int)] $x2,
              Callable[:(--> int)] $x3,
              Callable[:(--> int)] $x4,
              Callable[:(--> int)] $x5 --> int)
{
    my Callable[:(--> int)] $B;
    $B = sub (--> int)
    {
        $k -= 1;
        return A($k, $B, $x1, $x2, $x3, $x4)
    };
    if $k <= 0
    {
        return ($x4() + $x5())
    }
    return $B()
}

sub K(int $n --> Callable[:(--> int)])
{
    return sub (--> int)
    {
        return $n
    }
}

say( A(10, K(1), K(-1), K(-1), K(1), K(0) ) )


The ugly "Callable" type annotation happens to be the currently-spec'd syntax for strongly-typed closures in Perl 6.  TimToady (Larry Wall) says it might improve.

This program outputs the correct result, which for the starting value of 10 (as shown here on the last line), outputs -67.

Part of CSMeta is Sprixel, which will be my Perl 6 implementation on the CLR (both Microsoft's .NET and Novell's Mono).  The front end will be a version of TimToady's STanDard Perl 6 grammar/parser (which happens to be written in Perl 6) translated (by STD.pm6 and its sister "viv") to Perlesque code, so that the parser will run in the CLR.  It remains to be seen how the middle end will look.  The back end is, of course, Perlesque/RunSharp/CLR.

Each of the components of the project's source code is redistributable under one of several very open-source licenses, namely an MIT/X11-style license for my fork of RunSharp, an Apache-style license (the Microsoft Public License, MS-PL) for some bits borrowed from the DLR (Microsoft's Dynamic Language Runtime), and the AL2 (Artistic License 2) for all of the code I wrote.  All of these licenses are certified by the OSI (Open Source Initiative - www.opensource.org), and of course the AL2 is the most liberal of them all, since it's a meta-license (it permits redistribution under any of the other licenses approved by the OSI).

All of the CLR's standard library (and any other .dll/.exe you might write, including any that provide integration with native libraries, whether on Linux, Windows, or the Mac, are available from your source code.

To build the project for mono (you'll need at least version 2.6.3; earlier versions crash on Sprixel), svn checkout the source, and  `xbuild Sprixel.sln`.   In 7-15 seconds, you'll get a Sprixel.exe, which is the first stage of the compiler chain.  Upon every invocation, Sprixel.exe builds and emits perlesque.exe, the grammar/compiler of which is specified in Sprixel's source code, in declarative/fluent-style C#.  perlesque.exe is then given the input file or string provided to Sprixel.exe as its input, and it is compiled to asmbly_1.exe, which is the compiled representation of your program.

You can then run  the man_or_boy test  by typing    mono bin/Release/Sprixel.exe t/man_or_boy.t  .  It emits TAP-style output. 

The code that the compiler-compiler emits uses a "stackless" engine, which just means the C runtime stack is not used to represent the callframes/stack of the language.  In Sprixel's case, it uses a continuation-returning-style (to a trampoline).  If you do run the above man_or_boy test, take a look at the generated asmbly_1.exe in Reflector to see what I mean.

You might ask: "What about the gigantic performance hit due to reifying the language's stackframes?"  I would answer: that's the cost of the amazing benefit of "emulating" full-blown continuations on the CLR.  The "callframe" keyword in the Perlesque language is an expression that returns the current frame.  It would enable someone to write the equivalent of call/cc or let/cc in Perlesque, from which coroutines and iterators and other state machines can be constructed.

I recommend viewing Perlesque as a human-readable assembly language (since it's fully strongly-typed using the CTS (the CLR's Common Type System)), excellent for implementing programming languages when combined with a compiler-compiler framework such as CSMeta.

Visit csmeta.org to view the code, which is safe to read, even for employees of large software companies who fear unintentionally copying source code.  Contact me if you would like to contribute, or chat with us on IRC in #perl6 on the freenode network.

There are several major things remaining in the language before I move on to porting over the necessary bits of TimToady's parser from Perl 5 to Perlesque (or C#), on which I will implement Perl 6:
  1. An actual expression-parser (using the "precedence-climbing" algorithm - Google it) - currently you specify precedence with parentheses.  This is somewhat low on the priority list, since I am viewing Perlesque as an assembly language of sorts.
  2. Declarative classes/fields/methods in Perlesque (Perl 6 calls fields "attributes", not the same as C# attributes).  This should be finished this week.
  3. External module/DLL loading.  This can happen at parse-time, due to the way RunSharp works (thanks Stefan!).
  4. Array declaration/creation.  I haven't yet decided how to co-opt the Perl 6 syntax for this.  I probably should have mentioned above that CLR generic types are directly represented in Perlesque identically to how they're represented in C#, except with [ and ] instead of < and >.  And yes, that's implemented.