Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Thursday, June 11, 2009

L1: The Language of Parrot Internals

A while back I mentioned using a special-purpose small language to be used for implementing some of the Parrot internals, such as PMC VTABLEs, and OPS. Well, the idea is slowly taking shape and it's going by the name L1. I don't know the exact genesis of the name, so I can't point you to a link containing a "let's call this thing L1!" quote, but that's the name of it, and here's some discussion. By way of disclaimer, this all isn't anywhere on my personal pre-2.0 roadmap, but I do intend to offer my moral support where possible, and look very closely at this issue if it hasn't been resolved before 2.0.

cotto and bacek have been doing a lot of great work on a PCT-based PMC parser, and ultimately we hope that such a bootstrapping tool would help enable ubiquitous use of L1. I'll talk more about that later.

To understand why this is needed, and to put into context some of the things I am going to talk about in this post, let's look at a small example that comes directly from the libJIT documentation. Let's take a small function to multiply two numbers together and return a result:
int mul_add(int x, int y, int z)
{
return x * y + z;
}

If we want to compile this operation at runtime with libJIT, we would have to do this:

jit_context_t context;
context = jit_context_create();
jit_context_build_start(context);
jit_function_t function;
function = jit_function_create(context, signature);
jit_type_t params[3];
jit_type_t signature;
params[0] = jit_type_int;
params[1] = jit_type_int;
params[2] = jit_type_int;
signature = jit_type_create_signature
(jit_abi_cdecl, jit_type_int, params, 3, 1);
jit_value_t x, y, z;
x = jit_value_get_param(function, 0);
y = jit_value_get_param(function, 1);
z = jit_value_get_param(function, 2);
jit_value_t temp1, temp2;
temp1 = jit_insn_mul(function, x, y);
temp2 = jit_insn_add(function, temp1, z);
jit_insn_return(function, temp2);
jit_function_compile(function);
jit_context_build_end(context);

What this listing shows is not the function itself, but how to tell the computer to write the function itself at runtime. The computer will build and compile the function into executable code at runtime and then will be able to execute it. The benefit to JIT is that the compiled code pieces are reusable, so all the overhead of describing the function only needs to occur once and the resulting machine code can be executed over and over again. The mechanics of JIT aren't really important for this discussion, but what is important is to see that the two versions of the code are very different from each other, and if we're generating C code at runtime we potentially need to generate several different versions of each code snippet.

Currently, we do JIT by writing the opcode definitions in a C-like script in one place, and writing JIT versions of them in another place. And when one doesnt match the other, there's a problem. One thing that we absolutely need in Parrot, for our own sanity if nothing else, is the ability to specify operations using a common, simplified, non-C small language that can be converted into multiple forms and executed in multiple ways, as necessary. This, I think, is a good use for L1: An abstraction layer that enables us to write an almost behavioral description of a piece of code and have that used at build time to produce all the various pieces of C code and other code that we need.

chromatic has a slightly different conception of what L1 could be, although I don't think it's entirely incompatible from what I was talking about above. Instead of simply being a common front-end language that gets converted into other stuff for compilation and execution, chromatic suggests that it could be used in the virtual machine in the same way that microcodes are used in a hardware processor. A small, fast Parrot core (called "nanoparrot") would execute the L1 microcodes directly. There are some different ideas here about what the relationship will be between PIR/PASM and L1, but I will talk about those differences later. chromatic also hopes that a pure L1 execution environment would be self-contained and save us from the frequent switching between C and PIR calling conventions. Let's explore that idea a little bit more.

PIR code is executing and reaches a particular operation which internally calls PCCINVOKE on a PIR subroutine. This creates a new runloop to execute the new PIR function, which returns a value to PCCINVOKE, which in turn returns results back to PIR. All the while, marshaling data back and forth between two very different environments (C and PIR). Likewise, consider the case of PIR code executing and throwing an exception. Parrot searches for a handler (which itself may be PIR or C, which in turn calls a function in PIR or C again, ad infinitem), executes it, and possibly returns execution to where it was. We're jumping back and forth between C and PIR for control flow, creating runloops and shuffling data all too frequently. The situation, in short, is complex and unsustainable in the long run. Plus, there are serious performance problems associated with all this jumping between PIR and C.

Now consider the alternative of a pure L1 execution environment, where PASM opcodes are little more than named sequences of L1 opcodes. L1 code is executing, and throws an exception. A return continuation is created and passed to an L1-based hander, which returns control through the return continuation. No C code involved. Almost too good to be true. Almost. The case of calling into L1 code from C is a little bit more tricky but not impossible. Instead of passing a return continuation we pass a C-based return continuation which will probably consist of a cached image of the interpreter structure and a jump point or something. In any case we still save on performance because the arguments get passed in PIR registers and that's where they stay.

What I personally would like from L1 is a unified solution, something that meets these requirements:
  1. Has to be easy, trivially easy, to JIT.
  2. Has to be a small set of operations capable of implementing all other ops
  3. Should be suitable, at least in the long term, for implementing VTABLEs and METHODs in PMCs.
chromatic suggests that L1 should be a subset of PIR, but I'm not sure I agree with that. I am thinking right now (and I may change my mind as I think about this more) that it should be a separate assembly-level language entirely. There are a number of reasons for that, and I will discuss them later. I have plenty more to say about L1, and I'll do that in later posts.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.