Let's take a look at normal work flow from the perspective of the VM. The programmer writes a program. This could be directly in PIR, or in a high-level language which is compiled down to PIR. The exact path doesn't matter as far as Parrot is concerned, in either case it takes a PIR input file to execute.
IMCC, Parrot's PIR compiler front-end compiles the PIR code into PBC byte code. It's this PBC that's actually executed by Parrot. Parrot's runcore (I use the singular here, but Parrot actually has several distinct cores, a fact that is not important here) reads the bytecode and separates out the various elements. The opcodes, which have a one-to-one relationship with the PIR statements, are integer numbers in PBC. These opcode numbers are typically passed into a lookup table and a corresponding handler function is called to perform the actions of that op. In C, this operation looks similar to this:
This, in essence, is the "core" of Parrot. Each opcode is implemented by a separate C function which is called from a dispatch table by numerical index. Those opcode functions in turn call various API functions to do their calculations. Quick Recap: Opcode is a numerical index into a lookup table. It calls a C function, which in turn calls API functions to perform the required operation. Execution state is saved in the interpreter object (or in a large variety of structures which are attached to it, including the register arrays).
So what does L1 do to change this picture? Actually, not a whole hell of a lot, despite all the wonderful things I've tried to claim that L1 will do. The runcore? doesn't really change. The internal APIs? no change at all really (well, maybe a small change to the interface, but nothing huge). L1 ops are written in C, taking the place that PASM ops occupy now, and PASM ops instead are written in terms of L1 ops. Also, we're going to rewrite PMC VTABLEs in L1, but that's not exactly the core. The runcore now executes L1 bytecode instead of PASM bytecode, but almost everything else is the same.
So what changes? Why add that new layer? I'll explain a few reasons below.
L1 and JIT
Let's look at the JIT system again, god knows I love talking about it. What JIT does is convert an incoming bytecode stream into a machine code stream and executes that directly. So instead of the runcore loop example I showed above, we unroll the loop entirely and call all the functions in a long precompiled sequence. Or, we can get more fancy and eliminate the function calls entirely and simply execute the function guts in sequence. In this case, JIT is the runcore, not just a component of it. JIT takes over the execution of the opcodes by assembling them into machine code and executing the whole block of code directly. Basically, JIT executes the same opcode function logic, but gives us the ability to eliminate the dispatch logic that my previous runcore example used.
Also, a good JIT engine will perform machine-level optimizations, which gives us an added boost.
We want JIT because it lets us squeeze performance out of every loop iteration, and a program with one million opcodes will iterate over that runcore loop one million times. Plus, JIT gets us another benefit, which comes more or less for free: the ability to compile PIR programs into native machine code executables, like you can do already with C. You already have the machine code in memory, all we need to do is output the right file structure and then dump that machine code into a file. Bada-bing, executables.
So we want JIT. No question about that.
The big problem is that the opcode definition itself is very different from the JIT version of that opcode. The opcode logic is written in C and executed directly. The JIT version is a C function that writes the opcode logic at runtime and then executes it. Currently in Parrot, we have to write these things differently. I wrote a post recently that shows examples of each. See the files in /src/ops/* for the opcode definitions, and src/jit/* for the JIT versions of them (which are messy and ugly). When we change one we have to manually change the other for all platforms, which can be difficult if the person making the changes aren't familiar with all our platforms.
What we need is a way to automatically translate PASM opcodes into their JIT definitions AND their direct-executable function definitions. We do this by taking arbitrarily complex ops and decomposing them into atomic parts.
L1 and Control Flow
Let's now look at control flow. We have a PIR program that defines a class Foo and instantiates an object of that type. Here's a snippet:
$P0 = new 'Foo'
$P0 = 1
$P0 += 1
.sub add_i :vtable
The third line in this snippet calls the add_i VTABLE, which calls the function src/pmc/object.pmc:add_i(). This in turn checks to see if we have a PIR-based override. We do, so we call back into PIR to execute that override. This creates a new runloop further down on the C stack to execute the PBC of the override. When that returns, the sub-runloop exits, returns to the C code in object.pmc, returns to the op definition in the top runloop, and then returns to the top runloop to execute the next op in our hypothetical little program.
This creates all sorts of problems here, and I'll give an example of one. Let's say an unhanded exception is thrown in the vtable override. The interpreter can't find a handler, so the runloop terminates and exits. This returns back to the code in object.pmc, which returns back to the opcode definition, which returns back to the top runloop, which has absolutely no knowledge of the exception. An unhandled exception should cause the VM to exit, but it gets lost in the call stack and does not have this behavior. The problem is that we have recursive calls into separate runloops separated by an arbitrary depth of C functions in between.
The solution to this problem is to unify to have only a single runloop active at any time in a given execution context, and to not allow recursive calls into other runloops. We do this, in turn, by having a single execution environment that is consistent on itself. This is L1.
L1 and Optimization
Besides the benefits in simplifying JIT, I haven't really discussed how L1 will help speed up Parrot. It's not what L1 itself will do that provides optimizations, it's what L1 enables that will do it. L1 provides a consistent input language where flow control and variable usage can be analyzed. We don't have that now because some of our flow control happens in C and some in PIR, and there's just no way to consistently analyze that. We can do flow analysis to optimize algorithms at the high level before we pass it to the JIT engine to optimize at the low level. We also gain the ability to do escape analysis and lifetime analysis on our PMCs for a variety of optimizations, especially in the GC, the PMC allocator, and maybe a few other places. We can also simplify (maybe eliminate entirely!) the stackwalking code in the GC. We can do all these things only after we have L1.
So I've tried to give a wholistic high-level overview of the current environment and why I think we need a new low-level code form. In terms of architecture, not a lot has to change to enable L1. However, the benefits that we can get from having it are huge in the long run.