What Changes?
Let's take a look at normal work flow from the perspective of the VM. The programmer writes a program. This could be directly in PIR, or in a high-level language which is compiled down to PIR. The exact path doesn't matter as far as Parrot is concerned, in either case it takes a PIR input file to execute.
IMCC, Parrot's PIR compiler front-end compiles the PIR code into PBC byte code. It's this PBC that's actually executed by Parrot. Parrot's runcore (I use the singular here, but Parrot actually has several distinct cores, a fact that is not important here) reads the bytecode and separates out the various elements. The opcodes, which have a one-to-one relationship with the PIR statements, are integer numbers in PBC. These opcode numbers are typically passed into a lookup table and a corresponding handler function is called to perform the actions of that op. In C, this operation looks similar to this:
while(!halt) {
(interp->opcode_table[opcode])(interp, args)
}
This, in essence, is the "core" of Parrot. Each opcode is implemented by a separate C function which is called from a dispatch table by numerical index. Those opcode functions in turn call various API functions to do their calculations. Quick Recap: Opcode is a numerical index into a lookup table. It calls a C function, which in turn calls API functions to perform the required operation. Execution state is saved in the interpreter object (or in a large variety of structures which are attached to it, including the register arrays).
So what does L1 do to change this picture? Actually, not a whole hell of a lot, despite all the wonderful things I've tried to claim that L1 will do. The runcore? doesn't really change. The internal APIs? no change at all really (well, maybe a small change to the interface, but nothing huge). L1 ops are written in C, taking the place that PASM ops occupy now, and PASM ops instead are written in terms of L1 ops. Also, we're going to rewrite PMC VTABLEs in L1, but that's not exactly the core. The runcore now executes L1 bytecode instead of PASM bytecode, but almost everything else is the same.
So what changes? Why add that new layer? I'll explain a few reasons below.
L1 and JIT
Let's look at the JIT system again, god knows I love talking about it. What JIT does is convert an incoming bytecode stream into a machine code stream and executes that directly. So instead of the runcore loop example I showed above, we unroll the loop entirely and call all the functions in a long precompiled sequence. Or, we can get more fancy and eliminate the function calls entirely and simply execute the function guts in sequence. In this case, JIT is the runcore, not just a component of it. JIT takes over the execution of the opcodes by assembling them into machine code and executing the whole block of code directly. Basically, JIT executes the same opcode function logic, but gives us the ability to eliminate the dispatch logic that my previous runcore example used.
Also, a good JIT engine will perform machine-level optimizations, which gives us an added boost.
We want JIT because it lets us squeeze performance out of every loop iteration, and a program with one million opcodes will iterate over that runcore loop one million times. Plus, JIT gets us another benefit, which comes more or less for free: the ability to compile PIR programs into native machine code executables, like you can do already with C. You already have the machine code in memory, all we need to do is output the right file structure and then dump that machine code into a file. Bada-bing, executables.
So we want JIT. No question about that.
The big problem is that the opcode definition itself is very different from the JIT version of that opcode. The opcode logic is written in C and executed directly. The JIT version is a C function that writes the opcode logic at runtime and then executes it. Currently in Parrot, we have to write these things differently. I wrote a post recently that shows examples of each. See the files in /src/ops/* for the opcode definitions, and src/jit/* for the JIT versions of them (which are messy and ugly). When we change one we have to manually change the other for all platforms, which can be difficult if the person making the changes aren't familiar with all our platforms.
What we need is a way to automatically translate PASM opcodes into their JIT definitions AND their direct-executable function definitions. We do this by taking arbitrarily complex ops and decomposing them into atomic parts.
L1 and Control Flow
Let's now look at control flow. We have a PIR program that defines a class Foo and instantiates an object of that type. Here's a snippet:
$P0 = new 'Foo'
$P0 = 1
$P0 += 1
.namespace ["Foo"]
.sub add_i :vtable
...
.end
The third line in this snippet calls the add_i VTABLE, which calls the function src/pmc/object.pmc:add_i(). This in turn checks to see if we have a PIR-based override. We do, so we call back into PIR to execute that override. This creates a new runloop further down on the C stack to execute the PBC of the override. When that returns, the sub-runloop exits, returns to the C code in object.pmc, returns to the op definition in the top runloop, and then returns to the top runloop to execute the next op in our hypothetical little program.
This creates all sorts of problems here, and I'll give an example of one. Let's say an unhanded exception is thrown in the vtable override. The interpreter can't find a handler, so the runloop terminates and exits. This returns back to the code in object.pmc, which returns back to the opcode definition, which returns back to the top runloop, which has absolutely no knowledge of the exception. An unhandled exception should cause the VM to exit, but it gets lost in the call stack and does not have this behavior. The problem is that we have recursive calls into separate runloops separated by an arbitrary depth of C functions in between.
The solution to this problem is to unify to have only a single runloop active at any time in a given execution context, and to not allow recursive calls into other runloops. We do this, in turn, by having a single execution environment that is consistent on itself. This is L1.
L1 and Optimization
Besides the benefits in simplifying JIT, I haven't really discussed how L1 will help speed up Parrot. It's not what L1 itself will do that provides optimizations, it's what L1 enables that will do it. L1 provides a consistent input language where flow control and variable usage can be analyzed. We don't have that now because some of our flow control happens in C and some in PIR, and there's just no way to consistently analyze that. We can do flow analysis to optimize algorithms at the high level before we pass it to the JIT engine to optimize at the low level. We also gain the ability to do escape analysis and lifetime analysis on our PMCs for a variety of optimizations, especially in the GC, the PMC allocator, and maybe a few other places. We can also simplify (maybe eliminate entirely!) the stackwalking code in the GC. We can do all these things only after we have L1.
Conclusion
So I've tried to give a wholistic high-level overview of the current environment and why I think we need a new low-level code form. In terms of architecture, not a lot has to change to enable L1. However, the benefits that we can get from having it are huge in the long run.
A couple questions on the example. First, why is the add_i vtable in C? I suspect the answer is "because PIR cannot manipulate the structure of the VTABLE directly"? The followup of course, is, well, isn't that a problem with PIR, that it cannot manipulate data directly? Doesn't that kind of cripple PIR? I realize PBC has to be architecture independent and run on any machine regardless of where it was compiled. That doesn't mean there aren't ways to enable it to touch data directly, though.
ReplyDeleteThe second question is, why does the C call back into PIR? Supposing the vtable lookup absolutely must be in C, why not return back the object to call, and call it from PIR, rather than allow the call to descend?
Good answers to those questions might help justify L1 more in my eyes.
A followup from previous threads -- while it's been helpful to see that at least some string operations will be considered low level enough to qualify as L1 ops, I still have yet to find an answer as to whether L1 op implementations are free to use vector register sets. I've read up a small amount on various JIT technologies -- a few like to use the vector arrays, especially for parameter type conversions. There's active speculation on how various JIT engines could better use them.
(For example, using them to pass parameters. Altivec has some pretty powerful value shuffling commands that could be of use rearranging registers quickly)
There will need to be some sort of contract between the JIT engine and the C operations it invokes over the use of these resources. At the very least, if an op implementation wants to use the registers, there needs to be a way to tell JIT to avoid them around that operation. (And for that matter this applies to normalregisters as well.)
I suspect we couldbesmarter than that, and allow registers not used by JIT to be carried from one op to the next, which might be useful if we had ops that can be used in sequentially in various combinations. That's probably further along than anyone has thought L1 I realize.
builtin PMC types are currently written in C. There are a number of reasons for this, not the least of which is that core PMC types need access to C data types and various C functions. We don't want to give all these abilities to PIR because we end up tying PIR too closely to C semantics (which we want to avoid in general). Keep in mind that Parrot is designed for use with dynamic languages and optimization strategy for dynamic languages is to make them more dynamic. Giving PIR more C semantics would be exactly the opposite of that.
ReplyDeleteAs a general rule, we want to insulate the PIR programmer from arbitrary binary data and C pointers. Less access to these things means fewer segfaults and related problems.
To answer your second question, those are just the realities of C control flow semantics. As I showed in the post, opcodes are really just C functions. So anything that happens inside the opcode happens inside C. If we call something from there that follows the normal call semantics: push the arguments and the return address onto the system stack, jump to the new location, pull the arguments off the stack, wash rinse repeat.
The opcode add_i calls the function VTABLE_add_i(interp, pmc), which in turn calls Parrot_PCCINVOKE(interp, ...), which creates a new runloop and starts executing PBC at a given pointer offset. What you're really asking here is why can't C call back into an existing function frame somehow? Well, the short answer to that is that C can indeed be made to do that through longjmp and other trickery. The problems arise when we start talking about unifying that control flow with CPS, unifying it with signals and exceptions (including those that come from external libraries, extensions, or container functions where Parrot is embedded), unifying it with threads and callbacks, and then consolidating it with the stackwalking code in the GC (if we randomly jump back up the stack we could end up with unanchored PMCs which get prematurely collected), etc.
So instead of having a C version of a coroutine (calling back into an existing stack frame), we could have a PIR execution stack and then all sorts of messy stack management code to enable all those cool features with it.
An alternative idea is to completely forbid all boundary crossings like that. We don't allow C to call recursively into PIR, which means that PIR would need to be radically extended to handle all control flow, including very low-level accesses, which would be a problem because we don't want PIR coders to have to access all those funky low-level details.
A second alternative is to introduce a new layer in there, L1, that handles all control flow, never passes flow to C, and insulates the PIR programmer from the low-level details. Instead of recursive C calls on the system stack, and instead of explicitly maintaining our own PBC control stack, we just use L1 and use CPS to handle all our control flow needs without using any stacks. I personally think that's the much better solution.
Well, I do not find the arguments for keeping PIR from ever seeing pointer-like objects and structured binary data at all convincing, I have to say. Architecture dependencies really do not need to leak in there if it such abilities are added in a careful manner.
ReplyDeleteWhenever you utter a phrase like "prevent users from" you'd better have A) a very good reason why and B) a plan to allow everything else that does not conflict with that reason, rather than killing everything within the blast radius of the thing you are trying to avoid.
The arguments about calling I buy more.
It's my (somewhat limited) understanding that the C stack really isn't suitable for the jumping around modern languages do (especially with coroutines.) I have yet to see a clear presentation of exactly what the Parrot stack strategy is. There's docs for some lexpad API interfaces, but no really much to go on --
they are just stuff like "foo_create: creates a foo" which tells me not much.
Amonst all the "aren't continuations great" material I may have run across a moderately illuminating explanation of how it used to be a couple years ago but isn't now. The Coroutine source file is about as educational as being shooken upside down over a vat of boiling inhalants... perhaps it would be less so if I had any confidence that the comments in that area of the code were at all current, but I seem to remember when reading that (admittedly some time ago) it was one of those areas with a few XXX document this comments and others that just obviously were no longer true.
But the reasons concerning the arbitrary pointers and random memory lookups by address are the most important part! Those are the reasons why we use managed environments in the first place. You think anybody uses C# for the pure execution speed? No, they use it because the more you hide the nitty gritty details about the platform, the more you abstract, the safer and more productive your code can be. I can write functionality in C#, or in Java, or in Perl 5 in 10 minutes then I could write in C in a week precisely because I don't have to be dealing with all those low-level details.
ReplyDeletePeople use things like Parrot because they don't want to be dealing with C. If people wanted all the power of C, along with all it's semantics and restraints, they would use C. Parrot doesn't give it's PIR-level users access to write arbitrary data to arbitrary memory locations for a good reason: Because arbitrary pointers cause segfaults and all sorts of other problems. Parrot manages memory for you because memory is hard to manage and is very prone to all sorts of errors.
The Parrot stack strategy is: Don't use stacks. The documentation doesn't discuss them because we don't use them (That's sort of a lie, although the very last stack in Parrot is deprecated and scheduled for removal in 1.5). Instead of stacks we use register sets, and continuations, which implicitly form a sort of linked list.
You're right that a lot of our source files, including the coroutine file, are not well documented. We can try to work on that.
I'm still learning about CPS myself, but the more I see of it the more I like it. It's certainly just as great now as it ever has been. If we need more documentation about it, that's a valid point that we need to address.
We have to differentiate between people coding Parrot internals, and end users coding applications here. Crippling internals developers for the sake of keeping "the rabble" out of trouble is not a good strategy.
ReplyDeleteWhile it might seem easy to say "PIR is user level" and use that as the boundary between which developers need code security and which ones need the power derived from working without a net, it's an artificial boundary which can be enforced by any combination of implementation or even just policy -- e.g. "these opcodes are internal, do not use them in applications"
Since many internals are written in PIR, it's obviously also the wrong place to put that boundary.
At any rate, there are ways to make some pointer-level access safe without allowing random access to random addresses. This would involve opening small windows and having the runcore bounds check accesses automatically where necessary (and the ability to turn that off when one is trying to run mature code faster), and care for GC details so memory is not yanked out while the window is open.
I have to say, while I am really psyched for perl6, if it is held back from direct, efficient memory access by virtue of running on top of Parrot, it will be next to useless for my purposes. One of the reasons much of what I write is in C is because Perl5 sometimes makes this difficult... e.g. modifying values in SHMs shared with other processes. If I cannot treat shared memory as a buf8 or a structured data class, rather than funneling all accesses through some sort of read/write function set -- I'll just have to stick to C.
Personally I do not value languages protecting me from segmentation faults in the least. If I segfault, I debug, until the code does not segfault anymore. NULL PMC accesses are just as hard to track down, and sometimes unexpected DWIMs where program flow continues despite an error are even harder -- and in most cases have even more potentially dangerous results, since the damage tends to stay topical and can be hard to notice until it's too late.
Maybe I'm atypical, but I do not think so, actually. I have zero interest in working on a language that doesn't get me down to the metal when I need to go there... so I'll be interested to see if the Parrot community shares this stringent attitude as a whole.