I had, along with chromatic and some other people, been thinking about Lorito as a very very low-level language, similar to an assembly language. The idea being that each Lorito op would share a close correspondence with an operation in the JIT engine, and would therefore be trivially easy to compile at runtime.
Plobsing was taking a different approach: He's thinking about a higher-level structured language that we would parse down into native JIT format at build time. There are some merits to this idea and since he mentioned it to me I've been thinking about it more and more. Let's take a look at things in more detail, focusing on LLVM specifically.
First, LLVM uses it's own platform-neutral LLVM bytecode format natively. High-level code is parsed and compiled down to LLVM bytecode which can then be optionally optimized before conversion to machine code. That first part ("parsed and compiled") is expensive: We don't want to be parsing and compiling to LLVM bytecode at runtime, that would eat up any potential gains we could get from JIT in the first place. What we want is for the Lorito code to be compiled only once: during the Parrot build. From there we will have LLVM .bc files which contain the raw bytecode definitions for each op, which can be combined together into large methods or traces and compiled to machine code in one go.
Ops are defined in .ops files, which are currently written in a preprocessed C-like language but which will eventually be rewritten in Lorito. During the build, we want two things to happen: First, we want to compile the Lorito down into machine code, as we currently do with our ops, for interpretted cores (fast core, switch core, etc). Second, we want to compile down the ops into LLVM bytecode for use with the JIT at runtime. By comparison:
Interpretted: Good startup time, Reasonable execution time
JIT'd: Slower startup time, better execution time. Many tradeoffs available between the two (usually in terms of adding/removing optimization stages)
For long running programs, costs necessary to startup the JIT engine can be amortized over the entire execution time, which produces benefits overall.
But, I'm digressing. The goal of Lorito was to produce a language that would be trivially easy to JIT at runtime. I was assuming that what we needed was a very small language to perform the job. However, what's really the most trivial to JIT is LLVM's native bytecode format. If we generate that bytecode at compile time, the runtime costs will stay to a minimum. This means that we can have Lorito be any language of any complexity: The only requirements are that we be able to compile it down to LLVM bytecode, hopefully in a process that isn't overly complex or fraught with error possibilities. So long as the conversion only happens once at build time, it doesn't really matter how complicated it is.
Any parser for anything that's more complicated than assembly language will take development effort. The tradeoff is reduced development effort when we start rewriting the ops in Lorito, and increased inclination to write more things in Lorito than just ops. For instance, rewriting PMCs and their VTABLEs in Lorito means that we can start exposing those to the JIT as well. The more of Parrot that we expose to the JIT, the more gains there are to be had from it.
Assuming Lorito is going to now be a relatively high-level structured language as plobsing suggests, the question now is what should it look like? Should it be C, or like C? Should it be NQP, or like NQP? NQNQP?
As a thought experiment, consider the case where Lorito is C. In that case, our ops and VTABLEs are already all written in Lorito, and we already have a Lorito compiler available (LLVM). In this case, there are only a handful of steps to do before Parrot has a working (if naive) JIT:
- Add a configure probe to detect LLVM
- During the build, compile down ops to LLVM Bytecode, in addition to everything else we already do with them
- Add a utility function to compile a stream of bytecode (such as an entire Sub) to machine code using the LLVM bytecode.
- Add a way (such as in the Sub invoke VTABLE) to cache methods that have already been compiled, and add a switching mechanism to invoke that instead of the bytecode
I'm sure this idea won't jive well with chromatic's "welcome to the new millenium" general distaste for C. I'm not 100% sure about it all myself. However, it is interesting and worthy of some consideration. It certainly does seem like a path of least resistance