Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Monday, September 27, 2010

PDD03 Calling Conventions Critique

I've wanted to get back into this habit for a while, and today is the day to do it. Following a short conversation with Parrot hacker plobsing yesterday, I've decided to tackle PDD03, the "Calling Conventions" design document first. In the coming days I would also like to take a close look at some other PDDs for systems which are receiving current developer attention. Quotes are all from the text of PDD03.

FAQ: Given Parrot's internal use of continuation-passing style ["CPS"], it
would be possible to use one pair of opcodes for both call and return, since
under CPS returns are calls.  And perhaps someday we will have only two
opcodes. But for now, certain efficiency hacks are easier with four opcodes.

"Perhaps someday"? Is this an authoritative design document or a dumping ground for assorted wishful thinking? This document should say exactly what Parrot wants in the long run. Do we want two opcodes for simplicity, or do we want four opcodes because of optimization potential? I would suggest that we have two opcodes only, and we can implement optimizing behaviors by having multiple types of Continuation object, with at least one type specifically optimized for simple subroutine returns. We used to have the RetContinuation PMC type, and while it wasn't the right solution for the problem it did have a glimmer of a good idea buried in it.
set_opcode "flags0, flags1, ..., flagsN", VAL0, VAL1, ... VALN
get_opcode "flags0, flags1, ..., flagsN", REG0, REG1, ... REGN
get_opcode "..., 0x200, flags0, ...", ..., "name", REG0, ...
It's hard to talk about needing to add extra opcodes to facilitate "efficiency hacks", and then saying that for every single parameter pass and retrieval that we need to parse integer values from a constant string. Mercifully, this isn't what we do anymore. In all these cases the first argument is a FixedIntegerArray of flags, which is computed by the assembler and serialized as a constant into the bytecode. At the very least, the design document should be updated to reflect that bit of sanity.

What's interesting here is the fact that these opcodes are variadic. That is, these opcodes (which, I believe, are unique among all 1200+ opcodes in Parrot) take an open-ended list of arguments. This makes traversing bytecode extremely difficult, which in turn makes tools for generating, reading, and analyzing bytecode extremely difficult, and needlessly so. Far superior to this would be to serialize information about the register indices into the same PMC that contains the flags for those parameters.

Right now, we use a special PMC called a CallContext to help facilitate subroutine calls. The CallContext is used to pack up all the arguments to the subroutine, and then it also serves as the dynamic context object for the subroutine. It contains the actual storage for the registers used, and handles unpacking arguments into those registers. It also manages some other things for context, like lexical associations between subs and other details.

In short, the CallContext stores some static data that we usually know about at compile time: argument lists and argument signatures, parameter signatures, etc. It also knows where arguments are coming from, whether they are coming from constants or registers, and which registers exactly are being used. All this information is needlessly calculated at runtime when it could easily be computed at compile time and stored in the constants table of  the PBC. If we split CallContext up into a dynamic part (CallContext) and a static part (CallArguments and maybe CallParameters), we could serialize the static part at compile time and avoid a lot of runtime work.

Here's an example call:

set_args callargs # A CallArguments PMC, which can be loaded from PBC as a constant
invokecc obj, meth

Or, we could cut out the separate opcode entirely:

invokecc obj, meth, callargs

Here's an example subroutine using a new mechanism:

.pir sub foo
callcontext = get_params foo
callparams = get_param_defs foo
unpack_params callcontext, callparams

Here, again, "callparams" is a PMC containing static information about the parameter signature of the subroutine, and can easily be serialized to bytecode to avoid runtime calculations.

With this kind of a system we have three PMCs which can work together in tandem to implement a calling sequence, and can be easily overridden by HLLs to implement all sorts of custom behaviors. Plus, we get good separation of concerns, which is a typical component of good design. We have CallContext, which serves as the runtime context of the subroutine and acts as storage for registers and lexical associations. CallArguments performs a mapping of callee registers into a call object. Then we have a CallParameters which performs the inverse mapping of call arguments into parameter registers. With these three things anybody could write their own calling conventions and have them be seamlessly integrated into Parrot without too much trouble. At any time you can pack arguments into a CallContext using a CallArguments PMC (which you can created at runtime or serialize to PBC), and then unpack them again using a CallParameters PMC (which can also be created at runtime or compile time). Tailcall optimizations, loop unwinding, recursion avoidance, and all sorts of other optimizations become trivially possible to implement at any level.
For documentation purposes we'll number the bits 0 (low) through 30 (high).
Bit 31 (and higher, where available) will not be used.
Is there a particular reason why bit 31 is off limits? Is this only because INTVAL is a signed quantity, and we don't want to be monkeying with the sign bit (because an optimizing compiler may monkey with it right back)? That would make sense, but is absolutely unexplained here.
The value is a literal constant, not a register.  (Don't set this bit
yourself; the assembler will do it.)
This is something that upsets me to no end, and I would be extremely happy to change this particular behavior. Here's the current behavior, in a nutshell: Every single parameter has a flag that determines whether the parameter comes from a register or from the constants table in the bytecode. That means for every single parameter, on every single function call, we need to do a test for constantness and branch to appropriate behavior. For every single parameter, for every single function call. Let that sink in for a minute.

Consider an alternative. We have a new opcode like this:

register = get_constant_p_i index

The "_p_i" suffix indicates that the opcode returns a PMC and takes an integer argument. We could have variants for "_i_i", "_n_i" and "_s_i" too.

Let's compare two code snippets. First, the current system:

$I0 = 0
loop_top:
foo($I0, "bar")
inc $I0 
if $I0 < 100 goto loop_top

And now, in a system with a get_constant opcode:

$I0 = get_constant 0
$S1 = get_constant 1
$I2 = get_constant 2
loop_top:
foo($I0, $S1)
inc $I0
if $I0 < $I2 goto loop_top

It looks like more opcodes total, but this second example would probably execute faster. Why?

The foo() function is called 100 times. In the current system, every single time the function is called, every single time, the PCC system must ask whether the first argument is a constant (it never is) and whether the second argument is a register (it never is). Then, every single time, it needs to load the value of the second argument from the constant table. This is also not to mention the fact that the "if" opcode at the bottom could be loading the value of it's second argument from the constants table if that argument was a string or a PMC.  INTVAL arguments are stored directly in the PBC stream, so they don't need to be loaded from the constants table. Luckily, I think serialized PMCs are thawed once when the PBC is loaded and don't need to be re-thawed from the PBC each time that they are loaded.Of course, I could nit-pick and suggest we should only thaw PMCs lazily on-demand and cache them, but that's a detail that probably doesn't matter a huge amount in the long run (unless we start saving a hell of a lot more PMCs to bytecode).

Of course, for code generators we're going to need a way to get the unique integer index for each stored constant, which likely means we're going to need an improved assembler, but it is doable.

If this bit [:flat] is set on a PMC value, then the PMC must be an aggregate.  The
contents of the aggregate, rather than the aggregate itself, will be passed.
If the C bit is also set, the aggregate will be used as a hash; its
contents, as key/value pairs, will be passed as named arguments.  The PMC
must implement the full hash interface.  {{ TT #1288: Limit the required interface. }}

If the C bit is not set, the aggregate will be used as an array; its
contents will be passed as positional arguments.


The meaning of this bit is undefined when applied to integer, number, and
string values.

I understand what this passage is trying to say, but it's still pretty confusing. Plus, I always find it funny when our design documents contain references to tickets (and in this case, a ticket that hasn't drawn a single comment in 10 months from anybody who could make a reasonable decision on the issue). There's probably a bigger discussion to be had about what it means when a PMC declares that it "provides hash". Parrot does have some support for roles, but that support is thin and mostly untested. Plus, nowhere do we define what any of our built-in roles like "hash" and "array" actually mean. That's a different problem for a different blog post, but it is worth mentioning here.

Here's a slightly better description: If you use the ":flat" flag, Parrot is going to create an iterator for that PMC and iterate over all elements in the aggregate, passing each to the subroutine. If the ":named" flag is used, that iteration will be done like hash iteration (values returned from the iterator are keys). Otherwise, the iteration will be normal array-like iteration. If the given PMC does not provide a get_iter VTABLE, an exception will be thrown. There's no sense talking about how the PMC must satisfy any kind of interface, since the only thing we require is that the aggregate is iterable.

It's worth noting that after looking at the code I don't think Parrot follows this design. I don't think the presence of the :named flag with the :flat flag changes the behavior at all. It appears from reading the code that hashes or hash-like PMCs are always iterated as hashes and the contents will always be passed as name/value pairs. Arrays and array-like PMCs are always iterated as arrays and their contents are always passed individually. Where a PMC is both array-like and hash-like at the same time, it's contents are iterated as an array and passed individually. I do not know whether this behavior is acceptable (and the design should be updated) or whether the implementation is lacking and the design is to be followed. I may try to put together some tests for this behavior later to illustrate.

If you're C-savvy, take a look at the "dissect_aggregate_arg" function in src/call/args.c file for the actual implementation.

As the first opcode in a subroutine that will be called with
invokecc or a method that will be called with call_methodcc, use
the get_params opcode to tell Parrot where the subroutine's or
method's arguments should be stored and how they should be expanded.

It's interesting to me that there would be any requirement on this being the first opcode. It seems to me that we should be able to unpack the call object in any place, at any time. That's a small nit, things work reasonably well with this weird restriction in place. I think we can support a wider range of behaviors, though I won't say anything about the cost/benefit ratio of the effort needed to do it.

Similarly, just before (yes, before) calling such a subroutine or
method, use the get_results opcode to tell Parrot where the return
values should be stored and how to expand them for your use.

I don't think this is the case anymore, but I do need to double-check the code that IMCC is currently generating. Obviously in a pure-CPS system we don't want to be going through this nonsense. Either Parrot does the sane thing and we need to update the docs, or Parrot doesn't do the sane thing and we need to update the design. Either way, kill this passage.

If this bit [:slurpy] is set on a P register, then it will be populated with an
aggregate that will contain all of the remaining values that have not already
been stored in other registers.

...

If the named bit is not set, the aggregate will be the HLL-specific array
type and the contents will be all unassigned positional arguments.

Which array type? We have several of them. If we mean ResizablePMCArray, we should say "ResizablePMCArray" so people know how to do the overriding.

An I register with this bit set is set to one if the immediately preceding
optional register received a value; otherwise, it is set to zero.  If the
preceding register was not marked optional, the behavior is undefined; but
we promise you won't like it.

Undefined behavior? In my Parrot? Why can't we define what the behavior is? We can promise that the behavior will be bad, but we can't even hint about what that behavior must be? That's pretty generous of us!

We could trivially identify these kinds of issues at PIR compile time, or we could catch these situations at runtime and throw an exception. Undefined behavior is precisely the kind of thing that design documents should be looking to clear up, not institutionalize. I am interested to know whether we have any tests for this, or if we could start writing some.

If this bit is set on a P register that receives a value, Parrot will ensure
that the final value in the P register is read-only (i.e. will not permit
modification).  If the received value was a mutable PMC, then Parrot will
create and set the register to a {not yet invented} read-only PMC wrapper
around the original PMC.

Future Notes: Parrot's algorithm for deciding what is writable may be
simplistic.  In initial implementations, it may assume that any PMC not of a
known read-only-wrapper type is mutable.  Later it may allow the HLL to
provide the test.  But we must beware overdesigning this; any HLL with a truly
complex notion of read-only probably needs to do this kind of wrapping itself.

Ah, something that looks pretty smart, though it's clearly listed in the PDD as "XXX - PROPOSED ONLY - XXX", which is not really a good sign. I've never been too happy with Parrot's current mechanism for marking PMCs as read-only anyway. This is a pretty interesting feature, though in current Parrot I'm not sure it could be implemented to any great effect. I may also like to see something like a ":clone" flag that forces a copy to be passed instead of a reference, or a ":cow" flag which produces a copy-on-write reference. Either way, we would probably like some kind of mechanism to specify that a caller will not be playing with data referenced by a passed PMC. This is especially true when you start to consider alternate object metamodels, or complex HLL type mappings: we don't want libraries modifying objects that the don't understand, and creating results that are going to destabilize the HLL. Having a guarantee that mistakes in the callee can't be propagated back through references to the caller would be a nice feature to have. Eventually.

Named values (arguments, or values to return) must be listed textually after
all the positional values.  fla and non-flat values may be mixed in any
order.

Is this true? I see no reason in the code why named arguments must be passed after positional arguments. I do see a reason why named parameters must be specified after positional parameters, however. Consistency is good, but I tend to prefer that Parrot not implement unnecessary restrictions. Plus, it should be very possible for an HLL to override the default CallContext and other related PMC types and implement their own behaviors for things like ordering, overflow, and underflow.

That brings me to a point of particular unhappiness with this PDD: It is extremely focused on the behavior of the combination of IMCC, PIR, and built-in data types. It's not hard, with all the Lorito talk flying around, to imagine that in a relatively short time Parrot could be PIR free: System languages like NQP and Winxed could compile directly down to Lorito bytecode, and not involve PIR at all. We could be using HLL mapped types for CallContext and other things to completely change all this behavior. Explaining what the defaults are is certainly important, and suitable for in-depth documentation. Explaining how the defaults work and interoperate with HLL overriding types, and how the system should be extremely dynamic and pluggable is absolutely missing from PDD 03.

Named targets can be filled with either positional or named values.
However, if a named target was already filled by a positional value, and
then a named value is also given, this is an overflow error.

I find this a little bit confusing. Why can a named parameter be filled with a positional argument, but not the other way around? I suggest that a named parameter only takes a named argument, and a positional parameter only takes a positional argument. We should either provide other types (like :lookahead) if we want other behaviors, or we should allow the user to subclass the PMCs that implement this behavior and allow them to put in all the crazy, complicated rules that they want. Parrot defaults should be sane and general. Everything else should be subclassable or overridable.

The details included in this PDD are almost as troubling as some of the details omitted. Information about signature strings, which are used everywhere and are the only way to call a method from an extension or embedding program, are completely omitted. Being able to specify a signature as a string is a central part of the current PCC implementation, so that makes no sense to me. Information about central PMC types, like CallContext is nowhere to be found either, much less information about how to override these types, and the interfaces that they are expected to implement. Making calls from extension or embedding programs using variadic argument lists is completely missing. The differences between method and subroutine invocations, the difference between invoke and invokecc opcodes (And the implications of CPS in general) is missing. MMD is never mentioned. Tailcalls and optimizations related to them is missing. Details about passing arguments and extracting parameters from continuations and exceptions is missing. New and proposed flags like :call_sig and :invocant are not mentioned.

In my previous blog post, I mentioned four common problems that our PDDs suffered from. This document suffers from several. First, it's more descriptive than prescriptive, doing it's best to document what the defaults in Parrot were in 2008. Second, this document is rapidly losing touch with reality as changes the the PCC system are pushing the capabilities of Parrot beyond what the document accounts for. Third, it has an extremely narrow focus on IMCC/PIR, and is vague or is completely silent about any other possibilities, especially those (such as Lorito) that may play a dramatic role in the implementation of this system in the future.

PDD 03 doesn't tell our current users how to effectively use the calling conventions system of Parrot, and does nothing to direct our developers on how to improve it going forward. It really needs to be completely deleted and rewritten from the ground up. When it is, I think we will find some gold, both in terms of exposing virtues of the current implementation and describing plenty of opportunities for drastic improvement.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.