Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Sunday, July 19, 2009

Rethinking Parrot Execution

I've been doing some thinking recently about Parrot's execution strategy. One of the reasons for this train of thought is because of some of the issues we've been seeing that relate to the inferior runloops problem.

The inferior runloops problem, so far as I understand it, is this: Parrot can create multiple runloops on the C system stack which execute code independently and can interfere with each other in strange and complicated ways. There are two main methods that we could use to go about resolving this: The first is that we painstakingly go around and properly encapsulate everything and implement all sorts of runtime checks to make sure various special cases don't happen and cause data corruption. The second is to prevent Parrot from recursing into multiple runloops at all (or, if necessary, do it very rarely).

The second case seems to be the obvious solution. "It causes problems? Then don't do it!". However, it's not so simple as that. The problem comes in many forms but specifically let's talk about PMC VTABLEs. PMCs can be overridden from PIR, so the VTABLE you are calling may turn out to be a PIR sub called inline from C without being able to return to the runloop first. So, there's no choice but to recurse into a new runloop. We could set up some elaborate system with longjmps to allow this kind of situation, but that would create new problems that we would rather avoid.

Of course this is only one case of a problem. There are plenty of situations where we can meaningfully avoid runloop recursion despite some cases where we cannot avoid them. One such place is in runtime code compilation.

Control flow in Parrot currently goes like this: We execute Parrot from the command line with the name of a PIR file to execute. Parrot passes the PIR file to IMCC which compiles it into bytecode. IMCC immediately executes any :init, :immediate, and :postcomp functions, then executes the :main function. The big problem with this is that IMCC is managing control flow for Parrot. When we do a runtime code compilation the same issue occurs. The executing PIR code creates the PIR compreg object and invokes it with a string of PIR code. IMCC takes that string, compiles it, and executes any :init and :immediate Subs in recursive runloops before returning a reference to the :main Sub to be executed in the parent runloop. This system provides needless complication and plenty of opportunities for spectacular failure. Let's look at a different way to do this.

We call Parrot with the name of a PIR file. Parrot passes the PIR file to IMCC to be compiled. IMCC compiles the PIR file and returns some kind of object. Parrot then passes that object into the concurrency scheduler that manages control flow from that point forward. That "some kind of object" couldn't just be a Sub PMC because it would need to contain arrays of all the :init and :immediate and :load Subs, a pointer to the :main Sub, etc. The scheduler will then fire off the subs from this execution object in the correct order, and do them all in the same runloop.

When we do a runtime compilation of PIR code using the PIR compreg object, the same mechanism would apply. IMCC compiles the code and returns the execution object. We pass that object to the schduler which stiches the :init and :immediate subs into the currently executing program flow and just executes them in the same runloop. A whole class of RT and Trac tickets will disappear overnight, the interface with IMCC gets better encapsulated, IMCC's internal logic becomes majorly simplified, we better integrate the scheduler with primary control flow, we get better control over where and when runloops are created, and we can reduce (though not eliminate) the need to recurse into new runloops in some situations.

Besides those problems that get fixed, we also open ourselves up to some cool new possibilities, such as being able to specify multiple threads that get launched automatically at runtime. or the ability for the scheduler to serialize the current control flow state and save it to a file for continuing later. I'm sure there are more things as well.

Sounds like a win-win-win-win-win-win-win to me.

Now obviously a lot of thought would need to go into this idea, and I'd love to hear any feedback that people have about it.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.