Whiteknight's World

Case Study: Parrot Critique

2010-10-24T09:10:00.001-04:00

In response to my previous post about my vision for the Product Management team, Parrot user shockwave posted a very long and thoughtful critique. I have a very large backlog of other topics that I want to post about, but I want to take the time to respond directly to shockwave right now. In a nutshell, for anybody who hasn't read his entire set of comments, he is trying to embed a Parrot interpreter for a custom language he's designing into a 3D game engine that he is working on. His comments were very long (worth the read!), so I will be quoting some of the most important parts, with minor corrections.

I started to have issues embedding Parrot as soon as I tried to embed it.

As much as I hate to admit it, this doesn't surprise me. We have very few projects attempting to use Parrot in an embedding situation. In fact, I can only really think of one: this one. Some people will mention mod_parrot. I'm not sure whether that counts as an embedding or an extending type of project, but I definitely don't think it's being actively maintained right now. Whatever, maybe two projects.

It's just a fact of the programming world: Features which are not often used, and aren't thoroughly tested are going to wither. As a community we could be more proactive and show a little bit more foresight to ensure that this interface is capable, powerful, elegant, and maintained, but we don't. Maybe that's the single most important thing that the Product Management team should focus on for now.

First, there's barely any documentation for embedding it. Most of the documentation I found was just the listing of function prototypes.

Damnit! I find this so infuriating. As I mentioned in a previous post about the state of the PDDs, listings of function prototypes do not qualify as acceptable user documentation. A documentation file which only lists function prototypes and maybe a short, abstract blurb about them, is not adequate documentation for users. Also, the kinds of documentation that users need is often far different from the kinds of documentation that developers need. So, there's another important thing that the Product Management team should be working on.

To be clear, Parrot is something designed to be embedded, and there is no documentation on how to do that. That's not good.

This is a sentiment that I really couldn't agree with more. This is a huge part of my personal vision for Parrot. The "real" final product that we ship, the most important part of our binary distribution is not the Parrot executable. That's just a thin argument-processing wrapper around the most important part: libparrot. The Parrot executable is just a small utility that embeds libparrot. At least, that's what it should be. libparrot is the product that we are developing. Everything else--all the command-line tools, the HLL compilers and fakecutables, extension libraries, everything--are just add-ons to libparrot.

Anything that the Parrot executable can do, all the interfaces that it calls from libparrot, should be made available and easily so from other embedding applications. And all of it should be well documented.

the documentation for that function says this:

"void Parrot_load_bytecode(PARROT_INTERP, STRING *path)
Reads and load Parrot bytecode ... . Due to the void return type, the behavior of this function on error is unclear."

Basically, if a file doesn't exist, the program state then becomes undefined. Not good.

"the behavior of this function on error is unclear"? Full stop. This is absolutely, unapologetically stupid and wrong. Close your eyes and imagine me saying a few cursewords before you continue reading the rest of this post because, trust me, I am saying them now.

The job of any application virtual machine, be it the JVM, the .NET CLR, Neko VM, or whatever, is to make these kinds of things clear. Virtual machines provide a consistent abstraction layer over the underlying platform, and provide a standard and reliable runtime for applications that run on it. What part of that definition allows for a key interface function to both allow errors and also to have undefined behavior when an error exists? Either that function needs to be fixed to have defined and consistent behavior on error, or a new function needs to be written that that goal explicitly in mind.

Parrot seems to be built with a command line mentality. Not all the actual users(end-users, not developers) of Parrot will be running the end product from a command line. The video game engine I'm using, for example, runs under Windows, MacOS, XBox 360, and PS3. I'm trying to run it from Windows 7, as a GUI application; the errors are outputed to nowhere. Parrot can't just spit out error messages to the command line and expect that they will always be seen. There's no built-in way to place the errors in buffer, so that I could choose how to print those errors.

Let me paraphrase, using common internet vernacular: Parrot IZ T3H FAILZ. Any Parrot hacker who reads this paragraph should immediately be able to extract a number of TODO items from it. libparrot should always assume that it is being embedded, and act accordingly. Again, the Parrot executable isn't our primary product, libparrot is. Libraries like this shouldn't be just dumping text out to STDERR or any standard handle without allowing some possibility of explicit overriding. If we dumped error text into a buffer, set a flag, and allowed the embedding application handle it appropriately, we would be much better off.

In fact, I suggest that all functions which execute code in Parrot should be modified to return a PMC. That PMC could be an exit code, a status message, an unhandled exception, some kind of callback, or whatever. If the PMC returned is an unhanded exception, the embedding application could chose to handle it and call back in to Parrot, or propagate that error further up the call chain, or whatever. It's not libparrot's job to determine that an unhandled exception causes the application to exit, or to force a dump of the exception text to STDERR. Both of these things are very wrong in many situations.

I want to use Parrot as the runtime, because of business and personal reasons. but I can't base my business and future on hopes and dreams alone. I'd really like to go with Parrot as the runtime for the game engine, but if it can't stand on its own feet, I'm gonna have to shelf it for some, possibly long, time.

And here it is, the heartbreaker in this whole situation. Here we have a person who likes Parrot and wants to use it. He has put in the effort to build a compiler and runtime for his language, but the embedding interface is so shitty shitty shitty that he can't use it and may have to go back and implement all his ideas in a different system. Excuse me while I go say a few more choice curse words.

Parrot's embedding/extending interface has never been any good because nobody has ever taken the time to define exactly what that interface is and what it should be. Everybody has been happy to only use Parrot from the command-line, and to break encapsulation at every step because it was easier and nobody wanted to do any differently.

What we need to do, and do it as immediately as possible, is to realize that there are two distinct pieces of software: libparrot and the parrot executable. We need to realize that the former is our primary product, and can be used in many situations where the later cannot. We also should realize that libparrot needs a complete, comprehensive, elegant, and properly-encapsulating API, which the Parrot executable and other embedding programs should use exclusively. We need to start making some tough decisions about what that API should contain, what it should not contain, and what kind of functionality our users are going to have access to.

We need to write up a proper API, test it, and document it. Anything less is failure and means we are ignoring the very real needs of our users.

Product Management Team

2010-10-21T17:00:00.000-04:00

Yesterday I talked about the new Teams concept for the Parrot community. The goal of which is to take some important jobs and assign them to small groups instead of individuals. For a team like the architecture team, this task was previously assigned to only one individual. For other teams like Project Management, Community, Product Management, and QA, these are pretty new and didn't previously belong to any one dedicated individual.

With that in mind, I need to start putting together a comprehensive vision for what the Product Management Team will be, since we haven't ever had a "Project Manager" before to show me the way forward. The description in the teams proposal goes as follows:

This team is responsible for the vision of Parrot as a user-facing product. They also act as an advocate for the needs of Parrot's users (e.g. HLLs and libraries such as Rakudo and Kakapo) and as an intermediary between Parrot and its users.

There are a few different tasks here (and I think a few more are implied), but it all really boils down to a single core concept: Making Parrot "work" for current and prospective users.

Who are Parrot's users? That's really a good first question to ask. John Q Bag-of-donuts, running an instance of the November Wiki at home, isn't really a Parrot user. Sure, his software is running on top of Parrot, but the reality is that he is using November and Rakudo, not Parrot directly. I think that relatively few people would be (or should be) using Parrot directly. As far as most people should ever know or care, Parrot is just another library dependency for some higher-level software that just works.

Parrot users are the developers of HLL compilers, PIR libraries, PMC extensions, and embedding hosts. In many cases, especially historically, the group of Parrot users overlapped pretty heavily with the group of Parrot's developers, so the two groups would often get conflated. We can't do that anymore.

Users need Parrot to do. Users need Parrot to be. Parrot's capabilities, it's external interfaces, it's execution speed and it's memory usage; our users need all these things and need them to be done correctly. Parrot, in turn, needs users. The relationship is cyclic: A better Parrot attracts and retains more satisfied users. Satisfied users, in turn, provide feedback to the development team and make Parrot appear to be a more popular and compelling platform. There's word-of-mouth advertising too that we want to bank on. Of course, all the advertising and good will in the world don't replace rich feature sets and improved performance.

In a nutshell, these are the tasks that I think the Product Management team needs to be focused on:

Communication: We need to stay in touch with the users, bring their concerns to the rest of the Parrot development team, and share their successes as well
Marketing: We need to advertise Parrot. We need to attract new users, and help encourage people to build awesome things on top of our platform.
Interfaces: We need to work to make and keep Parrot's external interfaces sane. Our command-line interface, extending interface, embedding interface, and PIR/PASM/PBC. These things all need to be sane and either stable or in the midst of dramatic improvements. We will elicit feedback from users to find out what features people use and what features people don't use. This will help facilitate feature deprecations and experimental feature promotions in the future. As everybody can imagine, I would love to get my hands around the throat of the deprecation policy and start squeezing.
Advocacy: We need to find out exactly what the users want and advocate for that to the rest of the developer team. Parrot isn't a product developed in a vacuum, we have real users with real needs, and we need to make sure that we meet those needs. If we don't meet the needs of our users, Parrot is nothing for nobody.

I'm looking for interested and dedicated members to join the Product Management team for Parrot. People who are interested in doing the kinds of things I talk about above, or people who have their own ideas for what this team should be responsible for, would make perfect additions. Ideally, I would like to have at least 3-4 people on the team besides myself, and more would be welcomed too.

Parrot Teams

2010-10-20T15:45:00.000-04:00

Parrot contributor Christoph Otto (cotto) posted an email to the list a few days ago about a new Teams concept for the Parrot community. This idea is essentially an extension of an idea I had been kicking around in my own head, and I'm very enthusiastic about it.

The idea is to organize Parrot developers into particular teams, and give each team a particular area of focus and "authority" to get things done. We obviously don't want to get too overzealous and focus more on rigid organization and structures, but we do want to make sure that important jobs are being performed and some people get the chance to take personal responsibility for ensuring that things in the project get done.

There are five teams slated so far, though I'm sure the number will change as we see how teams work and we divide up tasks that need to get done.

One important team, for example, is the architectural team, which will be taking over for the position of architect. The idea is that we get more people involved in that role, increase our bus number, and become less reliant on a single person for such an important task. Let's all commend Allison for the great job she has done in this role for the past few years. However, let's also remember that it was a pretty significant drain on her time, and it's a hard job to do for just one person. It looks like Cotto will be the team leader of the architectural team, though I'm already a member of it as well.

I'm going to be the team leader for the new "Product Management" team. I'm still coming to grips myself with what this new team is supposed to be doing, because the proposal is still young and there are many roles and things that need defining. At the basic level, the Product Management team is supposed to interact with users, and form a comprehensive vision of Parrot as a user-facing product. This is very likely going to have implications in several areas of Parrot, including the embedding/extension interface, compilers for PIR/Lorito, tools, libraries, etc. My goal, at least initially, is to make sure that Parrot is serving the needs of our users, and that Parrot becomes a compelling platform for people to work on. You can damn sure expect more blog posts about these things!

I am going to try to solicit new membership to this team as best as I can. It's going to be important that we have some enthusiastic contributors here, especially people who have experience developing compilers, extensions, and other related parrot-dependent projects. We're also going to want to establish and maintain close contact with other developers of these kinds of projects, to ensure that we are meeting their needs and ensure that we are pushing to meet and exceed those needs in future releases.

Other teams include the Project Management team currently headed by Jim Keenan (kid51), the Community team headed by Jonathan Leto (dukeleto), and a Quality Assurance team, which currently has no team lead. All of these teams are going to have to take some time to ensure that they properly define their own roles, and find the best ways to perform them going forward. We also need to attract several developers to join all of these teams, to make sure they all have enough "staff" to get the necessary jobs done.

Signing up for a team is easy. Talk to the team lead and ask about how to get involved. You don't need to be a member of a team in order to be a committer for Parrot. I'm looking for people to join the Product Management Team, and I know other teams are going to be looking for new members as well, so definitely get involved if you are interested in something we are doing.

I'll try to put together another post soon with a more comprehensive vision for what the Product Management team will be and what it will do.

PDD03 Calling Conventions Critique

2010-09-27T17:00:00.000-04:00

I've wanted to get back into this habit for a while, and today is the day to do it. Following a short conversation with Parrot hacker plobsing yesterday, I've decided to tackle PDD03, the "Calling Conventions" design document first. In the coming days I would also like to take a close look at some other PDDs for systems which are receiving current developer attention. Quotes are all from the text of PDD03.

FAQ: Given Parrot's internal use of continuation-passing style ["CPS"], it
would be possible to use one pair of opcodes for both call and return, since
under CPS returns are calls. And perhaps someday we will have only two
opcodes. But for now, certain efficiency hacks are easier with four opcodes.

"Perhaps someday"? Is this an authoritative design document or a dumping ground for assorted wishful thinking? This document should say exactly what Parrot wants in the long run. Do we want two opcodes for simplicity, or do we want four opcodes because of optimization potential? I would suggest that we have two opcodes only, and we can implement optimizing behaviors by having multiple types of Continuation object, with at least one type specifically optimized for simple subroutine returns. We used to have the RetContinuation PMC type, and while it wasn't the right solution for the problem it did have a glimmer of a good idea buried in it.

set_opcode "flags0, flags1, ..., flagsN", VAL0, VAL1, ... VALN
get_opcode "flags0, flags1, ..., flagsN", REG0, REG1, ... REGN
get_opcode "..., 0x200, flags0, ...", ..., "name", REG0, ...

It's hard to talk about needing to add extra opcodes to facilitate "efficiency hacks", and then saying that for every single parameter pass and retrieval that we need to parse integer values from a constant string. Mercifully, this isn't what we do anymore. In all these cases the first argument is a FixedIntegerArray of flags, which is computed by the assembler and serialized as a constant into the bytecode. At the very least, the design document should be updated to reflect that bit of sanity.

What's interesting here is the fact that these opcodes are variadic. That is, these opcodes (which, I believe, are unique among all 1200+ opcodes in Parrot) take an open-ended list of arguments. This makes traversing bytecode extremely difficult, which in turn makes tools for generating, reading, and analyzing bytecode extremely difficult, and needlessly so. Far superior to this would be to serialize information about the register indices into the same PMC that contains the flags for those parameters.

Right now, we use a special PMC called a CallContext to help facilitate subroutine calls. The CallContext is used to pack up all the arguments to the subroutine, and then it also serves as the dynamic context object for the subroutine. It contains the actual storage for the registers used, and handles unpacking arguments into those registers. It also manages some other things for context, like lexical associations between subs and other details.

In short, the CallContext stores some static data that we usually know about at compile time: argument lists and argument signatures, parameter signatures, etc. It also knows where arguments are coming from, whether they are coming from constants or registers, and which registers exactly are being used. All this information is needlessly calculated at runtime when it could easily be computed at compile time and stored in the constants table of the PBC. If we split CallContext up into a dynamic part (CallContext) and a static part (CallArguments and maybe CallParameters), we could serialize the static part at compile time and avoid a lot of runtime work.

Here's an example call:

set_args callargs # A CallArguments PMC, which can be loaded from PBC as a constant
invokecc obj, meth

Or, we could cut out the separate opcode entirely:

invokecc obj, meth, callargs

Here's an example subroutine using a new mechanism:

.pir sub foo
callcontext = get_params foo
callparams = get_param_defs foo
unpack_params callcontext, callparams

Here, again, "callparams" is a PMC containing static information about the parameter signature of the subroutine, and can easily be serialized to bytecode to avoid runtime calculations.

With this kind of a system we have three PMCs which can work together in tandem to implement a calling sequence, and can be easily overridden by HLLs to implement all sorts of custom behaviors. Plus, we get good separation of concerns, which is a typical component of good design. We have CallContext, which serves as the runtime context of the subroutine and acts as storage for registers and lexical associations. CallArguments performs a mapping of callee registers into a call object. Then we have a CallParameters which performs the inverse mapping of call arguments into parameter registers. With these three things anybody could write their own calling conventions and have them be seamlessly integrated into Parrot without too much trouble. At any time you can pack arguments into a CallContext using a CallArguments PMC (which you can created at runtime or serialize to PBC), and then unpack them again using a CallParameters PMC (which can also be created at runtime or compile time). Tailcall optimizations, loop unwinding, recursion avoidance, and all sorts of other optimizations become trivially possible to implement at any level.

For documentation purposes we'll number the bits 0 (low) through 30 (high).
Bit 31 (and higher, where available) will not be used.

Is there a particular reason why bit 31 is off limits? Is this only because INTVAL is a signed quantity, and we don't want to be monkeying with the sign bit (because an optimizing compiler may monkey with it right back)? That would make sense, but is absolutely unexplained here.

The value is a literal constant, not a register. (Don't set this bit
yourself; the assembler will do it.)

This is something that upsets me to no end, and I would be extremely happy to change this particular behavior. Here's the current behavior, in a nutshell: Every single parameter has a flag that determines whether the parameter comes from a register or from the constants table in the bytecode. That means for every single parameter, on every single function call, we need to do a test for constantness and branch to appropriate behavior. For every single parameter, for every single function call. Let that sink in for a minute.

Consider an alternative. We have a new opcode like this:

register = get_constant_p_i index

The "_p_i" suffix indicates that the opcode returns a PMC and takes an integer argument. We could have variants for "_i_i", "_n_i" and "_s_i" too.

Let's compare two code snippets. First, the current system:

$I0 = 0
loop_top:
foo($I0, "bar")
inc $I0 
if $I0 < 100 goto loop_top

And now, in a system with a get_constant opcode:

$I0 = get_constant 0
$S1 = get_constant 1
$I2 = get_constant 2
loop_top:
foo($I0, $S1)
inc $I0
if $I0 < $I2 goto loop_top

It looks like more opcodes total, but this second example would probably execute faster. Why?

The foo() function is called 100 times. In the current system, every single time the function is called, every single time, the PCC system must ask whether the first argument is a constant (it never is) and whether the second argument is a register (it never is). Then, every single time, it needs to load the value of the second argument from the constant table. This is also not to mention the fact that the "if" opcode at the bottom could be loading the value of it's second argument from the constants table if that argument was a string or a PMC. INTVAL arguments are stored directly in the PBC stream, so they don't need to be loaded from the constants table. Luckily, I think serialized PMCs are thawed once when the PBC is loaded and don't need to be re-thawed from the PBC each time that they are loaded.Of course, I could nit-pick and suggest we should only thaw PMCs lazily on-demand and cache them, but that's a detail that probably doesn't matter a huge amount in the long run (unless we start saving a hell of a lot more PMCs to bytecode).

Of course, for code generators we're going to need a way to get the unique integer index for each stored constant, which likely means we're going to need an improved assembler, but it is doable.

If this bit [:flat] is set on a PMC value, then the PMC must be an aggregate. The
contents of the aggregate, rather than the aggregate itself, will be passed.
If the C bit is also set, the aggregate will be used as a hash; its
contents, as key/value pairs, will be passed as named arguments. The PMC
must implement the full hash interface. {{ TT #1288: Limit the required interface. }}

If the C bit is not set, the aggregate will be used as an array; its
contents will be passed as positional arguments.

The meaning of this bit is undefined when applied to integer, number, and
string values.

I understand what this passage is trying to say, but it's still pretty confusing. Plus, I always find it funny when our design documents contain references to tickets (and in this case, a ticket that hasn't drawn a single comment in 10 months from anybody who could make a reasonable decision on the issue). There's probably a bigger discussion to be had about what it means when a PMC declares that it "provides hash". Parrot does have some support for roles, but that support is thin and mostly untested. Plus, nowhere do we define what any of our built-in roles like "hash" and "array" actually mean. That's a different problem for a different blog post, but it is worth mentioning here.

Here's a slightly better description: If you use the ":flat" flag, Parrot is going to create an iterator for that PMC and iterate over all elements in the aggregate, passing each to the subroutine. If the ":named" flag is used, that iteration will be done like hash iteration (values returned from the iterator are keys). Otherwise, the iteration will be normal array-like iteration. If the given PMC does not provide a get_iter VTABLE, an exception will be thrown. There's no sense talking about how the PMC must satisfy any kind of interface, since the only thing we require is that the aggregate is iterable.

It's worth noting that after looking at the code I don't think Parrot follows this design. I don't think the presence of the :named flag with the :flat flag changes the behavior at all. It appears from reading the code that hashes or hash-like PMCs are always iterated as hashes and the contents will always be passed as name/value pairs. Arrays and array-like PMCs are always iterated as arrays and their contents are always passed individually. Where a PMC is both array-like and hash-like at the same time, it's contents are iterated as an array and passed individually. I do not know whether this behavior is acceptable (and the design should be updated) or whether the implementation is lacking and the design is to be followed. I may try to put together some tests for this behavior later to illustrate.

If you're C-savvy, take a look at the "dissect_aggregate_arg" function in src/call/args.c file for the actual implementation.

As the first opcode in a subroutine that will be called with
invokecc or a method that will be called with call_methodcc, use
the get_params opcode to tell Parrot where the subroutine's or
method's arguments should be stored and how they should be expanded.

It's interesting to me that there would be any requirement on this being the first opcode. It seems to me that we should be able to unpack the call object in any place, at any time. That's a small nit, things work reasonably well with this weird restriction in place. I think we can support a wider range of behaviors, though I won't say anything about the cost/benefit ratio of the effort needed to do it.

Similarly, just before (yes, before) calling such a subroutine or
method, use the get_results opcode to tell Parrot where the return
values should be stored and how to expand them for your use.

I don't think this is the case anymore, but I do need to double-check the code that IMCC is currently generating. Obviously in a pure-CPS system we don't want to be going through this nonsense. Either Parrot does the sane thing and we need to update the docs, or Parrot doesn't do the sane thing and we need to update the design. Either way, kill this passage.

If this bit [:slurpy] is set on a P register, then it will be populated with an
aggregate that will contain all of the remaining values that have not already
been stored in other registers.

...

If the named bit is not set, the aggregate will be the HLL-specific array
type and the contents will be all unassigned positional arguments.

Which array type? We have several of them. If we mean ResizablePMCArray, we should say "ResizablePMCArray" so people know how to do the overriding.

An I register with this bit set is set to one if the immediately preceding
optional register received a value; otherwise, it is set to zero. If the
preceding register was not marked optional, the behavior is undefined; but
we promise you won't like it.

Undefined behavior? In my Parrot? Why can't we define what the behavior is? We can promise that the behavior will be bad, but we can't even hint about what that behavior must be? That's pretty generous of us!

We could trivially identify these kinds of issues at PIR compile time, or we could catch these situations at runtime and throw an exception. Undefined behavior is precisely the kind of thing that design documents should be looking to clear up, not institutionalize. I am interested to know whether we have any tests for this, or if we could start writing some.

If this bit is set on a P register that receives a value, Parrot will ensure
that the final value in the P register is read-only (i.e. will not permit
modification). If the received value was a mutable PMC, then Parrot will
create and set the register to a {not yet invented} read-only PMC wrapper
around the original PMC.

Future Notes: Parrot's algorithm for deciding what is writable may be
simplistic. In initial implementations, it may assume that any PMC not of a
known read-only-wrapper type is mutable. Later it may allow the HLL to
provide the test. But we must beware overdesigning this; any HLL with a truly
complex notion of read-only probably needs to do this kind of wrapping itself.

Ah, something that looks pretty smart, though it's clearly listed in the PDD as "XXX - PROPOSED ONLY - XXX", which is not really a good sign. I've never been too happy with Parrot's current mechanism for marking PMCs as read-only anyway. This is a pretty interesting feature, though in current Parrot I'm not sure it could be implemented to any great effect. I may also like to see something like a ":clone" flag that forces a copy to be passed instead of a reference, or a ":cow" flag which produces a copy-on-write reference. Either way, we would probably like some kind of mechanism to specify that a caller will not be playing with data referenced by a passed PMC. This is especially true when you start to consider alternate object metamodels, or complex HLL type mappings: we don't want libraries modifying objects that the don't understand, and creating results that are going to destabilize the HLL. Having a guarantee that mistakes in the callee can't be propagated back through references to the caller would be a nice feature to have. Eventually.

Named values (arguments, or values to return) must be listed textually after
all the positional values. fla and non-flat values may be mixed in any
order.

Is this true? I see no reason in the code why named arguments must be passed after positional arguments. I do see a reason why named parameters must be specified after positional parameters, however. Consistency is good, but I tend to prefer that Parrot not implement unnecessary restrictions. Plus, it should be very possible for an HLL to override the default CallContext and other related PMC types and implement their own behaviors for things like ordering, overflow, and underflow.

That brings me to a point of particular unhappiness with this PDD: It is extremely focused on the behavior of the combination of IMCC, PIR, and built-in data types. It's not hard, with all the Lorito talk flying around, to imagine that in a relatively short time Parrot could be PIR free: System languages like NQP and Winxed could compile directly down to Lorito bytecode, and not involve PIR at all. We could be using HLL mapped types for CallContext and other things to completely change all this behavior. Explaining what the defaults are is certainly important, and suitable for in-depth documentation. Explaining how the defaults work and interoperate with HLL overriding types, and how the system should be extremely dynamic and pluggable is absolutely missing from PDD 03.

Named targets can be filled with either positional or named values.
However, if a named target was already filled by a positional value, and
then a named value is also given, this is an overflow error.

I find this a little bit confusing. Why can a named parameter be filled with a positional argument, but not the other way around? I suggest that a named parameter only takes a named argument, and a positional parameter only takes a positional argument. We should either provide other types (like :lookahead) if we want other behaviors, or we should allow the user to subclass the PMCs that implement this behavior and allow them to put in all the crazy, complicated rules that they want. Parrot defaults should be sane and general. Everything else should be subclassable or overridable.

The details included in this PDD are almost as troubling as some of the details omitted. Information about signature strings, which are used everywhere and are the only way to call a method from an extension or embedding program, are completely omitted. Being able to specify a signature as a string is a central part of the current PCC implementation, so that makes no sense to me. Information about central PMC types, like CallContext is nowhere to be found either, much less information about how to override these types, and the interfaces that they are expected to implement. Making calls from extension or embedding programs using variadic argument lists is completely missing. The differences between method and subroutine invocations, the difference between invoke and invokecc opcodes (And the implications of CPS in general) is missing. MMD is never mentioned. Tailcalls and optimizations related to them is missing. Details about passing arguments and extracting parameters from continuations and exceptions is missing. New and proposed flags like :call_sig and :invocant are not mentioned.

In my previous blog post, I mentioned four common problems that our PDDs suffered from. This document suffers from several. First, it's more descriptive than prescriptive, doing it's best to document what the defaults in Parrot were in 2008. Second, this document is rapidly losing touch with reality as changes the the PCC system are pushing the capabilities of Parrot beyond what the document accounts for. Third, it has an extremely narrow focus on IMCC/PIR, and is vague or is completely silent about any other possibilities, especially those (such as Lorito) that may play a dramatic role in the implementation of this system in the future.

PDD 03 doesn't tell our current users how to effectively use the calling conventions system of Parrot, and does nothing to direct our developers on how to improve it going forward. It really needs to be completely deleted and rewritten from the ground up. When it is, I think we will find some gold, both in terms of exposing virtues of the current implementation and describing plenty of opportunities for drastic improvement.

PLA: Release 1

2010-09-23T08:00:00.001-04:00

I'm happy to finally announce the first official release of Parrot-Linear-Algebra (PLA). PLA is an extension project for the Parrot Virtual Machine which brings linear algebra support and bindings to the BLAS library.

PLA has several dependencies (see below). Once you have those installed properly, you can obtain and install PLA using Plumage, the Parrot package manager:

plumage install parrot-linear-algebra

Or, if you prefer to do things the hard way, you can use this series of commands to obtain and build PLA like this:

git clone git://github.com/Whiteknight/parrot-linear-algebra.git pla
cd pla
parrot-nqp setup.nqp build
parrot-nqp setup.nqp test
parrot-nqp setup.nqp install

Dependencies

Before you get started with PLA, you must install some dependencies.

Parrot

You must have an installed copy of the Parrot Virtual Machine, version 2.8.0 or later.

BLAS

PLA links to the BLAS library. You can use either the standard reference BLAS library from netlib.org, or you can use one of the ATLAS or CBLAS variants. There are other implementations of BLAS, though PLA may not currently be compatible with all of them (patches and error reports welcomed!)

Kakapo

Kakapo is a development framework for the NQP language. It provides a unit testing library which PLA uses to implement its test suite. A special version of Kakapo is required for PLA release 1. You can use this sequence of commands to get and install that version:

git clone git://github.com/Whiteknight/kakapo.git kakapo
cd kakapo
checkout tag PLA-version-1
parrot-nqp setup.nqp build
parrot-nqp setup.nqp install

Linux

Sorry Windows guys! At the moment PLA only runs on Linux and other Unixy systems. Windows support is planned for future releases, but didn't make it into this one.

Next Release

PLA releases are not on a regular schedule. A new release might not come out again until new features are added or until something in the toolchain becomes incompatible.

Here are a list of TO-DO features that will eventually make it into future releases:

Bindings to the LAPACK library, to add more linear algebra utilities
Windows support, including a Windows installer
Writing bindings for other programming languages which run on Parrot, including Rakudo Perl 6.
Adding additional types, including special vector types and multi-dimensional tensor types.

If any of these things interest you, or if you have other cool ideas, please feel free to let me know and get involved!

New Participants Welcome!

Interested in PLA or linear algebra in general? You can fork a copy of PLA and start contributing right now! I'm happy to talk to interested contributors, and to add patches for bugfixes and new features into the core repository. Please let me know if you're interested in helping out.

If you have specific feature requests that you would like to see added to PLA, but aren't able to implement them yourself, please let me know.

Thanks

I want to offer a big thanks to the greater Parrot community for helping in various ways. PLA relies on several tools and libraries, the result of many man-hours of work by Parrot community contributors.

Parrot Fundamentals: Design

2010-09-21T17:00:00.114-04:00

Several days ago I wrote a sizable blog entry about the Parrot project, many of the ideas therein were discussed at length with fellow Parrot Foundation board member Jim Keenan when he came down to my area for a face-to-face meeting. Today I'm going to expand on some of those ideas, especially with respect to Parrot's design and roadmap.

Earlier this year I wrote a pretty negative review of PDD23, the exceptions system design document. Actually, that's something I really should repeat with some of the other design documents as well. There are certainly a few that deserve the treatment. Maybe I'll try to do that again soon.

Anyway, there is nothing special about PDD23; it isn't an especially bad example. That's telling, actually. Looking through that directory, I find that the documents generally suffer from four common problems:

Vague or Incomplete. Some PDDs are so incomplete, vague, or filled with holes that they are absolutely unusable for forming the decisions of our developers. PDD25 (concurrency) immediately comes to mind as one that contains very little actual design content. The majority of text in PDD25 consists of descriptions of technology, and reads more like an excerpt for "Threading for Dummies" than a real design document. PDD05, PDD06, PDD10, PDD14, PDD16, and maybe even PDD30 would also get put into this category. Notice that some of these PDDs are labeled as "draft" (and have been for years). I'm not really sure what a "draft" designation means, or how we go about getting them out of draft. I'll expand on that later.
Not Forward-Thinking. There are many cases of PDDs which act as little more than copies of function-level documentation. PDD09 is one that I am eminently familiar with which is in this category, and unfortunately I've written much of the text in that document myself. At the time I rewrote it, I really didn't have much of a concept for what a design document should be, and what it could be. PDD09 describes what the GC was about a year ago, and things have changed significantly since then (and are changing rapidly now). PDDs which are not forward thinking provide absolutely no direction to developers doing new work. PDD08, PDD11, PDD20, PDD24 and PDD27 are good examples of this.
Not In Sync with Reality. Some PDDs do not match the current implementation of things, and never really have (unlike the case above, where the implementation once matched the design, but has surpassed it). We have to ask in these situations why the design does not match the implementation: Is the design a lofty goal which the implementation is approaching by increments? Is the design unsuitable, and practical concerns have dictated a different approach? Did the implementation pre-date the design, and no attempt has ever been made to change it? PDD17 and PDD23 have some of this. PDD18 is also, by virtue of having never really been implemented.
Not Good Design. Some PDDs really just don't represent the kind of good design that Parrot needs in the long term. Think about how often and how loudly people complain about certain topics: The object metamodel (PDD15). The lexicals implementation (PDD20). Parrot's bytecode format (PDD13). Other PDDs may fall into this category too, as the implementation approaches the design and we start to find the flaws.

The first question I suppose we need to ask ourselves as a community is this: What is the purpose of these design documents? The second question we might want to ask is: Do we want to keep these documents at all? A good third question is, depending on the answer to the previous question: How do we want to go about improving the design documents to be what we want and need them to be?

The Parrot design documents can really be one of two things. The first is a form of summary documentation. Basically, the design documents would be a set of documents that distills what Parrot is currently capable of, and how it can be used. In other words, "interface documentation", or "man pages". A variant of this is to use the design docs as an a posteriori way to justify decisions that are made off-the-cuff by developers after they've already made their commits. A second possibility is that the design documents should be the forward-thinking technical goals of the project, the lofty goals that every commit and every release strives to reach, even if we never quite attain it perfectly. I think there is far more value in the later option, and I'll explain why.

First off, we have lots of function-level documentation. We have automated tests which read our source code files and verify that we have documented (at least in a minimal way) every single function. We also have lots of tests, although admittedly tests make for pretty lousy documentation in a general sense. They can be used as a kind of reference, of course, to see how something works, but you often need to know what you're looking for before you go out to find it. We have tons of code examples too. We also have an entire book to teach some things, though it could use some work in it's own right. Our documentation for how to use Parrot is not always great, but we do have it in plenty of other places that we don't need to use the PDDs for that purpose. If documentation is lacking, we should improve the documentation, not subvert the design documents to serve as another layer of it.

Forward-thinking, lofty designs are extremely valuable. Consider the example of a coder who finds a missing feature in Parrot and no design for such a feature available. So she takes it upon herself to come up with a design and implement it. Weeks later she emerges and shows the fruits of her labors to the community with a request to merge her work into trunk. We as a community look it over, find a few problems and, with good cause, we reject it as bad design. Not just something that we can tweak to become suitable, but something that is fundamentally wrong for us. Thanks but no thanks. Smell you later, Alligator. That contributor will probably storm off and will never be heard from again. Also, with good cause.

Now let's look at a similar example, except we as a community have done the work ahead of time of writing out our intended designs for this fancy new feature. We describe exactly what we want, and what any implementation is going to need. Our same intrepid developer follows this design, and when she emerges with her labored fruit, it is much more acceptable. With some feedback and small tweaks, it is approved and merged to trunk.

I don't want to say this is super common, but it isn't unheard of either for people to show up to an open source project, unannounced, with a gigantic patch for a new feature. It also isn't completely unheard of for those gigantic patches to be summarily rejected. The more common case is where we have interested and energetic developers showing up to the Parrot project looking for problems to tackle. Saying "We don't have JIT, we could really use one" is far more daunting than "we don't have a JIT, but we have a design, and a list of prior art that we would like to model on." Following a map to your destination is much easier than having to design your own map first and then try to follow it.

There's an issue of motivation too. A person is much more willing to start working on a project when they have certainty that they are doing the correct thing, and that the software they produce will be usable and desirable. It's much easier to follow a path, even a very bare one, than it is to cut your own. Not to mention that the destination is much more certain.

This year, Parrot had 5 GSoC students assigned to it. Of those five, four of them contacted me personally about specific projects I discussed here on this very blog before submitting applications. I don't take any credit for anything, I'm certain Parrot would have had several high-quality applications and projects without me. But I do know that people are more quickly and easily able to latch onto fully-formed ideas more than they can attach to nebulous and vague ideas. Also, people may not even be aware that their interests and skills align with things that Parrot needs until they know exactly what Parrot needs.

If we--as a community-driven open source project--want to increase the size of our developer pool (and I suggest that we should always want that), we need to communicate what we need and help prospective developers align themselves with those needs.

When a new person comes to the Parrot chatroom, or the Parrot mailing list, and says "I would like to get involved in a Parrot development project, what can I do?", we can say something stupid like "Look around and try to determine for yourself what needs to be done", or "We need everything". That's not helpful and not encouraging, even if it's the truth! Instead, we can say "Look, we have a list of projects that we've designed and prepared for, but we haven't been able to implement yet. Want to take a stab at it?" The former usually leads to a confused developer who never comes back. The later can lead to a new active, empowered, permanent member of our development team.

For the sake of this discussion, we'll accept the axiom that more active developers in the project is generally a good thing, and losing existing developers, or raising the barrier to entry so high that new developers do not join is a bad thing. I'll argue the point till I'm blue in the face, if anybody wants to take me up on it.

In this sense, having better designs and plans means a lower barrier to entry for new users since it's easier for them to find a project to work on and they can begin work with more surety that what they are doing will ultimately be desirable and acceptable to the community. It's also a good motivator for existing community members. When I finish a project and have some (rare) spare time on my hands, it's better for me to be able to go right down to the next item on a checklist instead of having to look around blindly trying to find something that needs to be done. Sure, there are ways I could focus my search, but I still have to hope that something obvious appears in my focused search that I can work on.

All that said, I think I can answer the next few questions pretty quickly:

Do we need these documents at all? Yes, I think we do. They can serve as an important tool to guide new and old developers alike. They can help inform and populate specific tasklists on the wiki and elsewhere, and serve as an organizational focus for teams of developers looking to improve specific areas of Parrot. Good design documents can also be used as a tool to initiate a bidirectional communications with our consumers: projects that use Parrot as an integral part of their tool chains. I'll expand on this issue in particular in a later blog post.

How do we want to improve these documents? The time when we can all sit back and wait for a design to magically appear from nowhere is over. Good riddance. We have plenty of people in our project who are not only capable developers, but actually pretty great software engineers and software designers. Beyond that, we have lots of people, both in our core project and throughout the ecosystem, who know all the kinds of things that are going to be necessary for projects built on Parrot to be successful. Getting enough smart minds together to tackle a design challenge should be trivial.

My suggestion is this, though it's only one possible suggestion and I am not going to argue at all about details. For each design, we form a team of dedicated developers who could be considered experts on the particular topic. It would be trivial, for almost any design document we have, to put together a team of 3-4 people with some expertise in that subject. With a team, we could go through a regular checklist to produce a decent design document:

Survey existing research, and find papers and prior art that we like.We could do this as a community before even creating the design team.
Get input from the ecosystem, and maybe a special advisory panel, to get an idea for what kinds of features are required, which features would be nice, and what kinds of things to avoid.
Give the design team some time to prepare and present a draft to the community
We all paint the bikeshed for a few days
We accept the design (assuming we actually accept it), and start developing towards it

If we have Parrot developer design meetings (PDS) every 6 months, that would seem like a great demarcation for this kind of process. At one PDS we identify an area that needs design help and assemble an initial team to pursue it. At the next PDS we check out the findings, maybe approve them, and start the concrete development work. If we need to look things over and push approval off, we can do it at a subsequent weekly #parrotsketch meeting.

In any 6-month stretch, we are working on a set of development priorities which have already been properly designed, and we are preparing designs to work on for the next six month stretch. Our last PDS meeting was April 2010. That means we're due for one coming up in October or November if we want to stick to a 6 month schedule. I definitely think we should try to have a meeting thereabouts so we can re-focus on our current development and current design priorities.

TL;DR: Parrot's PDDs are in a bad state, but they really do serve an important purpose and we need to make sure they get updated. PDDs should not be short summaries of other existing documentation. Instead, they should be forward-looking documents that describe what goals we are trying to reach. We need to get input from developers and also Parrot's consumers and end-users in shaping those PDDs. I'll post more about this later.

Woe Is Parrot!

2010-09-20T17:00:00.282-04:00

I received two quick comments to my blog post from yesterday. One of which was from Parrot developer chromatic, who always makes good points and always deserves a thoughtful reply. We did have a quick back-and-forth in the comments of that post, and he wrote a blog post of his own on the topic. Given all that, I wanted to write a follow up and try to be a little bit more clear about things. First, here's a part of my original post that drew his ire in particular:

Name me one other mature, production-quality application VM or language interpreter which does not support threading. This isn't an empty set, but it's not a particularly large list either. Name me one other application VM that does not currently sport an aggressive, modern JIT.

His first comment reply, in full, is:

I can name lots of VMs which fit your "Woe is Parrot!" criteria (Python, Perl 5, Ruby, PHP). Consider also the first few years of the JVM.

These are great examples and are extremely true, in no small part because I was not nearly specific enough in the point I was trying to make. Mea Culpa.

Before going any further, I want to clearly state my thesis with this series of posts and comments. It consists of three parts:

One, Parrot isn't nearly as competitive now in the dynamic language runtime realm as it could (and maybe should) be. Two, we can increase our rate of improvement with focused objectives and more developer motivation. Three, if we do that I project Parrot to be among the top-tier of dynamic language runtime environments within the next 1-5 years.

If you don't agree with that, stop reading. Everything that I say after this point is in direct support of that statement. Where it might appear that something I say contradicts that, I've probably mistyped something.

If we look at the case of multithreading, three of the examples that chromatic listed above do have some support. Python, Perl5, and Ruby all do support some variety of threading, of some usability level. PHP is the odd man out in this regard, though the argument could easily be made that the webserver niche where PHP primarily lives really doesn't need and doesn't want threading. I won't expand on that topic here, but I will say this: PHP does not support threading, and is definitely a production-quality language interpreter in use by many companies. Point made.

In terms of JIT, I was drawing an unnecessary and mostly fictitious line between programs which have far more similarities than differences, and I did little to clarify what I meant. So, I'll throw away that entire question.

The second part of chromatic's statement is a little bit easier to respond to. The first Java Virtual Machine was released in 1995. That's 15 years ago, and even though that initial release did not stand the test of time it was temporarily considered to be the state of the art. No, Java 1.0 did not support threading as we would expect it now. But then again, in 1995 it would have been much harder to find a multicore processor capable of exploiting the scalability of threaded application. In a single-processor environment, cheap green threads were definitely a competitive and acceptable alternative to true native threading. 15 years later, multicore processors are the norm, and a threading strategy based completely on green threads is hardly acceptable by itself. This is all not to mention a dozen other subsystems in the original JVM that were immature then and would be absolutely laughable now.

Trying to compare Parrot in 2010 to Java in 1995 is both telling and depressing. Sure, it's a victory for Parrot, but not one that we should ever mention again. I'm bigger and stronger than my 9 month old son. That doesn't mean I want to get into a fist fight with him (even though I would totally win).

Let me pose a question though that should provoke a little bit of thought (and, I'm sure, more anger): Consider a world that had no Perl language. Larry Wall got busy and worked on other projects for 25 years, and never released Perl to the world. Then, in a flash of light and unicorns in 2010 he releases, from nowhere, a fully-formed Perl 5.12 interpreter as we know it today. Like Athena springing fully-formed from the head of Zeus. Would you use that Perl today?

Would you use a language like Perl 5.12 if you hadn't already been using it, if your job wasn't using because it had been using it for years, if neither you nor anybody in your company had prior expertise in it and were not demonstrably able to work miracles with it? Would you use Perl 5.12 today if there wasn't a huge preexisting community and if there wasn't the miracle of CPAN? Would you use Perl 5.12 knowing about some of it's idiosyncrasies, the weaknesses of it's default exceptions system, the uninspiring nothingness of it's default object metamodel, or the sheer fact that in 2010 you still can't define function parameters with a meaningful function signature? Is that an investment of time, energy, and maybe money that you would make considering some of the other options available today?

Now, I'm not ragging on Perl. I want to state that very clearly before chromatic buys a plane ticket, travels to Pennsylvania, and punches me in the back of the head. Perl 5 is a fine language and obviously doesn't exist in the kind of vacuum that I contemplate above. There is a huge community of Perl programmers. There are vast amounts of institutional knowledge. There is the entirety of the CPAN. There are modules like Try::Tiny, and Moose, and Method::Signatures which can be used to build some pretty impressive modern (and even "post-modern") things on top of Perl 5's default tabula rasa. On top of all that, Perl is demonstrably stable. Robust. Flexible. Usable. Coders in other languages invent terms like "Rapid Application Development" and "Rapid Prototyping" to describe what the Perl people call "a slow day at the office". People everywhere may debate the aesthetics of sigils and the multitude of operators, but nobody questions the fitness of Perl for use as a programming tool. It's competency and utility are unassailable.

Here's my point: Take a look at the Perl 5 base technology. Take a serious, hard look at it. At the very most, the stand-alone Perl 5 interpreter is flexible, but technologically unimpressive. Nothing that the base Perl interpreter provides is the jaw-dropping, nerd-orgasming state-of-the-art. I could point to a dozen performance benchmarks that pit modern Perl 5 against modern Python, Ruby, Java, and whatever else, and Perl 5 almost always comes in dead last (notice that we're not benchmarking the time it takes to write the code). That's fine. It is in the context of history, community and ecosystem that Perl 5 becomes a strong competitive force in the world of computer programming. We know that a great Perl coder can write more functionality in one line of apparent gibberish than a Java coder can write in a whole page of code. We know that the same great Perl coder can write his solution faster. It is because people have written the tools and modules, that people have identified best practices, that people can do so much in so little time, and because people have taken the time to distill down to the elegant essence of Modern Perl, that we love and use Perl.

The problem I identify in Parrot is a bootstrapping problem. Perl 5 has plenty of reasons to use it besides just performance and a technical feature set. Parrot does not. Parrot needs to provide a massive, overwhelming technological impetus for new developers to use it and to get involved with. Attracting those new developers further accelerates the pace of development, both of the core VM and the ecosystem. All these improved components, in turn, attract more people. It's a bootstrapping problem, and Parrot needs something compelling to get the cycle started.

Make a great product. Attract more minds. Develop a bigger ecosystem. Build a better product. Repeat.

In his blog post, chromatic takes exception to the word "mature" I used in the previous post. I won't use that word any more. In his comments, he also expressed a dislike of the word "enterprise". I won't use that word either. They were probably bad choices.

In his blog post, chromatic says :

His argument is that a focus on threading and a focus on JIT is necessary for enterprises or language communities to consider Parrot a useful platform.
I can see his point, and yet (as usual) I challenge the terms of the debate itself.

Do challenge it. That's not really what I said. I mentioned the two particular cases of threading and JIT as things that I think Parrot is going to need to be competitive in the world of 2010. Perl 5, and Python, and Ruby, and all sorts of other things that chromatic mentions don't have both these things, so the counter-argument appears that none of these are suitable, competitive platforms either. That's not what I said either. Keeping up with the comparison I've been trying to make between Perl 5 and Parrot, here is a summary view of what both bring to the table:

Perl 5: Reasonable, but not blockbuster performance. Huge ecosystem of modules, add-ins, tools, and applications written in Perl. Huge preexisting developer base with large amounts of institutional knowledge. Long and storied history of robustness, stability and fitness. Institutional inertia (people using Perl next year, at least partially because they have been using it this year).
Parrot: Reasonable, but not blockbuster performance. Extremely small (but growing) ecosystem of dependent projects. Extremely small (but growing) preexisting developer base. Very little history of Parrot being stable and robust, considering the huge changes that the project has to make on a regular basis to improve itself.

So if you're a developer, or a manager, a graduate student, or a hobbyist, or anybody else who has a great idea and is looking for a platform on which to implement it, which of these two would you choose? I'll give you a hint: If you reach into your pocket for a coin to flip, or reach to the shelf for a magic 8-ball to help answer the question, you probably need to re-read the choices more carefully.

JIT is a feature that Parrot can use to set itself apart from the pack, not something that's a necessary requirement to join in. JIT is a leg up that Parrot can use to gain some traction against another runtime environmnt like Perl 5, Python 3, Ruby, or PHP 5 which have so many compelling stand-apart features of their own. Stable and scalable threading is another one. And Parrot needs to be fast. When groups like Facebook are talking about compiling PHP code down to C, you know that performance is an issue in the world of dynamic languages. It is foolhardy to think Parrot can succeed (for any definition thereof) without dramatically improved performance over what it offers now.

In the end, this is really a fallacious argument anyway. I'm sure chromatic has pointed that out by now. Parrot isn't a language like Perl 5 is a language, so the two aren't really comparable in a direct way. Parrot doesn't target the same kind of audience that Perl 5 targets. Parrot targets people like the ones who make Perl 5.

The idea of porting Perl 5 to run on top of Parrot was once kicked around in a semi-serious kind of way. I don't remember what number it was exactly, I think it was something like the 5.14 release was supposed to be running happily on top of Parrot. Let's revive that discussion a little bit. What kind of feature set would Parrot need to have to make a compelling argument for the Perl 5 development team to focus their energy whole-hog on porting to Parrot instead of improving Perl 5 in place? What kinds of tools would Parrot need to provide to smooth the way? If I want to see Perl 5.98 be released on Parrot, what do we need to do to make that happen? In answering this, I'm more interested in hearing about the shortcomings of Parrot (which I can work to fix) than the shortcomings of Perl (which I will not).

Rumors have been floating around for over a year now about a complete rewrite of PHP called PHP 6. They want unicode support built-in. We have unicode support built-in. What do we need to do to make a compelling argument in favor of building a new PHP 6 language on top of Parrot? What do the PHP designers and developers need to see to be convinced that Parrot is the way to go, instead of pulling out the old C compiler and starting from scratch?

In 5 years when maybe Python and Ruby people are looking to rewrite their languages, what do we need to have on the table to convince them to use Parrot as a starting point.

These are the important questions. Nothing else really matters. If language designers and compiler developers don't use Parrot and don't want to use Parrot, we've lost.

Parrot needs honest, constructive criticism. It is neither offensive nor overly aggressive to provide it. We need to set aggressive, but realistic goals as a team. There are several planned parts of Parrot that need to be implemented, and several existing parts that need to be re-implemented. Good goals will help to inform those designs and tune those implementations. Eventually, our wildest dreams can become reality.

Parrot as a Mature Platform

2010-09-19T08:42:00.001-04:00

Yesterday I met with Jim Keenan, fellow neophyte on the Parrot Foundation board. We got together for an informal get-together yesterday at a Barnes and Nobel, and had a highly productive little chat about Parrot: the foundation, the culture, the community, and the software. Obviously no "official" business happened at this little meetup, but we did get to know each other better and discussed a number of things. In this post, and others in the coming days, I'm going to talk about some of the points that came up in this meeting.

I've said many times, on this blog and elsewhere, that I don't think Parrot is currently a mature platform. It is certainly not suitable for use in a professional, production environment. I mentioned this sentiment to Jim, and he asked for some clarification. What did I mean by that?

Let me illustrate with a few examples.

You're employed as a system designer, and are preparing sketches for your companies next-generation software product. The success or failure of your design will have extreme effects on your current position, and maybe even on your career in the long term. In short, the design needs to be solid, flexible, expandible, robust, and all sorts of other good things. So here's the question: Do you choose Parrot as the basis for your new system, right now? If not, why?

Thought about it? Let me share a few answers with you, in the form of more questions. Name me one other mature, production-quality application VM or language interpreter which does not support threading. This isn't an empty set, but it's not a particularly large list either. Name me one other application VM that does not currently sport an aggressive, modern JIT.

Here's another example. Go to Google Scholar and search for papers and patents involving virtual machines. What percentage of the resulting papers use the Java Hotspot VM as the basis for their research? The .NET CLI? Smalltalk? Now take a closer look and count up how many results are based on Parrot? How many papers even mention Parrot in the footnotes?

You're a graduate student pursuing a PhD in CompSci. You want to spend the next three years of your life researching some new feature on virtual machines, the results of which will have major effects on your career and maybe will even influence whether you graduate or not. Do you choose to implement and study your fancy new feature on Java Hotspot, or on Parrot? Why?

Here's another example from a different direction. You work as the president of a large company which is reasonably sympathetic to the goals of the Parrot project. A Parrot contributor comes forward with a grant proposal to implement an exciting new feature. Do you cut a grant a sizable grant to support that developer through the production of that feature on Parrot, or do you spend your money elsewhere? In other words, do you expect to see a return on your investment, and do you expect that money spent on Parrot as it currently is will be well and efficiently spent? If you were doing your research and read over the long-term Parrot design, and if you were looking at the current state of the Parrot community and the community leadership, would you feel comfortable and confident to invest in it? Why or why not?

Turning that same example around a little bit, pretend that you're me. As a foundation board member, I'm going to take that grant request from the developer, put together all the necessary paperwork and approach a philanthropist with it. What do you think are the odds that I would get laughed out of the room and told never to come back?

Even though it's been 10 years since the start of it, Parrot really is a young project. Our long-term designs and goals are lacking. We have some extremely talented, enthusiastic, and energetic contributors, but we don't always do a great job of organizing and motivating them. There are plenty of areas where we do pretty well, but I can't think of a single aspect of the project or the community that we couldn't tune and improve. How much better do you want Parrot to be?

I want to be very clear about one thing here: I am not being insulting or disparaging about Parrot. It is not an insult to say that Parrot is not ready for enterprise-level production deployment. It is not disparaging to say that Parrot isn't a sure bet to make when careers and livelihoods are on the line. What we do need is honest self-assessment, and to use that as a basis for making long term plans and goals.

Starting from that honest self-assessment, we can start asking the important questions. Where do we go from here?

Hypothetically, we approach the Python Foundation and say "in 5 years, we want Parrot to be at a level where your premier, standard Python interpreter implementation could be implemented on top of it. What would you need to see in order to comfortably make that kind of decision in favor of Parrot?" Ask the same question of the Ruby, PHP, Perl 6, and Perl 5 interpreters. What does Parrot need to do in order to convince the leaders, architects, and developers of these projects that Parrot is a modern, competitive and even a desirable platform on which to build the next generation of their software? How long do people reasonably think it would take for Parrot to get into that condition? 1 year? 3 years? 5 years? There are no wrong answers, but the more honest we are with ourselves, the more certain we can be laying out a comprehensive roadmap to get us there.

In the coming days I'll address some of these issues. The point of this post was to get people thinking, dreaming, and doing both big.

Parrot has Smolder Again!

2010-09-15T20:47:00.000-04:00

Parrot's smolder server, previously hosted by plusthree.com, has been down for some days. Today, due to some concerted effort by particle, dukeleto, and the friendly folks at OSUOSL got a new instance of smolder set up for use by Parrot.

I introduce you to http://smolder.parrot.org

Reports for Parrot proper are already flying in, but what makes this smolder server so special is that we can add support for other projects as well. Half a dozen other projects are also able to tracked on Smolder: PLA, Lua, PLParrot, Partcl, Rakudo, and Plumage. Not all of these projects have the necessary infrastructure to perform the actual uploads yet, but within a few days I'm sure they all will.

Tonight I updated the PLA setup program and test harness to support uploads to smolder, and a few minutes ago I posted the first automated report. We posted a few test reports manually before that too. Everything is looking good so far, though I do have a few tweaks to make. Specifically, I want to include more information about the version of BLAS and LAPACK that are used in the report, which should be easy enough.

PLA Documentation Is Up

2010-09-10T22:15:00.001-04:00

I've finally gotten some online documentation for PLA up at Github Pages. The colors are the same standard dark scheme that I use for everything. There are some things in this world that I consider myself very good at, but graphic design and effective use of colors are not in that list. If somebody with a better grasp of the color wheel would like to take a stab at a redesign, I would be very happy to accept patches.

Documentation for PLA had been written in POD, and embedded directly in the PMC source files. This is a decent system, and is what Parrot uses, but I was becoming unhappy with it. The problem I was having was that even though converting POD to HTML is supposed to be a well-understood and oft-solved problem, I couldn't find a converter I liked that produced good-looking output. Also, I was having a hell of a time finding a tool that would give me any kind of flexibility with how the output HTML was generated, formatted, or organized. The output of the standard pod2html is horrendous, and I've found it to be very difficult to style without modifying the generated HTML by hand.

I also am not really too happy with the way POD embeds in source code. It's too abusive of vertical space, and causes files to become completely bloated. Not to mention the fact that I don't think it's really the right solution to the problem. I could have looked into something like Doxygen, but that's only marginally better. Sure, Doxygen uses less vertical space in a file, but it's still embedded documentation that attempts to do more than it should. Documentation for prospective users should be different than documentation for prospective hackers. If it's not different, I can guarantee that one of the two groups is getting the short end of the stick. If you have documentation for hackers (as you would expect to find embedded in the C source code file), the users are going to be stuck sifting through page after page of internals minutia. What the users really want to see is information about the interface and how to use the tools, not details about every function in the C code file.

I thought about writing a POD parser in NQP that would do what I wanted it to do, but by the time I realized how much effort that would take me, I had basically decided against using POD entirely. I do think that a standalone POD parser written in a Parrot-targetting language (NQP comes to mind immediately) would be a good thing, but I am not inclined to make it myself just yet.

I thought about adding separate POD files to form user documentation, separate from the hacker documentation embedded in the source code. However, this didn't sit well with me either. First, as I mentioned above, the generated HTML I could get always looked terrible (and I couldn't find any compelling alternate tool which might generate better-looking pages) and it didn't really give me the flexibility that I wanted to have. It got to the point that it would have taken me more effort to write the tool I needed to get the resulting output that I wanted than it would have taken me just to rewrite the documentation using a different markup language.

Finally I decided to embrace Github Pages wholesale. Github Pages uses the Jekyll preprocessor, which takes input in Textile, Markdown, or HTML. It gives me a lot more flexibility to break documentation up into arbitrary little chunks and keeps pages themed in a consistent way. I decided on Textile, which in my mind is easier to read and write than I ever found POD to be. So I rewrote most of the documentation in Textile with some Jekyll processor magic thrown in, and I'm pretty happy with the result.

So happy in fact that I've considered maybe changing my entire blog format over to using Jekyll and Textile. I'm not ready for that kind of changeover just yet, but it is something I probably want to look into eventually.

Along with some of the fixes I've made to the PLA test harness and test coverage in the last few days, this is basically one of the last requirements I had for putting out a new release of PLA. I now feel like I'm prepared to cut the release shortly after Parrot 2.8.0 comes out.

PLA Release Is Near

2010-09-07T17:00:00.000-04:00

I added in support for HLL mapping of autboxed types in Parrot-Linear-Algebra, and with that I feel like I'm getting pretty close to a good point for cutting a release. I don't yet have any tests for the autoboxing behavior, so I do need to write those first. Shortly thereafter I think I can get to work on the release.

HLL mapping, for readers who may not be familiar with the term, is a very cool feature in Parrot. It allows the user to manually re-map basic types to user-defined types instead of the built-in varieties. When the VM would normally create an Integer PMC, for instance, it would instead create a custom "MyInteger" type (or whatever you called it). You can use the HLL mapping functionality to override many built-in types in many operations. What's super-cool about HLL mapping is that the mapping is defined in a particular HLL namespace, so programs with modules written in different languages could allow each module to define it's own type mappings that do not conflict with each other.

In Parrot, an "HLL namespace" is a special type of top-level namespace object which allows the use of type mappings, among other things. In PIR code, defining an HLL is simple with the .HLL directive:

.HLL "MyLanguage"

All regular namespaces defined below that directive will be inside the HLL namespace. This means that so long as we make proper use of .HLL directives, we can maintain almost perfect encapsulation between modules written in different languages, which can be an extremely valuable thing for proper interoperation.

The default HLL is called "parrot". If you don't specify an HLL directive, you will automatically be inside the "parrot" HLL namespace. To get back to there, you can type (in PIR):

.HLL "parrot"

I started writing some tests on Saturday, and discovered two problems that brought me to a halt: First, NQP doesn't have any built-in way to specify an HLL namespace. I also wasn't able to find any crafty, sneaky way to inject one either. Second, HLL type mapping doesn't work in the parrot namespace.

The second problem turned out to be the most frustrating because the test programs I was writing were silently failing for no visible reason. The mapping method appeared to execute and return correctly, but none of my types were being mapped. Dutifully, I filed a bug about it. It turns out that this is by design, not accident, although I wish that it had been a little bit better documented. HLL mapping lookup operations can be a little bit expensive, so we don't want to be doing that all the time. Also, the parrot HLL is supposed to be a default, neutral space and one module shouldn't be able to break encapsulation and modify that HLL namespace in a way that would adversely affect other modules written in other languages. NotFound put together a commit that throws an exception when a type mapping is attempted in this HLL, so that allays my concerns that not only was it failing, but it was failing silently.

I also started putting together a patch in my fork of NQP-RX that adds HLL support, but the patch isn't very mature yet. If I can get this patch ready and merged into NQP-RX master in time for the 2.8.0 release, I will write up the last remaining PLA tests for this feature in NQP. Otherwise, I will write them in PIR.

The PLA release is going to target Parrot 2.8.0. There are a few things that I want to do first, before the release is out the door:

Finish writing the tests for the HLL mapping behavior, which might involve finishing up that patch to NQP
Write up some decent public documentation. The default output of pod2html is pretty ugly looking, so I may end up writing a custom converter. I've started experimenting with Github Pages, though my experiments so far in using the pod2html output there have not been too attractive. I may go through and reorganize all the POD documentation source anyway. I definitely want to expand documentation and examples of certain features.
Get a release, or pseudo-release of Kakapo that targets Parrot 2.8.0. If Austin isn't able to get a working release of that software that's up to his standards in the next two weeks, I may pick a revision that works well for my purposes and tag it on my Github fork. It won't be the same as a real release of Kakapo, but at least it won't be a stumbling block for me.
I need to check and double-check that the setup script for PLA is doing all the correct things with respect to releasing. I need to check that the generated Plumage metadata is correct and allows complete and functional installations using Plumage. I also need to check that I can generate correct .deb and .rpm packages for those systems.
I want to look into creating a windows installer, but I make absolutely no promises about that. I certainly haven't done any testing whatsoever on Windows so far, and I do not have high hopes that it will work at all there. This may be a task for the next release, or later.

In the span of about two weeks, we could have a release of a high-performance linear algebra toolkit for Parrot. It obviously doesn't have a huge amount of functionality yet, but it is a good start and provides a solid base of some important standard operations. I've got a lot of plans for the future of this little project, but we're at a good point right now and I think it will make for a very nice release.

Parrot Foundation: Current Work

2010-09-06T21:45:00.000-04:00

I've only been a member of the Parrot Foundation board for a few days now, and I already feel exhausted by it. I've read through dozens of pages in IRS Publication 557, I've been pouring over several incomplete drafts of Form 1023. I'm not certain that either of these two documents are written completely in English.

Luckily for me, because of the time I spent in the Wikimedia Chapters Committee, and because of some of the work I did with Wikimedia NYC and the now-defunct Wikimedia Pennsylvania, I'm not unfamiliar with this documentation or the process of gaining tax exempt status for a US non-profit corporation. It has been at least a year or two since I've really looked at this kind of paperwork, however, so I'm taking the crash-course to get back up to speed with it all. Some of it does seem more daunting than I remember!

I've also been digging through invoices and financial records, and lots of other paperwork as well. Some of the information and documentation that I need is readily available. Some of it might not be (or, I may be looking in all the wrong places). Either way, I'm setting a break-neck pace trying to get through it all and form a comprehensive view of the current state of the Parrot Foundation.

My goal is to put together a report about the current legal and financial state of the Foundation, and present that report to the other directors at the end of the month so we can start coming up with a plan of action for the coming year. Shortly thereafter I would love to send a status update to the general foundation membership, so we can make sure everybody is well informed about the state of the foundation and also so we can solicit some input about the future direction of it.

I will definitely post updates, either here or to the members mailing list as I get more information. However, don't expect too much until I am ready with my full report, probably around the end of September. As a member of the board I have a lot of plans for the year, but I don't want to start anything until I make sure the foundation is on a solid footing with all the right documents properly filled out and filed with all the right organizations. It might be a big job, but it's one that I hope to complete by the end of 2010.

Parrot Foundation Board

2010-08-31T20:56:00.000-04:00

Today at the weekly Parrotsketch meeting there was also a meeting of members of the Parrot Foundation, with the aim of electing board members for the coming year. All but one of the current board members chose not to run for re-election, so it was a pretty open field.

Four candidates were nominated, and today all of them were elected by simple majority: Jerry Gay (particle), Jim Keenan (kid51), Jonathan Leto (dukeleto), and myself.

Here's a brief discussion of what I want to do this year as a Parrot Foundation board member:

Re-read and re-reread the foundation bylaws. I've read through them a while ago back when the foundation was first forming, but don't think I've looked at them since. Plus, I've read through so many sets of bylaws for so many young organizations in my time, the details all run together in my mind.
Money. The foundation doesn't have a whole ton of money, of course I can't think of too many non-profits of this size which do. There are lots of things that can be done with money: grants and bounties being two that immediately come to mind, but there are plenty of other things. Finding ways to raise money and increase the PaFo coffers should probably be a pretty big priority for us.
Create a proper membership committee. Parroteer "smash" has been running the elections and doing a fantastic job, but some problems have been exposed in the membership process. Many people do not know whether they are members or not, and many people are confused by how a person becomes a member. Clearing up confusion in this area will help everybody.
Recruit new people. People are the fuel that runs an open-source project, and you can never have too many people. Parrot is by no means a small open source project, but it is far from being a big one. More developers create more/better software, which attracts more end users, which increases the prestige of the project, which attracts more developers, which.... It's a feedback loop that we should be trying much harder to feed.

These are just four things that I would like to focus on this term, however this is not a definitive list. I am hoping to get, within the next few days, some information from the current board members about what kinds of tasks they have left unfinished, or what kinds of things they would try to accomplish if given another term. Since there is so much new blood on the committee, we need to make sure to thoroughly pick the brains of the current board so that we can get this coming year off to a running start.

PLA Planning: Next Steps

2010-08-23T17:00:00.000-04:00

I did a little bit more infrastructure work on Parrot-Linear-Algebra this weekend. I wasn't able to get a lot done because we spent Saturday with family, but I got some nice groundwork laid for important future developments.

I started thinking about bindings for the LAPACK library, and much of the work I did this weekend was in preparation for that work. LAPACK is particularly tricky because it's written in FORTRAN and unlike the ATLAS library I use for my BLAS implementation, there aren't any good C bindings to LAPACK around that I can find. Yes there is the CLAPACK library, but I am not a particular fan of the way that library is generated and there aren't any premade packages for CLAPACK on Ubuntu. In installing PLA on Linux, I would like users to be able to get all prerequisites from their distribution's package manager and not have to compile these other prerequisites manually. This is a good thing since some of these can be extremely tricky to compile.

I've decided that, in the foreseeable future, we will be linking to the LAPACK library directly. This does mean that we are going to need to write our own bindings and wrappers for the functions we want to call, but that's not a huge issue. Plus, I can start moving forward on something I've wanted to have for a while now: more flexible and configurable interfaces to BLAS and LAPACK. There are several implementations of each of these libraries (BLAS especially), and I want PLA to support as many of them as possible.

To get these kinds of flexible interfaces, we're going to need to support large swaths of conditionally-compiled code. This will get extremely unweildy if we try to keep all the logic in the few monolithic PMC files I've been writing, so I decided that we're going to need to upgrade our build machinery to support linking in additional C library files into the dynpmc build. This weekend I added that machinery, and started the process of factoring out some of the supporting routines from the various PMC types into those files. With common functions defined in one place, I can go through and make sure all my types are making consistent use of them. Plus, I can start writing up complicated library bindings without worrying about making my large PMC files any larger.

BLAS doesn't actually supply too many routines that the PLA matrix types need. The most important, and only one I call so far, is the GEMM routine, which performs the generalized matrix multiply/addition operation:

Y = αAB + βC

Where α and β are scalar values (FLOATVAL in NumMatrix2D and Complex in ComplexMatrix2D), and A, B, and C are matrices of compatible dimension. This is a very powerful operation, and really the only one from BLAS that I need. Other routines from BLAS are more highly optimized versions of this same routine for specific types of matrices (diagonal, triangular, etc), and operations between matrices and vectors which I really don't support yet. Because there is really only this one routine, and because it's so powerful and common, I've included it as a method directly in the PMC types themselves. I may rethink that decision later, but for now it seems like a decent fit where it is.

LAPACK is a little bit different, however. LAPACK contains hundreds of routines, all of which are named in an idiosyncratic and difficult-to-understand way. These routines are broken up into drivers, computational routines, and auxiliary routines. The driver routines perform the most important operations and typically call sequences of the lower-level computational and auxiliary routines to perform their work. Driver routines include calculating linear least squares, finding eigenvalues and eigenvectors, and singular value decomposition.These are the ones that PLA is going to target first, though access to other, lower-level routines might be nice to have as well, later down the road.

I am torn as to how I want to provide LAPACK bindings in Parrot. The obvious thing I could do is write them all up as methods on the PLA matrix types. This is a system that provides the easiest access, but also starts to get bloated and unweildy very quickly. Plus, if you don't need LAPACK routines for your program, I would be forcing you to load the LAPACK library and create Sub PMCs for all those methods. That can get to be a pretty large cost to pay. I really don't like this idea because it adds huge bloat to my PMC files and adds extra runtime overhead of adding in all sorts of extra method PMCs when we load PLA. Plus, it doesn't scale well, any time we want to add a new LAPACK-based routine we need to add a new method to several PMC types present and future.

One alternative is to provide a separate library of functions and dynamically load it using Parrot's NCI tools. I don't know what the current status of the NCI system is, but without significant improvement there, attempting to load a complex library like LAPACK, even if I provided some simplified Parrot-specific wrappers, would be a nightmare. This option does provide a little bit more flexibility because we can load in the methods that we want without having to deal with ones that we don't. Also, I don't think the distutils library that PLA is using supports creating standalone shared libraries that aren't dynpmc or dynops libraries, so I would have to write all that machinery myself. I'm not sure I really like this option either, for all these reasons.

I'm also torn about whether I want to make some prerequisites for PLA optional. Kakapo, for instance, is only used for the test suite. If you don't have Kakapo installed you can still build and use PLA, you just can't run the test suite. BLAS is currently required because it's used to implement the GEMM method but also the various multiply vtables. I suppose I could make it optional, but I think it would be a little sneaky if different builds of PLA supported a different list of VTABLEs and METHODs for core types, including for something as fundamental as multiplication. LAPACK is a little bit different, because even without it we do have a pretty solid amount of core functionality. I am strongly looking into an ability to make LAPACK an optional prerequisite, and only include bindings for it if we have the library installed. If I wrote a dynamic library wrapper for LAPACK and had the user include that file manually to get those routines, you could still use the core library.

I had also considered the idea (though I've been fighting it) of building some kind of PLALibrary singleton PMC type which would control this kind of stuff. In addition to providing information about library version and configuration options, a PLALibrary PMC could allow dynamic loading of methods, and could forcibly inject those methods into the necessary PMC types if requested. This would allow me to build everything together into a single dynamic library without forcing me to define all the methods upfront which I don't need. I may like this idea the best, though I've been resisting creating a library singleton PMC until now.

There's a lot to consider, and I probably won't make any hard decisions about future plans for the next few days. In the interim I do want to put together some proof-of-concept wrappers for some common LAPACK routines, and start playing with those. If I could get routines to calculate Eigenvalues, Eigenvectors, Inverses, and Singular Value Decompositions, I think that would be a pretty great start.

GSoC Threads: Chandon's Results

2010-08-21T08:00:00.000-04:00

The pencil's down date for the 2010 GSoC program has come and gone, and now we're starting the process of sorting through the rubble and seeing what all got accomplished this summer. I have been purposefully trying not to provide too many of my own status updates for Chandon's project this summer, instead wanting him to post his own updates as he went without me getting in the way. It's his project after all, and I wanted him to get all the credit for all the good work he was doing.

The project is now over, and he's posted a nice final review of what he managed to accomplish this summer. While it isn't everything that he mentioned in his proposal, it still is quite a nice step in the right direction and a lot of the necessary framework is now available to bring threading to Parrot in a real and usable way.

So what did he manage to do? In his branch there is now a more-or-less working implementation of green threads, which means Parrot is getting onto the same level of capability as modern Python or Ruby implementations. You can schedule multiple independent tasks to run and, while they cannot run truly in parallel on separate OS threads or even on multiple processor cores, you can get the illusion of concurrency. It also gives you the ability to run multiple tasks together without having to explicitly schedule them one after the other, or having to manually switch back and forth between them. In a later post I may go into some detail about how this all works, and how it can be used by programs.

This is a pretty significant step forward, and once a few of the final nits get worked out I think we will be able to merge this into Parrot trunk and start making use of the new functionality immediately. However, green threads is only one half of the promised "hybrid threads" that Chandon's proposal hoped to achieve. The other half is the ability to run Parrot code in true parallel on separate OS threads. This was the larger part of the project and definitely the more difficult piece. Today I would like to talk a little bit about why it didn't get done, and maybe motivate some people to look into help completing the work as we move forward.

Let's take a look at a small snippet of normal object-oriented code:

foo.bar();
foo.bar();

Extremely simple, we have an object "foo" and we are calling the "bar" method on it. In a very naive staticly-typed language this would be a simple operation. The compiler would determine the type of foo, calculate the location of the bar routine in memory and would simply call that address twice. This would be extremely fast to execute too, which everybody likes. This would basically be converted to this low-level pseudo-code:

call bar(foo)
call bar(foo)

Now let's move up to a slightly more complicated example: a statically-typed language which allows subclassing and method overriding. Now, the compiler doesn't necessarily know which function in memory corresponds to "foo.bar()", since foo could be of type Foo, or of type FooSubclass, or even FooSubSubclass, and the location of the appropriate bar function would change with each different type. However, for each type, the value of bar does not change. It can be overridden by subclasses, but it cannot be changed for the class itself. Now, the compiler needs to ask foo to get the appropriate method first:

method _bar = get_method(typeof(foo), "bar")
call _bar(foo)
call _bar(foo)

Assuming foo does not change type inside the call to _bar, this code works just fine. Next let's look at a more complicated example still, the case of a dynamic language like Perl or Python. In these languages the class MyFooClass is an object of type Class, and foo is an object of type MyFooClass. Also bar is not a compiled-in constant property of the Class, but is instead a mutable property of it. Inside the call to bar, maybe we change the definition of bar itself inside the class object. Likewise, the definition of routines like "find_method" inside Class can change. Accounting for all this dynamicism, our code changes to look like this:

class fooclass = find_class(foo)
method _get_method = find_method(fooclass, "bar")
call _bar(foo)
class fooclass = find_class(foo)
method _get_method = find_method(fooclass, "bar")

call _bar(foo)

Keep in mind that everything can change inside each call to bar. We can:

Modify the inheritance hierachy of MyFooClass to add or remove parent classes
Add or remove methods in MyFooClass, and any item in it's inheritance hierarchy
Add or remove methods on the object foo itself (like in javascript where foo.bar = function() {...})
Change the class of foo

Let's bring this back to the idea of threading before I go too far off topic. Every time we call a method, or attempt to access a field on an object, we need to look those up in the associated Class object first. Those Class objects need to be global, because all code on all threads need to be able to lookup methods on any object, because we can pass objects between threads and we need to be able to work with them.

If classes are global, and need to be accessible from multiple threads, we inevitably run into the idea of contention: If one thread is changing the definition of foo.bar() at the same time that another thread is attempting to look it up, the thread doing the reading may end up with incomplete, corrupted, or incorrect information.

Global data is the enemy of good multithreading. Global data is a necessity for highly dynamic object systems like those supported by Parrot. Ergo, multithreading on Parrot is hard.

Look at all the global data in Parrot: there's the interpreter object itself, the packfiles, the list of Classes (and the methods/properties that they contain), the list of Namespaces (and the global values they contain), the context graph (and the contents of all registers in those contexts), the opcode table and the MMD signature cache. This is a hell of a lot of globally-available data, and we're going to need to find a sane way to limit corruptive concurrent access to these things. Not only that, but if we have to lock every single access to a global value, which means we need to obtain a lock every time we want to do a method call, our performance is going to be terrible.

By convention I think we can all agree that things like the opcode table should really not be modified when we have multiple threads running. But it's harder to make such a convention that class/method definitions should be treated as constant when multiple threads are running.

What we can do to alleviate some of the performance problems is to create short-term caches for things like Classes and Methods in local threads, with the understanding that without explicit synchronization, the exact ordering of read/write operations between threads cannot be guaranteed and can be scheduled for maximum benefit. If the relative execution times of two operations on separate threads cannot be guaranteed, then it makes no difference to functionality for the user whether those operations happen at random times with respect to each other and whether Parrot manually orders them to improve caching.

Think about it this way: We have two threads, one is writing a global value and one is reading it. These threads are operating in true parallel on two separate processor cores. If we run this program a million times, sometimes the read will randomly occur before the write, sometimes the write will occur before the read, and sometimes they will happen at the exact same moment and cause corruption. Depending on a simple matter of random timing, we could get three very different results from our program. Now, if Parrot implements a heuristic that if a write and a read happen within a very short period of time, Parrot will manually order them so that the read occurs before the write. And, if the read always happens before the write, maybe we don't read at all, but instead use a cached version. Then, only when the write has complete, we update the cache.

Another thing to think about is that we could disallow direct lookups of data in global variables, and give each thread a local copy of global values. Considering the principal I mentioned above, we can be a little bit liberal with the way we handle writes and reads. If a thread modifies its copy of a Class, those changes could be propagated to a global reference copy at a convenient time, and then propagated down to the local copies later, only when it's convenient to do so. Remember, if things are happening at random times compared to each other, it makes no substantive difference to the thread whether it's copy of a global variable is exactly up-to-date or whether it's a little bit lagged behind. That is, the reading thread doesn't know whether the writing thread had a legitimate delay before writing, or whether Parrot manually scheduled the write at a different time.

To get a complete Hybrid Threads implementation in Parrot like what Chandon was envisioning, we are going to have a few steps to take:

We have to break the Interpreter structure up into two parts: One which is truly global and contains data which all threads need, and one which is local for each thread and contains data that the thread needs. The local interpreter may contain a cache or a mirror of data in the global interpreter.
We need to come up with a good sane scheme (which will likely consist of both technical solutions and programmer conventions) to manage access to global values
We need to come up with a good sane scheme for sharing values between threads.Creating a local variable and passing a reference to it to another thread essentially turns it into a global variable and opens up problems with contention. We need a good way to manage this without slowing everything down.

All of these problems are solvable, and if we put forward a concerted effort we could have a decent hybrid threading system in Parrot within the next few months. We almost have a complete working Green Threads system ready to merge now, and hopefully that will be enough for our users while we get the rest of the details of the hybrid system worked out and implemented.

PLA: More New Methods

2010-08-20T17:00:00.000-04:00

In a previous post I discussed some of the common behaviors that the matrix types in Parrot-Linear-Algebra implement. Recently, I've added a few more which I think are pretty cool and worth discussing.

First and foremost, I've finally added some methods to convert between type. There are now methods for convert_to_number_matrix, convert_to_complex_matrix, and convert_to_pmc_matrix. Every type implements all of these methods, even when it would be a no-op. This is so you can take a PMC and if it "does 'matrix'" you can cast it to the type you want to deal with without worrying about spending too much expense. These methods always return a new matrix, so you can keep a copy of the matrix in it's original form and also have a new copy to play with. In the case where the matrix is already in the target format, I create a clone.

Unfortunately, these conversion operations can be a little bit expensive when you're actually converting types. The problem is that the data for the matrices is stored internally in a very dense format. For the Number and Complex matrices, the data is stored internally in the format required by the BLAS library routines. For the number matrix, the values are basically stored together in a single large buffer. For complex matrices, the real and imaginary values are stored together also, alternating positions. Converting one to the other is not easy, since I have to allocate a completely new buffer and iterate over each space individually. So, too many conversion operations can get expensive quickly.

Using these new conversion methods, I have updated some previous methods, like gemm(), the routine which performs the GEMM operation from the BLAS library. You can now pass any matrix types to this method, and it will perform conversions internally to ensure the types match. Here's a fun little NQP example that shows some of the capabilities of this library today:

INIT { pir::load_bytecode("pla_nqp.pbc"); }

my $A := NumMatrix2D.new();
$A.initialize_from_args(2, 2, 1, 2.0, "3", 4);
$A.transpose();
pir::say("A: " ~ $A);

my $B := ComplexMatrix2D.new();
$B.initialize_from_array(2, 2, [1, 2.0, "3+3i", "7+5i"]);
$B.conjugate();
pir::say("B: " ~ $B);

my $C := PMCMatrix2D.new();
$C.fill(4.4, 2, 2);
pir::say("C: " ~ $C);

my $D := $B.gemm(0.5, $A, $B, "1+2i", $C);
pir::say("D: " ~ $D);

You can try it yourself, or you can take my word for it that the result is correct. I've verified it this morning in Octave, and the results are the same (though the Octave script to produce this result is considerably shorter).

PLA is finally starting to get the kind of basic functionality and test coverage that I've been hoping for. With a few more finishing touches on this base, I'm going to start adding new functionality like LAPACK bindings. Specifically, I'm hoping to add in support for some common matrix decompositions, matrix reductions, inverses and eigenvalues. I'm also hoping to get started on the module for Rakudo sometime soon.

PLA Test Suite Redux

2010-08-19T22:05:00.000-04:00

I've been doing a hell of a lot of work on the Parrot-Linear-Algebra test suite in the past few days, and the results are quite spectacular. I've completely reworked the way the suite works, and broke a small handful of monolithic test files into a series of smaller, focused test files. Instead of describing it, I'll just show the results:

t/sanity.t .............................................. ok
t/pmc/pmcmatrix2d.t ..................................... ok
t/pmc/nummatrix2d.t ..................................... ok
t/pmc/complexmatrix2d.t ................................. ok
t/methods/nummatrix2d/convert_to_complex_matrix.t ....... ok
t/methods/nummatrix2d/row_scale.t ....................... ok
t/methods/nummatrix2d/get_block.t ....................... ok
t/methods/nummatrix2d/mem_transpose.t ................... ok
t/methods/nummatrix2d/convert_to_number_matrix.t ........ ok
t/methods/nummatrix2d/initialize_from_args.t ............ ok
t/methods/nummatrix2d/row_swap.t ........................ ok
t/methods/nummatrix2d/iterate_function_inplace.t ........ ok
t/methods/nummatrix2d/fill.t ............................ ok
t/methods/nummatrix2d/initialize_from_array.t ........... ok
t/methods/nummatrix2d/iterate_function_external.t ....... ok
t/methods/nummatrix2d/transpose.t ....................... ok
t/methods/nummatrix2d/item_at.t ......................... ok
t/methods/nummatrix2d/gemm.t ............................ ok
t/methods/nummatrix2d/row_combine.t ..................... ok
t/methods/nummatrix2d/resize.t .......................... ok
t/methods/nummatrix2d/convert_to_pmc_matrix.t ........... ok
t/methods/nummatrix2d/set_block.t ....................... ok
t/methods/complexmatrix2d/convert_to_complex_matrix.t ... ok
t/methods/complexmatrix2d/row_scale.t ................... ok
t/methods/complexmatrix2d/get_block.t ................... ok
t/methods/complexmatrix2d/mem_transpose.t ............... ok
t/methods/complexmatrix2d/convert_to_number_matrix.t .... ok
t/methods/complexmatrix2d/initialize_from_args.t ........ ok
t/methods/complexmatrix2d/row_swap.t .................... ok
t/methods/complexmatrix2d/iterate_function_inplace.t .... ok
t/methods/complexmatrix2d/fill.t ........................ ok
t/methods/complexmatrix2d/initialize_from_array.t ....... ok
t/methods/complexmatrix2d/conjugate.t ................... ok
t/methods/complexmatrix2d/iterate_function_external.t ... ok
t/methods/complexmatrix2d/transpose.t ................... ok
t/methods/complexmatrix2d/item_at.t ..................... ok
t/methods/complexmatrix2d/gemm.t ........................ ok
t/methods/complexmatrix2d/row_combine.t ................. ok
t/methods/complexmatrix2d/resize.t ...................... ok
t/methods/complexmatrix2d/convert_to_pmc_matrix.t ....... ok
t/methods/complexmatrix2d/set_block.t ................... ok
t/methods/pmcmatrix2d/convert_to_complex_matrix.t ....... ok
t/methods/pmcmatrix2d/get_block.t ....................... ok
t/methods/pmcmatrix2d/mem_transpose.t ................... ok
t/methods/pmcmatrix2d/convert_to_number_matrix.t ........ ok
t/methods/pmcmatrix2d/initialize_from_args.t ............ ok
t/methods/pmcmatrix2d/iterate_function_inplace.t ........ ok
t/methods/pmcmatrix2d/fill.t ............................ ok
t/methods/pmcmatrix2d/initialize_from_array.t ........... ok
t/methods/pmcmatrix2d/iterate_function_external.t ....... ok
t/methods/pmcmatrix2d/transpose.t ....................... ok
t/methods/pmcmatrix2d/item_at.t ......................... ok
t/methods/pmcmatrix2d/resize.t .......................... ok
t/methods/pmcmatrix2d/convert_to_pmc_matrix.t ........... ok
t/methods/pmcmatrix2d/set_block.t ....................... ok
PASSED 356 tests in 55 files

I factored out all tests to delegate matrix creation and matrix population routines into a series of type-specific factory objects. With these factories, creation of tests goes extremely quickly:

Write tests using a few methods and values from the factory instead
Create a short stub test for each Matrix type, subclassing the common tests and add a proper factory
There is no step 3.

It's really become quite simple, and I'm much happier with the test suite now. Plus, I've added plenty of new features lately (complete with tests!) and I'm going to talk about that plenty in a later post.

PLA Status Updates

2010-08-16T20:15:00.001-04:00

On Friday night we dropped the kid off with his grandparents for a sleepover. With the apartment to ourselves, my wife and I did what we've been wanting to do for months: she went to bed early and I stayed up to program. I started making some real progress in Parrot-Linear-Algebra, and also uncovered some interesting bugs which needed to be fixed.

I added a short file for adding PLA support to programs written in NQP. The file, pla.nqp, can be included into an NQP program like this once it's installed:

INIT { pir::load_bytecode("pla.pbc"); }

That file loads the PLA binary library, stores a global reference to the library, and registers the type names with P6metaclass. That done, you can start writing more natural-looking programs in NQP. I updated the Gaussian Elimination example I wrote a few weeks ago to use this new bootstrap file (and some new methods I added), which means the example can now run without Kakapo or any other dependencies.

Next up, I started prototyping bindings for the library for Rakudo. These are still preliminary, but I also have a working example of using numerical matrices in Rakudo. Within the next few days I will probably create a complete module for Rakudo (Math::LinearAlgebra, or something like that). I'll post more details as I do more work on it.

I added a new item_at method which returns and optionally sets a value at given coordinates in a matrix. This method is common to all my core matrix types. The list of common methods for these matrix types is:

resize() : pre-allocate size for the matrix, growing (never shrinking) it to hold a certain size
fill() : Fill a matrix, or a region of a matrix, with a constant value. Automatically resize if necessary
transpose() : Transpose (swap rows with columns) the matrix lazily
mem_transpose() : Eagerly transpose the actual memory contents
iterate_function_inplace() : Execute a function for every element of the matrix, replacing that element with the function result
iterate_function_external() : Execute a function for every element of the matrix, creating a new matrix with the results. This is similar to Perl's map function.
initialize_from_array() : Insert values into the matrix from an array
initialize_from_args() : Similar to the _from_array variant, but initializes the matrix using elements from a slurpy argument list
get_block() : Return a block, or "submatrix" from the matrix
set_block() : Set a block in the matrix
item_at() : New this weekend, gets or sets a value in the matrix at the specified coordinates

There are maybe a handful of other methods I would like to add, but in terms of the core types, this is the standard API (plus a series of VTABLE calls, which are standard). For mathematics types I also have GEMM (the BLAS-based matrix multiply operation), and elementary matrix operations (add rows, swap rows, scale rows). This is not a shabby interface at all, and can start to be used for some real-world applications.

This weekend I also started seeing some very weird errors in the PLA test suite. Tests were running fine, but I was seeing exceptions (and, in one case, a segfault) occur after all tests had passed. This sounded very similar to another problem I've seen in the past. Here's the test that set it off. Can you spot the problem?

method test_METHOD_iterate_function_inplace_TRANSPOSE() {
    my $m := self.fancymatrix2x2();
    $m.transpose();
    my $n := self.matrix2x2(self.fancyvalue(0) * 2, self.fancyvalue(2) * 2,
                            self.fancyvalue(1) * 2, self.fancyvalue(3) * 2);
    my $sub := -> $matrix, $value, $x, $y {
        return ($value * 2);
    };
    $m.iterate_function_inplace($sub);
    assert_equal($m, $n, "external iteration does not respect transpose");
}

What's maddening is that this test has been a problem for months, but never caused a failure. It was silently wrong, probably since the day I wrote it.

See it yet?

The problem is that return statement. What should happen is this: The pointy block creates an anonymous subroutine, which I pass to the iterate_function_inplace method. The method should loop over ever element in our matrix and call that pointy block for each one. The result should be the same matrix with every element multiplied by two.

What actually happens is this: the iterate_function_inplace method, in order to call the callback, must recurse on the C stack into a new runloop function. The pointy block executes in this inferior runloop. However, where things get weird is that return statement. In NQP, returns are performed by constructing and throwing control exceptions. In the case of the pointy block (and I'm not sure whether or not this is a bug), the return statement jumps to the return handler for the test_METHOD_iterate_function_inplace_TRANSPOSE() function, not the anonymous sub.

The sub executes after having only called the callback once, it never hits the final assertion (which, it turns out, would have failed). Control flow continues inside the inner runloop until the end of the test program, then the C runloop function returns, tries to continue executing from that point, and things go to hell.

The solution is really quite simple. Change this:

my $sub := -> $matrix, $value, $x, $y {
    return ($value * 2);
};

into this:

my $sub := sub ($matrix, $value, $x, $y) {
    return ($value * 2);
};

Problem solved, and now more tests are legitimately passing.

It's been a productive couple of days in PLA, and I'm hoping I can keep this momentum up in the days ahead. I need to finish implementing my GEMM wrapper for ComplexMatrix2D, and then get started on the Linear Algebra module for Rakudo.

Kakapo Fixed!! (Mostly)

2010-08-12T07:50:00.000-04:00

The lovable Austin Hastings has gotten Kakapo mostly working as of yesterday. It's not 100%, but I'm able to use it for some purposes such as getting the PLA test suite passing again.

When I tried to build PLA last night, for the first time in a long time, the build failed. Some of the string handling functions I was using in the get_string VTABLEs were using old, deprecated functions in the string API. I took this as my opportunity to rewrite a lot of the logic to use the new StringBuilder type instead. This should improve performance and bring PLA up to some of the modern best-practices of the Parrot community.

When I tried to run the tests, however, KABLAMO. Kakapo wasn't happy and the tests wouldn't run. I tracked down a small series of problems, and put together a hasty patch. Austin pointed out that this wasn't the real solution, so he didn't want me to push it to the master at Gitorious. Instead, I pushed to my mirror at Github. This fix out of the way, the vast majority of the PLA test suite runs, and most tests were passing. This morning, I pushed another patch to my Kakapo mirror. After that, I'm proud to say, the PLA test suite passes 100% of important tests. That means I can start making preparations for an official release, and then begin development on cool new features.

Some people have asked me why I've gone through all this hassle; if I had just rewritten my testsuite without Kakapo it could have been working a long time ago. The answer to that question is pretty simple: I want the abstraction and insulation that Kakapo provides.

To look at it another way, consider this: NQP has a test suite of it's own that needs to pass before it can make a release. If I'm using NQP, I know that the behavior NQP provides will be a little more stable than the underlying Parrot VM. However, NQP isn't a perfect overlay for Parrot: There are some features that Parrot provides that NQP doesn't cleanly wrap. To get an abstraction layer over those features, I need a framework like Kakapo. This is all not to mention that several other high-profile projects use NQP, so even if it's test suite isn't all-encompassing, the test suites of projects which build on top of NQP probably cover more ground.

Kakapo likewise has a test suite, and that needs to pass before Kakapo gets released. If I'm using Kakapo, I can be certain that the API that it and NQP provide together should be relatively stable and are tested to work before they are released. Since we are talking about tracking Parrot trunk with PLA development, this stability is very welcome.

What we've run into recently is a rare confluence of events where highly disruptive changes in Parrot caused major breakages in Kakapo. When Kakapo got fixed, the PLA test suite ran perfectly without any changes to the suite itself. Considering the magnitude of changes to Parrot recently, the ability for me to continue running the test suite without alterations is a pretty big and important aspect. Sure there was some down-time, but once our dependencies were fixed, our stuff worked again like magic.

Kakapo isn't 100% yet, but Austin is still working on it and I'll be lending a hand. I'm also going to start making some progress in PLA, and see what I can do to help out with some of the performance improvements and other big tasks that are going on in Parrot too. Expect to see many more updates on things in the coming days and weeks.

Kakapo and Parrot-Linear-Algebra

2010-07-28T18:00:00.000-04:00

Parrot's 2.6 release is out and now everybody is holding their breath in anticipation of the new Rakudo Star release tomorrow. I'm still fighting with my own release project: Kakapo.Kakapo got broken pretty severely when Parrot was changed to no longer store methods in namespaces, and I have been unable in several weeks to make it work again. Austin has been busy with other projects during this time too, so it's been me by my lonesome to try and get this thing working again.

The problems are three-fold: First, Austin's plan for Kakapo was ambitious and the framework does a hell of a lot of things, not all of which I fully understand. Second, NQP and P6object have progressed in some significant ways since Kakapo was originally inked out, and much of the workaround code that Austin wrote to bypass some missing features no longer seems necessary. Third, we have this issue with methods not working nicely with NameSpaces anymore, which makes all of Kakapo's method-injection logic break.Without method injection happening, and happening early enough, all the code in Kakapo which relies on the new methods is broken.

Tene put together a patch for NQP which allows methods marked "our" to be stored in NameSpaces again. This should simplify some of the new logic in Kakapo though I've been too busy in the past few days to test it. Hopefully this patch of his represents a large portion of the fix to get Kakapo working again. I doubt it will be everything that's necessary, but it will be a nice start.

Without working Kakapo I haven't done any work whatsoever on Parrot-Linear-Algebra. PLA uses Kakapo to implement all it's unit tests, and I'm happy enough with the new testing framework that I do not want to move it to use anything besides Kakapo. In the worst case I would consider trying to break out a testing sub-framework from Kakapo if we can't fix the entire framework, but I haven't gotten to that point yet and likely won't have the time for it in the near future. I would much rather fix the framework as a whole first, and then maybe talk about splitting out the unit test stuff into a separate sub-library later, if I still feel like that's a beneficial path to take.

What I would really like to be doing is getting back into PLA development. I had hoped to get some wrappers together to make it work with Rakudo Star, but without working tests I haven't cut a suitable release and without a release it won't be included in the Rakudo Star bundle anyway.

My hope is that within the next few days I can get Kakapo working again. At least I need it working enough that I can get the unit tests for PLA going. With Tene's fix, some hard work, and some trial-and-error I think it will be possible.

Lorito: A Design

2010-07-10T18:30:00.001-04:00

Cotto has been putting together a really great checklist for Lorito, the new microcoding approach that the Parrot developers are hoping to implement soon. The first step on the checklist, creating a proper compiler for our ops code DSL is already complete. Today at #parrotsketch Allison also mentioned some things about Lorito, and how we should start focusing more effort on it. I, for one, am very happy about that.

In this post I'm going to play architect and spell out a vision for Lorito. At the very least, this will be a good play-excercise. I don't expect everybody to like all the things I say here, but I do expect this post to generate some thought and discussion.

On the ground floor, Lorito needs to be able to do what C code does now. Actually, it really needs to be able to do much of what the underlying hardware machine does now. C is just a useful abstraction over the hardware, one that people are familiar with. What we don't want to be doing is creating our own ABI, or creating new low-level calling conventions or anything like that. Those kinds of problems are already resolved and if we want to be able to tap into the myriad of existing libraries we will want to stay compatible with existing conventions.

We want Lorito to interact with raw integers and pointers, including pointer dereferences and pointer arithmetic. This is important so that we can easily work with C-level arrays and pointers. I can't think of a modern architecture where pointers and integers are different sizes and are treated differently, but I can't say I'm familiar with every architecture in current use. I don't foresee any huge problems with creating an opset that works transparently with integers and pointers. We also want Lorito to be able to call C-level functions, including indirect calls through a function pointer. There's no hard requirement yet stated that Lorito needs to be binary compatible with compiled C code, but I think we can all agree that such would be a major benefit. In fact, if we had a utility that could compile Lorito directly into C code, that would be the best intermediate step we could take. As I will discuss next, such a tool would make the conversion of Parrot to using Lorito easier in the long run. We probably don't want Lorito-to-C conversion to be a permanent part of our build, but it does get us moving now.

The PASM ops are currently written in C. If Lorito does everything that C can do, eventually they could all be rewritten in Lorito instead. That raises the interesting idea that groups of Lorito ops can be composed into more complex higher-level ops. PASM then becomes little more than a set of predefined convenience compositions, though it certainly does not represent the complete set of possible compositions. Dynops would just be custom predefined Lorito compositions and, since they wouldn't be defined in machine-code libraries, their definitions could be included directly in bytecode for later use.

In fact, while we are moving down this though path, we get to the idea that we could identify repeated patterns of Lorito ops at compile time, and define custom compositions on the fly. This allows us to compress packfiles for size without really adding all the overhead of general-purpose compression algorithms. When optimizing code, if we can apply an optimization to any composed op, even those identified on the fly, those optimizations can be immediately used anywhere the composition op is. This allows us to prune branches out of the parse tree when optimizing quickly.

Any language for which a compiler exists to convert it to Lorito can be considered an "overlay" language for Lorito. We can then use any overlay language any place where we can use Lorito. For instance, if NQP is converted to output Lorito (and composition ops) directly, we can then use NQP to define all the PASM ops, all the core PMCs, and several parts of Parrot's core. That would be quite the turnaround from how NQP is used currently.

Conversely, almost any C code can be rewritten in Lorito, so long as we have a step in the build to preprocess Lorito back into valid C code. In fact, I see no reason why we cannot interleave code written in both languages, in a manner analogous to how the Q:PIR construct allows PIR code to be written in NQP source. If we had a preprocessor that scanned code files and converted Lorito ops into equivalent C code line-by-line, we could start slowly rewriting much of Parrot, if not all of it, in Lorito. Then, once we have converted all the code over that we need, we could change the build process to convert those code files to bytecode instead of machine code, and run large parts of Parrot's code through the runcore.

Alternatively, instead of writing core code in Lorito directly, we could write core code in any Lorito overlay language. NQP comes to mind here.

The huge benefit to be had is that if user programs are written in a Lorito overlay, and Parrot itself is written primarily in Lorito or an overlay, our eventual JIT can record traces across the API boundary, and optimize hot sections of the Parrot core on the fly at runtime.

Here ends the part of the blog post concerned with overarching architecture discussions. Let's get down to the nitty-gritty details.

Lorito is going to want some basic arithmetic operations: addition, subtraction, multiplication and division on integers (pointers) and floating point numbers. We're also going to want modulus, address-of (C &) and pointer dereference (C unary *) for integer/pointer types, int-to-float and float-to-int cast operations. Logical operations are probably essential as well: and, or, xor, not. Logical inversion (C unary !) would probably be necessary as well. Assuming we can find a way to share ops for int/pointer and float operands (which I doubt we can do in an elegant way), that's still about 20 ops we're going to want to even perform basic manipulations on numbers. I can't imagine any modern, robust, mature virtual machine which doesn't perform these operations readily.

On top of basic mathematical operations, we're going to need some important operations for calling subroutines: Passing and retrieving arguments, calling subroutines, returning with values. I don't know how low-level Lorito intends to get, but let's face the reality of the world: Hardware machines are stack-based. Every library function we want to call is probably going to be passing arguments on the system stack. If we want to be defiant and call these ops "pass_arg" and "retrieve_arg" to hide the fact that Parrot is using the system stack, that's fine by me. The "stacks are evil" mantra, while occasionally misguided, is definitely firmly ingrained in Parrot culture.

In a minimalist world, the only ops we would really need are ops to call functions with passed arguments. We could implement every other op as a huge library of functions to call. Of course this would be extremely sub-optimal, but it does open a new possible avenue: Any operation which is sufficiently uncommon could be turned into a function call instead of having a dedicated op. We could encapsulate the call in a composite op. There are lots of options here, we have to weigh the desire to have a smaller op set against the need to have ready access to a huge myriad of low-level operations.

Having ops to do basic ops is necessary but not sufficient. I don't think we can sit back on our laurels and be happy with an op set that only does a subset of what C can do. That's worthless. Parrot needs to provide access to it's PMC and STRING types, and also to its interpreter. We want ops to load registers with values from the constants table in the bytecode file. We want to treat PMCs and STRINGs as opaque pointers at the user-level. They aren't like other pointers which can be manipulated, they are GCable objects that Parrot treats specially. We don't, for instance, want to be passing a PMC pointer willy-nilly to an arithmetic operation. We also don't want such a mistake to imply a call to one of the addition VTABLEs.

That said, we want to be able to get the VTABLE structure from a PMC, introspect information about that, and be able to call the various VTABLE interface functions there without having to calculate pointer offsets. This might make a good use of composite ops, where we can still do the pointer derefs in Lorito, but hide the nonsense behind some PASM-levle composite ops. We already have most of these ops already, but I think we're going to want to rework some of them. We definitely need to radically reduce the number of VTABLE interface functions, or else interacting with all of them from Lorito will be extremely ungainly.

So far as we are talking about a major redesign of the system, maybe it's time to start considering an idea chromatic has been talking about with making all vtables into methods so we can share a common lookup mechanism, common dispatch mechanism, common MMD behaviors, common inheritance behaviors, etc. That's not a bad idea, but we either need dramatic improvements to PCC performance, or we need to create a PCC fast-path for calls which are made without constructing a call object and without doing too many marshalling operations to get arguments where they need to be.

In fact, on a tangentially-related topic, I definitely think PCC should have a fast-path for functions which do not participate in MMD and which have fixed argument lists containing only positional args (no :optional, no :slurpy, no :named, no :call_sig, etc). This is, I think, an extremely common case and there are plenty of optimizations to be had here if we want them.

To round out the low-level opset, we need ops to move data between registers, between registers and memory, and maybe even ops to move data between memory locations directly, without moving them to a register first.

That's my conception for Lorito: probably 64 ops or less, a capability to add composite ops, treating PASM as a series of common composite ops, and the ability to write low-level code interchangably in C or Lorito. I think this plan for it provides a great development path without too many abrupt stops or course changes. We are quickly going to find ourselves in desperate need of a robust, modern JIT. We are going to be able to move to a proper precise GC since all the Lorito code will be using Parrot's registers to store working values, not relying on stack space which will need to be traced.

I am highly interested in hearing what other people have to say about Lorito, and what changes Parrot should be considering so long as we are writing this blank check for a massive refactor.

Parrot-Linear-Algebra: Change of Course

2010-06-23T21:10:00.000-04:00

I've added a lot of functionality to PLA in the last few days, including a large boost to the test suite this evening. Since Kakapo doesn't have a release that targets Parrot 2.3.0, I decided to refocus my efforts on Parrot 2.6.0. However, with all the changes in Parrot since 2.3.0 came out, the upgrade path was much harder than I anticipated. Luckily I have some superstars like darbelo and NotFound on my side, otherwise I would still be fighting with things.

After talking to darbelo I've decided PLA is not going to attempt to track supported releases, at least not for normal development. It's too much of a pain in the ass, and nobody else is doing it. Sure it's best practice, but it's the kind of thing the whole ecosystem really needs to do together. If the HLL compiler is tracking Parrot HEAD because of a desperate need of new features and improved performance, then it doesn't make any sense for libraries not to be keeping up pace.

Darbelo made one particularly good point:

If we track HEAD the we'll run on 2.6 when it's released. We branch on the release day and do whatever 'release engineering' we need to do on the branch and tag a release when we're done.

That's it, in a nutshell: If we track trunk closely enough, we will always be ready with a release following Parrot's release. Spend a few days to prepare the necessary accouterments, then cut a tag and continue on with normal development. Our releases can be tied to Parrot's supported releases, but our ongoing development will target trunk as closely as possible. I don't anticipate that we will want to cut a PLA release every time Parrot does, but if we follow a 4-month schedule and follow supported releases that should be just fine. At least, fine for now.

PLA is going to track Parrot trunk and hopefully be prepared to cut a release following Parrot 2.6.0. That gives us about 4 weeks to get all our ducks in a row (and our Kakapos too!).

BLAS, LAPACK, and Parrot-Linear-Algebra

2010-06-22T17:00:00.001-04:00

Last night I got back to work on Parrot-Linear-Algebra, my matrix package for Parrot. I'm really starting to get happy with the functionality it provides and am looking to start moving the project to the next level of functionality and usability.

I hadn't touched it in a while, for a variety of reasons. Austing Hastings has been very busy with other projects and hasn't cut an "official" release of Kakapo that works with the last supported Parrot release, 2.3. I've applied some hotfixes that make Kakapo build and pass tests on 2.3, but that's not quite good enough. When I make a PLA release I don't want "clone the Kakapo repository from Gitorious" in the instructions, because then the instructions get out of date when Kakapo is updated to be incompatible. What I want is to say "Download Kakapo version X from this URL", which will be invariant.

Last night I added some prototype new features to the NumMatrix2D PMC, the basic numeric matrix type. I'm going to mirror the majority of those changes to the two other general-purpose types; ComplexMatrix2D and PMCMatrix2D. I added methods to do basic elementary row operations (swap two rows, add a multiple of one row to another, and multiply the contents of a row by a non-zero constant), and used those operations to put together an example program to calculate both the Row Echelon and Reduced Row Echelon forms of a matrix. Those methods, combined with block manipulation methods I added previously and the new GEMM method I added last night as well, create a basis for us to create a comprehensive library of linear algebra routines.

But that brings me to my next concern: How to implement the remainder of the fundamental algorithms a linear algebra pack is going to want to provide? Obviously I would love to wrap the LAPACK library up and call the routines it provides directly. LAPACK provides a huge number of routines for doing things like singular value decomposition, QR, LU, and other types of decomposition, solving the eigen probem, solving systems of equations in two variables, etc. In fact, LAPACK provides almost all of the functionality I would want PLA to have in the next few months.

The problem, however, is that LAPACK is not nearly as accessible from C code as I would like. In fact, there is no "standard" for C-bindings to the library, and several lame attempts are available that tend to be incompatible with each other. The reference version, and only reliably-available version, of LAPACK is written in FORTRAN. The standard CLAPACK library is just a machine-lead translation of the FORTRAN sourcecode, with a few points in the code needing to be manually tweaked after conversion. It has a few problems, including the fact that every single function parameter (even basic ints and floats) must be passed by reference. The ATLAS library, which PLA currently prefers to provide BLAS bindings, provides some LAPACK C bindings of it's own, but only a very very small subset of all LAPACK functions are provided, and the ones it does have hardly support all the operations PLA is going to want to provide.

CLAPACK, being more or less the standard could be made to work, except that it doesn't come as any sort of user-friendly package. There are no CLAPACK packages for Debian (Ubuntu) or RedHat that I have seen, and that raises the barrier to entry significantly, something that I want to avoid.

I could use the LAPACK library directly, and twist my code to match the FORTRAN calling conventions and data alignments. That's not unthinkable, though it would require some more work on the part of PLA developers than any other solution.

I could skip LAPACK entirely, and instead rely on something like GSL with proper C bindings built-in. GSL does provide several important matrix operations and decompositions, though I would need to do more research into the capabilities of that library.. What I don't want is to lose focus and have this project grow to try and become a general wrapper for all of GSL. I want PLA to stay focused on Linear Algebra only. We could maybe create sister projects to encapsulate more of the GSL functionality and use PLA under the hood to implement the basic data structures, of course.

Maybe we will get lucky, and the new NCI system will have support for calling FORTRAN routines from shared libraries. I don't think we can just anticipate this kind of functionality, at least not during the GSoC program.

A "nuclear" option, perhaps, would be to not rely on any external library for these things and instead brew up all the basics myself. I'm not against such work per se, but it would be a huge investment in time and the case cannot be made that it's the best use of my limited development time. I did put together a Gauss-Jordan elimination routine last night, it wouldn't be too too much effort to put together a generalized QR decomposition algorithm and a singular-value decomposition algorithm, followed by routines to calculate eigenvalues, eigenvectors, and matrix inverses from those things. If PLA had a larger team of active developers who wanted to participate in this kind of work it would be an easier decision to make, but if it's primarily me and a handful of occasional-contributors, this really isn't a doable task.

My plan for PLA in the near future is this: After the 2.6 release I want to push for a stable release of Kakapo, and then using that I want to cut a release of PLA. From that point forward, PLA will target the 2.6 version of Parrot at least until 2.9 and maybe later. The first release of PLA is going to provide three basic matrix types: A 2D matrix of floats, a 2D matrix of complex numbers, and a 2D matrix of PMCs. These three matrix types will have a large base of common functionality and each type will have some additional functionality too, as required by that type. Everything provided will be well-tested (including error cases, which I haven't exercised nearly enough so far) and some example programs will be provided in both PIR and NQP.

There are a few projects I am envisioning to start in the future that will rely on PLA, so I really am hoping to create a nice, stable, functional release within the next few months. I'll post more information about any other projects as they arise.

Parrot's Deprecation Policy

2010-06-19T09:24:00.000-04:00

Parrot user kthakore sent a very interesting email to the Parrot developers list this week to criticize the current deprecation policy. After taking some time off from development, he returned to find that his extension no longer built on Parrot HEAD and he couldn't quite figure out why. There was a deprecation notice for the feature that changed, but the notice was so vaguely worded and so short on explanation that it offered very little help to the beleaguered extension developer.

When we're talking about code, there are only a handful of things that we really need to worry about: utility, performance, security, stability and maintainability. Well-written code, for the most part, can satisfy all these requirements. Sometimes trade-offs are made, such as trading a certain amount of maintainability to optimize for performance, but in those cases some well-placed comments can help alleviate or minimize any regressions. Code, while technical and deep, is often pretty easy: we follow rules and policies, make changes, measure results, wash, rinse, repeat.

Not so easy are the softer sides of open source software: the people. People come in varieties: core developers, extension developers, testers, documenters, end-users and well-wishers of varying levels of technical competency. Keeping all these groups of people working together nicely and happily can be quite a difficult challenge, and there is likely no way for all groups to be 100% happy 100% of the time. The tradeoffs here are harder to understand, harder to manage, and a misstep can have huge negative consequences for the project.

I've long been a detractor of Parrot's current deprecation policy: It's too rigid, too narrow, and doesn't really do anything to help the people who need helping. It also doesn't really take into account Parrot's current stage in the development life cycle. Parrot, as I've mentioned on occasion, has plenty of warts and has suffered many growing pains. It is folly to think that we should be making blanket guarantees about what will or will not be present in various releases, or how quickly we can or cannot make changes.

When there's a problem or a bug or a misfeature, people want those things fixed quickly. A good example of this were the PCC refactors a few months ago. Even though those refactors created some backwards-incompatible behavior our users (who the deprecation policy was at least nominally designed to protect) were trying to rush them through. Rakudo developers specifically were blocking on the PCC improvements and having to wait for months and months would have been bad for them. The PCC refactors were high priority and high need.

The deprecation of several core ops and their conversion to dynops recently is an example from the other end of the spectrum (and the source of kthatkore's frustration). While we followed the letter of the deprecation policy, these things didn't need to be removed with haste and created a bit of hassle for users who weren't expecting it. I'm not saying they shouldn't have been removed (I'm always a proponent of removing cruft), but it does expose some short-comings of our deprecation policy and process.

What we have is a series of conflicting motivations, even for individual groups. Consider:

Core Developers: Core developers want to remove bad features, want not to maintain bad features, want to add new good features and want to create the best software for the users. Developers need to work on fun new features, but also need their work to be used and appreciated. Take away either of those pieces, and many volunteer developers will simply walk away. Moving too quickly alienates the users and creates a huge disconnect between what the developers are developing and what the users are actually using. Moving too slowly is boring and developers start to leave for greener pastures.
Extension Developers: Want to add new extensions to the Parrot ecosystem for the benefit of users, but have to deal with binary compatibility at the C level which changes much more frequently than the "normal" user-visible interfaces like PIR. Parrot has a lousy extension API right now, so necessary improvements there require extension developers to stay up-to-date with current core development. At the same time, all sorts of other changes break things in new versions, even when necessary features are fixed.
Users: For stability, it's good for users not to upgrade. To fix bugs and get new features, it's good t upgrade. Upgrading brings rewards but also hassles: Core features disappear, new bugs are added, and extensions are broken by binary incompatibilities. Upgrading Parrot means needing to upgrade (or, at least, rebuild) extensions too.

It's difficult to tell an extension developer to stick to the supported releases because the extending API is so lousy and incomplete. Having to wait 3 months or more for a necessary change is hard for these small projects, and their developers can quickly lose interest and motivation. Until we reach some basic level of usability, we have to expect that extension developers are going to be tracking Parrot HEAD more or less closely. I think it's a little disingenuous to simultaneously expect fully that developers will be tracking the repository HEAD but also write in our deprecation policy that they should only track the stable releases, and you really need to ask who exactly that policy is designed to protect in this case.

We really need to account for different levels of user and different levels of feature. End-users shouldn't be using Parrot directly. Joe Schmoe at home is not and definitely should not be writing his tools in PIR. If he's writing his code in a higher-level language like Rakudo Perl 6, NQP, or Winxed, he's buffered from disruptive changes made to the Parrot core VM. It's the developers of HLL compilers and extensions that need to worry about these kinds of changes.

Likewise, we need to differentiate between issues of multiple severities.When a big issue is blocking development in extensions and HLL compilers, it behooves Parrot to ignore the mandatory wait period and to make those fixes post haste. Alternatively, a change which is not necessary and would cause a block or a slow-down for extension developers should be put off for longer and made with more care.

What we need, in a nutshell, is a policy that actually does what we claim the current deprecation policy does now: Protect our users from disruptive changes, but also enable forward progress to be made without being forever tied to every bad decision ever made. I suggest these changes to policy and process:

Deprecation should be added before a disruptive change is made
The deadline for the deprecation shouldn't be blindly tied to the next stable release, but intelligently selected with input from the affected parties. We do have weekly planning meetings where these kinds of things can be decided. If we need to regularly schedule additional meetings with other parties (HLL compiler devs, extension devs, etc) we should do that as well.
Deprecations should be well-documented and publicized. Information about the deprecation should include what exactly is changing, how users of those features can work around the changes, and who to contact when/if problems arise.
Information about the deprecation should be sent to the users, not just dumped in DEPRECATED.pod where we expect people to be looking regularly. An email list for this purpose was suggested and I like that idea (other ideas were also suggested that I also like). Any way to send the information directly to the user is a good thing.
Where possible, both old and new versions should be provided simultaneously for a period of time while users transition. This is most important in the C-level API where function wrappers can easily be provided to translate old calls into new ones.

I'm still a big proponent of the idea that the deprecation policy should be opt-in, in the sense that only features that we've put a stamp of approval onto should be covered under the deprecation policy and anything else should not be relied upon. You can use a feature that we haven't approved, but then you're responsible for paying the price when that feature changes or disappears completely. You would also be responsible for sending the core developers feedback about which features you would like to see be added and approved.

Having a deprecation policy is a good thing and a necessary part of a mature software project. However, the policy we have currently fails on a number of counts and requires some serious re-thinking if we want to make it better. I sincerely hope, and I know several other people also hope, that we do make it much better in the future.

ParrotTheory: Locks and Synchonization

2010-06-18T09:54:00.000-04:00

I've talked a good amount in the past and recently about threading, so today I'm going to talk about some of the small bits that make threading and concurrency work. I'm thinking that we're also going to need to implement some of these primitives in Parrot eventually, so consider this blog post an extremely verbose way of planning for that.

Let's start the discussion by looking at two small code snippets; a structure definition and a routine that uses that structure:

typedef struct _array {
  int length;
  int *values;
} array;

void push( array *ary, int x) { 
  array->length++; 
  realloc(array->values, newlen * sizeof(int));
  array->values[array->length - 1] = x;
}

This isn't nearly as contrived an example as it may seem initially, though I purposefully made the code a little bit naive. It's worth noting here that the idea of a structure which contains both an array and the integer length of that array is very common. It's how Parrot implements it's STRING type and several of its array PMC types as well. In fact, the push_int VTABLE method for the ResizableIntegerArray PMC probably looks extremely similar to this example code.

Astute observers, and veterans of threading trench warfare will see a problem with this code: There are no locks and no concurrency safeguards, so this code isn't thread safe. Let's take a short walkthrough of this code where we have a preemptive thread switch in the middle to another thread also attempting to access this same method on the same object:

We have an array object with length 5 and items {1, 2, 3, 4, 5}
Thread A attempts to push the number 6 to the array, Thread B attempts to push the number 7.
Thread A enters the function. We set length to 6 and realloc to gain a sixth slot. The array now contains values {1, 2, 3, 4, 5, ?}, because the last value isn't initialized by realloc.
At this point, there is a preemptive thread switch and Thread B takes over control.
Thread B enters the function and sets length to 7 and reallocs again. The array now contains values {1, 2, 3, 4, 5, ?, ?}.
Thread B sets the sixth element to 7. The array is now {1, 2, 3, 4, 5, ?, 7}
Thread B gets preempted, Thread A takes over.
In thread A, the value of length is still 7, so [ary->length - 1] is still 6. When we add x to the array, we now have {1, 2, 3, 4, 5, ?, 6}

There are some things we could try here, such as saving the values we need from the structure to local variables, so that even if we preempt in the middle of the functions the length field will be an accurate index. Or, we can try to rearrange the order of operations so some errors appear less frequently or less obviously. The real problem here is that we have three operations that need to stay coupled: Increasing the known length of the array, reallocating array storage, and assigning an item to that storage. Any break between a set of coupled operations causes a problem. We call these areas of code where operations are sensitive to cross-thread interference critical sections.

As another example of a very small critical section, consider this one line of code:

foo = i++;

This seems pretty simple, but in fact it is not. This line of code requires several assembly language instructions, especially when we are talking about a non-optimized build:

Fetch the value of i from memory into a processor register
Copy the value of i to the variable foo
Increment the register containing the value for i
Copy the value of the register back to memory where i is located.

A preemptive thread switch can happen in between any of these steps. Consider the case where we break between steps 2 and 3 to another thread performing the same operation: If i is 5, Thread 1 foo is 5, Thread 2 Foo is 5, and at the end of the code snippet i is 7! If this code snippet is, for instance, in an SQL database generating unique integer keys for rows in a particular table, we've just generated non-unique keys and created a world of hurt for ourselves.

To get around these kinds of problems, one solution is to use a synchronization primitive. A synchronization primitive is any of a class of algorithms and objects that are designed to synchronize and limit access to shared resources. In this sense, a resource is anything that multiple threads might want to access: An IO stream, a global variable, a shared pointer, even a sensitive sequence of instructions, etc. Any time we have a critical section of code that is sensitive to sharing we want to find a way to limit access to a finite number of simultaneous threads (usually one). There are several ways to do this.

A mutex, short for "mutual exclusion", object is a type of lock that helps to prevent access to a critical section. To pick a pertinent example, think of a mutex like a basketball. Only the one person in the game with the basketball can do things: shoot, pass and dribble. Other players on the team can do other stuff like running, covering, or posting, but they cannot do ball stuff without the ball. You cannot shoot the ball if you do not have the ball, you cannot pass the ball if you do not have the ball. This is a convention of the sport. If we were playing Calvinball instead, maybe we could do these things without looking preposterous. By convention also, if we as programmers declare that a certain shared resource can only be accessed by a thread (player) with the mutex (ball), the those are the rules for our system (game) and things can move along smoothly. Here's an example of that convention in action:

Mutex *m;
AQUIRE_MUTEX(m);
// critical section code
RELEASE_MUTEX(m);

The power in this code is that the AQUIRE_MUTEX() function will attempt to gain ownership of the mutex, and will wait indefinitely for the mutex to become available if some other thread already owns it. ACQUIRE_MUTEX is like waving your arms in the air, shouting "I'm open" until the current ball carrier passes the ball to you. Until you get the ball, you just have to stand with your arms in the air until you get it. Because of that behavior, no two threads can enter the same critical section, assuming of course that the programmer (you) has properly protected that critical section with the mutex. Keep in mind that there is no intrinsic property of the critical section itself that prevents multiple threads from running it simultaneously and corrupting data. The exclusion comes from the proper and pervasive use of locks like our mutex to keep the critical section safe. Here's another example:

Mutex m;

int pop(array* a) {
  ACQUIRE_MUTEX(m);
  int item = a->values[a->length - 1];
  a->length--;
  RELEASE_MUTEX(m);
  return item;
}

void push(array* a, int item) {
  a->values[a-length] = item; 
  a->length++;
}

In this example we can see that we aren't properly using the mutex everywhere, so we can't guarantee that we won't get corrupt data. Multiple threads could just as easily enter the push function simultaneously as could attempt to enter the pop function. If you don't use mutexes everywhere, it's almost as good as not using them anywhere. This is a convention that the coder must decide upon beforehand and follow diligently.

There are multiple ways to implement locks and mutexes. One idea is a spinlock, which attempts to access a flag and enters an endless while-loop until it can. An empty while-loop can be very inefficient on a processor, but if we call a sleep command inside the loop to allow other threads to run while we wait it isn't such a big problem. Spinlocks implemented by the OS inside the kernel event loop can be very efficient indeed. In fact, as a general rule, if the OS implements locking primitives they tend to be much better to use than anything you can write in userspace.

Another type of lock primitive is a semaphore, though it is subtly different. A semaphore allows a finite number of threads to access a finite number of shared resources at a time. Where a normal mutex, like a spinlock, allows only one thread to enter at a time the semaphore may allow one or more. Consider a case where we have five worker threads in a web server, and 100 incoming connections. A semaphore uses a first-come-first-served method to assign incoming connections to available threads. Each incoming connection attempts to access the semaphore. As requests are completed, threads signal their availability and the semaphore assigns the next connection in the list to that thread. A semaphore with only one shared object acts like a normal mutex or spinlock.

The overhead of a lock is the amount of effort it takes to acquire and manage the lock. In a uniprocessor system the lock may be very simple to obtain: First disable interrupts so we cannot be preempted by another thread, check the status of the lock, obtain the lock if it's available, and re-enable interrupts. In a multiprocessor system, especially one with shared memory, the overhead and error-checking involved can be much higher. In these systems the performance gain from using threads can be much higher too, so it's a trade-off.

Granularity is the amount of stuff in your critical section protected by a lock. Course Granularity means that we have lots of code inside our critical section. This is good because we need fewer locks and therefore experience lower overhead. Plus, it's easier as a programmer to make sure we acquire fewer locks over large swaths of our program. The downside is that the larger our protected critical section is, the more likely other threads are going to be blocked waiting to enter it. This, in turn, can create problems like high latency. Fine Granularity is the opposite, where we lock as little code as possible. The upside is that we don't have to worry about multiple threads blocking for long on small bits of code. The downside is that acquiring more locks means more lock overhead, and more programmer effort to implement all the locks consistently and safely. Fine granularity can also lead to deadlock, where multiple threads are stuck waiting for locks that other threads own.

The Python interpreter, as an example, implements a single Global Interpreter Lock, which is a lock to govern the entire Python interpreter. Only one operating system thread can be running the interpreter at once, to prevent corruption of global data. I think new versions of Ruby do this too.

There are other methods of synchronizing access to shared resources. One method is to make all data immutable; If you can't modify data, you can't corrupt it. Since Parrot's strings are immutable, you shouldn't ever need a lock when working with them. You may still need to worry about playing with a mutable container PMC which holds strings, or the mutable registers which point to strings, however.

Parrot is definitely going to want to make use of OS-supplied locks in some fashion. Maybe we want to make a PMC wrapper around system lock primitives, or we want to create some kind of lock manager that uses a single system mutex to distribute a series of immutable tokens to worker threads. The exact details of locking are certainly up for debate, but the fact that we don't want to brew our own should be obvious.

Since locks need to be used consistently for them to be of use at all strongly hints at the fact that Parrot should probably do the locking internally. We probably don't want to apply locks to every single operation, since the common case programs are single-threaded applications and we don't want to apply the performance penalty of lock overhead to programs which don't need it. If Parrot can identify only those PMCs which are shared, it can apply locks selectively to those PMCs only, limiting overhead to only the places where it is necessary. For instance, if we add a synchronize op:

$P0 = syncronize $P1

We can create some kind of wrapper PMC type whose vtables enter a lock, call the vtable of the synchronized PMC, and then release the lock. In this example, if everybody used $P0 when they wanted to modify $P1, all operations would be safe. The onus would be on the programmer to explicitly mark the PMC as synchronized, of course, and many programmers will probably forget to do that.

Maybe instead of passing PMC references between threads directly we create and pass clones and modify them separately on different threads. Then, when we want our changes from one thread appear in another thread, we would call some kind of propagate op:

propagate thread, obj

This would pause the specified thread, update the object contents, and then unpause the thread. This would be very similar to the message passing that languages like Erlang use (not exactly the same, because Erlang wouldn't pause the recipient for this, but you get the idea).

Maybe we have a system where we only share read-only copies. So thread A would own the PMC, but thread B could get a read-only copy of it. This would completely obviate the need to lock the PMC since only one thread could write to it, but then we need some kind of mechanism where thread B could make modifications back if necessary, or maybe B could gain the writable copy and make A's copy read-only. This system could get very complicated very quickly, however.

We could also avoid most locks if we used a transactional memory system to avoid memory corruption, but that could still add overhead to the single-threaded common case and then we would still want a system of locks for other operations that don't require locking a PMC.

These are only a handful of the many potential options that Parrot has ahead of it, and I can go into greater detail about any of them that people are interested in thinking about. I think Parrot is going to want at least some kind of locking mechanism, so we could start prototyping those things immediately if we wanted. How these mechanisms get implemented and applied within Parrot is obviously the bigger issue that we can ignore for now.