The project is now over, and he's posted a nice final review of what he managed to accomplish this summer. While it isn't everything that he mentioned in his proposal, it still is quite a nice step in the right direction and a lot of the necessary framework is now available to bring threading to Parrot in a real and usable way.
So what did he manage to do? In his branch there is now a more-or-less working implementation of green threads, which means Parrot is getting onto the same level of capability as modern Python or Ruby implementations. You can schedule multiple independent tasks to run and, while they cannot run truly in parallel on separate OS threads or even on multiple processor cores, you can get the illusion of concurrency. It also gives you the ability to run multiple tasks together without having to explicitly schedule them one after the other, or having to manually switch back and forth between them. In a later post I may go into some detail about how this all works, and how it can be used by programs.
This is a pretty significant step forward, and once a few of the final nits get worked out I think we will be able to merge this into Parrot trunk and start making use of the new functionality immediately. However, green threads is only one half of the promised "hybrid threads" that Chandon's proposal hoped to achieve. The other half is the ability to run Parrot code in true parallel on separate OS threads. This was the larger part of the project and definitely the more difficult piece. Today I would like to talk a little bit about why it didn't get done, and maybe motivate some people to look into help completing the work as we move forward.
Let's take a look at a small snippet of normal object-oriented code:
Extremely simple, we have an object "foo" and we are calling the "bar" method on it. In a very naive staticly-typed language this would be a simple operation. The compiler would determine the type of foo, calculate the location of the bar routine in memory and would simply call that address twice. This would be extremely fast to execute too, which everybody likes. This would basically be converted to this low-level pseudo-code:
Now let's move up to a slightly more complicated example: a statically-typed language which allows subclassing and method overriding. Now, the compiler doesn't necessarily know which function in memory corresponds to "foo.bar()", since foo could be of type Foo, or of type FooSubclass, or even FooSubSubclass, and the location of the appropriate bar function would change with each different type. However, for each type, the value of bar does not change. It can be overridden by subclasses, but it cannot be changed for the class itself. Now, the compiler needs to ask foo to get the appropriate method first:
method _bar = get_method(typeof(foo), "bar")
Assuming foo does not change type inside the call to _bar, this code works just fine. Next let's look at a more complicated example still, the case of a dynamic language like Perl or Python. In these languages the class MyFooClass is an object of type Class, and foo is an object of type MyFooClass. Also bar is not a compiled-in constant property of the Class, but is instead a mutable property of it. Inside the call to bar, maybe we change the definition of bar itself inside the class object. Likewise, the definition of routines like "find_method" inside Class can change. Accounting for all this dynamicism, our code changes to look like this:
class fooclass = find_class(foo)
method _get_method = find_method(fooclass, "bar")
class fooclass = find_class(foo)
method _get_method = find_method(fooclass, "bar")
Keep in mind that everything can change inside each call to bar. We can:
- Modify the inheritance hierachy of MyFooClass to add or remove parent classes
- Add or remove methods in MyFooClass, and any item in it's inheritance hierarchy
- Change the class of foo
If classes are global, and need to be accessible from multiple threads, we inevitably run into the idea of contention: If one thread is changing the definition of foo.bar() at the same time that another thread is attempting to look it up, the thread doing the reading may end up with incomplete, corrupted, or incorrect information.
Global data is the enemy of good multithreading. Global data is a necessity for highly dynamic object systems like those supported by Parrot. Ergo, multithreading on Parrot is hard.
Look at all the global data in Parrot: there's the interpreter object itself, the packfiles, the list of Classes (and the methods/properties that they contain), the list of Namespaces (and the global values they contain), the context graph (and the contents of all registers in those contexts), the opcode table and the MMD signature cache. This is a hell of a lot of globally-available data, and we're going to need to find a sane way to limit corruptive concurrent access to these things. Not only that, but if we have to lock every single access to a global value, which means we need to obtain a lock every time we want to do a method call, our performance is going to be terrible.
By convention I think we can all agree that things like the opcode table should really not be modified when we have multiple threads running. But it's harder to make such a convention that class/method definitions should be treated as constant when multiple threads are running.
What we can do to alleviate some of the performance problems is to create short-term caches for things like Classes and Methods in local threads, with the understanding that without explicit synchronization, the exact ordering of read/write operations between threads cannot be guaranteed and can be scheduled for maximum benefit. If the relative execution times of two operations on separate threads cannot be guaranteed, then it makes no difference to functionality for the user whether those operations happen at random times with respect to each other and whether Parrot manually orders them to improve caching.
Think about it this way: We have two threads, one is writing a global value and one is reading it. These threads are operating in true parallel on two separate processor cores. If we run this program a million times, sometimes the read will randomly occur before the write, sometimes the write will occur before the read, and sometimes they will happen at the exact same moment and cause corruption. Depending on a simple matter of random timing, we could get three very different results from our program. Now, if Parrot implements a heuristic that if a write and a read happen within a very short period of time, Parrot will manually order them so that the read occurs before the write. And, if the read always happens before the write, maybe we don't read at all, but instead use a cached version. Then, only when the write has complete, we update the cache.
Another thing to think about is that we could disallow direct lookups of data in global variables, and give each thread a local copy of global values. Considering the principal I mentioned above, we can be a little bit liberal with the way we handle writes and reads. If a thread modifies its copy of a Class, those changes could be propagated to a global reference copy at a convenient time, and then propagated down to the local copies later, only when it's convenient to do so. Remember, if things are happening at random times compared to each other, it makes no substantive difference to the thread whether it's copy of a global variable is exactly up-to-date or whether it's a little bit lagged behind. That is, the reading thread doesn't know whether the writing thread had a legitimate delay before writing, or whether Parrot manually scheduled the write at a different time.
To get a complete Hybrid Threads implementation in Parrot like what Chandon was envisioning, we are going to have a few steps to take:
- We have to break the Interpreter structure up into two parts: One which is truly global and contains data which all threads need, and one which is local for each thread and contains data that the thread needs. The local interpreter may contain a cache or a mirror of data in the global interpreter.
- We need to come up with a good sane scheme (which will likely consist of both technical solutions and programmer conventions) to manage access to global values
- We need to come up with a good sane scheme for sharing values between threads.Creating a local variable and passing a reference to it to another thread essentially turns it into a global variable and opens up problems with contention. We need a good way to manage this without slowing everything down.