On one hand it's exciting--very exciting--to have a real production-quality GC core in Parrot. On the other hand, performance is horrible with the Boehm collector. Well, maybe not "horrible", but worse than we have now. Despite the high quality of the Boehm collector and the hard work of Bacek, the new core may prove to be little more than a carnival oddity because of poor performance. The big question to ask first is "Why does Boehm perform so slowly in Parrot?", perhaps followed by "...and is there anything we can do to speed it up?". I won't really tackle the second question here, it's far too big for a single blog post.
I've given a brief overview of GC concepts in the past, but I didn't really cover all aspects of GCs. There isn't enough room in a single blog post to do it (although I am hearing very good reviews about a new book about Garbage Collection that might be able to cover things in more detail). One aspect of GCs that I did not mention in my previous post is conservativeness.
Memory gets broken into regions that we will call "spaces". Spaces can be used for all sorts of things and most programmers will already be familiar with them: stack space, heap space, program space, etc. We know that in our running program all PMCs are allocated in heap space, and are referenced from both heap space (pointers contained in PMCs to other PMCs) and in stack space (PMCs being actively used by C code. If we need to find and follow all pointers to PMCs, as we do in the GC mark phase, we can accomplish this by traversing the stack and heap spaces linearly and following every value that looks like a pointer to heap space.
The problem with a linear memory traversal is that you do it blind: you do not know whether any given value is a valid pointer or not. You can only assume that any value you encounter which--when treated as a pointer--points into heap space, you just assume that it's a pointer. This behavior creates lots of false positives, because values that are not pointers are treated like pointers and followed like pointers. We call this behavior conservative. A Conservative GC, like the Boehm collector, is completely disconnected from the operations of the program, and so must blindly traverse the heap and stack space following all pointer-esque values, whether they are actually pointers or not.
The alternative type is a precise GC, which knows where pointers are beforehand and never has false positives. A precise GC can avoid scanning large regions of memory that are guaranteed to not contain any pointers. It can also avoid scanning numbers that happen to look like pointers into heap space, because it knows which fields are numbers and which are really pointers. Precise GCs are almost always faster than their conservative counterparts, though they require significantly more integration into the rest of the program. This creates both an encapsulation problem and requires ongoing maintenance of the GC's algorithms as program data structure formats change. The GC has access to more information, but the program now has the responsibility to ensure all that information is consistently used and remains accurate.
Parrot's current GC is precise. At least, it attempts to be. Each PMC type defines a "VTABLE_mark()" routine, which can be used to mark the PMC and all the pointers that it contains. Each PMC type knows exactly where it's pointers are, so it can focus it's efforts only marking things that actually need marking. Where Parrot's GC fails to be precise is in the stack-walking code. I've complained about that before and won't really rehash it here. Suffice it to say that if Parrot got cleaned up to avoid unanchored PMCs on the stack and therefore to not need to stackwalk for a GC run, it could provide a nice performance improvement.
But, I digress. The Boehm collector is a stand-alone module that obviously has no intimate knowledge of the internal workings of Parrot. Boehm doesn't know some of the intentional simplifications that Parrot uses to keep track of it's PMCs and to keep them efficient for use in the GC. Let's look at some C code:
int * x = (int *)malloc(sizeof(int) * 32);
int * y = &(x[16]);
int * z = &(x[32]);
In this code snippet we see our aggregate object x, and a pointer y which points to a memory location inside x. When Boehm does a stackwalk here, it sees the value y, determines that it's a pointer into heap space, and then must trace backwards from there to find the start of the aggregate object x. Boehm must then compare the size of x to the distance from there to y to ensure that the pointer y is actually located inside x. Next we find the value of z, which points outside the allocated buffer of x. I don't know why you would intentionally write the code above, but that's hardly the point. Boehm sees that z points into heap space, traces back until it finds x, determines the size of x and then sees that z points outside of it. That's quite a lot of work to mark pointers, but it's necessary when you consider that normal arbitrary C code can create pointers to normal arbitrary locations inside a structure or array. Boehm needs to be very conservative in order to be usable by a wide array of software, much of which won't have the same design for memory objects or access that Parrot has.
So back to the question at hand: Why is Boehm so slow on Parrot? The answer is that the Boehm GC is a high-quality tool that needs to be general enough for adaptation to a wide array of programs. It sacrifices intimate knowledge of Parrot's internals (which it can use to focus and optimize it's behavior) in exchange for improved generality. On the other hand Parrot's current GC, while naive, does have this intimate knowledge that it can use to improve performance. If we know where all the pointers are ahead of time, we can avoid all the effort of having to search for them.
Sigh...
ReplyDelete