Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Wednesday, May 6, 2009

Switching Gears: GC

I talked briefly with Allison today via email, and decided that the best idea for me is to switch gears and not focus on the Asynchronous IO system just yet. Though I've been trying to keep it out of my mind for a little while the GC really is the most pressing need we have right now, so I'm going to start working on that instead of the AIO system, for now. Given several projects that I am both willing and able to work on, I will tend to work on the one that's the most needed by the project at large. I say "tend" because this is far from a hard-and-fast rule, but it does explain why I am willing to change focus in this case so readily.

My loyal followers (the null set) will remember back to last summer when I was working on a new GC system for Parrot as part of the Google Summer of Code project. People should also remember that my project was ultimately unsuccessful: I was not able to produce a GC that ran reliably. Part of it was my fault, the GC turns out to have been a bigger problem then I was capable of building successfully at that time. However, we've also learned the lesson that Parrots GC API is particularly messy and needs to be cleaned significantly before we can make another honest attempt at an improved GC core.

My task, should I choose to accept it, is to clean up the GC API so we can eventually have pluggable GC cores. This is going to be a bigger task then most people realize, for a variety of reasons. The current GC is intimately intertwined through the whole codebase. There isn't clear encapsulation between the GC and other parts of the system, and as a consequence it's almost impossible to create a new GC core because we don't even know what all the old core is doing (and where it's doing it at).

What we need to define is:
  1. What interface functions (and macros) the GC should provide to do it's work.
  2. What functions from the rest of Parrot the GC is permitted to call. This should be a very small subset of all functions Parrot provides.
  3. What data structures the GC provides for access by the rest of the system.
  4. What data structures the GC is allowed access to.
Not a small task at all but potentially a very rewarding one if it's done right.

Yesterday I created the "gc_api" branch to start the work, and my first task is to rename of the functions from "foo" to "Parrot_gc_foo" following the naming convention used by the rest of the Parrot subsystems. This isn't going to be a simple prefixing operation, some functions are going to be completely renamed to be more honest about what they do.

Next task is to work out what functions the GC needs to provide and export to the rest of the system. These functions will be located in src/gc/api.c only. Any function not in that file will not be intended for use by the rest of Parrot. Here is a partial list off the top of my head of functions that I think the GC needs to provide:
  1. Initialize and Deinitialize the system
  2. Allocate new STRING and PMC structures from the proper pools
  3. Allocate new pmc_ext and sync structures (or, given a PMC, append these structures to it)
  4. Allocate new raw data items from the fixed-size pools (or from the system, or whatever)
  5. Mark an object as being "alive"
  6. Mark an object as being "dead"
  7. Create a new pool
  8. Perform a mark phase
  9. Perform a sweep phase
  10. Perform a combined mark/sweep collection run.
There are probably a few more basic operations that I am forgetting right now, and there are definitely some "wishlist" items that I'm not mentioning here too for brevity. This does seem like a reasonably complete yet concise interface for the GC to implement. I'll post updates as my work progresses.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.