Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Tuesday, March 2, 2010

Difference between PMCs and Objects

There has been lots of talk and activity lately that has to deal with Parrot Objects. My rant about exceptions in Parrot has incited Tene to begin a flurry of development on that system, and Austin's Kakapo project has been regularly pushing the boundaries of what kinds of operations are and should be possible (and finding lots of bugs along the way!). Other people have been bringing up the topic as well, and lots of people are asking lots of questions about the implementation. I'm going to use this post to explain a bit about how Objects and PMCs work in Parrot, and maybe later I'll devote a post or two to ideas for fixing this system.

PMCs are basically objects, though extremely simple, flexible, and low-level. PMCs are interacted with, primarily, through the VTABLE interface. VTABLEs in Parrot are long lists of C function pointers that implement various behaviors. Calling the in-place addition VTABLE, add_i, is done like this in C:

VTABLE_add_i(interp, pmc, 5);

...Which translates to this:

pmc->vtable->add_i(interp, pmc, 5);

By pointing to a per-type VTABLE structure, PMCs with the same type can access a common list of function behaviors without overlapping or needing to do expensive switch/cases over a list of direct function calls. Likewise, determining the type of a PMC means finding the type of the VTABLE it points to:

pmc->vtable->base_type; // type number
pmc->vtable->whoami; // type name (Parrot STRING)
pmc->vtable->class; // Class or PMCProxy PMC for the type

Also, if we have the type number, we can look up the particular VTABLE in an array:

VTABLE * tbl = interp->vtable[index];

In a sense, that's all there is to a PMC. All interactions with a PMC happen through this interface of about 185 function pointers. A PMC, by itself, doesn't have things that we would normally associate with "objects" in higher-level systems: Attributes and Methods. Sure, PMCs do have a way to associate a C structure, and therefore maintain a list of what we call "attributes", but those aren't directly accessible from PIR without adding some kind of lookup routine to find them and maybe wrap them into one of the Parrot register types (INTVAL, FLOATVAL, STRING, PMC). PMCs also appear to have methods, but this really isn't the case when you look at it closely.

As I describe in a previous post, the long way to invoke a method on a PMC is like this:

$P0 = new ['Foo']
$P1 = find_method $P0, "bar"
callmethodcc $P0, $P1

The find_method opcode is a thin wrapper around the VTABLE_find_method interface function. If I translate this to an extremely condensed and wildly inaccurate pseudo-C listing, we get:

PMC * p0 = Parrot_pmc_new(interp, type_Foo);
PMC * p1 = VTABLE_find_method(interp, p0, "bar");
setup_method_call(interp, p0);
VTABLE_invoke(interp, p1);

This is obviously an extremely inaccurate listing, but should do well to illustrate my point. The method is actually a separate PMC type. It can be either a Sub (a .sub written in PIR) or an NCI (a wrapper type around a C function call). To make the call we set up the argument list (the invocant, $P0, is treated sort of like an argument but is kept distinct) and then invoke the method.

Before they are invoked, methods are stored inside either a Class or PMCProxy PMC associated with that type. When we call VTABLE_find_method(interp, p0, "bar"), we go through this machination:

PMC * class = pmc->vtable->class;
PMC * methods = class->data->methods;
PMC * method = VTABLE_get_pmc_keyed_str(interp, methods, "bar");

What we think of as an "object" and a "class" is actually a small collection of interoperating PMCs. The PMC itself contains a long list of VTABLEs and a small amount of data stored in a C structure, which cannot be directly accessed from PIR code. The PMCProxy PMC (like Class, which I will describe later, but designed to work with PMC types written in C) contains a hash of methods and a variety of other data. Methods themselves are their own PMCs, complete with their own type data. To really blow your mind consider that, as a PMC, you can call a method on a method, or even a method on a method on a method.

In short, a PMC is sort of like the building block that is used to create objects and a type system, though the PMCs themselves are not what we normally think of as "objects". The only way to interact with a PMC is through VTABLEs, not attributes or methods. Luckily, VTABLEs exist that allow us to query the object for related attributes and methods, though the PMC itself may not necessarily respond to these requests.

Using PMCs, Parrot does provide a proper Object system through the use of two special PMC types: Object and Class. Class, as can be guessed, is a "metaobject" that defines type information for objects of a single type. The Class uses a series of PMCs internally to manage things like method PMCs and attributes. The Object PMC is the basic building block of a class instance object. It provides a series of default vtables that allow it to interact with Class the way we expect (to find methods that are stored in the class reliably, for instance) and to provide a set of attributes that are available for access from PIR. PMCs are the almost formless building blocks, Object is a very specific PMC type that provides behaviors that we expect from an OO type system.

Now that we've covered basic definitions, what are the big operational differences between the two systems? Here's a short list:

  1. Object types are defined by Class PMCs. PMCs are defined by PMCProxy PMCs
  2. Class PMCs are created whenever we do a "newclass" or "subclass" operation from PIR. PMCProxy PMCs are created lazily, only when we actually need to introspect a built-in PMC type.
  3. Objects must be created from a Class, which means the Class PMC must exist before any Objects of that type can be created. PMCs can be created by themselves and generally don't require instantiation from another PMC.
  4. Objects have very regimented behavior: You can (and should) expect certain things when you access a named attribute or named Method. In a PMC these behaviors may be overridden to do different and unexpected things. Specifically, it can be very difficult to get access to named attributes on a PMC unless they are explicitly made visible from PIR (which can be a lot of work, and not a lot of PMC types do it completely)
  5. Inheritance between PMCs happens at the C level, so C-level attribute structures are merged together and made visible from C code. Inheritance between objects happens at the PIR level, method and attribute lists are combined and made visible as expected when accessed from PIR code. Inheritance from a PMC to an object is almost always broken, if you expect the attributes and methods from the PMC to magically become visible as attributes and methods on the Object. I've never seen inheritance from an Object to a PMC subclass, but I suspect it is broken even worse.
  6. The VTABLEs in the Object PMC all provide an option to use a PIR-based override routine to implement the behavior. To do this, every VTABLE function in the Object PMC searches the associated Class for a similarly named VTABLE Sub PMC and, if one is found, calls that. PMC types almost never search for an override in the Proxy, and if you define one it will never be called (unless you specifically implement the logic to search for and execute it). On a related note the VTABLEs of an Object, because they are stored as PMCs in a Hash in the Class, can be modified at runtime. The VTABLEs of a PMC cannot be (well, I guess you could change the pointer to call a different function if your C-foo is strong, but I would prepare for fire and brimstone. Also, I won't fix any "bugs" that arise from this misguided behavior). I estimate at least 10% of reported bugs or feature requests in Parrot come from the "this sucks worse than I would expect" behavior of subclassing Objects from PMCs. If you can get away with it, it is almost always better to delegate to a built-in type instead of inheriting from it directly. But, I can talk more about problems and workaround solutions like this in another post.
So there you have a guide to the differences between Objects and PMCs. PMCs are the low-level building blocks of an object system, and Objects are combinations of several PMCs and a large number of default VTABLEs to implement an expected set of OO behaviors. In a sense, Objects are PMCs, but in another sense they really aren't.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.