Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Sunday, August 23, 2009

PMCs: Now without PMC_EXT or UnionVal

Last night I fixed the last of the test errors that I was seeing and merged the pmc_sans_unionval branch into trunk. This branch made a number of sweeping changes, but the most important two were removing the UnionVal structure from PObjs, and merging the PMC_EXT structure into the PMC structure. The bulk of this work was provided via patch from newcomer jessevdam, I mostly did some cleanups and post-facto changes for consistency and cleanliness. I think it was much more important to get this change merged then to have it be perfectly pretty.

Almost immediately after I merged the branch reports of test failures on various platforms started rolling in. kid51 was seeing consistent failures on PPC. mikehh and GeJ were seeing consistent failures on x86 Linux too. I ran through the tests another two times before I spotted an intermittent failure on x86_64 too. I hadn't seen them when I committed, but they did appear about 1/4 of the time when I ran coretest.

The failures that kid51 was seeing were different from those that GeJ reported. It turns out that I was seeing both these failures on my machine too, about half the time when any failure manifested. A little bit of debugging this morning turned up the culprits: The first was caused by a strange situation where a PMC metadata object wasn't being marked properly by the GC and getting prematurely collected. This error was easy to spot because of a helpful debugging message that had been inserted into the GC code long ago. With this error fixed and the GC structure sufficiently changed, I removed that debugging message.

The second failure was more difficult to nail down. When I took a backtrace, I saw that the failure was happening deep inside the PCC system function call argument processing functions. This system is currently a hideous web of evil code, but luckily it is what Allison is working to refactor right now. I would have thrown up my hands in despair at seeing that, and given up on any hope of a fix but for one thing: I had seen a backtrace like this before when I was doing some early work trying to replace Parrot_PCCINVOKE with the more modern Parrot_pcc_invoke_method family of functions. The error, in a nutshell, is a PCC bug: PMC** arguments passed to a function call in order to receive a return value need to be initialized to point to a valid PMC value before being passed. This despite the fact that they should only be treated as a storage location and overwritten without ever looking at the value originally pointed to. Despite the "should" in that last sentence, somewhere along the PCC pipeline the values are being passed off to the GC which tries dutifully to mark it, and things go down hill.

This fix, unfortunately, means turning this:

PMC *retval;
Parrot_mmd_multi_dispatch_from_c_args(interp, "subtract", "PPP->P", pmc, value, dest, &retval);

into this:

PMC *retval = PMCNULL;
Parrot_mmd_multi_dispatch_from_c_args(interp, "subtract", "PPP->P", pmc, value, dest, &retval);

One line changed and now the test appears to be working perfectly on every system where I can get a report. I sincerely hope Allison can get these issues resolved in the PCC refactors work, because I hate tracking these kinds of issues down. If I hadn't recognized that backtrace and the problem that caused it, I would still not have a fix for this bug.

With this, three of the branches I mentioned before the release have landed: the auto_attrs branch, the Parrot_Sub refactors branch, and now the pmc_sans_unionval branch. Bacek is getting damn close to finishing the Context PMC work too (now in the context_pmc3 branch) too. I don't know where chromatic and cotto are in the pluggable_runcore branch, and I don't know where Allison is in the pcc_arg_unify branch either. But, I still have very high hopes that they will all land soon and Parrot will be a better place.

I'm shooting through some code today making miscellaneous cleanups. There have also been some problems recently involving the fixed-size allocator (especially on Windows) so I am going to have to dig into that issue soon too. Once I get that working perfectly again, I want to run some benchmarks. I bet we're going to see some significant speed improvements over 1.5.0 when all the numbers come in.

1 comment:

  1. Minor correction, the error I reported was on FreeBSD amd64.
    Anyway, I just updloaded an all-PASS smoke report. Thanks for taking the time to dig on the issue and fix it.

    -- G.

    ReplyDelete

Note: Only a member of this blog may post a comment.