Tuesday, December 1, 2009

GC Gets Kick Start

Parrot's garbage collector is starting to really get a lions share of developer attention recently, especially after some very interesting benchmark statistics from chromatic went public. For the benchmark of building the new NQP-RX project, a whopping 80% of execution time is spend in the GC mark phase. Actually, the real statistic is that 80% of the execution time was spend only in the Capture PMC's mark routine (and the functions that it calls). That's quite a huge amount of time, even for a naive GC like ours to spend.

Let's take a quick recap about GCs, and what causes them to be so expensive: GC is used to find and automatically reclaim unused data items so that the storage space can be reused. A good GC system means that the system programmer does not need to manually free memory when it is done being used: The GC will detect and automatically deallocate the unused memory. A good GC system, in essence, will completely eliminate memory leaks, and help to make the codebase much more clean and succinct. To do this, GC needs to first find all unused ("dead") objects and then free them, two phases known as mark and sweep.

In a naive implementation of a mark and sweep algorithm, there are two aptly named phases: mark and sweep. The mark phase is charged with finding dead objects to reclaim. We typically do this in a reverse order, by first finding all objects that are in current use ("alive"), and then declaring all other objects to be reclaimable (or already free). Starting from a root set, such as the register sets and interpreter globals in Parrot, we can construct a graph of all objects by following pointers and marking each reached object as alive. It stands to reason that if there is no pointer to a particular object, it cannot be accessed, and anything that we do not access during the mark is presumed unreachable, which is the same as dead.

In the sweep phase, we must iterate over the pool of all objects, finding objects marked dead and freeing them. Freeing an object typically involves calling a custom destructor if one is provided, and making the memory available to the allocator so the memory can be reused the next time an allocation is made.

In every mark and sweep GC collection run for a naive collector, we must first trace the entire memory graph and then iterate over the entire object pool. This is very expensive, and the expense grows large as the memory use of the program grows large. What we need for Parrot is something a little bit less naive.

What we probably can not do is make huge conceptual improvements to the idea of mark and sweep: We will always need to detect alive objects, and we will always need to traverse and free the dead ones. The general idea is sound and that's not something we want to change. What we can do, however, is to impose heuristics on the system to decrease the number of items to mark and decrease the number of objects to sweep. This is where the bulk of GC performance improvements can be made, by being much smarter about how the GC is used.

Allison sent a nice email to the list the other day essentially saying that GC has become an officially-recognized pain point and that we as a team are going to be looking at improvements after 2.0 (if we don't manage to start before that). Very interesting discussion has already started on ways to improve it.

As I mentioned above, the bulk of GC performance improvements are made by applying heuristics to decrease the number of objects to mark and sweep. A secondary set of improvements can then be made, often at the code level, to make the GC's operations run faster. I'll call the first set of improvements "algorithmic", and the second set "implementation". So what we the Parrot developers need to do first is pick the right algorithms to use and then implement and optimize them.

Here is a general list of things we can do to improve GC performance in Parrot:
  1. Allocate fewer GCable objects. This is typically the result of user-level code optimization. So, Parrot needs optimizers that are GC-sympathetic. Parrot also allocates a number of STRINGs and PMCs for internal purposes, so we need to minimize that.
  2. Mark fewer objects. This comes from a good generational GC system where we segregate items based on how stable they are.
  3. Sweep fewer objects. I think chromatic's linked-list idea will help us significantly in this regard.

What I think we are leaning towards in Parrot is a system called a generational GC. A generational system uses the heuristic that items which have lived for a long time without being GC collected will tend to stay alive longer, and items which are recently allocated tend to die quickly. It's an acknowledgement that a lot of garbage is created for very short-term uses, and relatively few things stand the test of time. Here's a quick example using explicitly non-idiomatic Perl 5:

my @array = fill_array(100); # 100 items in the array
foreach my $item (@array) {
my $new_item = mangle($item);
say $new_item;
}

In this loop we create a lot of garbage. Every new instance of $new_item is a new collectible item which can be declared dead at the bottom of the loop and allocated anew at the top. Also, all the local variables used inside the mangle function follow the same life cycle. The only items that survive through the entire snippet are @array and it's contents.

Every time we mark, we have to mark @array and all it's contents, even though they are long-lived and will survive the entire loop. Every time we sweep we need to separate dead items $new_item and all the local variables created inside mangle() from @array and its persistently live set.

Generational GC works by saying that @array is long-lived and putting it into an older generation. Older generations contain objects which are, by definition, older and therefore less likely to die. If we aren't worried about the item dieing, then we don't need to explicitly mark it. At least, we don't need to mark it as often. We also don't need to sweep it, if we can find a good fast way to avoid that.

Sunday, November 22, 2009

Running out of Free Time

I'm slowly running out of free time recently. Ongoing computer problems have diverted an unfortunate amount of my effort recently, and I haven't been able to develop as much as I would like. It's very hard to get into the swing of a hardcore coding session when your computer freezes or spontaneously reboots every 30 minutes on average. I saw "on average" because sometimes it's much more frequent than that. I had to reboot my computer twice while writing this first paragraph. Thank goodness for Blogger autosaving my drafts.

On top of that, we're having a kid. An entire new kid. The due date was on Friday, so now we're officially in over-time and very much looking forward to having an "outside" baby to play with. We both fully expect to be spending some serious time in the hospital this weekend or next week, with sooner being better than later. I've never had any of my own children before, so I'm not entirely certain how it's going to change things. But I am certain that they will change, and probably not in a way that gives me more time to hack on Parrot.

Despite my lack of time, this morning I started work on a brand new secret project. I'm not going to give any details about it quite yet (I want to get some more aspects of the design worked out before I go showing off my work), but suffice it to say that if things work out the way I hope they will this project could provide a significant benefit to Parrot and its ecosystem. I'm intending this project to be a Christmas present to the Parrot project, so I hope I can get it done by then. More details to come.

I'm also trying to do some work to get :call_sig working properly. Matrixy is definitely going to need improvements there, and I'm sure other projects will want it too. If I can keep the changes small I will just commit it directly. If it starts getting too large, I'll make a branch instead.

Matrixy and Parrot-Linear-Algebra are going to take a back seat for now while I focus on other issues and my new secret project. I've got a few cleanups I want to make and some tests to add of course, but no huge development for a little while.

Thursday, November 19, 2009

Parrot Project Proliferation Part 3

This is part 3 of my series on the cool new Parrot projects that are popping up around the interwebs. Today I'm going to introduce Markdown, NQP-RX, and nqpTAP.

Markdown

Markdown is a text markup syntax that's designed to be easy to read and edit. In some ways, it's like wikitext, except Markdown has been driven by a consistent design philosophy while wikitext has grown in a platform-dependent ad hoc way.

Parrot has it's own markdown engine now, courtesy of fperrad. It converts markdown input into properly-formatted valid HTML output. And the best part is that it runs on pure Parrot. So now, with all your cool websites you're making with mod_parrot, you can use this markdown engine to format text.

NQP-RX


It's not exactly a small project, but NQP-RX really deserves some attention. It's a rewrite of NQP and PGE from the Parrot repo, but properly integrating the grammars into the NQP language and enabling a lot of cool new features that "classic" NQP doesn't have. On top of that, NQP-RX properly bootstraps: It knows enough syntax in order to parse itself (after it's already been built from PIR source, of course). That's no small feat for a program written in the Parrot equivalent of assembly language.

The old NQP is still hanging around in the Parrot repo like it always has, and projects that were relying on NQP will still be able to work with it. However, the new NQP-RX is developed on github and snapshots of it are kept in the extensions directory in the Parrot repo too.

nqpTAP


Every project in the Parrot ecosystem that I have seen makes extensive use of unit testing. Some projects are test-driven, though the majority seem to use post-facto tests for verification and to prevent recursion. Whatever the purpose, tests are everywhere and the TAP standard is used by almost all of them.

the nqpTAP project, started by dukeleto and based on his work in Plumage, is a pure-Parrot TAP harness that executes tests and summarizes results without depending on anything besides Parrot itself. Keeping dependencies low is always a good thing, and nqpTAP helps to reduce the barrier to new projects looking to create a proper test suite. Even better, nqpTAP targets the new NQP-RX, so it will be stable and working long into the future.

These three projects are very interesting, and I think it's worthwhile to give them at least a first look.

Wednesday, November 18, 2009

Trac Ticket Hunt

Yesterday, after a herculean effort, the Parrot devs closed out the remaining old tickets from the RT system. Many of the tickets were vague and uncloseable, or unreproducible. Many of them were translated into Trac tickets for futher monitoring.

Of course, we still have a lot of open tickets in Trac. Over 600 of them, as Coke pointed out in an email this morning. That's quite a huge number of open issues, and really too many for our current development team to deal with in a reasonable amount of time. We need help dealing with this huge backlog, and this is a great opportunity for new interested people to get involved in Parrot.

Preparing for 2.0

The 1.8 release went out the door on Tuesday, and I think it went much more smoothly than 1.7 did last month. Not completely without hiccups, but better. Now we're in the home stretch for the big 2.0 release in January where the mantra is "production ready".

What does it mean to be production ready? First and foremost I think of stability and reliability. Nobody is going to invest time and effort in software that isn't stable. Next, I think about performance. Computer hardware isn't cheap, and we can't be shipping a piece of software that hogs processor cycles and costs companies more money to support.

With these goals in mind (and I would love to hear what other people think "Production Ready" means), I think there are two big things we need to focus on: Testing and Profiling.

Testing

Test reports are good, and we're starting to get a very large volume of test reports flowing in, including test reports on new or exotic system. Bravo to anybody who has set up an automatic build bot in the past few months. It is sincerely appreciated.

Test reports are a good and necessary first step, but are by no means the end. Tests are good when they all pass, but that's boring (and unfortunately, it's not usually the case). What's really interesting and important is finding the failing tests and writing up Trac tickets for them so they can get fixed. So here are some things you can do to help:
  1. Monitor the stream of incoming Smolder reports and look for failures
  2. If the failure is happening on a platform that you can test on, try to verify it
  3. See if you can isolate the code from the test that is failing. Bonus points: See if you can write your own small test that demonstrates the bug. The smaller the better.
  4. Open a Trac ticket including information about your platform, the revision where the failures first start appearing (as close as you can tell), and any test cases you've produced that exercise it.
  5. If you're able, submit patches to fix the issue, patches to improve testing of the issue, or patches for documentation to explain what the desired behavior is
All of these things would be very awesome, and are all great ways to get involved in Parrot without having to dive into the source code head first.

Profiling and Benchmarking

Let's not fool ourselves: Parrot is not speedy fast right now as far as VMs are concerned. We don't have JIT, we don't have a good GC, we don't even have PIC. We don't do enough caching, our startup time is still terrible, etc. There are lots of big optimizing features that we need to implement in the coming months and years if we truly want to be a viable and even formidable alternative to other VMs.

However, this all doesn't mean that the only performance benefits that we need come from these huge projects with weird acronyms. There are plenty of small implementation and algorithmic improvements that we can make throughout to start slimming this bird down, some of which will have serious effects on our performance. Parrot is so large though that we can't necessarily find all these little optimization opportunities easily. We need to narrow the search. This is where profiling and benchmarking come in. This is where we need you.

Parrot has a fancy new profiler tool that can be used to profile programs written in PIR or any of the HLLs that run on top of Parrot. It still needs lots of documentation, but it should be mostly easy to use for people willing to poke around in it. If you can find a good example program that demonstrates some real-word usage patterns, we would love for you to profile them and send us the reports. Knowing where the bottlenecks and slowdowns are will help us to target, refactor, and improve them, and that's a big help.

To start up the profiler, run Parrot with this incantation:

> parrot -Rprofiling

For more information about what to do with it, hop onto the IRC channel and ask around. I haven't used this much myself, but it would be cool to get started.

To prove that we are indeed making things faster, we need benchmarks. Good benchmarks are programs that perform lots of very repetitive work and target a particular Parrot subsystem. We want programs that really exercise Parrot, and can do it in a consistent way. Then, we can use timings on these benchmarks to show whether Parrot's performance is improving or getting worse over time. This is very important.

Ticket Triage

As Coke mentioned in his email, we can't sit back and congratulate ourselves now that RT is empty. We need to focus our attentions now on the growing backlog of tickets in Trac. Some of the issues documented there are very serious and will definitely prevent Parrot from being stable and "production ready" by 2.0.

As Coke outlined, we need people to go through old tickets and answer a few questions:
  1. Can we reproduce this issue now with Parrot 1.8.0? Many tickets were filed weeks or even months ago, and may have disappeared in the course of normal development
  2. Look at RFC tickets (requests for comments) and weigh in. Do the changes described make sense? Would they be beneficial? Many of these tickets are simply waiting for some kind of discussion before they get closed.
  3. If the ticket involves an HLL, see if you can reproduce the issue using pure-PIR code instead of high-level code. Parrot operates on PIR natively, so Parrot developers are most easily going to be able to fix problems that can be demonstrated in PIR
  4. If you see a ticket with a segfault, sigbus, sigfpu or other system error condition, see if you can provide a backtrace.
  5. If a ticket contains a patch, see if the patch still applies cleanly to trunk. If so, see if the patch fixes the problem.
  6. Add comments or set other information to make sure the ticket stays up-to-date and informative. Even if the information you add is small ("Still a problem!" or "fixed!"), that's still something. If nothing else, make sure the ticket is properly categorized by component, milestone, etc. You'll probably need to create a Trac account (free and easy!) in order to make modifications
  7. Look for duplicates. If two tickets describe the same problem, one of them can go.
  8. If the ticket can be legitimately closed (fixed, no longer a problem, a duplicate, etc) make sure that happens. Hop on IRC or the mailing list and harrass people until it gets closed. It may be a little bit annoying, but it will get results.
Conclusion

I haven't done a Parrot4Newbies post in a while, and I know some people have been looking for ways to get involved. With 2.0 on the horizon testing, profiling, and ticket triaging are all great and incredibly pertinent ways to get involved. And more importantly then just being involved, these are all great ways to help Parrot grow and get ready for the big milestone. So if you are interested in Parrot and have a few spare moments, take a look at some tickets and see what you can accomplish. I can guarantee that anything you get done will be much appreciated.

Tuesday, November 17, 2009

Parrot Users Mailing List

Received a message from Coke this morning: We have a new mailing list setup specifically for users of Parrot and applications running on top of Parrot. If you would like a place to chat about Parrot without getting sucked into the minutia of the developers mailing list, the parrot-users list might be the thing to look at. Subscription to the list is free and easy.

It has also been suggested that all the parrot developers join that list to help answer questions. Hopefully it will be a great place for new Parrot users to go, get help, meet other developers, and get started using our software!

Saturday, November 14, 2009

Matrixy Passing (Almost) All Tests

I went on a bit of a marathon today, and I'm pleased to announce that the pla_integration branch of Matrixy is passing almost all tests. All the tests that it is failing are relying on Complex number handling, which Parrot-Linear-Algebra doesn't support yet.

This is pretty big vindication that what I have been trying to do with Parrot-Linear-Algebra is going well: Despite the project being relatively young, the various matrix types provided by that project are turning out to be very stable and robust. The NumMatrix2D type is the most used and the most feature-full right now, but relative newcomer CharMatrix2D is proving to be very powerful and useful as well.

PMCMatrix2D is mostly unused for now. However, I do intend to use that in the near future to implement Cell Arrays in Matrixy. This is a feature that we had no good way of implementing in the old Matrixy, but in the new system it looks like a very natural extension of the other things I've been doing.

With Cell Arrays, it should be possible, if not easy, to properly implement variadic input and output function arguments. This is a huge issue that's preventing Matrixy's library of builtin functions from being accurate to the behavior provided by Octave or MATLAB. It's also preventing us from writing more functions directly in M, instead of in PIR. Even if we have to descend into inline PIR for some things, it would be great if we could write more code in M.

Starting tomorrow I'm going to try and add some preliminary Complex number support to Parrot-Linear-Algebra, and try to get the remaining Matrixy tests passing with it. With that we'll be back to where we were before the projects were split, and new development can begin in earnest.

Friday, November 13, 2009

The Path to Matrixy

I've spent a little bit of time in the last week working on the pla_integration branch for Matrixy. The goal of that branch is to update Matrixy to use Parrot-Linear-Algebra as an external dependency, and to use its matrix types natively instead of the ad hoc home-brewed aggregate types I had been using previously. In short, I'm doing things the way they should have been done in the first place.

As of Sunday evening, the branch could build, run, and execute the test suite. There are several tests that are failing still (many of which are to be expected), but a good number that are successfully passing too, which is good.

String handling is one area where I was expecting some serious regressions, and was not surprised in that regard. M uses a very idiosyncratic handling as I've mentioned before, and all tests that are relying on complicated string behavior are failing miserably. In response to this, last night I created a new matrix type in the Parrot-Linear-Algebra project: a 2D character matrix type that will be used to implement Matrixy's string handling. My task now is to improve this new matrix type (including adding a test suite for it) and integrating it into Matrixy. I started some of that last night but haven't pushed any of my work to the public repo yet.

Another thing I added last night was the "matrix" role. Any of the matrix types in Parrot-Linear-Algebra will now respond positively to a "does 'matrix'" query. An object that fulfills the matrix role will have the following properties:
  1. Contains a number of elements arranged in a rectangular NxM grid
  2. Elements can be accessed linearly using a single integer index (behavior varies by type, but it is always possible)
  3. Elements can be accessed by X,Y 2-ary coordinates to retrieve a value
  4. Matrices can grow automatically to accommodate newly inserted values
  5. Matrices should stringify in such a way where each row of the matrix will be on it's own line (rows separated by newlines). The formatting of each individual row is dependent on the type of the matrix.
With a nice standard interface like this, these matrix types should be consistently usable from HLLs, and I know Matrixy is going to be making aggressive use of these features to implement even it's most basic behaviors. I may try to add new requirements to this list, specifically there are some methods that I would like every matrix type to have (is_scalar(), is_empty(), etc.), and maybe a few other behaviors that should become standard between all our types. I'm starting to think that a templating system will become a necessity to prevent us from needing to rewrite similar algorithms for what could become dozens of matrix and vector types. The improved grammar support in NQP-RX may be the catalyst that makes these changes possible. It's another task for another day.

Speaking of tasks, on the near-term TODO list I plan to add specialized vector types to Parrot-Linear-Algebra, add tests for the new matrix and vector types, beef up the CharMatrix2D type to handle the string operations that Matrixy needs, and continue fixing the pla_integration branch for Matrixy to pass more of it's tests. On top of all that, there's a testing hackathon this weekend that I want to participate in, some work for Wittie that I need to finish, and possibly having a baby. Could turn out to be the very busy next few days for me!