Whiteknight's World: 2009

Tuesday, December 29, 2009

Great Summaries

It's been quite a busy week for me, and my activity level shows no signs of decreasing again until the new year. So, no substantial blogging from me for a little while.

In my absence, I strongly suggest everybody check out Lithos' blog. He's been posting excellent weekly summaries of Parrot and Perl6 activity. It's been a great resource for me when I've fallen out of the loop.

Saturday, December 19, 2009

Rethinking Matixy Assignment

Variadic output arguments have been partially implemented in Matrixy in the "varargout" branch, but I'm starting to hit a wall with the current implementation. At the very least, it's not a clean or well-designed implementation of it. The parser is using all sorts of weird global variables and there are too many special cases to deal with.

A big portion of the issue is the problem with ambiguity between a matrix index and a function call. This statement can be either a call to function foo, or an index into matrix foo:


a = foo(b, c);

And, functions (but not matrices) can return multiple values:


[a, b(3), c(4, 5)] = foo(x, y);

So when we parse the first set of values, we don't know until we've parsed the equals-sign if it's this:


[a, b(3), c(4, 5)];

which is shorthand for this:


ans = [a, b(3), c(4, 5)];

But, I've harped on these problems all before, and I won't dig into it in depth here. What I think we need to do in Matrixy to bring some sanity back to the parser, is to create a matrix reference object that can be used to retrieve or modify values in a matrix object comprised of a combination of constant and variable references. So this line:


x = [a, b(3), c(4, 5)];

becomes this PIR code sequence:


$P0 = !index_ref(a)
$P1 = !index_ref(b, 3)
$P2 = !index_ref(c, 4, 5)
$P3 = !build_literal_matrix($P0, $P1, $P2)
$P4 = !index_ref(x)
$P3.extract_as_rvalue($P4)

And this:


[a, b(3), c(4, 5)] = foo()

Becomes this:


$P0 = !index_ref(a)
$P1 = !index_ref(b, 3)
$P2 = !index_ref(c, 4, 5)
$P3 = !matrix_accessor($P0, $P1, $P2)
$P4 = !index_ref(foo)
$P3.assign_as_lvalue($P4)

Of course these are just first drafts of the call sequences and the method names, but it does illustrate the general idea. This is going to add a pretty significant number of function calls and created PMCs to the normal control flow too, but I can worry about optimizing things once they all work.

When we attempt to pull a value from a reference object, we'll determine whether it's a matrix or a function call and handle it accordingly. If we're pushing a value to it as in an assignment, we always just assume it's a matrix. M, to the best of my knowledge, doesn't support L-value function calls.

Soon I'm going to merge the varargout branch where I have been working on variadic output arguments. Even though I know this isn't the approach I will be taking in the long term this branch does include a number of important regression tests and other refactors and enhancements that I don't want to lose. I might not end up getting to this issue until after Parrot's 2.0 release, however.

Wednesday, December 16, 2009

Parrot 1.9.0

Parrot 1.9.0 "Blue-Fronted Amazon" was released yesterday, courtesy of Gerd Pokorra. As always, Parrot showed up on time and under budget, a rarity in the software world. This is the last development release before the big 2.0 milestone on 19 January 2010. chromatic will be the release manager for 2.0, followed by darbelo (2.1, 16 Feb) and then cotto (2.2, 16 March). Probably around January I will put out a call for release managers in April and May too.

1.9.0 was a relatively conservative release in terms of entries in the changelog. Comparing the release announcement for the past month doesn't show the same number of high-profile projects that previous months had. Part of the reason for this is the ongoing focus on testing and optimization. These things are great for the project but tend not to pad the changelog too much. I also think that many of our core developers are starting to focus energy on other ecosystem-related projects: compilers, libraries, and utilities. After all, Parrot is an interesting project by itself but all the various projects that it enables are even more so, which is a major draw for our developers. Success begets success, so the rapid proliferation of these side projects are just as important to the overall success of Parrot as a whole.

I expect 2.0 to be similarly conservative with much of the development team focusing on fixing bugs, improving documentation, and expanding test coverage. There are some big changes brewing that could land before 2.0, however, so it might turn out to be a big release indeed.

Monday, December 14, 2009

Parrot Developer Meeting Yesterday

Yesterday was the big online Parrot Developer Meeting, which I mentioned briefly last week. The idea was to have a meeting similar to the large Parrot Developer Summit from 2008 to reevaluate our long-term development roadmap and make sure we were on the right path for 2.0 and beyond. I had been one of the more vocal people saying that our roadmap up to that point was incomplete, outdated, and not reflective of our recent development priorities, so I'm particularly happy that this meeting was held.

It was a very productive meeting too, although due to time constraints and internet connectivity issues I wasn't able to participate as actively as I wanted. The meeting started with a 30-minute project retrospective lead by chromatic, a statement of our short-term project goals, some discussion of changes to the support policy, and then an item-by-item review of some of our existing and new roadmap items.

Short-Term Goals

Rakudo* ("Rakudo Star") is due shortly after Parrot's 2.3 release, and is going to be a major milestone for both projects. James Keenan suggested, to much agreement, that Parrot should be focusing almost single-mindedly on the needs of Rakudo* between now and then to help make that release as successful as possible. The desired return on investiment is that the success of Rakudo* will help to spur on increased interest from new and existing HLL developers, and demonstrate the Parrot is a viable platform to host these compiler projects.

Among the major needs of Rakudo* are improved Parrot performance. Specifically, optimizations for PCC and an overhaul of the GC were both mentioned as paths towards this goal.

Support Policy

Several days ago I sent an email to the list proposing that we rewrite our support policy to alleviate some of the problems people have been having with our long deprecation cycles. While these long deprecation cycles are good for stability and developer piece-of-mind, it has been a strong negative influence on development speed. Top this off with the fact that many of our systems are still immature and in need of major overhaul, and we run into some serious problems.

My suggestion, while a catalyst for discussion, was not accepted. What we have decided to do in it's stead is to shorten our deprecation cycle by adding more supported releases. Parrot will now have a supported release every 3 months instead of every 6. So next year we will support releases 2.0, 2.3, 2.6 and 2.9. Hopefully this improved turnaround time will alleviate some of our issues and increase the speed of development.

Roadmap

We did a complete item-by-item review of our roadmap items, and assigned specific releases when we would like to have each feature in place. Here is a quick listing of some of the major items on this list:

Improved GC (2.3)
Fixed L-Value semantics (2.6)
Overhaul NCI (2.6)
Asynchronous IO (2.9)
Concurrency (3.0)
Strings overhaul, including immutable strings (3.0)
Lorito (3.0)
PIRC (3.0)
JIT (3.3)

I personally suspect that some of these numbers will be rearranged in practice (AIO and Concurrency are going to go closely together, I predict Strings will become a pain-point sooner than 3.0, PIRC is going to be affected by Lorito in unforseen ways, etc), but overall it's a decent and well thought-out list, and I won't nitpick it. We would be lucky if we could get even half of these things done before 3.3, but I hold out hope that we could do better than that. I am especially hopeful considering the way our development team is steadily growing.

So that's my quick recap of the developers meeting. I'm planning to read back over the transcripts, and I'll write more posts about any topics that I find to be of particular interest.

Thursday, December 10, 2009

Parrot Developer Meeting

I should have blogged about this on Tuesday when it was decided. On Sunday, 13 December 2009, we are going to have a virtual Parrot design meeting.

3:30 EST

13 December 2009

irc://irc.parrot.org/#parrot

We are going to talk about the long-term development roadmap and hopefully explicitly lay out our priorities for the forseeable future. I personally would like to see us map out from now through 3.0, but I would be happy if we could just map through 2.6.

Matrixy Progress

I've been doing a lot of work on Matrixy lately. I find that recently I've been able to do a lot on that project in small bits, which is great when I want to touch it during a lunch break or between baby maintenance. I've certainly been able to work more on Matrixy this week then I have been able to write blog posts, for instance. In recent days I've:

Done major cleanup and expansion of the test suite
Added a bunch of builtins, including some new parrot-primitive functions that will allow me to write more functions in M and to possibly migrate some PIR-based builtins to M.
Refactored and cleaned up dispatch
Added Cell Array support
Added proper nargin/varargin support
Created a new branch to begin adding proper nargout/varargout support

It's the last item on the list that's been giving me a bunch of trouble recently, and the inherent difficulty of the task is probably the reason why I haven't gotten it working prior to now. The work I have been doing so far is mostly hackery, trying to add nargout and varargout without having to rewrite the entire grammar and dispatching mechanism . Of course, in the long run I am going to have to rewrite these things, but I'm just not ready to do that yet. I would rather have a proof of concept and some passing tests than nothing.

The problem with dispatch, or anything in M, is that everything is ambiguous until runtime. The same syntax, "X(1)" could be used to refer to the first element of the matrix X, the first element of the cell array X, or a call to the function X with an argument "1". This is further complicated by the fact that variables overshadow functions of the same name, but do not overwrite them completely. If we have a variable and a function named X, we can still access the function version using the feval builtin. In M we can also call functions without using parenthesis at all, so it isn't only the case that postfix parenthesis create ambiguity, almost every single identifier lookup requires a runtime check.

I've talked about all these syntax issues before, and won't dwell on them now. There are also semantic issues that need attention. Let's look at the case of nargout for instance.


function x, y, z = getcoords()
   ...
endfunction

[x, y, z, w] = getcoords()

In this code snippet above, the "getcoords" function is called with 4 output arguments, but the definition of that function only provides for three. If "getcoords" doesn't explicitly check the number of outputs expected and throw an error, this assignment will proceed without a problem. x, y, and z will get the expected values in the caller context, and the w variable will simply be left undefined.

So what we have is really a fundamental disconnect between caller and callee. The callee can see how it was called by checking nargin and nargout variables, and can choose to error if those numbers do not match what it wants. A function can return a different number of values then the caller expects, too. So if I just did a call to:


getcoords();
disp(ans);

nargout here would be 0, but the function could still return 3 values which would be stored in the global default variable "ans". Yesterday I started a refactor to make this possible, by trying to break assignments up into two parts: The callee returning an arbitrary array of values and the caller having to explicitly unpack those values. It's gotten me through a number of important test cases, although it is obviously not a great or pretty solution.


[a, b, c];

is an R-value, and the generated matrix is stored in the default variable "ans". However,


[a, b, c] = foo()

is obviously an L-value, and I need to be doing some bookkeeping to keep track of the number of arguments so I can populate nargout in the call to foo (if foo is a function call, of course). So I create a global variable to store the L-values in the assignment so when I generate the actual assignment call I have access to that number. One problem I ran into yesterday though is that when a rule fails and we have to backtrack, we end up with these global variables in an unconsistent state. So the call:


foo();

doesn't have any L-values, and when I parse the function call I can't expect the global variable to exist. Likewise, when I parse:


[a, b, c];

I need to keep count of the L-values, even though this isn't an assignment. So yesterday I ran into the problem:


[a, b, c];
foo();     # Thinks nargout = 3

Fun, eh?

I'm not even entirely certain how I'm going to do all this right. Do I create a custom CallSignature subclass, and handle argument passing myself? This has the nice benefit that I can almost always treat "x(1)" as a function call, whether it's an actual function or an internal indexing function. The more I can abstract away the differences, the better. The "almost" in the previous sentence of course refers to "x(1) =" L-values, which would need to be indexed a little differently from a normal function call. And since I need to be manipulating indices before passing them to the PMC, I need to be calling a function to handle indexed assignments anyway.

It's all going to be a little tricky to get past this roadblock and to do it in a way that I find acceptable. However, Matrixy has good momentum right now and has a lot of great features already, so I'm hoping I don't get mired down for too long.

Friday, December 4, 2009

Pure-Parrot Testing

I had mentioned the nqpTAP project that dukeleto started a while back to provide a pure-Parrot testing harness. A few days ago he completely rewrote the project and re-released it as Tapir (a clever portmanteu of "TAP" and "PIR"). The test harness is written in PIR instead of NQP now, is more robust, and is self-testable. Dukeleto also apparently has plans to make it more configurable and maybe even pluggable, things which I am very excited about.

I would like to migrate Parrot-Linear-Algebra and Matrixy to use the new harness, and I'm sure other projects would like that as well. These two might make cool test cases. I'll post details when I have a good procedure for doing that.

This got me thinking more about a project I've been incubating in the back of my head for a while: I've been wanting to have a mock object testing framework for Parrot, and I think it would be reasonably easy to make one. So I'm going to draft out some of my ideas here.

First thing we want is the actual mock object type. Call it "MockObject" for short. This object type, once initialized with a number of options and restrictions, should act exactly like a normal object. It should provides methods, respond to VTABLE calls, and do all sorts of things that a normal object of the given type would do. The difference, of course, is that MockObject is just pretending.

Pretend I have a large system that needs to be tested. I pass request objects to it, and the system in turn accesses properties and methods in that request. I want to test and verify that my system is calling the correct methods with the correct arguments, and is accessing values in the proper order. To do this, I need to create a MockObject, and pass that object to my system to verify that things are happening correctly.

So a MockObject needs to respond to certain behaviors:

Must be able to respond to method calls (including handling calls to methods which should not exist), being able to expect and verify parameter lists.
Must be able to respond to vtable calls, being able to expect parameter values.
Must be able to log method calls, vtable calls, and property accesses to verify order

A MockObject needs to act exactly like the kind of object it is impersonating, so it really can't have a public interface of it's own. Any interface that MockObject implements for itself is going to be unusable for testing, and even if we keep that interface small there is always going to be that overlap where we can't test. To avoid this, we need to be able to configure MockObject without having any visible interface. Sounds rough, eh?

So here's what I'm thinking. First, we have a MockManager class that contains the configuration methods for MockObject. To configure a MockObject, we don't call methods directly on it, we instead pass it to methods on the MockManager class. This saves us from overlapping interfaces in PIR-land. Second, we need to provide two interfaces: the "normal" PIR interface that perfectly imitates the target type, and the internal interface that MockManager uses for configuration. At the C level, we can have two VTABLE structures, which we swap out manually when doing configuration.

So without any further adeu, I would like to show some PIR code that demonstrates how I think MockObject could be used in PIR code as part of a normal test:


.local pmc mock, manager
.local int result
manager = new ['MockManager']
mock = manager.'new_mock'('ResizablePMCArray')
manager.'expect'(mock, 'sort')
mock.'sort'()
result = manager.'success'(mock)
ok(result, "the sort method was called!")

So we create a MockObject that is supposed to act like a ResizablePMCArray. We setup the expectation that the sort method is going to be called with no arguments. After we've called that method, we check to see that all our expectations were met. This test above should pass.

There are obviously a lot of issues raised by this potential implementation and a lot of questions that still need to be addressed before we can use any of this. However, I do think this would be a great project and very usable for a number of projects. I would definitely love to hear what other people think about it as well.

Thursday, December 3, 2009

Computer Status and Virtualization

I mentioned a little while back that I was having computer problems. This is saying it mildly: After upgrading to Ubuntu 9.10 from 9.04, my system became completely unstable and mostly unusable. I spend several evenings rebooting my computer over and over because it was frozen up. I filed a bug with Ubuntu, received no fixes (though I did eventually get confirmation from another user that he was seeing a similar issue), and finally rolled back to 9.04.

When I say "rolled back", I don't mean that I had previously taken a faithful system backup image and was able to quickly and easily jump back to that without losing any data. No. I pulled out my external harddrive and began the arduous task of trying to scavenge all the data I should have been backing up regularly. It took me an entire evening because several times the computer froze in mid file transfer and I had to reboot, delete the partial file fragments and start over. I'm still not 100% certain that all the files I backed up are sane and faithful.

As much as I would like to give a stupid grin and say "Golly, I've sure learned my lesson!" I'll probably run into this same mess next time I try to upgrade my OS too. Even a monkey can be taught.

Recently I've been playing with VirtualBox on my work computer. Basically, I was trying to get access to a Linux environment from my work computer without having to deal with Cygwin. Say what you want to about the merits of Cygwin, I've simply never liked using it. Besides just wanting access to a real Linux environment, I also wanted to play around with new OSes. I really liked Ubuntu by 9.04 but my experience with 9.10 had me pretty sour for a little while, and I wanted to entertain a few more options first.

So I set up VirtualBox on my work computer, installed a few new Guest OSes, and things have worked like a charm. I tried Fedora 12, which was pretty cool. I hadn't used Fedora since version 7, and it's come a long way since then. I also tried OpenSolaris; it was nice, but it was hard to differentiate it from a Linux distro on my virtual platform and I was also having some strange stability problems with Xorg. I also tried FreeBSD and OpenBSD, but was inable to get either of them installed and running. If anybody knows the trick to getting a BSD variant installed on my VirtualBox, I would love to hear about it.

Once I got my personal computer back online, I decided to install VirtualBox here as well. I could test out a bunch of other systems, and maybe even get Parrot building and testing in those places too.

At least, that was the theory. I haven't really been able to get anything working on here as easily as I was able to get them running on my work computer. I even tried to get a virtual Ubuntu 9.10 installation set up so I could start doing some testing on it before I became brave enough to make the upgrade again, but wasn't able to get that working either. One of the big issues I'm running into is that my system doesn't support hardware level virtualization support, which is necessary for virtualizing 64-bit systems. This is a total bummer and I've become very unhappy with my computer since learning about that drawback. It's only been about a year though, so it's hard to justify buying a new replacement computer. Maybe I can look into it for next christmas.

I'm going to start compiling the things I've learned about VirtualBox and maybe write a blog post or two about using it to setup virtual test environments for Parrot. Could be very helpful for expanding our coverage on less-popular systems. I've already managed to post a smoke report for a platform that I cannot find any record of Parrot having a report for (OpenSolaris on i386), so that's a nice little bonus. There were a few failures in that report too, so maybe I can learn enough about the system to get those fixed. And everybody wins.

Tuesday, December 1, 2009

GC Gets Kick Start

Parrot's garbage collector is starting to really get a lions share of developer attention recently, especially after some very interesting benchmark statistics from chromatic went public. For the benchmark of building the new NQP-RX project, a whopping 80% of execution time is spend in the GC mark phase. Actually, the real statistic is that 80% of the execution time was spend only in the Capture PMC's mark routine (and the functions that it calls). That's quite a huge amount of time, even for a naive GC like ours to spend.

Let's take a quick recap about GCs, and what causes them to be so expensive: GC is used to find and automatically reclaim unused data items so that the storage space can be reused. A good GC system means that the system programmer does not need to manually free memory when it is done being used: The GC will detect and automatically deallocate the unused memory. A good GC system, in essence, will completely eliminate memory leaks, and help to make the codebase much more clean and succinct. To do this, GC needs to first find all unused ("dead") objects and then free them, two phases known as mark and sweep.

In a naive implementation of a mark and sweep algorithm, there are two aptly named phases: mark and sweep. The mark phase is charged with finding dead objects to reclaim. We typically do this in a reverse order, by first finding all objects that are in current use ("alive"), and then declaring all other objects to be reclaimable (or already free). Starting from a root set, such as the register sets and interpreter globals in Parrot, we can construct a graph of all objects by following pointers and marking each reached object as alive. It stands to reason that if there is no pointer to a particular object, it cannot be accessed, and anything that we do not access during the mark is presumed unreachable, which is the same as dead.

In the sweep phase, we must iterate over the pool of all objects, finding objects marked dead and freeing them. Freeing an object typically involves calling a custom destructor if one is provided, and making the memory available to the allocator so the memory can be reused the next time an allocation is made.

In every mark and sweep GC collection run for a naive collector, we must first trace the entire memory graph and then iterate over the entire object pool. This is very expensive, and the expense grows large as the memory use of the program grows large. What we need for Parrot is something a little bit less naive.

What we probably can not do is make huge conceptual improvements to the idea of mark and sweep: We will always need to detect alive objects, and we will always need to traverse and free the dead ones. The general idea is sound and that's not something we want to change. What we can do, however, is to impose heuristics on the system to decrease the number of items to mark and decrease the number of objects to sweep. This is where the bulk of GC performance improvements can be made, by being much smarter about how the GC is used.

Allison sent a nice email to the list the other day essentially saying that GC has become an officially-recognized pain point and that we as a team are going to be looking at improvements after 2.0 (if we don't manage to start before that). Very interesting discussion has already started on ways to improve it.

As I mentioned above, the bulk of GC performance improvements are made by applying heuristics to decrease the number of objects to mark and sweep. A secondary set of improvements can then be made, often at the code level, to make the GC's operations run faster. I'll call the first set of improvements "algorithmic", and the second set "implementation". So what we the Parrot developers need to do first is pick the right algorithms to use and then implement and optimize them.

Here is a general list of things we can do to improve GC performance in Parrot:

Allocate fewer GCable objects. This is typically the result of user-level code optimization. So, Parrot needs optimizers that are GC-sympathetic. Parrot also allocates a number of STRINGs and PMCs for internal purposes, so we need to minimize that.
Mark fewer objects. This comes from a good generational GC system where we segregate items based on how stable they are.
Sweep fewer objects. I think chromatic's linked-list idea will help us significantly in this regard.

What I think we are leaning towards in Parrot is a system called a generational GC. A generational system uses the heuristic that items which have lived for a long time without being GC collected will tend to stay alive longer, and items which are recently allocated tend to die quickly. It's an acknowledgement that a lot of garbage is created for very short-term uses, and relatively few things stand the test of time. Here's a quick example using explicitly non-idiomatic Perl 5:

my @array = fill_array(100); # 100 items in the array
foreach my $item (@array) {
my $new_item = mangle($item);
say $new_item;
}

In this loop we create a lot of garbage. Every new instance of $new_item is a new collectible item which can be declared dead at the bottom of the loop and allocated anew at the top. Also, all the local variables used inside the mangle function follow the same life cycle. The only items that survive through the entire snippet are @array and it's contents.

Every time we mark, we have to mark @array and all it's contents, even though they are long-lived and will survive the entire loop. Every time we sweep we need to separate dead items $new_item and all the local variables created inside mangle() from @array and its persistently live set.

Generational GC works by saying that @array is long-lived and putting it into an older generation. Older generations contain objects which are, by definition, older and therefore less likely to die. If we aren't worried about the item dieing, then we don't need to explicitly mark it. At least, we don't need to mark it as often. We also don't need to sweep it, if we can find a good fast way to avoid that.

Sunday, November 22, 2009

Running out of Free Time

I'm slowly running out of free time recently. Ongoing computer problems have diverted an unfortunate amount of my effort recently, and I haven't been able to develop as much as I would like. It's very hard to get into the swing of a hardcore coding session when your computer freezes or spontaneously reboots every 30 minutes on average. I saw "on average" because sometimes it's much more frequent than that. I had to reboot my computer twice while writing this first paragraph. Thank goodness for Blogger autosaving my drafts.

On top of that, we're having a kid. An entire new kid. The due date was on Friday, so now we're officially in over-time and very much looking forward to having an "outside" baby to play with. We both fully expect to be spending some serious time in the hospital this weekend or next week, with sooner being better than later. I've never had any of my own children before, so I'm not entirely certain how it's going to change things. But I am certain that they will change, and probably not in a way that gives me more time to hack on Parrot.

Despite my lack of time, this morning I started work on a brand new secret project. I'm not going to give any details about it quite yet (I want to get some more aspects of the design worked out before I go showing off my work), but suffice it to say that if things work out the way I hope they will this project could provide a significant benefit to Parrot and its ecosystem. I'm intending this project to be a Christmas present to the Parrot project, so I hope I can get it done by then. More details to come.

I'm also trying to do some work to get :call_sig working properly. Matrixy is definitely going to need improvements there, and I'm sure other projects will want it too. If I can keep the changes small I will just commit it directly. If it starts getting too large, I'll make a branch instead.

Matrixy and Parrot-Linear-Algebra are going to take a back seat for now while I focus on other issues and my new secret project. I've got a few cleanups I want to make and some tests to add of course, but no huge development for a little while.

Thursday, November 19, 2009

Parrot Project Proliferation Part 3

This is part 3 of my series on the cool new Parrot projects that are popping up around the interwebs. Today I'm going to introduce Markdown, NQP-RX, and nqpTAP.

Markdown

Markdown is a text markup syntax that's designed to be easy to read and edit. In some ways, it's like wikitext, except Markdown has been driven by a consistent design philosophy while wikitext has grown in a platform-dependent ad hoc way.

Parrot has it's own markdown engine now, courtesy of fperrad. It converts markdown input into properly-formatted valid HTML output. And the best part is that it runs on pure Parrot. So now, with all your cool websites you're making with mod_parrot, you can use this markdown engine to format text.

NQP-RX

It's not exactly a small project, but NQP-RX really deserves some attention. It's a rewrite of NQP and PGE from the Parrot repo, but properly integrating the grammars into the NQP language and enabling a lot of cool new features that "classic" NQP doesn't have. On top of that, NQP-RX properly bootstraps: It knows enough syntax in order to parse itself (after it's already been built from PIR source, of course). That's no small feat for a program written in the Parrot equivalent of assembly language.

The old NQP is still hanging around in the Parrot repo like it always has, and projects that were relying on NQP will still be able to work with it. However, the new NQP-RX is developed on github and snapshots of it are kept in the extensions directory in the Parrot repo too.

nqpTAP

Every project in the Parrot ecosystem that I have seen makes extensive use of unit testing. Some projects are test-driven, though the majority seem to use post-facto tests for verification and to prevent recursion. Whatever the purpose, tests are everywhere and the TAP standard is used by almost all of them.

the nqpTAP project, started by dukeleto and based on his work in Plumage, is a pure-Parrot TAP harness that executes tests and summarizes results without depending on anything besides Parrot itself. Keeping dependencies low is always a good thing, and nqpTAP helps to reduce the barrier to new projects looking to create a proper test suite. Even better, nqpTAP targets the new NQP-RX, so it will be stable and working long into the future.

These three projects are very interesting, and I think it's worthwhile to give them at least a first look.

Wednesday, November 18, 2009

Trac Ticket Hunt

Yesterday, after a herculean effort, the Parrot devs closed out the remaining old tickets from the RT system. Many of the tickets were vague and uncloseable, or unreproducible. Many of them were translated into Trac tickets for futher monitoring.

Of course, we still have a lot of open tickets in Trac. Over 600 of them, as Coke pointed out in an email this morning. That's quite a huge number of open issues, and really too many for our current development team to deal with in a reasonable amount of time. We need help dealing with this huge backlog, and this is a great opportunity for new interested people to get involved in Parrot.

Preparing for 2.0

The 1.8 release went out the door on Tuesday, and I think it went much more smoothly than 1.7 did last month. Not completely without hiccups, but better. Now we're in the home stretch for the big 2.0 release in January where the mantra is "production ready".

What does it mean to be production ready? First and foremost I think of stability and reliability. Nobody is going to invest time and effort in software that isn't stable. Next, I think about performance. Computer hardware isn't cheap, and we can't be shipping a piece of software that hogs processor cycles and costs companies more money to support.

With these goals in mind (and I would love to hear what other people think "Production Ready" means), I think there are two big things we need to focus on: Testing and Profiling.

Testing

Test reports are good, and we're starting to get a very large volume of test reports flowing in, including test reports on new or exotic system. Bravo to anybody who has set up an automatic build bot in the past few months. It is sincerely appreciated.

Test reports are a good and necessary first step, but are by no means the end. Tests are good when they all pass, but that's boring (and unfortunately, it's not usually the case). What's really interesting and important is finding the failing tests and writing up Trac tickets for them so they can get fixed. So here are some things you can do to help:

Monitor the stream of incoming Smolder reports and look for failures
If the failure is happening on a platform that you can test on, try to verify it
See if you can isolate the code from the test that is failing. Bonus points: See if you can write your own small test that demonstrates the bug. The smaller the better.
Open a Trac ticket including information about your platform, the revision where the failures first start appearing (as close as you can tell), and any test cases you've produced that exercise it.
If you're able, submit patches to fix the issue, patches to improve testing of the issue, or patches for documentation to explain what the desired behavior is

All of these things would be very awesome, and are all great ways to get involved in Parrot without having to dive into the source code head first.

Profiling and Benchmarking

Let's not fool ourselves: Parrot is not speedy fast right now as far as VMs are concerned. We don't have JIT, we don't have a good GC, we don't even have PIC. We don't do enough caching, our startup time is still terrible, etc. There are lots of big optimizing features that we need to implement in the coming months and years if we truly want to be a viable and even formidable alternative to other VMs.

However, this all doesn't mean that the only performance benefits that we need come from these huge projects with weird acronyms. There are plenty of small implementation and algorithmic improvements that we can make throughout to start slimming this bird down, some of which will have serious effects on our performance. Parrot is so large though that we can't necessarily find all these little optimization opportunities easily. We need to narrow the search. This is where profiling and benchmarking come in. This is where we need you.

Parrot has a fancy new profiler tool that can be used to profile programs written in PIR or any of the HLLs that run on top of Parrot. It still needs lots of documentation, but it should be mostly easy to use for people willing to poke around in it. If you can find a good example program that demonstrates some real-word usage patterns, we would love for you to profile them and send us the reports. Knowing where the bottlenecks and slowdowns are will help us to target, refactor, and improve them, and that's a big help.

To start up the profiler, run Parrot with this incantation:


> parrot -Rprofiling

For more information about what to do with it, hop onto the IRC channel and ask around. I haven't used this much myself, but it would be cool to get started.

To prove that we are indeed making things faster, we need benchmarks. Good benchmarks are programs that perform lots of very repetitive work and target a particular Parrot subsystem. We want programs that really exercise Parrot, and can do it in a consistent way. Then, we can use timings on these benchmarks to show whether Parrot's performance is improving or getting worse over time. This is very important.

Ticket Triage

As Coke mentioned in his email, we can't sit back and congratulate ourselves now that RT is empty. We need to focus our attentions now on the growing backlog of tickets in Trac. Some of the issues documented there are very serious and will definitely prevent Parrot from being stable and "production ready" by 2.0.

As Coke outlined, we need people to go through old tickets and answer a few questions:

Can we reproduce this issue now with Parrot 1.8.0? Many tickets were filed weeks or even months ago, and may have disappeared in the course of normal development
Look at RFC tickets (requests for comments) and weigh in. Do the changes described make sense? Would they be beneficial? Many of these tickets are simply waiting for some kind of discussion before they get closed.
If the ticket involves an HLL, see if you can reproduce the issue using pure-PIR code instead of high-level code. Parrot operates on PIR natively, so Parrot developers are most easily going to be able to fix problems that can be demonstrated in PIR
If you see a ticket with a segfault, sigbus, sigfpu or other system error condition, see if you can provide a backtrace.
If a ticket contains a patch, see if the patch still applies cleanly to trunk. If so, see if the patch fixes the problem.
Add comments or set other information to make sure the ticket stays up-to-date and informative. Even if the information you add is small ("Still a problem!" or "fixed!"), that's still something. If nothing else, make sure the ticket is properly categorized by component, milestone, etc. You'll probably need to create a Trac account (free and easy!) in order to make modifications
Look for duplicates. If two tickets describe the same problem, one of them can go.
If the ticket can be legitimately closed (fixed, no longer a problem, a duplicate, etc) make sure that happens. Hop on IRC or the mailing list and harrass people until it gets closed. It may be a little bit annoying, but it will get results.

Conclusion

I haven't done a Parrot4Newbies post in a while, and I know some people have been looking for ways to get involved. With 2.0 on the horizon testing, profiling, and ticket triaging are all great and incredibly pertinent ways to get involved. And more importantly then just being involved, these are all great ways to help Parrot grow and get ready for the big milestone. So if you are interested in Parrot and have a few spare moments, take a look at some tickets and see what you can accomplish. I can guarantee that anything you get done will be much appreciated.

Tuesday, November 17, 2009

Parrot Users Mailing List

Received a message from Coke this morning: We have a new mailing list setup specifically for users of Parrot and applications running on top of Parrot. If you would like a place to chat about Parrot without getting sucked into the minutia of the developers mailing list, the parrot-users list might be the thing to look at. Subscription to the list is free and easy.

It has also been suggested that all the parrot developers join that list to help answer questions. Hopefully it will be a great place for new Parrot users to go, get help, meet other developers, and get started using our software!

Saturday, November 14, 2009

Matrixy Passing (Almost) All Tests

I went on a bit of a marathon today, and I'm pleased to announce that the pla_integration branch of Matrixy is passing almost all tests. All the tests that it is failing are relying on Complex number handling, which Parrot-Linear-Algebra doesn't support yet.

This is pretty big vindication that what I have been trying to do with Parrot-Linear-Algebra is going well: Despite the project being relatively young, the various matrix types provided by that project are turning out to be very stable and robust. The NumMatrix2D type is the most used and the most feature-full right now, but relative newcomer CharMatrix2D is proving to be very powerful and useful as well.

PMCMatrix2D is mostly unused for now. However, I do intend to use that in the near future to implement Cell Arrays in Matrixy. This is a feature that we had no good way of implementing in the old Matrixy, but in the new system it looks like a very natural extension of the other things I've been doing.

With Cell Arrays, it should be possible, if not easy, to properly implement variadic input and output function arguments. This is a huge issue that's preventing Matrixy's library of builtin functions from being accurate to the behavior provided by Octave or MATLAB. It's also preventing us from writing more functions directly in M, instead of in PIR. Even if we have to descend into inline PIR for some things, it would be great if we could write more code in M.

Starting tomorrow I'm going to try and add some preliminary Complex number support to Parrot-Linear-Algebra, and try to get the remaining Matrixy tests passing with it. With that we'll be back to where we were before the projects were split, and new development can begin in earnest.

Friday, November 13, 2009

The Path to Matrixy

I've spent a little bit of time in the last week working on the pla_integration branch for Matrixy. The goal of that branch is to update Matrixy to use Parrot-Linear-Algebra as an external dependency, and to use its matrix types natively instead of the ad hoc home-brewed aggregate types I had been using previously. In short, I'm doing things the way they should have been done in the first place.

As of Sunday evening, the branch could build, run, and execute the test suite. There are several tests that are failing still (many of which are to be expected), but a good number that are successfully passing too, which is good.

String handling is one area where I was expecting some serious regressions, and was not surprised in that regard. M uses a very idiosyncratic handling as I've mentioned before, and all tests that are relying on complicated string behavior are failing miserably. In response to this, last night I created a new matrix type in the Parrot-Linear-Algebra project: a 2D character matrix type that will be used to implement Matrixy's string handling. My task now is to improve this new matrix type (including adding a test suite for it) and integrating it into Matrixy. I started some of that last night but haven't pushed any of my work to the public repo yet.

Another thing I added last night was the "matrix" role. Any of the matrix types in Parrot-Linear-Algebra will now respond positively to a "does 'matrix'" query. An object that fulfills the matrix role will have the following properties:

Contains a number of elements arranged in a rectangular NxM grid
Elements can be accessed linearly using a single integer index (behavior varies by type, but it is always possible)
Elements can be accessed by X,Y 2-ary coordinates to retrieve a value
Matrices can grow automatically to accommodate newly inserted values
Matrices should stringify in such a way where each row of the matrix will be on it's own line (rows separated by newlines). The formatting of each individual row is dependent on the type of the matrix.

With a nice standard interface like this, these matrix types should be consistently usable from HLLs, and I know Matrixy is going to be making aggressive use of these features to implement even it's most basic behaviors. I may try to add new requirements to this list, specifically there are some methods that I would like every matrix type to have (is_scalar(), is_empty(), etc.), and maybe a few other behaviors that should become standard between all our types. I'm starting to think that a templating system will become a necessity to prevent us from needing to rewrite similar algorithms for what could become dozens of matrix and vector types. The improved grammar support in NQP-RX may be the catalyst that makes these changes possible. It's another task for another day.

Speaking of tasks, on the near-term TODO list I plan to add specialized vector types to Parrot-Linear-Algebra, add tests for the new matrix and vector types, beef up the CharMatrix2D type to handle the string operations that Matrixy needs, and continue fixing the pla_integration branch for Matrixy to pass more of it's tests. On top of all that, there's a testing hackathon this weekend that I want to participate in, some work for Wittie that I need to finish, and possibly having a baby. Could turn out to be the very busy next few days for me!

Thursday, November 12, 2009

String Handling in M

I've been doing a lot of work recently on the pla_integration branch for Matrixy. The goals of this branch (which is likely to just become the new master) are to integrate Matrixy with the Parrot-Linear-Algebra project, and use it's new matrix types instead of home-brewing our own types.

I'm already seeing some good results: I've got lots of important tests passing and performance seems to be nice (though startup time and PCT processing time overall are worse, but that's another issue for another day). However, the one area where I am still having a lot of problems is in fixing the handling of strings.

Strings in M are very idiosyncratic. This is especially true when we start mixing strings with matrices. One good thing I am finding is that the various idiosyncracies and even--dare I say--inconsistencies help give a certain insight into the way Octave does it's parsing. We should be able to take those insights and try to produce a sane and compliant parser for Matrixy. Best way to proceed is through a series of examples. As a reminder about M, lines not terminated by a semicolon print their results to the screen, and the % sign is the comment delimiter. I'll show the output of each line in comments below it:


x = ["ok";"w00t"]
% ok
% w00t

Here is a very simple case. We have a matrix with two rows, each row contains a string. When printed, each row of the matrix is printed on a newline. One thing to notice and remember for later is that the strings on these two lines are not the same lengths.


x = ["ok"; 65, 66]
% ok
% AB

M has a lot of close ties to Fortran, which is what the original version of Matlab was developed in. In later times I believe it was ported to C and then to Java, taking characteristics of each implementation language along the way. In any case, there are some very obvious influences from Fortan and C on the M language. One such influence is that strings are simply arrays of characters. When we are printing out a matrix that has some kind of internal "I am a matrix of strings" flag set, integer values are converted to their ASCII equivalents and treated as characters.


x = ["ok"; 65, 66, 67]
% "Error: number of columns must match (3 != 2)"

Here is a slightly strange result. In the first example I showed, we can have a matrix where the string literals on each row are different lengths. In the second example I showed that we can treat integers as characters in forming string literals inside a matrix. However here we see a surprising result: If we try to make a row of all integers and a row that is a string, they must have the same length.


x = ["A", 66; "C", 67]
% AB
% CD

Mixing integers and strings on a single row works.


x = ["A", 66; "C", 68, 69]
% "Error: number of columns must match (3 != 2)"

...but the rows must be the same lengths when we build them like this.


x = ["ABC"; "D"; "E", 70]
% "Error: number of columns must match (1 != 3)

And we can see from this error message that suddenly line 2 ("D") throws an error because it's not the same length as line 1 ("ABC"), even though this would have worked if we hadn't included line 3 ("E", 70). As a more complicated example, and to clarify how strings of uneven lengths are stored, see this example:


x = ["ABCDE"; "F"];
x(2, 5) = "G";
x
% ABCDE
% F   G

So we can see from here that strings aren't inserted into matrices with arbitrary lengths, they are padded out to be the same length with spaces. Finally:


foo = "Awesome";
x = [foo; 65]
% "Error: number of columns must mach (1 != 7)

So we can see that the checks for these matrix sizings are happening at runtime, not at parse time. (This small example could be explained away by aggressive constant propagation in the optimizer, but I will assure the reader that this holds true in "real" cases as well).

We can divine a few parser rules from all this:

If we have strings in the matrix, we flag the matrix as being a stringified one and print it as a string. This means converting any integers in the matrix to characters in the row-string.
If we have integer or number literals in the matrix, even if they can be converted to ASCII characters, the rows of the matrix must have the same lengths.
Judging from the third-to-last example, they appear to do these length checks and string checks on the matrix after parsing is completed (otherwise, why would it have errored on lines 1 and 2 not being equal length when it didn't see the integer until line 3?).
These checks happen at runtime.

What I think we need to do for parsing these literals is the following:

We parse each row separately, and pass them to a row object constructor at runtime.
If the row contains any strings, set the "has strings" flag. If the row contains any numbers, set the "has numbers" flag.
We pass all row objects to a matrix constructor
If all rows are strings, and no rows have numbers, pad the strings with spaces and insert them into a character matrix (like a normal matrix, but defaults to printing like an array of ASCII characters). Done.
Check row lengths. If all are not the same at this point, throw an error. Done.
If any rows contain strings, create a new character matrix object and populate it with all data, no padding. Done.
If rows only contain numbers, create a normal NumMatrix2D PMC, populate it and return that. Done.

As an aside, there's another example that is worth showing:


x = ["ABC"; 68.1, 69.2, 70.3]
% "ABC"
% "DEF"
x(2, 2)
% ans = E

We see here that floating-point numbers are converted to ASCII integers when inserted into the character matrix, and that rounding sticks: you can't get the original non-integral value back after the conversion. So all my ideas above with the character matrix type should work with this.

So that's what I think we're going to have to do if we want to faithfully reproduce the behavior of Octave. This system will make matrix literals in the code a bit of a performance drain, but that's what we're going to have to live with for now.

Wednesday, November 11, 2009

November Testing Hackathon

A quick announcement before I forget about it:

This weekend, November 14th and 15th, there will be a testing hackathon for Parrot. We want to focus our efforts on improving tests, especially opcode-related tests in t/op/*. Some tasks that we will try to work on are:

Converting tests from Perl to PIR
Improving coverage of tests for ops
Get lots of platform test reports in anticipation of the 1.8.0 release.

I would love to see lots of people on IRC this weekend to help with the festivities.

Parrot's libJIT framebuilder

I offered a bit of a challenge to my readers a while back, and I mentioned that I had received one (very good) submission from plobsing. It used libJIT to implement a frame builder for Parrot, to replace the old home-brew one that we had been using.

I also hear tell that he's planning a framebuilder using libffi too, although I'll have to talk more about that in a later post.

There were some bumps in the road, however. Regular readers will recognize that between the time he sent me the code and now the big PCC refactor landed, which changed the landscape for anything involving function calls quite substantially. But plobsing persevered, and with some configuration help from our own kid51, we ended up with a very nice working branch running the new framebuilder.

Assuming things test well, I would like to push to get the libjit_framebuilder branch merged into trunk soon. We need testing so here is something you, the reader, can do:

Test the Branch

Checkout the libjit_framebuilder branch and give it a whirl. You want to do this before you have libJIT installed on your system to get a baseline. Parrot should work normally without libJIT installed.

Get libJIT

Getting libJIT is probably the hardest part of the whole process. There don't seem to be any debian packages for downloading, and there definitely don't seem to be any Windows binary installers floating around. Google returns few results when I search for "libjit", mostly Wikipedia and plobsing's own work (not usually a good sign!).

You can download the source to the mainline project HERE, and can apparently download source from a fork of it HERE as well. If you have SVN, you can get the source of the fork from it's googlecode project. I'm not really able to find a repo link for the project mainline. Maybe somebody else can help point me in the right direction for that.

From the source you can type this to get it to build on Linux:


./configure
make
sudo make install

On my machine, I also had to run "sudo ldconfig" to update some kind of cache or something. I don't know the details, but if you're on Linux and it doesn't work, try that.

I have no idea how to get this working on Windows. You may be SOL.

Test the Branch Again

Now that you have libJIT reconfigure, rebuild, and retest Parrot. It should detect libJIT, use it for the framebuilder, and (I hope) you should see some kind of performance improvement. At the very least, it shouldn't be noticeably slower.

Send Reports

If things work on your platform, let us know. If things don't work, we definitely need to know that as well. If you have any information about how to get or build libJIT on various systems, we would love to start compiling some documentation about that too. More information is always better, and if Parrot is going to use libJIT going forward (even as an optional component for some performance improvements) we should all be aware of how to get and use it.

Tuesday, November 10, 2009

Planning for Lorito

I had an interesting conversation with plobsing this morning about Lorito. He has been wrapping up his work on the libjit-based framebuilder, and is looking to break ground on another project. Specifically, he wanted to start putting together a prototype for Lorito.

I had, along with chromatic and some other people, been thinking about Lorito as a very very low-level language, similar to an assembly language. The idea being that each Lorito op would share a close correspondence with an operation in the JIT engine, and would therefore be trivially easy to compile at runtime.

Plobsing was taking a different approach: He's thinking about a higher-level structured language that we would parse down into native JIT format at build time. There are some merits to this idea and since he mentioned it to me I've been thinking about it more and more. Let's take a look at things in more detail, focusing on LLVM specifically.

First, LLVM uses it's own platform-neutral LLVM bytecode format natively. High-level code is parsed and compiled down to LLVM bytecode which can then be optionally optimized before conversion to machine code. That first part ("parsed and compiled") is expensive: We don't want to be parsing and compiling to LLVM bytecode at runtime, that would eat up any potential gains we could get from JIT in the first place. What we want is for the Lorito code to be compiled only once: during the Parrot build. From there we will have LLVM .bc files which contain the raw bytecode definitions for each op, which can be combined together into large methods or traces and compiled to machine code in one go.

Ops are defined in .ops files, which are currently written in a preprocessed C-like language but which will eventually be rewritten in Lorito. During the build, we want two things to happen: First, we want to compile the Lorito down into machine code, as we currently do with our ops, for interpretted cores (fast core, switch core, etc). Second, we want to compile down the ops into LLVM bytecode for use with the JIT at runtime. By comparison:

Interpretted: Good startup time, Reasonable execution time
JIT'd: Slower startup time, better execution time. Many tradeoffs available between the two (usually in terms of adding/removing optimization stages)

For long running programs, costs necessary to startup the JIT engine can be amortized over the entire execution time, which produces benefits overall.

But, I'm digressing. The goal of Lorito was to produce a language that would be trivially easy to JIT at runtime. I was assuming that what we needed was a very small language to perform the job. However, what's really the most trivial to JIT is LLVM's native bytecode format. If we generate that bytecode at compile time, the runtime costs will stay to a minimum. This means that we can have Lorito be any language of any complexity: The only requirements are that we be able to compile it down to LLVM bytecode, hopefully in a process that isn't overly complex or fraught with error possibilities. So long as the conversion only happens once at build time, it doesn't really matter how complicated it is.

Any parser for anything that's more complicated than assembly language will take development effort. The tradeoff is reduced development effort when we start rewriting the ops in Lorito, and increased inclination to write more things in Lorito than just ops. For instance, rewriting PMCs and their VTABLEs in Lorito means that we can start exposing those to the JIT as well. The more of Parrot that we expose to the JIT, the more gains there are to be had from it.

Assuming Lorito is going to now be a relatively high-level structured language as plobsing suggests, the question now is what should it look like? Should it be C, or like C? Should it be NQP, or like NQP? NQNQP?

As a thought experiment, consider the case where Lorito is C. In that case, our ops and VTABLEs are already all written in Lorito, and we already have a Lorito compiler available (LLVM). In this case, there are only a handful of steps to do before Parrot has a working (if naive) JIT:

Add a configure probe to detect LLVM
During the build, compile down ops to LLVM Bytecode, in addition to everything else we already do with them
Add a utility function to compile a stream of bytecode (such as an entire Sub) to machine code using the LLVM bytecode.
Add a way (such as in the Sub invoke VTABLE) to cache methods that have already been compiled, and add a switching mechanism to invoke that instead of the bytecode

There are a few more necessary details, of course, but if we take this approach Parrot isn't too far away from a working, proof-of-concept JIT.

I'm sure this idea won't jive well with chromatic's "welcome to the new millenium" general distaste for C. I'm not 100% sure about it all myself. However, it is interesting and worthy of some consideration. It certainly does seem like a path of least resistance

MediaWiki Book Designer

The other day I pushed a new repository to github. It is my visual book designer tool for MediaWiki that I've been developing on and off for years now. I wrote a post a while ago on my now-defunct Wikibooks blog about it, and things have progressed significantly since then.

I started working on this graphical book interface years ago in response to complaints I had heard from new Wikibookians. Making new books was too hard, because it had to be done page-at-a-time and the book structure (TOC, chapters, pages, subpages, etc) needed to be created manually using a complex system of relative links. On top of that, Wikitext really loses what little charm it has when you start doing complex structural, organizational, or navigational things with it, and it was another hurdle that prospective new authors couldn't always seem to clear. In short, it was hard to setup a new wiki-based ebook at all, and much much harder to do it correctly. This is the problem that I wanted to address.

The book designer took a number of forms before I settled on a graphical, outline-based approach. I had been doing work on it and tinkering in my spare time for a while before I was approached by Professor Kidd at Old Dominion University. She was looking to integrate MediaWiki and wiki-based e-books into her classroom, but was looking for an easier way to build books. Actually she was looking for a large number of usability enhancements, but book building was a big piece of it.

One thing lead to another, her team picked up a nice grant to pursue the use of wikis in the classroom, and I signed on to help with some of the necessary development. The book designer grew from a curious little toy that I had been playing with privately into a full-blown extension. Development isn't 100%, but I decided that now was a good time to open it up to the public and host it on github. So, that's what I did.

This development project is also going to yield a few other extensions and enhancements which I will try to host publicly in due time. I'll post updates about all that when things happen.

The book designer project "works", but it isn't very pretty in a lot of places. It consists of two major components: the PHP-based backend and the Javascript-based interface. They communicate together through a quick-and-dirty (and very very ugly) interchange format. The user builds the book outline using the provided javascript tools, and clicks the button to save it. The javascript builds up a textual description of the outline and posts it to the server where the PHP backend parses that and creates the necessary pages with a large amount of helpful boilerplate text. When the dust settles, a complete skeleton book exists that only needs content to be added by interested authors.

I don't know how much more work I am going to do on this tool for the duration of this project. I also don't know how much I will be working on it thereafter, especially considering some of the other things I have going on in the coming weeks. But, if other people are interested in this tool and want to get involved, I will do everything I can to support that.

Monday, November 9, 2009

Parrot Project Proliferation Part 2

It's part two of my series on Parrot-related projects. There are a handful of these cool new projects that I know about, but I would love to hear some ideas from readers too. We can't give too much free publicity to cool Parrot projects!

Kakapo

NQP is a thin subset of the Perl6 syntax that's used as part of PCT to help build compilers. It's a very nice little language that tends to be relatively close to the underlying PIR code, and it's a real big help when building compilers. Part of the charm of NQP is that it doesn't include a runtime library. It's a bare-bones language, and provides only the few features necessary to build compilers.

Sometimes people using NQP are interested in a runtime library anyway. Sometimes, people need more then what the bare NQP compiler provides. To this end the Kakapo project, started by Austin Hastings, aims to provide a runtime of interesting functions that the NQP coder can use to perform common tasks. There are helpful objects, methods, and subroutines that can be used to interact with Parrot in a more natural way then having to descend into streams of inlined PIR.

Kakapo is a really interesting project, because it serves a few purposes:

Offers to turn NQP from just "a part of PCT" into a full-fledged and fully-capable programming language for Parrot development.
Provides some interesting and effort-reducing utilities that can be used by compiler designers to get off to a quicker start
Provides those utilities in a way that can be used from PIR programs and programs written in other languages on Parrot

When you get a chance, you should check Kakapo out and give it a spin. It may turn out to make your Parrot development work much easier.

Blizkost

Blizkost, started by Johnathan Worthington, is a project that I never thought I would ever see: Perl5 on Parrot.

Let that sink in for a minute. Perl5 is so idiosyncratic that there only is, and can really only be, one implementation. The official Perl5 specification is whatever the official Perl5 executable happens to parse and execute. The solution to work around this single-implementation problem is to simply embed the Perl5 interpreter into Parrot, instead of trying to develop a new parser from the ground up.

This is what Blizkost does: It implements a handful of new PMC types that act as wrapper for the Perl5 interpreter object and the various Perl5 object types. These PMC types allow Parrot programs to call into Perl5 code, and receive returned results back. Current functionality is limited, but there are a handful of interested (and talented!) contributors. If you know a thing or two about Perl5 internals, I'm sure they could use some help.

Winxed

NotFound started a very interesting little project: a javascript-like language compiler named Winxed. The twist is that instead of using PCT like most other projects do, he hand-wrote the lexer, parser, and code generators in C++.

Technically the word "compiler" covers a wide range of utilities, but most casual coders would probably refer to it as a "translator" instead. Winxed takes the javascript-like language input and outputs plain-text PIR code which can be compiled and executed by Parrot.

Wednesday, November 4, 2009

Libraries And Extensions

Parrot is just a virtual machine, an engine that's intended to run other programs and facilitate inter-operation between them. It's a facilitator for other programs and compilers and technologies; a linchpin for a whole new software environment. Getting compilers to run on it and communicate with each other is one goal: Getting them to all work seamlessly with a large assortment of native libraries is another.

We don't just want Parrot to be a cool program, we want it to enable a whole ecosystem of cool languages, programs, and libraries. I write a program in Perl6, use your framework that you've written in Cardinal, include it in a PHP webpage, and everybody gets to use system libraries written in C. To do that, we need people to create cool compilers, write cool programs, and write wrappers around cool native libraries. Seriously, everything cool.

One big part of this puzzle is Plumage, Parrot's module ecosystem manager. If you haven't already, check it out. Once registered with Plumage, any Parrot user should be able to download any compiler or any library that targets Parrot with only a few simple commands. It's like CPAN for Perl5, except has the potential to get bigger. Much bigger. With Plumage in place, we can really start building up the addons: PIR code libraries, native code library wrappers, compilers, applications, etc. The possibilities are endless, and provide a huge number of openings for new Parrot developers to get started on.

Library Wishlist

What native libraries do you use? What libraries do you wish Parrot had easy wrappers for? The first thing we need is a list of libraries that people want to use, and possible some use-cases for them (if they are obscure or rare enough that our developers won't be familiar with them). Post suggestions here in the comments of this blog, or on the Wiki, or in Trac, or wherever.

Darbelo put together the DecNum library wrapper for his GSoC project. I've been (with lots of help!) trying to put together wrappers for the BLAS and LAPACK linear algebra libraries. Eventualy we will be able to easily install these things through Plumage, and then other projects will be able to use them as dependencies. These are just two examples, and are both math-centric, but are certainly not the only libraries that need some attention. So ideas and suggestions for new libraries to target would be a great help.

Library Wrappers

Are you familiar with a popular library? Better yet, are you familiar with APIs and internals? We could use your help writing a wrapper around that library for Parrot. The benefit of course is that once you write a wrapper for the library once, it will be usable from programs written in all languages on Parrot. People writing in Ruby, Perl6, Tcl, and even eventually Perl5 will be immediately able to use your work.

Writing a library wrapper is relatively easy and straight-forward, although there isn't a lot of good documentation about the process. Actually, that would be another cool thing for newbies to do: Take a look at the documentation we do have and help point out areas that are confusing or short on details. Tell us what's hard to understand and we can help improve things so other new users will be able to get started faster.

Pure-Parrot Module

Austin has been working on a new project called Kakapo that's a library of runtime helper function specifically for NQP. This is an example of another class of extension: a pure-parrot library module. Pure-Parrot modules are ones that are written in PIR, or another language that runs on top of Parrot. These modules have the benefit that they don't need any other compiled binaries besides Parrot, which is great for platform interoperability. Write once, use anywhere. This is the power of Parrot.

Parrot ships with a small set of libraries, although these are mostly utilities that Parrot itself needs for building or testing. There is plenty of room open for creating new libraries as well to do all sorts of things. Need some ideas? Look at CPAN. There are thousands of libraries there which could be rewritten in PIR or NQP, which would immediately open them up for use with other languages as well. Writing cool libraries like this is a great way to get started using Parrot, and a great way to contribute back to the rest of the community.

Conclusion

The Parrot ecosystem is growing at an incredible rate. New language compilers, libraries, projects, and applications are springing up all over the place that use the tools Parrot provides. It's a compelling platform, and is demonstrably useful for a number of different types of projects. If you're interested in getting started using Parrot, non-core projects like this are a great way to get acclimated.

If you start a cool new project, or if you join a preexisting one, please let me know so I can highlight it on my blog.

Friday, October 30, 2009

Proper Matrices in Matrixy

I started a branch in Matrixy (it's own little adventure!) to start working on improved integration with parrot-linear-algebra. Matrixy currently uses nested ResizablePMCArray PMCs to implement matrices. This carries a few problems with it and is a huge performance bottleneck.

When I started work on Matrixy I had in mind to create classes for the fundamental things like Matrices and Vectors, but also the other types like cell arrays and structs. My work at the time was very naive, and I expected to be implementing all the various numerical algorithms myself. It wasn't until Blair mentioned it to me that I really thought about using an optimized library like BLAS, although that brings with it it's own hardships. Libraries like BLAS saves us the effort of implementing all the operations from scratch, but requires that we be able to write glue code for interfacing with the library functions. Tradeoffs.

Nested RPAs were easy because they were a pure-PIR solution that could be made to work quickly. The PMC manages it's own memory, so our only job was to manage the individual arrays. There is a certain amount of effort involved in keeping the matrix square when we resize (very expensive) and a certain amount of effort involved in marshalling data into BLAS API functions (very expensive), but we were able to get a lot of development done using the quick prototype.

in Parrot-Linear-Algebra we're providing PMC types that store data in a low-level C data buffer, in a format that is amenable for use directly in BLAS. Also, since it's a square buffer by default, we don't have to do anything special to keep it square on resize: We create a new buffer of the necessary size and block-copy items from the smaller old buffer to the larger new buffer. This might not be the most efficient way to handle it, but if we consider that matrices can be preallocated quickly, it's not going to be a huge bottleneck and a good developer can avoid it almost entirely. The new PMC type also provides all the necessary VTABLEs, so we can interact with it in a very natural way from PIR. Code that we were executing in large PIR functions before can now be performed using single opcodes, or single method calls.

So the general plan for the branch is to change Matrixy to have a strict dependency on parrot-linear-algebra. I will cut out all the code that manages custom RPA types and replace it with (hopefully smaller and better) code to interact with NumMatrix2D PMCs instead. Along the way, I'll be adding METHODs to the PMC type to perform the interactions with BLAS that we were previously doing in PIR. This should provide us a significant speedup in most cases, clean up the code significantly, and get us on the right path for future developments. I've already added several METHODs and VTABLEs for various purposes: clone, transpose, attribute access, fill, etc. These all need testing and improving of course, but I've been on too much of a roll with Matrixy to do these things.

One cool method that I added last night was an "iterate function" method. It takes a Sub object, which it invokes on every element in the matrix and replaces the current value with the result of the Sub. Since Matrixy has a lot of functions that act independently on every item in the array, this was a pretty natural thing to add. Also, since we're doing this in C now instead of PIR, we open up all sorts of possibilities like automatically threading large requests for improved performance. I think Parrot's threads system has some work to do yet before this becomes a viable optimization, but it's still interesting to think about.

Here's a PIR example of iterating a function:


.sub main :main
 $P0 = new ['NumMatrix2D']
 .const Sub helper = "add_to_each"
 $P0[1;1] = 4
 $P0[0;1] = 3
 $P0[1;0] = 2
 $P0[0;0] = 1
 # 1 2
 # 3 4
 $P0.'iterate_function_inplace'(helper, 2)
 # 3 4
 # 5 6
 $P0.'iterate_function_inplace'(helper, 5)
 # 8  9
 # 10 11
.end

.sub 'add_to_each'
 .param pmc matrix
 .param num value
 .param num to_add
 $N0 = value + to_add
 .return($N0)
.end

I haven't done any testing on this yet, but this is what it will look like when it's all working properly. There are a lot of functions that act item-wise on each element in a matrix, and this little helper routine is going to make those all much easier and cleaner to implement.

Thursday, October 29, 2009

ParrotTheory: Microprocessor Design

Long before I was involved in helping with the Parrot virtual machine, I was in school learning about and even designing actual microprocessor hardware. In my posts I nonchalantly talk about things like caches, pipelines, and hazards. Now I'm going to explain a little bit about what these things are and why they matter to Parrot (and all code, for that matter).

A typical production-grade microcontroller, not a toy or a colorful explanatory diagram in a book, is broken up into several stages. Each stage represents a portion of work or computation that can be done in a single clock cycle. I'll call these "clock cycles" for simplicity, but it's worth mentioning that there is not necessarily a strict relationship between the clock frequency and the instruction frequency. It's also worth mentioning here that the smaller your cycles are, the faster your clock can operate. Your clock's frequency is limited by the length of time it takes to execute your longest stage.

If we look at a simple and classic design, like the original MIPS microcontroller, it is broken up into 5 distinct stages: Instruction Fetch (IF), Instruction Decode (ID), Register Lookup (REG), Execute (EX) and Write Back (WB). In the IF stage, the instruction is retrieved from memory. ID breaks that instruction up to a series of individual components like control sequences, register names, etc. REG retrieves the values of the operand registers based on the information in the instruction word from the register file (an array of register values). EX performs the desired operation and produces a result. WB takes that result and stores it back into the register file. Wash, rinse, repeat.

A given instruction is only being executed in one stage at a time, so in this example it takes 5 clock cycles for the instruction to execute fully, and 1 instruction is executed every 5 cycles. This may sound redundant, but I'll explain later that it might not be. This would be a bit of a waste because at any given time 4 of the 5 stages can be empty and unused. To prevent this, microprocessors use a technique called pipelining, where we stagger execution to keep all the stages full. At any given time, each stage can be processing a different instruction. In this case, it would take 5 cycles to complete 1 instruction, but we can now complete 1 instruction every cycle. This is a 5x speedup!

[Note: some readers will realize that I'm talking about a multicycle processor here, not the more simple single cycle design. This is on purpose.]

The original MIPS design is a very simple example, modern processors from SPARC, AMD, or Intel might have more then 20 stages.

We run into problems called pipeline hazards if we do things like try to read a value from a register that is being calculated concurrently by another instruction in the processor. In this case, the processor bubbles, or delays for a few cycles until all the necessary data is ready. A smart optimizing compiler can automatically rearrange instructions to help reduce bubbling. To see how, take a look at this sequence:

a = b + c
d = a + 2
e = f + g

In this sequence, the value a is used immediately after it is calculated. This will cause the processor to bubble for a few cycles until the value of a is ready. Remember, it takes 5 cycles to compute a single instruction, so the first instruction must complete fully before the second instruction can begin. This means that for 4 cycles, new instructions cannot enter the pipeline, being instead replaced with empty "bubbles". By simply reordering things a little bit, we can speed things up:

a = b + c
e = f + g
d = a + 2

Same exact code, but now executes a little bit faster. This is because the instruction "e = f + g" is not dependent on the value of a, so it can enter the pipeline before the value of a is calculated. Now, we're down from a total of 7 cycles to 6 to perform the exact same work

[Note: I'm definitely simplifying. Don't shoot me!]

The EX stage, which is the actual workhorse of the processor, is relatively complicated in itself. It consists of two separate components, the arithmetic-logic unit (ALU) and the Memory Unit (MEM). Instead of being a series of stages in a row, the ALU is a set of parallel execution cores. Each core performs a different operation, so one may add/subtract, one may multiply, one may bitshift, etc. Data comes in and is copied to every core, every operation is performed on it, and a multiplexer is used at the end to select which value we want to keep. The MEM component can write or read data in memory at a given address. It's the MEM component that is usually one of the slowest parts of the processor, and serves as a huge bottleneck for the rest of the hardware. This is because MEM needs to interface with the cache and the RAM, which both may be running at a significantly slower clock frequency.

The ALU can be remarkably fast, because there are often very straight-forward tradeoffs to be made between numbers of transistors and speed. When you hear statistics about modern CPUs having billions of transistors, you'll know that they're trading more transistors for improved performance. I can talk about this more later if people are interested.

What we can do to make things even faster is to start doubling up on parts of the pipeline. We can be fetching and decoding instructions in large batches and passing them to multiple parallel EX cores. So long as we have the necessary hardware to detect hazards, we can be essentially executing multiple instructions together at each stage, with multiple stages in process. Some modern Intel CPUs, for instance, may contain two ALUs and two specialized MEM units (one to read and one to write). This means it can execute two mathematical instructions and two memory instructions at the same time. A smart compiler will know to order instructions like this to maximize parallelism. This design which uses parallel pipelines in a single processor core is called superscalar. When a compiler rearranges instructions to take advantage of parallelism or even to avoid hazards, it is called instruction pairing.

Consider a modern processor executing at 2GHz with 20 pipeline stages and can complete 1 instruction per cycle. That's 40 billion instructions per second that can be executed if everything is running at it's maximum efficiency. Not too bad. However, keeping things maximally efficient is a difficult task. Reducing hazards is a great thing, but the processor can only run fast if it can keep it's pipeline full, and can only keep the pipeline full if it knows what instructions to execute ahead of time. When we look at a conditional in C, we may not know what instructions to execute next until we've already calculated the value of the conditional:

if(a == 0) {
a = 1
} else {
a = 2
}

If the comparison a == 0 is in the pipeline now, we don't know which instruction to load next: a = 1, or a = 2. Without knowing we would have to wait 20 cycles until the comparison completed before we could load the next instruction into the pipeline. When you consider the case of more stages and more parallel pipelines, if we have to stall and wait for a result to be computed we can lose a lot of processor time. You can start to see huge performance decreases from these kinds of situations. To help resolve these situations we have branch predictors.

Branch predictors are specialized pieces of hardware that attempt to predict where control flow will be going, and use that information to speculatively load the pipeline with instructions. If the branch predictor is correct, things chug along like normal and everything is fast and wonderful. If the branch predictor is incorrect we need to flush the incorrect instructions out of the pipeline and start fetching the correct instructions. This can be expensive for a number of reasons: We need to completely refill the pipeline, which means we need to fetch more instructions from slow-moving memory.

I haven't made much mention of it so far, but memory caching is a huge deal. Memory runs at a much slower speed then the processor does. To keep the processor moving quickly we bring small blocks of memory into a small bit of fast-access storage inside the processor called the cache. The cache is small but very fast, and information in the processor cache can be accessed very quickly. If we try to access memory that isn't in the cache, we need to stop everything and load that data from memory into the cache before we can continue. This is called a cache miss and can be very expensive. If we look for memory that is not in memory either but is located on disk, we have what's called a page fault, which is more expensive still, but discussing tha is well beyond the scope of this post.

If the branch predictor is wrong, we need to flush the pipeline and maybe flush the cache too. This can become prohibitively expensive, with slowdowns around 100x or more. This is why sometimes your 3GHz quad-core processor grinds along like an old 486: poorly written code is causing all sorts of hazards and misses, and drives efficiency down to the floor. It's not the processor that's slow, it's the processor needing to wait for the pipeline to refill, or the cache to refill, or even the virtual memory manager to load the correct page from disk. It's the waiting that's a huge performance killer.

In a lot of cases, doing more can actually be better for performance. Consider this code, for instance:

if(a != NULL) a = NULL;

This is very straight forward, and only does the work of setting a to NULL if it isn't already. What you don't see is that this is probably a net performance loss: the branch gives us a 50% chance of missing in the branch predictor, and the comparison instruction is already moving the value of a into the cache and then it's an extra instruction to compare before we set the value. In this case, it is likely much faster to just write

a = NULL;

But then again, that's not the whole story. It's not just a matter of keeping the pipeline full, it's also a matter of how much effort it takes to read from and write to the cache. If we need to write through cache to main memory, we'll waste a lot more time then we would have if we did the conditional. There are always tradeoffs to make, some of them painful.

In other cases, it's better to write more code to be faster. Here's an example:

int i;
void *a[8];
for(i = 0; i < 8; i++) a[i] = NULL;

We could make this faster by telling C that i can avoid memory entirely by defining it as "register int i", but that's a small optimization. The loop is going to cause at least one branch predictor failure, either entering the loop (and not knowing the first time to jump back to the top) or exiting the loop (and not knowing when not to jump back to the top). By unrolling the loop we can make things much faster:

a[0] = NULL;
a[1] = NULL;
a[2] = NULL;
a[3] = NULL;
a[4] = NULL;
a[5] = NULL;
a[6] = NULL;
a[7] = NULL;

I don't want to talk about all potential optimizations here, that would be a huge list. But by being conscious of the types of factors involved in performance I hope that other developers are able to spot other situations where code may be far less efficient then it could be.

Wednesday, October 28, 2009

CS Education

Let me preface this post by saying that I don't have a CS (Computer Science) degree. I majored Electrical Engineering for my undergrad and Computer Engineering for my masters degree.

Yesterday I read a very interesting series of blog posts about the state of CS education. First up was "Joel of Software" with a post about how lousy CS education is. Next up was "Stochastic Geometry" with a post rebuking Joel. Finally, Ovid had a nice post about his reactions to them.

I can tell that the state of computer engineering education is lousy, and I have heard enough from professors and students to know that the state of computer science education is lousy too. It's hard to really put a finger on what exactly is wrong with the system, however. First, let's distill the good points of what Joel has to say:

Students aren't doing enough team work
Students aren't good at time management
Students don't write large programs
Students don't have to spend much time debugging, stabilizing, securing, or maintaining software in the long term.

These points are all true, and go largely unaddressed by Stochastic Geometry. However, in the flip side, let's look at what the response had to say:

Any particular tool is unimportant compared to learning a class of tools
Learning the theoretical fundamentals is important, because it allows the student to learn specifics more easily.
Teaching one particular methodology is harmful, because there are no standards and no proof that any one methodology is beneficial or even widely used.

Since joining my job, I've had the privilege to participate in a few interviews of new applicants. Most of the jobs we've been looking to fill are entry-level, so many of our applicants were fresh out of school. We had one candidate in particular with a dual major in CS and CE from a school that, I thought, had a decent program. He had a nice resume and a decent GPA, so we were hopeful.

We started with some low-ball sanity questions. First up: "How many bits are in a word?" There are a few good ways to answer this one, and while it may seem esoteric it is profoundly important when talking about embedded systems. Savvy readers will realize that this is sort of a trick question. The term "word" in computers has a number of different meanings depending on context. An acceptable, though naive, answer would have been "16". A better answer would have been something like "it depends on the system. 32-bit computers have a 32-bit word, 64-bit computers have a 64-bit word, etc". A great answer would have included the previous one, and included "though in some programming spheres, 'word' is defined to always mean 16-bit and double-word (32) and quad-word (64) are defined to mean other things, regardless of the size of the machine word". All of these would have been acceptable. His answer: "I don't know."

We were stunned, and tried to give him some help, but he simply didn't know the answer. So we moved to a less-difficult question, to try and coax some information that was buried a little too deep: "How many bits in a byte?". He didn't know that either. Thanks for coming in today, but the interview is now over.

Without fail. Without any exceptions, the best potential applicants I've seen for my place of work, and the best software developers that I've known from other contexts, have significant programming experience outside of school. Sometimes this comes some independent learning and small personal projects. Sometimes this takes the form of Open Source Software contributions. I make no secret of the fact that my participation in Parrot has had one of the biggest beneficial effects on my skill as a coder. I am far more influenced by my work in Parrot then I was by anything I did in school. I'm more influenced by it then I am from my full-time job. It really is an immeasurably wonderful environment to work in, and gives that kind of practical immersion that a university degree really can't provide

One applicant we talked to listed PHP on his resume. I asked, "Where did you learn PHP, in school?". To which he replied, "No, I taught it to myself."

"Why?"

"Just because I was interested in it and wanted to know how it works."

That applicant got the job, and has proven to be very successful at it. It didn't hurt that he knew how many bits were in a byte.

There was a time when people went into computer programming for the money. Back in the heyday of Silicone Valley and the .Com Bubble, there was money to be made hand-over-fist for any programmer who was able to dip into it. Reality has set in now, and this isn't the field to get into if you're only looking for some easy cash. In fact, if you don't stand out from the crowd in some way, you're more likely to be unemployed. I graduated with a lot of other EE and CS majors who settled for jobs in sales or marketing when they couldn't find a tech-related one. I knew one Masters degree recipient who had to work at Target for a while because there were no jobs for him whatsoever.

Every recent college graduate will tell you that almost every single job listing they find requires at least one year of experience. The naive will say "How am I supposed to get the experience in the first place, if I can't get a job that requires prior experience?" In some cases internships can go a long way towards greasing those skids. However, the bigger question is this: If you're fresh out of school and have all the fun facts and theory fresh in your mind, why do you need experience? You need it because school doesn't really prepare you for a job at all: it prepares you to prepare yourself to get a job.

I don't want to scare anybody away, but I do need to make this point: If you aren't good at it, you won't easily find a good job programming or web developing. Coming out of college you will not be any good at it if you don't have any other experience. This is an undisputable fact, a college education simply does not and cannot prepare you for a real job in the real world of computer programming. You're going to have to put in the extra effort yourself. You're going to have to fill in the gaps yourself. You're going to have to learn the important lessons yourself. College made you capable of learning, now the onus is on you to actually learn. Going back for a masters degree is a good idea. Internships are good. Doing little contracts for small local businesses can be good too. Classes and training courses are good, and they happen in all sorts of places at all times of the year. Participating in an open-source software project is, in my estimation, the best.

The last option is my favorite, but so many people shy away from it: "I don't want to spend my free time writing code". To this I have two replies. First, if you don't want to write code, why are you pursuing a career in it? Second, what are you going to say when your future boss hands you a blackberry and tells you that you are on-call 24/7 and are expected to jump anytime the system goes down? Or when your boss tells you that the entire team is expected to put in 10-20 additional hours of overtime per week, every week, until the next release?

So in conclusion, my point is this: a college education, especially in the fields of CS or CE, will teach you the basics and the fundamentals, and put you into a position to learn new things and adapt to new work environments. It will not directly prepare you for a good job as a programmer. What a college education does (and this is extremely important, even if it doesn't sound it) is to prepare you to learn the new things you will need to be a programmer. You have to do the learning yourself. Joel is right to point out that students are coming out of college woefully unprepared for work. He is also right to point out the problem belongs to the students themselves. Stochastic Geometry is right to point out that colleges are doing what they should, and that teaching the fundamentals is very important. The missing link is that the students need to go through the last few preparation steps themselves. Students need to get the extra real-world experiences themselves to complement the fundamentals they've learned in college. Participating in a good open source project will teach you the necessary lessons, will look great on a resume, and will act as that final bit of preparation to get you into the good jobs you want.

Blog Closed

Pages