Whiteknight's World: August 2010

Tuesday, August 31, 2010

Parrot Foundation Board

Today at the weekly Parrotsketch meeting there was also a meeting of members of the Parrot Foundation, with the aim of electing board members for the coming year. All but one of the current board members chose not to run for re-election, so it was a pretty open field.

Four candidates were nominated, and today all of them were elected by simple majority: Jerry Gay (particle), Jim Keenan (kid51), Jonathan Leto (dukeleto), and myself.

Here's a brief discussion of what I want to do this year as a Parrot Foundation board member:

Re-read and re-reread the foundation bylaws. I've read through them a while ago back when the foundation was first forming, but don't think I've looked at them since. Plus, I've read through so many sets of bylaws for so many young organizations in my time, the details all run together in my mind.
Money. The foundation doesn't have a whole ton of money, of course I can't think of too many non-profits of this size which do. There are lots of things that can be done with money: grants and bounties being two that immediately come to mind, but there are plenty of other things. Finding ways to raise money and increase the PaFo coffers should probably be a pretty big priority for us.
Create a proper membership committee. Parroteer "smash" has been running the elections and doing a fantastic job, but some problems have been exposed in the membership process. Many people do not know whether they are members or not, and many people are confused by how a person becomes a member. Clearing up confusion in this area will help everybody.
Recruit new people. People are the fuel that runs an open-source project, and you can never have too many people. Parrot is by no means a small open source project, but it is far from being a big one. More developers create more/better software, which attracts more end users, which increases the prestige of the project, which attracts more developers, which.... It's a feedback loop that we should be trying much harder to feed.

These are just four things that I would like to focus on this term, however this is not a definitive list. I am hoping to get, within the next few days, some information from the current board members about what kinds of tasks they have left unfinished, or what kinds of things they would try to accomplish if given another term. Since there is so much new blood on the committee, we need to make sure to thoroughly pick the brains of the current board so that we can get this coming year off to a running start.

Sunday, August 29, 2010

Code Anthem: What Can Change

Friday night I wrote an introduction to a site called Code Anthem, which tests the skills of a programmer and uses those results to help employers find talented individuals. As I said then, I think the idea is a pretty awesome one, and I think it has a lot of potential to help employers with the tricky technical screens that so many companies get wrong.

Today I'm going to talk a little bit about what I think Code Anthem can do differently, and how they can grow the service into something that would have real, industry-wide value.

The tests at Code Anthem tend to focus pretty narrowly on algorithms: Determine whether one array is a subset of another array, Determine if a string is a palindrome, Calculate the perimeter of a polygon, Validate a password. These are all relatively simple things and once you know the necessary algorithm it tends to be a simple matter of translating it into the target language. With a little bit of a brush up for me to re-familiarize myself with the core classes and methods, I could probably do as well in the Java test as I did in the C# test, and I absolutely would not consider my mad Java skillz to be at the same level that my C# abilities are. I was decent with the language in college when I was using it, but skills do fade if you don't use them and I have certainly not been using Java.

A test for problem-solving and basic language syntax tests exactly that: problem solving and basic language syntax. One question involved calculating the Fibonacci sequence, and while my solution was pretty efficient, I could easily have used the classic, naive recursive implementation and thrown performance considerations to the wind. Would this provide me the correct answers to test input? yes. Would this be an acceptable answer? Absolutely not. So, testing that I can solve a problem which has several known solution algorithms doesn't necessarily mean that I will pick a good one, and doesn't necessarily mean that I've solved the problem well.

Looking at this another way, I know that the recursive solution (which is perfectly acceptable input) starts to go hell-crazy above a certain input level. Assuming that Code Anthem isn't evaluating solutions on a supercomputer at the NSA, it's pretty reasonable to assume that they aren't going to be testing the solution with inputs above 30ish, and absolutely no inputs above 45. With this devious knowledge in mind, I don't need an algorithmic solution at all, all I need to do is provide an answer that must work for the set of inputs that they can reasonably use to test:

int fib(int idx) {
return (new int[] {1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...})[idx - 1];
}

Again, not an acceptable solution to the problem under any circumstances, but it does work and it does require a little bit more insight than the recursive solution does.

Any programmer who really thinks that the naive recursive solution to the Fibonacci problem is an acceptable one really should fail the test no matter how accurate the results are. It may be technically correct, but it is an example of extremely poor problem solving, lazy coding, and a complete disregard for real-world performance issues. It would be relatively trivial to run tests in a sandbox thread with a timed kill-switch: If the timer goes off before a solution is found, the complexity must have been too high and the result is failure.

I've digressed a little bit: If you're only testing problem-solving and an ability to dump a solution from your brain into the syntax of the target language, you're not testing much. I've known a lot of really lousy programmers who would be able to solve many of these problems, even if the quality of the final solution was severely lacking.

When I taught Embedded C coding in college, I used to hear from students all the time that their grades should have been higher because "it compiles", and somehow that always seems good enough. The reality is that compilation is the milestone that separates a file of source code from a file of unintelligible gibberish. Just compiling means that what you wrote is software, but not that it's correct, robust, maintainable, or even acceptable. To earn these other distinctions, more work above simply getting the syntax right is required. In the Code Anthem test, they're testing a slightly higher level, but not by much. Yes they can show that the code compiles and that it correctly produces the correct results, but they cannot show that the code is "good" nor that the programmer is "competent", or "capable" or "worth hiring".

C# is a pretty rich language, even if you stick with the popular 3.5 version and not the new 4.0. Of course, 4.0 doesn't add a whole hell of a lot to the core language, so we can really ignore it for now. Features like contravariance of generic types are certainly nice, but the new version doesn't have huge market penetration yet so we can't expect most people to know about it. Testing for a basic ability to write functions and loops, and a basic understanding of core types really misses out on many of the important features of C#, and really does nothing to separate the coders who don't know the language well from those that do.

A C# or Java test which does nothing to test knowledge of object-oriented concepts, such as classes and inheritance, really isn't a comprehensive test of those languages. If you can't use inheritance, don't understand the difference between is-a and has-a relationships, and don't know what keywords like "interface", "abstract" and "virtual" do, you really don't know C#.

In C#, if you don't understand delegates, you could get in trouble pretty quickly. Understanding the difference between a method group identifier and a delegate can be a simple and confusing mistake for inexperienced programmers. Not understanding the difference between normal and multicast delegates can lead to some very weird runtime effects. Not understanding anonymous delegates, closures, and dynamic invocation really can really separate out the entry-level C# coders from from the advanced programmers and gurus.

Beginner C# coders will know the basics about syntax: classes, functions, loops, variables, operators. Median C# coders will be using interfaces, inheritance, delegates, exceptions, and built-in generics. Good C# coders will be doing all the previous things correctly and at a higher level, plus mixing and matching all these tools and techniques to solve complex problems.

All of the tests in Code Anthem (at least those that I saw) gave the test-taker a function prototype and some instructions, and asked for the body of the function to be filled in. A different, more comprehensive type of problem would give an interface definition and ask the user to implement an entire class. An extremely comprehensive test would present many types of questions:

Implement a function to have a particular behavior, given a signature and a description.
Implement a class to provide a specific interface, given the interface and a description of behavior.
Given a piece of pre-written code with bugs, identify and fix the bugs.
Multiple choice questions about language features which would be hard to test practically.

You'd probably want to put some kind of time limit on those multiple choice questions, but there are some things that you're not going to really be able to test without them.

There are plenty of ways to run this kind of a test. You could do it like a game show, such as "Who wants to be a Millionaire": People start with a score of zero, and gradually move up the scale as they answer ever harder and harder questions. The really good ones might have to spend some time on the test, so it would be good to break it up into stages that can be tackled at different times.

Another option would be to break up the test into several smaller tests focusing on different issues. Some tests could even be language-agnostic, such as figuring out how to solve particular problems where language-specific syntax or semantics doesn't play much of a role in the solution.

What Code Anthem does now is a basic test to weed the programming bozos from non-bozos. This is certainly a good first step, but really doesn't do anything to differentiate between the coders who are good at what they do from the true creme de la creme of potential applicants (and everybody in between). Weeding out the obvious bozos is still a good service to render, but it really falls short of a potential that a system like this has.

Friday, August 27, 2010

Code Anthem

I'm always trolling around the internet for cool new ideas and blogs about programming. A while back I found the blog for a new site called Code Anthem and started following it. The Code Anthem blog gives an often-refreshing look at programming in the workplace, with specific focus on hiring and getting hired.

Code Anthem isn't just a blog though. Recently they've started beta testing on their pay service, a coding test system that's designed to work something like an online portfolio. It works like this: Coders take standard tests in one of a variety of programming languages to gauge skill levels. Employers come to Code Anthem and ask "I want a programmer with X skill level in Y programming language". Code Anthem then matches up coders with that skill set in that geographic area with the prospective employer. Everybody wins. Well, not the really bad programmers, but I suspect that they never win.

Yesterday, as part of the private beta, I took the test in C#. They're only offering Java and C# for the beta test so far, and of the two I am much better with the later. I went in with pretty high hopes, since I write a lot of C# code at work and consider myself to be pretty damn good at it. I logged in, answered 5 questions, and scored a solid 3/10.

Well, that sucks in a major way. I mean, I don't think I'm the best C# coder that ever graced the language, and I maybe rushed through some of the questions faster than I should have, but I have to believe I am above the 30% mark. My pride a little wounded and having a few spare moments open up, I decided to re-take the test. Through the next 5 questions I was much more careful, took my time writing the solutions, and was sure to perform testing on each solution to verify the results. At the end I looked over everything one last time, clicked submit, and ... 3/10.

Damnit. Maybe I suck much worse at this whole programming gig than I thought I did. Maybe I'm really bad at it. Like, "time to change careers" bad, Or maybe "time to join the army" bad. It was really quite a shock, and for a few moments I got kind of upset about it. I quickly got over it, and did what I think any good programmer should do in the face of failure: start debugging.

I didn't save my answers to all my questions, but did have a few and I started rigorous testing. I also had taken some notes about the various questions: some were worded in particularly difficult or vague ways, and I had made note of those things as I went. I put together my findings into a nice email and sent it off to the powers that be. After all, the point of a beta test is to find bugs and elicit feedback.

Some of the answers I had originally gotten wrong exposed bugs in the test and the results were flipped in my favor. My score didn't change too much though because the test is adaptive: as you get questions wrong it gives you easier questions later. So even though I was getting some results changed, they were the low-tier easy questions and didn't improve my scores much. What would have been needed at that point would be for me to go back and retake the test from scratch, getting positive results at the beginning which in turn would lead to harder but more valuable questions as the test progressed, improving my score substantially.

What happened instead was that Code Anthem realized that mean scores were far too low, and they fixed a few things. First, they fixed some bugs in some tests, then re-evaluated the relative difficulty (and therefore value) of tests based on how often they were legitimately answered correctly, then they recalibrated results to move the mean score up from about 30% to 50%. The end result is that my paltry 3.0 turned into a hefty 9.0, much more in line with the result I was expecting to get from the beginning. Of course, this score will probably be wiped when the beta ends and the real test gets put in place, but it's nice to see such a high number now anyway.

In the end, what do I think about Code Anthem? For starters, I think it's a really awesome idea. I've been on a number of job interviews, mostly two years ago when I graduated and was looking for my first real job. What I remember most about them is that they were almost laughably non-technical.

One interview I had was a phone interview, and I distinctly remember because of the awkwardness of trying to "write" code to an interviewer over the phone (Actually, the thing I remember most about this particular interview was a terrible blunder I made in describing the purpose of the POSIX "fork" routine, a mistake that I might not ever forgive myself for).

A second interview I went on had a written programming test, that I remember being a very vaguely-worded "real world" question, a problem that the engineer administering the test had run into not long before the interview began and hadn't taken the time to really formulate it into a test-worthy question. I spent a little bit too much time writing out a stack class that I would have needed to solve the actual problem (the test was in embedded C, so I didn't have a fancy "stack" type to play with) and when time got short I ended up just verbally describing my solution to the guy.

Another interview I went on featured a short technical phone screen. When I got to the in-person interview, I was given a test mostly involving mathematics with very little programming. It was a company specializing in numerical computing, so it's not strange that they focused so much on my math skills.

In terms of technical interviews, these three examples are really the most rigorous demonstrations of my ability that any company saw before making a hiring decision. The only other information they would have had would be my transcripts, my resume, and the short descriptions of some past projects I gave verbally to interviewers. There would be absolutely no way for them to know, given such little reliable information, what I or any application would be really capable of. Saying that I worked on such-and-such a project when I was in school, or that I was "proficient" in languages X, Y, and Z are really uninformative and subjective personal assessments, and hardly a reliable measure of my value to a company.

So this is where Code Anthem shines. They provide answers to a very fundamental question: Can this person take a set of specifications and provide a sample of code that works? I think this is a very cool idea, and I would be very happy if the concept gained traction in the industry. It certainly is a huge improvement over the current system where professionally-crafted resumes, awkward phone screens, and off-the-cuff written tests are used to make hiring decisions which could potentially have huge effects on the company.

Of course there are some things that I think the site could improve. I'll write some critiques in another post.

Monday, August 23, 2010

PLA Planning: Next Steps

I did a little bit more infrastructure work on Parrot-Linear-Algebra this weekend. I wasn't able to get a lot done because we spent Saturday with family, but I got some nice groundwork laid for important future developments.

I started thinking about bindings for the LAPACK library, and much of the work I did this weekend was in preparation for that work. LAPACK is particularly tricky because it's written in FORTRAN and unlike the ATLAS library I use for my BLAS implementation, there aren't any good C bindings to LAPACK around that I can find. Yes there is the CLAPACK library, but I am not a particular fan of the way that library is generated and there aren't any premade packages for CLAPACK on Ubuntu. In installing PLA on Linux, I would like users to be able to get all prerequisites from their distribution's package manager and not have to compile these other prerequisites manually. This is a good thing since some of these can be extremely tricky to compile.

I've decided that, in the foreseeable future, we will be linking to the LAPACK library directly. This does mean that we are going to need to write our own bindings and wrappers for the functions we want to call, but that's not a huge issue. Plus, I can start moving forward on something I've wanted to have for a while now: more flexible and configurable interfaces to BLAS and LAPACK. There are several implementations of each of these libraries (BLAS especially), and I want PLA to support as many of them as possible.

To get these kinds of flexible interfaces, we're going to need to support large swaths of conditionally-compiled code. This will get extremely unweildy if we try to keep all the logic in the few monolithic PMC files I've been writing, so I decided that we're going to need to upgrade our build machinery to support linking in additional C library files into the dynpmc build. This weekend I added that machinery, and started the process of factoring out some of the supporting routines from the various PMC types into those files. With common functions defined in one place, I can go through and make sure all my types are making consistent use of them. Plus, I can start writing up complicated library bindings without worrying about making my large PMC files any larger.

BLAS doesn't actually supply too many routines that the PLA matrix types need. The most important, and only one I call so far, is the GEMM routine, which performs the generalized matrix multiply/addition operation:

Y = αAB + βC

Where α and β are scalar values (FLOATVAL in NumMatrix2D and Complex in ComplexMatrix2D), and A, B, and C are matrices of compatible dimension. This is a very powerful operation, and really the only one from BLAS that I need. Other routines from BLAS are more highly optimized versions of this same routine for specific types of matrices (diagonal, triangular, etc), and operations between matrices and vectors which I really don't support yet. Because there is really only this one routine, and because it's so powerful and common, I've included it as a method directly in the PMC types themselves. I may rethink that decision later, but for now it seems like a decent fit where it is.

LAPACK is a little bit different, however. LAPACK contains hundreds of routines, all of which are named in an idiosyncratic and difficult-to-understand way. These routines are broken up into drivers, computational routines, and auxiliary routines. The driver routines perform the most important operations and typically call sequences of the lower-level computational and auxiliary routines to perform their work. Driver routines include calculating linear least squares, finding eigenvalues and eigenvectors, and singular value decomposition.These are the ones that PLA is going to target first, though access to other, lower-level routines might be nice to have as well, later down the road.

I am torn as to how I want to provide LAPACK bindings in Parrot. The obvious thing I could do is write them all up as methods on the PLA matrix types. This is a system that provides the easiest access, but also starts to get bloated and unweildy very quickly. Plus, if you don't need LAPACK routines for your program, I would be forcing you to load the LAPACK library and create Sub PMCs for all those methods. That can get to be a pretty large cost to pay. I really don't like this idea because it adds huge bloat to my PMC files and adds extra runtime overhead of adding in all sorts of extra method PMCs when we load PLA. Plus, it doesn't scale well, any time we want to add a new LAPACK-based routine we need to add a new method to several PMC types present and future.

One alternative is to provide a separate library of functions and dynamically load it using Parrot's NCI tools. I don't know what the current status of the NCI system is, but without significant improvement there, attempting to load a complex library like LAPACK, even if I provided some simplified Parrot-specific wrappers, would be a nightmare. This option does provide a little bit more flexibility because we can load in the methods that we want without having to deal with ones that we don't. Also, I don't think the distutils library that PLA is using supports creating standalone shared libraries that aren't dynpmc or dynops libraries, so I would have to write all that machinery myself. I'm not sure I really like this option either, for all these reasons.

I'm also torn about whether I want to make some prerequisites for PLA optional. Kakapo, for instance, is only used for the test suite. If you don't have Kakapo installed you can still build and use PLA, you just can't run the test suite. BLAS is currently required because it's used to implement the GEMM method but also the various multiply vtables. I suppose I could make it optional, but I think it would be a little sneaky if different builds of PLA supported a different list of VTABLEs and METHODs for core types, including for something as fundamental as multiplication. LAPACK is a little bit different, because even without it we do have a pretty solid amount of core functionality. I am strongly looking into an ability to make LAPACK an optional prerequisite, and only include bindings for it if we have the library installed. If I wrote a dynamic library wrapper for LAPACK and had the user include that file manually to get those routines, you could still use the core library.

I had also considered the idea (though I've been fighting it) of building some kind of PLALibrary singleton PMC type which would control this kind of stuff. In addition to providing information about library version and configuration options, a PLALibrary PMC could allow dynamic loading of methods, and could forcibly inject those methods into the necessary PMC types if requested. This would allow me to build everything together into a single dynamic library without forcing me to define all the methods upfront which I don't need. I may like this idea the best, though I've been resisting creating a library singleton PMC until now.

There's a lot to consider, and I probably won't make any hard decisions about future plans for the next few days. In the interim I do want to put together some proof-of-concept wrappers for some common LAPACK routines, and start playing with those. If I could get routines to calculate Eigenvalues, Eigenvectors, Inverses, and Singular Value Decompositions, I think that would be a pretty great start.

Saturday, August 21, 2010

GSoC Threads: Chandon's Results

The pencil's down date for the 2010 GSoC program has come and gone, and now we're starting the process of sorting through the rubble and seeing what all got accomplished this summer. I have been purposefully trying not to provide too many of my own status updates for Chandon's project this summer, instead wanting him to post his own updates as he went without me getting in the way. It's his project after all, and I wanted him to get all the credit for all the good work he was doing.

The project is now over, and he's posted a nice final review of what he managed to accomplish this summer. While it isn't everything that he mentioned in his proposal, it still is quite a nice step in the right direction and a lot of the necessary framework is now available to bring threading to Parrot in a real and usable way.

So what did he manage to do? In his branch there is now a more-or-less working implementation of green threads, which means Parrot is getting onto the same level of capability as modern Python or Ruby implementations. You can schedule multiple independent tasks to run and, while they cannot run truly in parallel on separate OS threads or even on multiple processor cores, you can get the illusion of concurrency. It also gives you the ability to run multiple tasks together without having to explicitly schedule them one after the other, or having to manually switch back and forth between them. In a later post I may go into some detail about how this all works, and how it can be used by programs.

This is a pretty significant step forward, and once a few of the final nits get worked out I think we will be able to merge this into Parrot trunk and start making use of the new functionality immediately. However, green threads is only one half of the promised "hybrid threads" that Chandon's proposal hoped to achieve. The other half is the ability to run Parrot code in true parallel on separate OS threads. This was the larger part of the project and definitely the more difficult piece. Today I would like to talk a little bit about why it didn't get done, and maybe motivate some people to look into help completing the work as we move forward.

Let's take a look at a small snippet of normal object-oriented code:

foo.bar();
foo.bar();

Extremely simple, we have an object "foo" and we are calling the "bar" method on it. In a very naive staticly-typed language this would be a simple operation. The compiler would determine the type of foo, calculate the location of the bar routine in memory and would simply call that address twice. This would be extremely fast to execute too, which everybody likes. This would basically be converted to this low-level pseudo-code:

call bar(foo)
call bar(foo)

Now let's move up to a slightly more complicated example: a statically-typed language which allows subclassing and method overriding. Now, the compiler doesn't necessarily know which function in memory corresponds to "foo.bar()", since foo could be of type Foo, or of type FooSubclass, or even FooSubSubclass, and the location of the appropriate bar function would change with each different type. However, for each type, the value of bar does not change. It can be overridden by subclasses, but it cannot be changed for the class itself. Now, the compiler needs to ask foo to get the appropriate method first:

method _bar = get_method(typeof(foo), "bar")
call _bar(foo)
call _bar(foo)

Assuming foo does not change type inside the call to _bar, this code works just fine. Next let's look at a more complicated example still, the case of a dynamic language like Perl or Python. In these languages the class MyFooClass is an object of type Class, and foo is an object of type MyFooClass. Also bar is not a compiled-in constant property of the Class, but is instead a mutable property of it. Inside the call to bar, maybe we change the definition of bar itself inside the class object. Likewise, the definition of routines like "find_method" inside Class can change. Accounting for all this dynamicism, our code changes to look like this:

class fooclass = find_class(foo)
method _get_method = find_method(fooclass, "bar")
call _bar(foo)
class fooclass = find_class(foo)
method _get_method = find_method(fooclass, "bar")

call _bar(foo)

Keep in mind that everything can change inside each call to bar. We can:

Modify the inheritance hierachy of MyFooClass to add or remove parent classes
Add or remove methods in MyFooClass, and any item in it's inheritance hierarchy
Add or remove methods on the object foo itself (like in javascript where foo.bar = function() {...})
Change the class of foo

Let's bring this back to the idea of threading before I go too far off topic. Every time we call a method, or attempt to access a field on an object, we need to look those up in the associated Class object first. Those Class objects need to be global, because all code on all threads need to be able to lookup methods on any object, because we can pass objects between threads and we need to be able to work with them.

If classes are global, and need to be accessible from multiple threads, we inevitably run into the idea of contention: If one thread is changing the definition of foo.bar() at the same time that another thread is attempting to look it up, the thread doing the reading may end up with incomplete, corrupted, or incorrect information.

Global data is the enemy of good multithreading. Global data is a necessity for highly dynamic object systems like those supported by Parrot. Ergo, multithreading on Parrot is hard.

Look at all the global data in Parrot: there's the interpreter object itself, the packfiles, the list of Classes (and the methods/properties that they contain), the list of Namespaces (and the global values they contain), the context graph (and the contents of all registers in those contexts), the opcode table and the MMD signature cache. This is a hell of a lot of globally-available data, and we're going to need to find a sane way to limit corruptive concurrent access to these things. Not only that, but if we have to lock every single access to a global value, which means we need to obtain a lock every time we want to do a method call, our performance is going to be terrible.

By convention I think we can all agree that things like the opcode table should really not be modified when we have multiple threads running. But it's harder to make such a convention that class/method definitions should be treated as constant when multiple threads are running.

What we can do to alleviate some of the performance problems is to create short-term caches for things like Classes and Methods in local threads, with the understanding that without explicit synchronization, the exact ordering of read/write operations between threads cannot be guaranteed and can be scheduled for maximum benefit. If the relative execution times of two operations on separate threads cannot be guaranteed, then it makes no difference to functionality for the user whether those operations happen at random times with respect to each other and whether Parrot manually orders them to improve caching.

Think about it this way: We have two threads, one is writing a global value and one is reading it. These threads are operating in true parallel on two separate processor cores. If we run this program a million times, sometimes the read will randomly occur before the write, sometimes the write will occur before the read, and sometimes they will happen at the exact same moment and cause corruption. Depending on a simple matter of random timing, we could get three very different results from our program. Now, if Parrot implements a heuristic that if a write and a read happen within a very short period of time, Parrot will manually order them so that the read occurs before the write. And, if the read always happens before the write, maybe we don't read at all, but instead use a cached version. Then, only when the write has complete, we update the cache.

Another thing to think about is that we could disallow direct lookups of data in global variables, and give each thread a local copy of global values. Considering the principal I mentioned above, we can be a little bit liberal with the way we handle writes and reads. If a thread modifies its copy of a Class, those changes could be propagated to a global reference copy at a convenient time, and then propagated down to the local copies later, only when it's convenient to do so. Remember, if things are happening at random times compared to each other, it makes no substantive difference to the thread whether it's copy of a global variable is exactly up-to-date or whether it's a little bit lagged behind. That is, the reading thread doesn't know whether the writing thread had a legitimate delay before writing, or whether Parrot manually scheduled the write at a different time.

To get a complete Hybrid Threads implementation in Parrot like what Chandon was envisioning, we are going to have a few steps to take:

We have to break the Interpreter structure up into two parts: One which is truly global and contains data which all threads need, and one which is local for each thread and contains data that the thread needs. The local interpreter may contain a cache or a mirror of data in the global interpreter.
We need to come up with a good sane scheme (which will likely consist of both technical solutions and programmer conventions) to manage access to global values
We need to come up with a good sane scheme for sharing values between threads.Creating a local variable and passing a reference to it to another thread essentially turns it into a global variable and opens up problems with contention. We need a good way to manage this without slowing everything down.

All of these problems are solvable, and if we put forward a concerted effort we could have a decent hybrid threading system in Parrot within the next few months. We almost have a complete working Green Threads system ready to merge now, and hopefully that will be enough for our users while we get the rest of the details of the hybrid system worked out and implemented.

Friday, August 20, 2010

PLA: More New Methods

In a previous post I discussed some of the common behaviors that the matrix types in Parrot-Linear-Algebra implement. Recently, I've added a few more which I think are pretty cool and worth discussing.

First and foremost, I've finally added some methods to convert between type. There are now methods for convert_to_number_matrix, convert_to_complex_matrix, and convert_to_pmc_matrix. Every type implements all of these methods, even when it would be a no-op. This is so you can take a PMC and if it "does 'matrix'" you can cast it to the type you want to deal with without worrying about spending too much expense. These methods always return a new matrix, so you can keep a copy of the matrix in it's original form and also have a new copy to play with. In the case where the matrix is already in the target format, I create a clone.

Unfortunately, these conversion operations can be a little bit expensive when you're actually converting types. The problem is that the data for the matrices is stored internally in a very dense format. For the Number and Complex matrices, the data is stored internally in the format required by the BLAS library routines. For the number matrix, the values are basically stored together in a single large buffer. For complex matrices, the real and imaginary values are stored together also, alternating positions. Converting one to the other is not easy, since I have to allocate a completely new buffer and iterate over each space individually. So, too many conversion operations can get expensive quickly.

Using these new conversion methods, I have updated some previous methods, like gemm(), the routine which performs the GEMM operation from the BLAS library. You can now pass any matrix types to this method, and it will perform conversions internally to ensure the types match. Here's a fun little NQP example that shows some of the capabilities of this library today:

INIT { pir::load_bytecode("pla_nqp.pbc"); }

my $A := NumMatrix2D.new();
$A.initialize_from_args(2, 2, 1, 2.0, "3", 4);
$A.transpose();
pir::say("A: " ~ $A);

my $B := ComplexMatrix2D.new();
$B.initialize_from_array(2, 2, [1, 2.0, "3+3i", "7+5i"]);
$B.conjugate();
pir::say("B: " ~ $B);

my $C := PMCMatrix2D.new();
$C.fill(4.4, 2, 2);
pir::say("C: " ~ $C);

my $D := $B.gemm(0.5, $A, $B, "1+2i", $C);
pir::say("D: " ~ $D);

You can try it yourself, or you can take my word for it that the result is correct. I've verified it this morning in Octave, and the results are the same (though the Octave script to produce this result is considerably shorter).

PLA is finally starting to get the kind of basic functionality and test coverage that I've been hoping for. With a few more finishing touches on this base, I'm going to start adding new functionality like LAPACK bindings. Specifically, I'm hoping to add in support for some common matrix decompositions, matrix reductions, inverses and eigenvalues. I'm also hoping to get started on the module for Rakudo sometime soon.

Thursday, August 19, 2010

PLA Test Suite Redux

I've been doing a hell of a lot of work on the Parrot-Linear-Algebra test suite in the past few days, and the results are quite spectacular. I've completely reworked the way the suite works, and broke a small handful of monolithic test files into a series of smaller, focused test files. Instead of describing it, I'll just show the results:

t/sanity.t .............................................. ok
t/pmc/pmcmatrix2d.t ..................................... ok
t/pmc/nummatrix2d.t ..................................... ok
t/pmc/complexmatrix2d.t ................................. ok
t/methods/nummatrix2d/convert_to_complex_matrix.t ....... ok
t/methods/nummatrix2d/row_scale.t ....................... ok
t/methods/nummatrix2d/get_block.t ....................... ok
t/methods/nummatrix2d/mem_transpose.t ................... ok
t/methods/nummatrix2d/convert_to_number_matrix.t ........ ok
t/methods/nummatrix2d/initialize_from_args.t ............ ok
t/methods/nummatrix2d/row_swap.t ........................ ok
t/methods/nummatrix2d/iterate_function_inplace.t ........ ok
t/methods/nummatrix2d/fill.t ............................ ok
t/methods/nummatrix2d/initialize_from_array.t ........... ok
t/methods/nummatrix2d/iterate_function_external.t ....... ok
t/methods/nummatrix2d/transpose.t ....................... ok
t/methods/nummatrix2d/item_at.t ......................... ok
t/methods/nummatrix2d/gemm.t ............................ ok
t/methods/nummatrix2d/row_combine.t ..................... ok
t/methods/nummatrix2d/resize.t .......................... ok
t/methods/nummatrix2d/convert_to_pmc_matrix.t ........... ok
t/methods/nummatrix2d/set_block.t ....................... ok
t/methods/complexmatrix2d/convert_to_complex_matrix.t ... ok
t/methods/complexmatrix2d/row_scale.t ................... ok
t/methods/complexmatrix2d/get_block.t ................... ok
t/methods/complexmatrix2d/mem_transpose.t ............... ok
t/methods/complexmatrix2d/convert_to_number_matrix.t .... ok
t/methods/complexmatrix2d/initialize_from_args.t ........ ok
t/methods/complexmatrix2d/row_swap.t .................... ok
t/methods/complexmatrix2d/iterate_function_inplace.t .... ok
t/methods/complexmatrix2d/fill.t ........................ ok
t/methods/complexmatrix2d/initialize_from_array.t ....... ok
t/methods/complexmatrix2d/conjugate.t ................... ok
t/methods/complexmatrix2d/iterate_function_external.t ... ok
t/methods/complexmatrix2d/transpose.t ................... ok
t/methods/complexmatrix2d/item_at.t ..................... ok
t/methods/complexmatrix2d/gemm.t ........................ ok
t/methods/complexmatrix2d/row_combine.t ................. ok
t/methods/complexmatrix2d/resize.t ...................... ok
t/methods/complexmatrix2d/convert_to_pmc_matrix.t ....... ok
t/methods/complexmatrix2d/set_block.t ................... ok
t/methods/pmcmatrix2d/convert_to_complex_matrix.t ....... ok
t/methods/pmcmatrix2d/get_block.t ....................... ok
t/methods/pmcmatrix2d/mem_transpose.t ................... ok
t/methods/pmcmatrix2d/convert_to_number_matrix.t ........ ok
t/methods/pmcmatrix2d/initialize_from_args.t ............ ok
t/methods/pmcmatrix2d/iterate_function_inplace.t ........ ok
t/methods/pmcmatrix2d/fill.t ............................ ok
t/methods/pmcmatrix2d/initialize_from_array.t ........... ok
t/methods/pmcmatrix2d/iterate_function_external.t ....... ok
t/methods/pmcmatrix2d/transpose.t ....................... ok
t/methods/pmcmatrix2d/item_at.t ......................... ok
t/methods/pmcmatrix2d/resize.t .......................... ok
t/methods/pmcmatrix2d/convert_to_pmc_matrix.t ........... ok
t/methods/pmcmatrix2d/set_block.t ....................... ok
PASSED 356 tests in 55 files

I factored out all tests to delegate matrix creation and matrix population routines into a series of type-specific factory objects. With these factories, creation of tests goes extremely quickly:

Write tests using a few methods and values from the factory instead
Create a short stub test for each Matrix type, subclassing the common tests and add a proper factory
There is no step 3.

It's really become quite simple, and I'm much happier with the test suite now. Plus, I've added plenty of new features lately (complete with tests!) and I'm going to talk about that plenty in a later post.

Wednesday, August 18, 2010

Github Wikis and EmbedVideo

Github is rolling out a new wiki system which uses a git backend to store data. This is an awesome feature, and one that I've wanted Github to add ever since I started using a similar feature at Google Code. It's great to be able to download all the wiki pages as a repository and work with it locally using my normal editing tools.

Whiteknight's rule of the internet #47: Raw TextArea controls in a browser are rarely good enough for anything, and should be avoided for any task besides extremely rare edits of small bits of text.

Only one of my projects on Github makes use of the wiki feature there: EmbedVideo. Last night I converted the existing wiki pages to the new git-based wiki system, and pulled down the repository. In a wild fit of productivity, I updated much of the documentation and filled in some of the blanks that I had left for too long. It's still not perfect (software documentation rarely is), but it's better than it was and it's probably good enough to get most users moving with some of the new features.

I love when tools get better, and compared to what they used to be the new Github wikis are certainly much much better.

Monday, August 16, 2010

PLA Status Updates

On Friday night we dropped the kid off with his grandparents for a sleepover. With the apartment to ourselves, my wife and I did what we've been wanting to do for months: she went to bed early and I stayed up to program. I started making some real progress in Parrot-Linear-Algebra, and also uncovered some interesting bugs which needed to be fixed.

I added a short file for adding PLA support to programs written in NQP. The file, pla.nqp, can be included into an NQP program like this once it's installed:

INIT { pir::load_bytecode("pla.pbc"); }

That file loads the PLA binary library, stores a global reference to the library, and registers the type names with P6metaclass. That done, you can start writing more natural-looking programs in NQP. I updated the Gaussian Elimination example I wrote a few weeks ago to use this new bootstrap file (and some new methods I added), which means the example can now run without Kakapo or any other dependencies.

Next up, I started prototyping bindings for the library for Rakudo. These are still preliminary, but I also have a working example of using numerical matrices in Rakudo. Within the next few days I will probably create a complete module for Rakudo (Math::LinearAlgebra, or something like that). I'll post more details as I do more work on it.

I added a new item_at method which returns and optionally sets a value at given coordinates in a matrix. This method is common to all my core matrix types. The list of common methods for these matrix types is:

resize() : pre-allocate size for the matrix, growing (never shrinking) it to hold a certain size
fill() : Fill a matrix, or a region of a matrix, with a constant value. Automatically resize if necessary
transpose() : Transpose (swap rows with columns) the matrix lazily
mem_transpose() : Eagerly transpose the actual memory contents
iterate_function_inplace() : Execute a function for every element of the matrix, replacing that element with the function result
iterate_function_external() : Execute a function for every element of the matrix, creating a new matrix with the results. This is similar to Perl's map function.
initialize_from_array() : Insert values into the matrix from an array
initialize_from_args() : Similar to the _from_array variant, but initializes the matrix using elements from a slurpy argument list
get_block() : Return a block, or "submatrix" from the matrix
set_block() : Set a block in the matrix
item_at() : New this weekend, gets or sets a value in the matrix at the specified coordinates

There are maybe a handful of other methods I would like to add, but in terms of the core types, this is the standard API (plus a series of VTABLE calls, which are standard). For mathematics types I also have GEMM (the BLAS-based matrix multiply operation), and elementary matrix operations (add rows, swap rows, scale rows). This is not a shabby interface at all, and can start to be used for some real-world applications.

This weekend I also started seeing some very weird errors in the PLA test suite. Tests were running fine, but I was seeing exceptions (and, in one case, a segfault) occur after all tests had passed. This sounded very similar to another problem I've seen in the past. Here's the test that set it off. Can you spot the problem?

method test_METHOD_iterate_function_inplace_TRANSPOSE() {
    my $m := self.fancymatrix2x2();
    $m.transpose();
    my $n := self.matrix2x2(self.fancyvalue(0) * 2, self.fancyvalue(2) * 2,
                            self.fancyvalue(1) * 2, self.fancyvalue(3) * 2);
    my $sub := -> $matrix, $value, $x, $y {
        return ($value * 2);
    };
    $m.iterate_function_inplace($sub);
    assert_equal($m, $n, "external iteration does not respect transpose");
}

What's maddening is that this test has been a problem for months, but never caused a failure. It was silently wrong, probably since the day I wrote it.

See it yet?

The problem is that return statement. What should happen is this: The pointy block creates an anonymous subroutine, which I pass to the iterate_function_inplace method. The method should loop over ever element in our matrix and call that pointy block for each one. The result should be the same matrix with every element multiplied by two.

What actually happens is this: the iterate_function_inplace method, in order to call the callback, must recurse on the C stack into a new runloop function. The pointy block executes in this inferior runloop. However, where things get weird is that return statement. In NQP, returns are performed by constructing and throwing control exceptions. In the case of the pointy block (and I'm not sure whether or not this is a bug), the return statement jumps to the return handler for the test_METHOD_iterate_function_inplace_TRANSPOSE() function, not the anonymous sub.

The sub executes after having only called the callback once, it never hits the final assertion (which, it turns out, would have failed). Control flow continues inside the inner runloop until the end of the test program, then the C runloop function returns, tries to continue executing from that point, and things go to hell.

The solution is really quite simple. Change this:

my $sub := -> $matrix, $value, $x, $y {
    return ($value * 2);
};

into this:

my $sub := sub ($matrix, $value, $x, $y) {
    return ($value * 2);
};

Problem solved, and now more tests are legitimately passing.

It's been a productive couple of days in PLA, and I'm hoping I can keep this momentum up in the days ahead. I need to finish implementing my GEMM wrapper for ComplexMatrix2D, and then get started on the Linear Algebra module for Rakudo.

Thursday, August 12, 2010

Kakapo Fixed!! (Mostly)

The lovable Austin Hastings has gotten Kakapo mostly working as of yesterday. It's not 100%, but I'm able to use it for some purposes such as getting the PLA test suite passing again.

When I tried to build PLA last night, for the first time in a long time, the build failed. Some of the string handling functions I was using in the get_string VTABLEs were using old, deprecated functions in the string API. I took this as my opportunity to rewrite a lot of the logic to use the new StringBuilder type instead. This should improve performance and bring PLA up to some of the modern best-practices of the Parrot community.

When I tried to run the tests, however, KABLAMO. Kakapo wasn't happy and the tests wouldn't run. I tracked down a small series of problems, and put together a hasty patch. Austin pointed out that this wasn't the real solution, so he didn't want me to push it to the master at Gitorious. Instead, I pushed to my mirror at Github. This fix out of the way, the vast majority of the PLA test suite runs, and most tests were passing. This morning, I pushed another patch to my Kakapo mirror. After that, I'm proud to say, the PLA test suite passes 100% of important tests. That means I can start making preparations for an official release, and then begin development on cool new features.

Some people have asked me why I've gone through all this hassle; if I had just rewritten my testsuite without Kakapo it could have been working a long time ago. The answer to that question is pretty simple: I want the abstraction and insulation that Kakapo provides.

To look at it another way, consider this: NQP has a test suite of it's own that needs to pass before it can make a release. If I'm using NQP, I know that the behavior NQP provides will be a little more stable than the underlying Parrot VM. However, NQP isn't a perfect overlay for Parrot: There are some features that Parrot provides that NQP doesn't cleanly wrap. To get an abstraction layer over those features, I need a framework like Kakapo. This is all not to mention that several other high-profile projects use NQP, so even if it's test suite isn't all-encompassing, the test suites of projects which build on top of NQP probably cover more ground.

Kakapo likewise has a test suite, and that needs to pass before Kakapo gets released. If I'm using Kakapo, I can be certain that the API that it and NQP provide together should be relatively stable and are tested to work before they are released. Since we are talking about tracking Parrot trunk with PLA development, this stability is very welcome.

What we've run into recently is a rare confluence of events where highly disruptive changes in Parrot caused major breakages in Kakapo. When Kakapo got fixed, the PLA test suite ran perfectly without any changes to the suite itself. Considering the magnitude of changes to Parrot recently, the ability for me to continue running the test suite without alterations is a pretty big and important aspect. Sure there was some down-time, but once our dependencies were fixed, our stuff worked again like magic.

Kakapo isn't 100% yet, but Austin is still working on it and I'll be lending a hand. I'm also going to start making some progress in PLA, and see what I can do to help out with some of the performance improvements and other big tasks that are going on in Parrot too. Expect to see many more updates on things in the coming days and weeks.

Blog Closed

Pages