Blog Closed

This blog has moved to Github. This page will not be updated and is not open for comments. Please go to the new site for updated content.

Saturday, May 29, 2010

The AOT Advantage

In my previous blog post I briefly mentioned PHP, HipHop, and AOT. For readers coming late to the party, HipHop is a native code compilation system for PHP, created by Facebook. The idea is that compiling down a relatively large subset of PHP (basically everything besides eval) to C++ code and passing it to an optimizing compiler can produce a much more efficient program than interpretation alone or even compiling to bytecode in advance and executing that.

Let's look at this same idea, but in relation to Parrot's intended LLVM backend system.

We can start off with a basic PHP compiler. The compiler doesn't need to be particularly fast since compilation happens once in advance. Sure, we don't want compilation to take forever because that slows down development, but it doesn't have the same kind of speed mandate we would have in an interpreted environment where we compile the script every single time we run it.

Our basic, feature-light compiler compiles the PHP code down to PAST. From there we kick in a series of optimization stages. Again, if we know we're doing ahead-of-time compilation, we can add more optimization stages up front and take a little more time to compile. For a release build of a high-throughput application, maybe we really want to swing for the fences with this one and enable every optimization in the book. Let's hope it's a long, well-written book.

We compile the PAST down to Parrot bytecode, after all the optimizations have run their course. At this point, we hand off the bytecode to the LLVM backend. LLVM converts the Parrot bytecode into LLVM IR, and passes that through it's own impressive suite of optimizations. At the end, LLVM spits out a native executable which is screaming fast.

The converse side is a case where we want to shorten the turnaround time, such as when we are doing direct interpretation, or even making executables for debugging purposes and don't want our developers sitting around for a long time waiting for the thing to compile. We don't want to completely avoid optimizations, because after compilation the developers have huge and expensive test suites to run which can take much much longer.

What we do in this case is compile the code to PAST, maybe run a handful of "good bang for the buck" (GBFTB) optimizations like constant folding, and compile the PAST down to bytecode. We'll avoid the last step and just execute the bytecode directly, since this is a debug build. When we run it, the JIT engine will kick in to do basic runtime compilation of detected hot spots, giving us nice speedups for programs with tight loops. The JIT compiler will likewise use a few of the GTFTB optimizations, but won't go crazy; we're doing this at runtime now and we have a user waiting.

There are some people in the world of scripting languages who view the complete lack of native executable support as a virtue. After all, they say, the nature of these languages is fast turnaround; We can make changes and run them directly without needing to invoke a separate compiler to produce a separate executable. Plus, the immediate availability of human-readable source code furthers the common goals of open source software and knowledge sharing.

Fast turnaround, from script to execution, certainly is a virtue--in development. Once we're ready to cut that release however, things change pretty dramatically. At my job, I write some applications in C#. During development, I keep things in "Debug" mode, and do lots of tests very quickly. When it's time to cut that release I change the VisualStudio configuration to "Release" and then click "Build All". Where I was doing incremental debug builds, now I'm doing a full release build with optimizations turned on and debugging symbols stripped out. I also build a separate installer project, and to test everything I need to install the software first and run it from it's installed location. This is a lot of extra cost, but I don't do it frequently. I go from about a 10 second compile time turnaround to almost 10 minutes worth of compilation and installation work. The net result is a faster, better executable for my end users.

In the case of Facebook, using HipHop to compile PHP down to native machine code improves throughput performance by 30%, which translates into lower server load, lower wait times for users and, at the end of the day, real money. Say what you want to about obfuscation of the source code, 30% is nothing to ignore.

I like to think that the Parrot infrastructure will support any workflow that our end users want. Having speedy interpretation around for people who want that is a very good thing, and we do it already. But, having the ability to employ tiered libraries of optimizations and create native machine binaries for fast execution is a very very good thing as well. This is why I keep pushing for our eventual JIT system to also support AOT as well. LLVM will already handle all the nitty-gritty, so the amount of scaffolding we'll have to provide in Parrot is minimal in comparison.

Friday, May 28, 2010

Compilers Targetting Parrot

I got to thinking today about Parrot and came to an interesting conclusion: Writing our own parsers for various existing languages is nice, but if we could get the language communities involved it would be even better. There's a symbiotic relationship that could form from such an arrangement and I would like to talk about that here for a little bit.

Let's take PHP as an interesting example. The PHP community has been talking about some kind of rewrite of their core engine to support Unicode from the ground-up. I'm not really involved in those discussions, so I don't know what the current state of the discussion is. I couldn't even tell you if work has started on such a rewrite. However, I do know one thing: If the next version of the PHP compiler targeted Parrot as the backend, it would have built-in Unicode support for free.

There are some reasons why PHP people might not like this option, considering PHP's niche in the web server world and Parrot's current issues with performance and memory consumption. But, as I'll describe here, those are relatively small issues.

Pulling an extremely wrong number out of a hat, let's say that it takes 10,000 man-hours to construct a modern, though bare-bones, dynamic language interpreter from the ground-up. You would have some features like GC, but wouldn't have others like JIT. Unicode support but a small or even non-existant library of built-in optimizations. You build a compiler, you build the interpreter runtime, and then you build a language library runtime so the programs you write don't have to reinvent every wheel from scratch. It's a daunting task, but it's very doable and has been done many times before.

Now, let's say you take that same developer pool of 10,000 man-hours and instead build a compiler on top of Parrot instead of writing a new VM from the ground up. You do get a whole bunch of the really tricky features immediately: GC, Exceptions, IO, JIT (to come), Threads (to come), Unicode, Continuations, Closures, a library of Optimizations (to come), and all sorts of other goodies. You also get a nicer environment to write your parser in than bare lex/yacc (though, I think, you could still use lex/yacc if you really wanted). You get some libraries already, though you would likely need to write wrappers at least to get things into the proper semantics of your language. You really do get a hell of a lot and you could easily have a working compiler, at least the basics of one, up and running within a few weeks.

Let's even be conservative and say that your team can't build the compiler front-end in less than 1,000 man-hours on Parrot. Even then, you still have 9,000 man-hours left. You could spend some of that making a gigantic runtime library, but I like to think you could devote some of that effort back to Parrot too. If you spent even 5,000 hours making Parrot faster, or adding new features, or whatever you need, you'll still be spending less effort overall and getting a comparable result, if not a better one.

Keep in mind, on Parrot, you're going to get all the features that other people add, in addition to the ones you added yourself. Parrot really is a symbiosis: You hack Parrot to add what you need, and in turn Parrot gives you all sorts of other things for free.

Parrot has plenty of stuff around to get you started, even if it's not perfect right now. Remember that Parrot is extremely actively developed. You can do your development now, build your compiler and get your language ready and grow your regression test suite. When you decide that you need things improved, you can make targetted improvements incrementally. If you want a better GC, for instance, you can go in and just add that part. But, you have the benefit of knowing that Parrot already has one and your test suite team doesn't have to wait for your GC team to finish writing their first GC. That's good, because those damned GCs can take a while.

My point, in a nutshell, is this: Parrot has some really cool features, but the coolest is that it provides a standard base level, a standard toolbox, that you and your compiler team can take and use immediately if you want it. You can get started right now with the kinds of high-powered features and tools that it would take you years to get right by yourselves. Yes there are improvements needed and some features yet to be desired, but adding them to Parrot represents smaller, more incremental improvements than trying to write an entire interpreter yourself.

Parrot is adding some really cool features this year too. I fully expect by the 3.0 release we will these new things:
  1. A new, fully-featured robust threading system
  2. A pluggable optimization library, with optimizations able to be written (likely with some bootstrapping) in the high-level language of your choice
  3. A new GC with radically improved performance characteristics
  4. An extremely powerful instrumentation and analysis framework (again, with tools able to be written in the language of your choice, possibly with some bootstrapping)
  5. A flexible native function call interface for interacting with existing libraries
If you can wait a little later, I fully expect by 3.6 or 3.9 that we will be rolling out a powerful LLVM-based native code compilation core which will support JIT and (I hope!) AOT. You don't have to look much further than Facebook's HipHop project to see that compilation of dynamic languages down to native executables will be a big benefit, especially if we can aggressively optimize all the way down. I'll talk about that particular issue in a later post.

Parrot does have some problems now, like performance. I can certainly understand the criticism on that point. However, for a fraction of the effort it will take you to write a new interpreter or virtual machine from the ground up you can target Parrot instead, get a bunch of free features, and maybe spend some of your free time helping us add the optimizations and improvements that you need. I really and truly believe that it's a win-win proposition for everybody involved.

Monday, May 24, 2010

Lost Finale

I know it's not my normal topic of conversation, but the day after the big finale I'll damned if I don't write at least something about Lost. For 6 years now that show has been my absolute favorite hour of TV each week, and for some time it's been the only hour of TV that I will watch on a regular basis. Sure, there are a few other shows I will watch if they are on when I am looking, but Lost was the only show that I would arrange my schedule around.

Last night was the big, long-awaited finale and it completely soured the rest of the series. I don't know if I will ever be able to watch any of it again, knowing now that the whole thing was some huge farce. My questions were not answered, as was promised to me in the advertising blitz this year, and I was left completely unrewarded for all the time and effort I spent following along with this story.

And I followed. I watched almost every episode when it aired. I read lostpedia cover-to-cover. I talked with other fans on the internet and tracked down clues and easter eggs in the various guerrilla media campaigns that ABC ran. I did it all.

The rest of this blog post contains spoilers. Normally I don't like to be the one to spoil the surprise for other people, but I feel that the show spoiled itself last night. There's nothing left for me to ruin.

In the normal "island" timeline Jack, newly crowed as protector of the island and his team meet up with Locke, Linus, and Desmond on their way to the glowing source in the center of the island. Desmond lowers into the hole, uncorks the source, and the island starts to sink. Without the power of the light, Locke is human again and is killed by Jack and Kate. Jack enters the source himself to re-cork the island and bring the light back, but dies in the process. Hurley becomes the new protector, with Linus as his assistant. Everybody else (read: like 5 people) make it off the island in the Ajira plane.

This part is fine. Maybe it's a little unimaginative, but it's fine. It works.

In the "sideways" alternate timeline, All the stars are meeting up with each other and having flashbacks about their time on the island. They all go to a concert, where Clair has the baby. Jack shows up late, meets Kate, and they go to the funeral home to attend the funeral of Jack's father. But lo! Jack's father is up and walking around. They have an emotional conversation, Jack realizes that they are all already dead. Christian says something about leaving but won't say to where, there is a white light, and the series ends.

That's right: in the big "flash sideways" timeline, everybody is dead and it's apparently some sort of purgatory. Purgatory, you know, that thing that the writers said early in the series that it definitely wouldn't be. Everything in the flash-sideways timeline was basically a dream, which makes this one of the largest and most-watched usages of the "...and then they woke up" cop-out gimmick ever. According to Wikipedia:

Similar to a dream sequence is a plot device in which an entire story has been revealed to be a dream. As opposed to a segment of an otherwise real scenario, in these cases it is revealed that everything depicted was unreal. Often times this is used to explain away inexplicable events. Because it has been done, in many occasions, to resolve a storyline that seemed out of place or unexpected, it is often considered weak storytelling; and further, in-jokes are often made in writing (particularly television scripts) that refer to the disappointment a viewer might feel in finding out everything they've watched was a dream. 

If I learned nothing new about the show last night it was this: The writers were lazy and got themselves into more trouble than they could handle. They created a situation so complex and unbelievable that they couldn't deal with it, so they wrote it all away as a figment of imagination. It's not that they didn't have the time or the advanced notice they would have needed to wrap things up cleanly, they simply chose the lazy, easy way out of a story that they allowed to get out of control.

The vast majorty of the finale was actually very good. We see Jack solving the mystery of how to kill the smoke monster, and then sacrificing himself for the sake of the island and ultimately the whole world. Hurley becomes the new protector of the island, Linus finds redemption for his transgressions, and a handful of people from the island were able to leave for good. Everything was going fine until the last 10 minutes. At 11:20 last night they could have kept things going, ended on a strong note, and enshrined it as one of the best possible finales for one of television's greatest shows. But, they didn't.

Let's give one possible alternate ending that I think would have been far better:

Desmond uncorks the island and dies. Jack and Locke fight each other to the death. Locke dies, Jack is mortally wounded. The plane with the rest of the survivors takes off. Jack closes his eye as the island sinks below the water. That universe blinks out of existance, with the "sideways" timeline becoming the new main timeline. All the lost survivors have their memories of their time on the island and, through their bonds of comradery and new life experiences they learn to start resolving some of their personal problems to become better people.

That ending would have been much better in my opinion because it isn't a cop-out and we see the character development actually reach a conclusion for many characters. Another possible ending:

Desmond uncorks the island. Jack and Locke fight, Locke dies (Jack is wounded).  Jack enters the cave to re-cork the island, and when the light comes back, Jack is transformed into the new smoke monster (though, he is obviously a benevolent one). Hurley becomes the island protector. We see Jack and Hurley at some time in the future, fast friends, vowing together to protect the island with their newfound powers. Hurley allows all the remaining people from the island to leave, the island flashes and disappears. In the sideways timeline, all the survivors have their memories of the time on the island, get over many of their personal problems, and form a pact together to help protect what the island stands for.

A little touchy-feely and feel-good for my taste, but still far superior to the "real" ending we saw. At least we would have known that this ending was real and we weren't just watching some hastily-written dream sequence. Here's something even better:

It's the end of season 5. Juliet is in the well. She hits the nuke with the stone, the screen goes white. We see Oceanic 815 land in LAX as planned, the survivors get off the plane. While they aren't explicitly aware, they have been subtly changed because of the experiences on the island, and every one of them goes on to live long and fulfilling lives.

This ending cuts off a whole damn season, and yet would have been much more rewarding and would have left fewer issues unresolved. We wouldn't have gotten explanations for any of the weird things that happened on the island, but that wouldn't really have been so important. We could chalk it all up to the island being weird and magical, and now that the island is sunk under water and everybody is home safe, we can basically forget all about it. Maybe, we could even leave it ambiguous as to whether the events on the island even happened at all, or whether it was some sort of collective mass hallucination that changed the lives of all the people on the plane for the better. I would have been much happier with that ending, personally, even though we would have lost a whole year of the show. Hell, we could have pushed that ending back a year, and spent all of season 5 really digging into some of the mysteries of the island, the role of Jacob, the workings of the Dharma initiative, etc. That would have been cool too. And, it would have been much better than what we got.

And while I am rambling, I was promised that all my questions would be answered, but in reality very few of them were. If anything, too many problems and dangling plot threads were left completely unaddressed. I would have been completely willing to ignore some of these issues if I thought we were hurtling towards a cataclysmic ending that would shock and awe us all into a happy stupor, but with the main plotline being written off as a work of bad creative fiction, the other ommisions becomes more striking and infuriating. Some of the more pressing points I would like to enumerate here:

What was the problem with pregnant women and babies on the island? This is a pretty serious question, considering almost two full seasons were devoted to this issue alone. Juliet, being a fertility specialist was on the island for no other reason than to address this problem, seems to become a pretty pointless character without a satisfactory answer to it. With mothers dieing in child birth, Linus torchering his daughter's boyfriend to prevent her from getting pregnant, Ethan kidnapping Clair at night to inject her with technicolor drugs, Roussou having her infant daughter kidnapped from her and raised--on the very same island--without her knowledge, etc. There were so many important plot points tied to this phenomina, and it would have been absolutely easy to tie it up. Here's an example bit of dialog that, while short and vague, would have resolved the issue nicely:

Jack: When I get to the source, I'm going to kill you.
Locke: And how do you plan to do that? Guns and knives can't kill me. I can rip trees out of the ground and prevent women from bearing children. How do you possibly plan to kill me?
Jack: I have a plan. Wait and see.

Seriously, it wouldn't have taken much effort, just a passing reference by someone in the know and the issue would have disappeared. Maybe it wasn't the fault of the Man in Black, but was instead one of Jacob's "rules" for the island:

Locke, to Linus early in the season: Why would you be upset about killing Jacob? He's a man who never cared about you after everything you gave. He's a man who let mothers die because he made a rule against children being born on this island. His island. He created so much hardship for you and your people, even let your daughter die in front of you.

With this statement, Linus's conversation at the end with Hurley about how Hurley could be a better leader and make better rules would be much more poignant. Seriously, we could turn the whole notion of good and evil upside down and start to wonder whether Jacob was really good, whether the smoke monster was really bad, and what would be the real ramifications for everybody getting on the plane together and heading back to reality. It would have created a much more interesting metaphysical dilemma for our antagonists, tasked to decide whether Jacob was really the side of good when he did things that seemed so evil; for the sake of the island, of course. We also might have gained a lot of insight into Linus, the quintessential character who seemingly does evil to pursue good ends.  If that was the modus operandi of Jacob and even the island at large, we could have seen a lot of things in a different light.

What the hell did the atomic bomb accomplish at the end of Season 5? If the flash sideways in season 6 were just a view of purgatory where all the losties were already dead, then how could Juliet claim that detonating the bomb "worked"? If everybody died in the atomic bomb explosion, and sent everybody on the island to purgatory, then what do we make of the people still on the island in the "main" timeline? And if, considering what I said about Juliet above, the bomb didn't save anybody and didn't keep the plane from crashing, what purpose did Juliet's character really serve in the show? Sure, she created a bit of a weird and dramatic love triangle with Jack and Kate, and she did get romantically involved with Sawyer, though her death would only have reaffirmed his sense that his loved ones are always taken from him, and would have prevented any further growth of Sawyer's character. The main problem that brings her to the island is simply ignored, and the big self-sacrifice that she made didn't have any effect. If anything, apparently it created a separate reality where everybody either died or was already dead. Go Juliet!

If the atomic bomb didn't do anything worthwhile, what was the point of the survivors traveling back in time to 1977 in the first place? We didn't really learn anything new about the Dharma initiative, or about the power of the island that they were studying. We already knew that the island had the power to transport things through time and space, and those were the most important things that the Dharma people appeared to have been working on. I know that there were other things of significance that Dharma was supposed to be involved in, from other media like the Lost video games or other guerilla media, but none of those were mentioned even in casual passing in the show so they don't matter.

Before I let the issue with Juliet drop completely, one of the biggest problems with this show that has been gnawing at me (though it isn't really an unanswered mystery of the show) involves Juliet's death. When she died they were on their way to the Temple to heal Sayid. They put him into the pool of water and he was magically healed. Juliet's still-warm dead body is laying outside the temple, and inside the temple is a magical pool that can bring people back to life. Put two and two together and GO OUTSIDE AND GET JULIETS BODY AND PUT HER IN THE DAMN POOL. How is it that the entire damn braintrust is sitting inside the temple simultaneously mourning the death of Juliet, witnessing the resurrection of a dead Sayid and not once questioning whether somebody else recently-deceased could be brought back as well? That seems to me to be a remarkably stupid oversight on their part. I'm not saying that the idea would even have worked, since nobody has told me any of the details of the workings of that magic pool, but you can be damn certain that if I were there I would have at least given the idea some consideration. There's certainly no harm in trying, if you can toss any old corpse into the water and some of them come back to life and the rest just stay dead. Again, this is just an annoyance and not a real pressing mystery of the show. If anything it just shows how lazy the writers were.

What the hell was the deal with the cabin, the ash circles, the temple, and Dogan? Dogan apparently had the ability to keep the monster out of the temple, but how? Did anybody else have that power? Was it just an example of Dogan being "special", like several other people were listed as being "special", or was it some kind of gift given to him by Jacob? And what was going on with Jacob's cabin? I think we're lead to assume that the man in black was trapped inside the ring of ash around the cabin, but that doesn't really make sense considering the fact that we see the monster outside the ring in the very first episode of the season. How can he be outside, terrorizing the crash survivors and simultaneously trapped inside whispering "help me" to any passers-by? I also don't think Jacob was the one in the cabin, considering that we never see Jacob having a problem with ash lines in the ground, and we also never see Jacob being invisible like a ghost or needing help from somebody like John Locke. What is the significance of Horace Goodspeed having built the cabin? That's the kind of detail that could just as easily have been left out if they weren't going to explore it. There are lots of details like that throughout the show that they should have just left out if they were going to be too lazy to give an even cursory examination. Maybe the spirit in the cabin is a manifestation of the sentient island, and it was the will of the island itself that needed help from John Locke. It could also be this spirit force that the ash line was trying to protect from the smoke monster. Sure it's all speculation, but there is a lot that they could have done with this incident to gain a further understanding of the island and the relationship between it and Jacob, but they didn't.

Ilana and Brahm seemed to have expected to find Jacob inside the cabin, but were alarmed when they saw that the ring of ash was broken, and then set the house on fire. Maybe Jacob lived inside the cabin, protected by the ash ring that kept the monster out, and therefore kept Jacob safe from harm. But if Jacob was essentially barricaded in there, how would Jacob have gotten out to go make visits to the various characters on the main land? And, if Jacob were inside that cabin and Ben knew how to find it, why hadn't Ben ever met Jacob? In fact, why was Ben surprised when things inside the cabin started shaking and going crazy? If that was Jacob, why do we see Christian Shepard and Claire in there, people who are associated with the smoke monster? People call it "Jacob's Cabin". Linus didn't expect to see anything in there, Ilana and Brahm expected Jacob to live there, but the only other-worldly being we ever see there is the smoke monster, who we also see outside the ash ring. Forgive me for harping on this particular point, but it is very confusing.

Why did the smoke monster kill some people, like Mr Eko, but couldn't kill others, like Jack? Where were the bloody goddamn rules when Mr Eko was getting bludgeoned to death, or when the frenchman's arm was getting ripped off, or when Locke was being dragged by the leg into a pit? Everybody seems to know these rules, and despite a complete lack of stated repurcussions everybody seems to follow them blindly to the death, yet they appear to be completely porous and permissive when convenient. More information about the rules would have been much appreciated, but was never given.

Why wouldn't Ben have known a little bit more about the smoke monster? Conceivably Richard would have known all about him, having met him in person, or could have asked Jacob at least once over the course of hundreds of years. Richard knew that the smoke monster was the arch nemesis of Jacob and the ultimate evil force on the island, yet Ben seemed to view it as some sort of protective force that was, at least, benign and at best benevolent.

The smoke monster was able to kill Jacob through some elaborate loophole in some kind of rule. Seriously, look at the steps the smoke monster had to take to accomplish his goal: He convinced Linus to move the island (under the guise of Jacob, and nobody knew the wiser) which caused the wheel to fall off it's axis. He then convinced Locke to fix the wheel and also leave the island. Off island, somehow he influenced Ben to bring all the survivors back, along with the body of John Locke, at which point he assumed the form of John's corpse. Using his new form, and the form of his dead daughter, he convinced Ben to finally kill Jacob. This is awefully convoluted to be a plan, and you have to seriously wonder what kind of "rule" this solution was a workaround for. If Ben was the necessary killer, why go through the whole production of killing the mercenaries and sending Ben and Locke off the island only to bring them back? Or, maybe the "rules" stated that only a person who left the island and returned could kill Jacob, but it seems silly for Jacob to build that kind of loophole into the rules and then be surprised when it actually happened.

And while we're on the topic, why was the smoke monster trying to save the island at that point anyway? Why would the smoke monster kill the mercenaries and tell Linus to move the island? Why doesn't he convince the mercenaries to kill Jacob themselves? It would have been easy to do:

Locke: Here's a man named Jacob. If you kill him for me, I will tell you where Benjamin Linus is and I will help you kill him.
Mercenaries: Okay. Enough said. BANG BANG.

Much less convoluted and much less prone to errors than the plan he finally settled on. Or, better yet, He could have offered to save Linus' daughter in exchange for Linus killing Jacob. There are any number of other ways that the smoke monster could have asked another person to kill Jacob, but when put in the situation the smoke monster protected Ben and friends from the foreign invaders and actively tried to save the island. If the smoke monster really wanted to destroy the island, why would he help John Locke fix the broken wheel? Constant time flashes would have eventually killed all the surviving candidates on the island. Other uses of that light energy could have been used to destroy the island completely. Why didn't the monster pursue any of these alternatives earlier in the story?

What makes people special? The smoke monster apparently had the special ability to understand how things work. He was so intuitive that he turned observations about the magnetic attraction of a knife to the ground into a working time machine. How was Hurley about to see dead people, and how was Miles able to communicate with the recently-deceased? What the hell was the deal with Walt? What was really so special about John? How was Desmond able to withstand the electromagnetism of the hatch explosion, and how was he able to see the future (for a short while)? And, for that matter, when did Desmond lose the ability to see the future and why? Are these gifts that Jacob gave or, in the case of the smoke monster, that the current protector of the island can grant? Are these innate abilities that some people just have? Are these powers that the island grants to people to aide in it's own protection? And, if so, wouldn't that imply that the island had intelligence?

What the hell are the rules? Who makes the rules? How are they enforced? What would have happened if the smoke monster tried to kill Jacob? We heard a whole hell of a lot about the rules throughout the show, but none of the rules were ever explicitly stated. We know that some things didn't seem to be allowed, but we don't know why or what the consequences would have been. We also know that some rules had some pretty big damn loopholes in them that could be exploited in such a way that the rule appears to be completely broken.

We were told that the light on the island was inside all people, and we were also told that if the light went out that it would go out everywhere, implying that the light going out would cause people to die. News flash: Desmond put the light out, and nothing bad appeared to happen from it. Sure, the island shook a little bit, but that was hardly undoable. Similarly, what would really have happened if the smoke monster left the island? He started off as just a man and we have seen several men, including Jacob and Richard, leave the island. Why couldn't the man in black do the same? What was so bad about him leaving the island that the Mother had to kill all his people, fill in his well, and knock him unconscious to prevent it? I can understand that things would have been different after he turned into the smoke monster (the smoke is tied to the light, if the smoke leaves the island, the light goes out, etc), but what would have been the harm before that?

This certainly isn't an exhaustive list, there are millions and millions of small questions that have been raised but never addressed. Like I said, we could have easily glossed over all these things if we were given a proper ending. But, the ending as bad as it was made all the other omissions, contradictions, and unresolved issues all the more glaring.

The fact is this: Lost was supposed to be better than this. It's fans were supposed to be smarter and more dedicated. It's writers more learned, daring, and uncompromising. Lost could have and should have been so much more, but the ending it received cheapened the entire thing. I wish, sincerely, that things had ended differently. Maybe then I wouldn't have to write such a long and rambling blog post about it.

Friday, May 21, 2010

The Fitness of Parrot as a Target Platform

In response to the blog post I wrote the other day, I received a very poignant question regarding Parrot's stability as a target for HLL compilers. Since the comment was anonymous and I can't ask permission, I'm going to repost a portion of it here:

However to be honest, its scary that over the last 10 years so much was done and discarded and is being rewritten. I agree with the "throw away the first" reasoning but that is something I would expect in a 2-3 year project not a 10 year old project.

Could you comment on the risk associated with for a HLL project that starts to build on Parrot with an ETA of say 1 year? Can it be said with a reasonably level of certainity that whatever sub-projects (JIT, treads etc.) are added to Parrot now, they will take an evolutionary approach rather than being scrapped and rewritten?

My intent is not to put down Parrot as I really want to use it but I want to voice my fears and hope to understand the parrot-dev perspective on making Parrot a stable (for some definition of stable) platform to build on.
This really is two questions rolled into one:
  1. Parrot has been in development for about 10 years, why are these big changes being made so late in the project? 
  2. Are these big changes going to affect the stability of the external API, and is Parrot a stable, reliable platform for HLL designers?
I'll try to tackle these two separately.

The 0.0.1 release of Parrot was cut in September 2001, according to documentation in our repository. I don't know much about the early history of the project; I've only been a Parrot contributor for about two and a half years now, about one fifth of the total life of the project. Looking through the NEWS entries starting back at 0.0.2, it does look like things were moving at a rapid pace back in 2001 and so maybe Parrot should be a lot further along than it is now.

There are a few responses to this. The obvious ones: Volunteer developers are a limited resource, and Parrot is an enabling technology so many developers have been focusing their efforts outside the core (HLL compilers, libraries, programs written using both of those, etc). Core development effort and HLL compiler development effort are certainly not directly transferable, but in the early years there was a lot of cross-work among the central developers, and this did play a factor. There are less obvious reasons, such as attempts to support various platforms (Parrot has supported or attempted to support several architecture and compiler combinations, some of which were exotic even 10 years ago), attempts to improve performance throughout (sometimes prematurely), and the fact that the NEWS file is misleading because the early "action-packed" releases that seem to show a high development velocity were not time-based releases and often represent several months of work. Hell, there were only 6 releases in all of 2003 and 2004 combined, but the NEWS entries for those releases don't look any more impressive than the news release for 2.4. The pace of development is certainly not constant (though I would say it is increasing) and while 10 years looks like a lot, it's 10 years of inconsistent development time from contributors with other things going on as well.

The biggest factor to keep in mind is that Parrot development from 2001 until 2010 didn't happen in a straight line. Parrot started out as simply the internals engine for the Perl6 compiler and grew to become larger than that. Along the way a lot of designs and plans were hammered out, then thrown away entirely. As I mentioned last time however, there isn't a whole hell of a lot of prior art in the area of dynamic language VMs; guesses were made and some of them ended up being wrong.

We could have declared Parrot more or less "complete" some time ago and shipped a program that was known to have some serious flaws. Instead what we've been doing, and what I think is the more responsible course of action: fix it until it is correct. So to answer the first question: yes it's been 10 years, but Parrot isn't perfect yet and we are going to continue working on it until it is, however long that takes.

The second question can itself be decomposed into several parts: Will things that I rely on change? If so, how quickly will they change? Will the work I do now need to be thrown out when the next big feature set lands? Is Parrot mature enough for me to build on top of it a software project that I want to lead into maturity? These are all good questions, and very concerning for HLL developers. The answer isn't clear-cut, but I will try to explain it as well as I can.

The most important point I can raise when tackling this question is to mention our support policy. Our support policy, which I have complained about as being too strong, goes something like this:
  1. We have 4 supported releases per year, every 3 months. We support these releases, including bug fixes, until the next supported release comes out. HLL compilers and other external projects are highly encouraged to target these supported releases (many do not, but they do so at their own risk).
  2. We have a defined external API which cannot be changed without notice. A deprecation notice for any change to any externally-visible feature or behavior must be included in a supported release before that interface can be changed or removed. Assuming you target supported releases as suggested in #1 above, this means you have 3 months of prior warning to prepare your project before disruptive changes to Parrot are made.
  3. New features that are added will be tagged "experimental" so projects can examine them, provide feedback, and get an idea of where things will be going in the future. If people start relying on experimental features and new problems aren't created, they tend to stay around.
  4. Not quite part of the policy, but still relevant: When we cut releases, or change features, we try to do extensive testing in HLLs and external projects. Where problems are found, we typically try to either offer fixes or workarounds.
It's interesting to note that many of our most disruptive changes in recent months were actually driven by the HLLs and external projects themselves, as fixes to bugs or longstanding problems. It's also worth mentioning that when the question is raised most HLL developers seem to want Parrot to make fixes and improvements more rapidly than it has been doing.

Another part of the question is whether the new features and things that we will be adding/changing in coming months will be gradual and be able to be inserted non-disruptively into existing software projects. Let's give a quick rundown of some systems that could be seeing major changes in the next couple months:
  1. GC: GC is a very internal thing, when it works properly, you don't even need to know it exists. Any changes here, so long as they don't introduce bugs, will be invisible to the HLL developer, and will only serve to improve performance (or, improve it under certain workloads, depending on algorithm).
  2. JIT: In the past the JIT system was separate, and you needed a separate command-line switch to activate it. In the future, I'm hoping we get a trace-based system that kicks in automatically when a need is detected. JIT, like GC, shouldn't change execution behavior, only performance, so changes here should be invisible to the HLL developer.
  3. Threads: We don't really have a good, working, reliable threads implementation now and HLLs are generally not using them. Anything we add/change/fix will be as good as a newly-implemented feature and can be added post-facto by HLL developers with no stress.
  4. NCI: This mostly affects users of external libraries, and writers of interfaces to those projects. Improved NCI gives us improved access to more libraries. The interface here may change in a disruptive way but, I hope, this won't be a huge issue to most projects
  5. Packfiles: Packfiles aren't really portable now, so people haven't been using them to their potential. In any case, Packfile structure and handling are mostly transparent to the HLL developer. In the future what we will see are better usage patterns, improved performance and decreased bug volume, at no expense to the HLL developer.
  6. PCC: We have a pretty extensive library of tests for PCC behavior and a defined standard inferface that won't be changing any time soon. Sure, the internals may change (hopefully become faster) and new features will be added, but exising code will not notice.
  7. PASM: We're looking to add a suite of optimizations to PASM. These optimizations will likely be opt-in initially, so absolutely nothing changes for HLL developers who don't want them. If you do want them, your programs will probably only get faster at runtime at the cost of some additional processing in the compiler. This is a standard and non-exciting trade-off that most good compilers offer to their users.
In summary, Parrot is a good, stable platform for HLL developers to use. Yes there are big changes planned to the internals of the VM, but most of them are going to bring big improvements to the compilers and library projects that run on top of Parrot, without affecting the externally-visible interface much and without bringing new problems. That really is one of the driving benefits of a VM system like Parrot: Write the improvement once and get the benefits everywhere. The interface is in flux, but changes happen slowly and with plenty of prior warning (and support, where support is needed). As the big systems in Parrot get fixed, things will get even more stable and the benefits will become even more apparent.

I hope that helps answer some concerns.

Wednesday, May 19, 2010

Bright Blue Yonder

Yesterday I released 2.4, "Sulfur Crest". It's a cooler, more succinct name than "Sulfur-Crested Cockatoo". Every time I do a Parrot release I find myself searching through Wikipedia to look for interesting Parrot names (and colorful pictures too!).

I picked an interesting quote for the release announcement. Amid the broken grammar and colorful imagery was a key point that I wanted to express: Parrot is getting smarter.

If you had told me two years ago, or even a year ago, that Parrot would have the features it has now, or that it would be on the verge of adding the features that we are currently planning, I might have been incredulous. Parrot has matured a lot recently, and I really think all that effort is going to start paying off.

Parrot had a lot of features when I first joined the project. Many of these were first drafts and prototypes which have since been ripped out or reimplemented. This is a normal, natural, healthy part of the software development process, and nobody should be surprised or upset about it. Pop quiz: Name for me one other dynamic language virtual machine, open-source or otherwise, that aims to support the same number of programming languages and runtime environments that Parrot aims to support? Smalltalk comes to mind as a virtual machine for a dynamic language, but that VM backend was never intended to support anything besides Smalltalk. At least, not well. .NET supports a few languages, but is certainly not dynamic.

There's a great rule in programming that we always throw the first one away. The first draft is never the final manuscript, the first stab in the dark never hits the target. The original Parrot designers and developers didn't have a whole hell of a lot of prior art to copy from. They made some guesses, but there was no way they could have gotten everything right the first time, no way they could have forseen all the problems that the project would run into over time. But that lack of information and that uncertainty can't hold people back. You grit your teeth, you write a prototype, and you resolve to yourself that you always throw the first one away.

Let's take JIT for an example. In a basic JIT system, you have a snippet of code with a runtime cost of E. A JIT attempts to compile that code into a piece of machine code with a runtime:

OE < E

Where O is some sort of optimization factor. Of course, there is some kind of overhead H, in compiling the code at runtime:

H + OE < E

H involves constructing the machine code and performing optimizations. The more optimizations we perform, the larger H gets, but the smaller O can get. If E is large enough and if we tune H properly, eventually we cross a threshold and the left side of the inequality becomes smaller than the right side of the equation. At this point, JIT is a net performance win for the application. Simple, right?

Anyway, I'll skip the rest of the theory lesson. The point is this: Parrot's original JIT didn't really have anything a modern compiler system would recognize as "optimization passes", so it wasn't really tunable. It also didn't readily support code generation on any platform besides x86. These things weren't going to be easy (or, even possible) to add either. When the burden of maintenance became too high, and when it was clear that the old prototype system was doing more harm than good, we ripped it out.

There were a number of such systems in early Parrot, prototype code that got Parrot up to the first plateau but needed to be redesigned and reimplemented for Parrot to move up to the next one.  There are several projects either going on right now, or about to get started to work on these systems. When you look at the list, you realize that while it's ambitious, with the current team and the current state of the VM it is entirely plausible that they will all succeed. A quick overview:

Allison, Bacek and chromatic are talking about several ideas for implementing new, more efficient GC algorithms to replace our first GC prototype.

Plobsing has been hard at work redoing our freeze/thaw serialization system and, along with NotFound and others is working to fix several long-standing bugs in the PBC system.

Nat Tuck is preparing a GSoC project to implement a new threading system to replace the first prototype version of that system.

Tyler Curtis is preparing a GSoC project to implement a first prototype of a PAST optimization system.

Daniel Arbelo is preparing his GSoC project to implement a first prototype of the NFG string normalization form.

Muhd Khairul Syamil Hashim Is putting together a GSoC project to implement an instrumentation framework for Parrot so we can get some high-quality analysis and debugging tools for Parrot.

John Harrison is preparing his GSoC project to develop a new NCI frame builder with LLVM, which will replace the old prototype frame builder and (hopefully) lay the groundwork to start replacing our prototype JIT system.

...and there are even other projects that I can't think of off the top of my head right now too.

If you had told me a year or two ago that all these projects would be on the table, or that we would have such high chances of them all succeeding, I would have called you crazy. But after all the work we've put in between then and now, and considering the high caliber of our development team, I'm feeling pretty confident that we will be successful and Parrot is going to gain some of these awesome new features. Cross  your fingers!

Tuesday, May 18, 2010

Parrot 2.4.0 "Sulfur Crest" Released!

"So there me was beating boulder into powder because me couldn't eat it, and magic ball land in lap. Naturally me think, "All right, free egg." because me stupid and me caveman. So me spent about three days humping and bust open with thigh bone so me could eat it good. Then magic ball shoot Oog with beam, and next thing me know me go out and invent wheel out of dinosaur brain. Magic dino wheel rolls for three short distance until me eat it. The point is, me get smarter. Soon me walk upright, me feather back dirty, matted hair into wings for style, and me stop to use bathroom as opposed to me just doing it as me walk. " -- Oog, Aqua Teen Hunger Force

On behalf of the Parrot team, I'm proud to announce Parrot 2.4.0 "Sulfur Crest." Parrot is a virtual machine aimed at running all dynamic languages.

Parrot 2.4.0 is available on Parrot's FTP site, or follow the download instructions. For those who would like to develop on Parrot, or help develop Parrot itself, we recommend using Subversion on our source code repository to get the latest and best Parrot code.

Parrot 2.4.0 News:


- Core
  + Various long-standing bugs in IMCC were fixed
  + STRINGs are now immutable.
  + use STRINGNULL instead of NULL when working with strings
  + Fixed storage of methods in the NameSpace PMC
  + Added :nsentry flag to force method to be stored in the NameSpace
  + Added StringBuilder and PackfileDebug PMCs
  + Added experimental opcodes find_codepoint and unroll
- Compilers
  + Fixed reporting of line numbers in IMCC
  + Removed deprecated NQP compiler, replaced with new NQP-RX
  + Removed NCIGen compiler
- Deprecations
  + Tools to distribute on CPAN were removed
  + Deprecated dynpmcs have been removed to external repositories
  + Removed RetContinuation PMC
  + Removed CGoto, CGP, and Switch runcores
- Tests
  + Many tests for the extend/embed interface were added
  + done_testing() is now implemented in Test::More
- Tools
  + The fakexecutable tapir is renamed parrot-prove
  + Performance fixes to the pbc_to_exe tool
  + Fix data_json to work outside of trunk
  + The dynpmc GzipHandle (zlib wrapper) was added
  + The library Archive/Tar.pir was added.
  + The library Archive/Zip.pir was added.
  + The libraries LWP.pir, HTTP/Message.pir & URI.pir were added.
- Miscellaneous
  + Six Parrot-related projects accepted to GSoC
  + Improve use of const and other compiler hints

Thanks to all our contributors for making this possible, and our sponsors for supporting this project. Our next release is 15 June 2010.

Enjoy!

Wednesday, May 12, 2010

Immutable Strings branch performance

Parrot now uses immutable strings internally for it's string operations. In a lot of ways this was a real improvement in terms of better code and better performance for many benchmarks. However, many HLL compilers, specifically NQP and Rakudo suffered significant performance decreases with immutable strings. Why would this be?

It turns out that Immutable strings are great for many operations. Copy operations are cheap, maybe creating a new STRING header to point to an existing buffer. There's no sense to actually copy the buffer because nobody can change it. Substrings are likewise very cheap, consisting of only a new STRING header pointing into the middle of an existing immutable buffer.

Some operations however are more expensive. A great example of that is string appends. When we append two strings, we need to allocate a new buffer, copy both previous buffers into the new buffer (possibly translating charset and encoding to match) and creating a new header to point to the new buffer. With the older system of COW strings, appends were far less expensive, and many pieces of code--especially NQP and PCT code generators--used them in large numbers.After the switch to immutable strings, any code that was optimized to use lots of cheap appends began to take huge amounts of time and waste lots and lots of memory.

The solution to these problems is not to use many bare append operations on native strings, but instead to create a new StringBuilder PMC type. StringBuilder stores multiple chunks of strings together in an array or tree structure, and only coalesces them together into a single string buffer when requested. This allows StringBuilder to calculate the size of the allocated string buffer only once, only perform a single set of copies, not create lots of unnecessary temporary STRING headers, etc.

Several contributors have worked on a branch, "codestring", for this purpose, and some results I saw this morning are particularly telling about the performance improvements it brings. Here's numbers from the benchmark of building the Rakudo compiler:
JimmyZ: trunk with old nqp-rx real:8m5.546s user:7m37.561s sys:0m10.281s
JimmyZ: trunk with new nqp-rx real:7m48.292s user:7m11.795s sys:0m10.585s
JimmyZ: codestring with new nqp-rx real:6m58.873s user:6m22.732s sys:0m6.356s

The "new nqp-rx" he's talking about there is nqp-rx modified to make better use of the new CodeString and StringBuilder PMCs in the branch. The numbers are quite telling, build time is down about 12%. I think the finishing touches are being put on, and the branch will be merged back to trunk soon.

Tuesday, May 11, 2010

EmbedVideo MediaWiki extension

I've been doing a little bit of contract work lately on a mediawiki-based learning system. One of the features that was requested of this site was the ability to embed flash videos, primarily from YouTube, but also from other kid- and school-friendly hosting sites such as TeacherTube.com and KidsTube.com. KidsTube has since announced that they will be going offline soon, so support for them isn't particularly important now, but the idea still stands.

I did a search on mediawiki.org to find a suitable embedding extension for flash videos. A few options pop up at the top of the results list, but I notice some strange similarities: EmbedVideo, EmbedVideo++, EmbedVideoPlus, all of which appear to be forks of the same original codebase. I also found at least two other independently-developed variants of the same extension in my searches.

The original EmbedVideo extension was developed by mediawiki hacker jimbojw some time ago, but he has completely stopped maintaining it. In his absence, the EmbedVideo++ extension was forked off his work to add some new features. However, that extension has since been abandoned as well. Add another fork from yet another developer--EmbedVideoPlus--which has already not been modified in over a year. I saw two other variants as well which are almost identical minus some small additions but which weren't listed in the released extensions list. Neither of these look to be actively developed, at least not towards a larger goal of a generally more-useful extension.

Where the story starts to get funny is that I had to make some of my own modifications to the extension, and was about two button clicks away from hosting my own version publicly when I had an insight: Why add another version, when I could try to take over maintainership of some (or all!) of the existing versions? Two of them are listed as "abandoned" on the mediawiki.org website. One other hasn't been modified in over a year. Two more are slightly more recent but by no means under active development. Each developer has done just enough work to get their own changes added, but stopped active development as soon as that was complete. I think I can do a little better than that. At the very least, I can put the project into some kind of public source control and make sure it gets exposure and some level of development sanity.

So, Sunday afternoon I sent out a handful of emails to the involved developers: I would like to take over the project, merge all the variants together into a single extension, throw it up on github, do some new development, and actively maintain it. I've only received a reply back from Jimbojw so far, but he seemed enthusiastic.

Yesterday I created a project on Github and uploaded some initial files. The original EmbedVideo.php file with some modest modifications, LICENSE, CREDITS, and README files, and some other related stuff. I've got a few big TODO items that I want to tackle in the coming weeks and months:
  1. Refactor the code to follow current best practices
  2. Test the code to work with newest development versions of MediaWiki. I've heard some reports that there are some incompatibilities.
  3. Add proper i18n support.
  4. Use, where possible, a dedicated javascript library for the purpose such as SWFObject.
  5. Add in some support for new video hosting websites
  6. Add in some other bits and pieces from the other extension forks, as possible.
I'm hoping to get some nice cleanup work started, and maybe get a few items ticked off this list soon so I can cut a new public release of the extension.

I'll post more news and updates as development progresses.

Thursday, May 6, 2010

The Merits of a Distributed Workflow

Long-time Parrot contributor kid51 posted a nice comment on my previous post about Git and SVN. He had some issues with the proposed distributed workflow I suggested. As my reply started to grow far too long for the stupid little blogger comment edit box, I realized I should turn it into a full blog post.

I've been pretty conflicted myself over the idea of a release manager. On one hand releases are boring which means anybody can do them and they don't require a huge time commitment. On the other hand, there is hardly a lot of "management" that goes on in a release: The release manager has some authority to declare a feature freeze and has some ability to select which revision becomes the final release; but that's it. Sure release managers also try to motivate people to perform the regular tasks like updating the NEWS and PLATFORM files, but my experience is that the release manager ends up doing the majority of those tasks herself. I really feel like we need more direction and vision than this most months, and the release manager is a good person (though not the only possible person) to do it.

Our releases, especially supported releases, have had certain taglines stemming back to PDS in 2008. The 1.0 release was famously labeled "Stable API for developers" and the 2.0 release was labeled "ready for production use", when even a cursory review shows that these two releases hardly met their respective goals. A release manager with more authority to shape that release, and more authority to shape previous releases as well might have done more to change that. A development focus, be it for weekly development between #ps meetings or for a monthly release, only matters if somebody is focusing on it and motivating the rest of the community to focus as well. That person either needs to be the architect or some other single authority (though playing cheerleader and chief motivator constantly would be quite a drain on that person) or the release manager. The benefit to using the release manager to motivate the team and shape the release is--even though it's more of a time commitment for the release manager--that we share the burden and no one person gets burnt out month after month.

A tiered development system has a number of benefits. Bleeding edge development can occur unfettered (as it happens now in branches). From there we can pull features into integration branches where we assure all the assorted changes and new additions work well together. Development releases can be cherry-picked to represent the stable baseline features that we want people to play with and rely on, and long-term supported releases would represent only those features which are tested, documented, and worthy of being included under our deprecation policy. I don't think end-users of our supported releases should ever be exposed to features marked "experimental", for instance, or any feature at all that we don't want covered by our long-term deprecation policy. If any feature included in a supported release must be deprecated and cannot be removed or fundamentally altered for at least three more months, we should be particularly careful about including new things in a supported release, and there should be some level of gatekeeper who is able to say "this thing isn't ready for prime time yet". That person, I think, should be the release manager.

Compare our system for dealing with new experimental features (Goes into trunk, maybe with mention in #ps but with very little fanfare, and is then automatically included in the next release unless somebody removes it), to a system where features are added to a development branch, vetted, tested, documented, and then pulled into a release candidate branch only when it's known by the community to pass muster. I think that's a much better system and would lead us to releases with higher overall quality and stability.

All this sort of ignores your point, which is an extremely valid one, that switching to Git represents more than just a small change in the commands that a developer types in to the console. Yes, it's a small difference to say "git commit -a" instead of saying "svn commit". Yes, it's a small personal change for me to commit locally and then push. The far bigger issues are the community workflow and even the communtity culture changes that will occur because of Git. These things don't need to change, we could lock Git down and use it exactly the way we use SVN now, but I don't think any of the Git proponents in the debate want it that way.

I would much rather plan for the changes in work flow and even build enthusiasm for them primarily than sell the small changes in tooling and then be swept up in larger culture changes that nobody is prepared for. I think we should want these changes and embrace them, in which case Git is just the necessary vehicle, and not he end in itself.

Wednesday, May 5, 2010

SVN or Git

As I mentioned in my last post, the discussion about whether we move to Git from SVN has been raging pretty hard in the past few days, and while the general opinion seems to be one in favor of a move, there certainly isn't concensus on the point just yet.

I'm in favor of Git, although I certainly wouldn't list my bent as a "passion" as some opinions in the community have been described. In Parrot world I use SVN exclusively. I could use git-svn, or I could try to use dukeleto's Git mirror on Github, but I don't. As far as I am concerned those add an additional layer of complexity, an additional abstraction that layer that I can ignore most of the time until something goes wrong. It simply isn't worth it to me to try and force a Git interface onto Parrot's SVN repository.

Let me tell a short story.

When I was a graduate student I was given a "computer" for my "desk" in my "office" that I could use to do my normal school work and also to develop my thesis. This computer was a complete beast of a performance machine. At least, it had been several years earlier when it was first purchased. By the time I got it, circa 2006, it's 12Gb hard drive and screaming 128Mb RAM weren't quite as impressive comparatively.

I was paranoid about data integrity, considering I was working on the most important academic project of my whole life on the least qualified machine on campus. So, I took some steps to help shore up my defenses. First thing, I scrounged together several other small harddrives from other dusty computer carcasses. I removed the CDROM drive and jammed four HDDs into my case. Two of them were just hanging inside because there wasn't enough space to physically mount them all. I also had a shared network drive supplied by the university, a flash drive, and a personal laptop.

I set up SVN on the computer, and backed up my work as follows: On the second drive was where my work was (the first drive was barely large enough to hold Windows XP and the other software I needed). I had a script that I used to simultaneously svn co my work to the third drive and xcopy my working directory to the fourth. I had an hourly task that would then backup my svn repo from the third drive and an xcopy of the directory on the fourth drive to the network shared drive. At the end of each day, I would xcopy my working directory to my flash drive, go home, and store a copy on my personal laptop. About once a week I would email myself a copy of my thesis paper, in hopes that if all else failed, at least Google could safeguard the all-important final deliverable.

This may all sound excessive, but by the end of the year I had lost two hard drives (one of which actually caught on fire), my flash drive got crushed, and the network drive has experienced several outages.

Story aside, the idea that there is only one copy of the entire history of the Parrot repository (or even two copies, assuming the server has a tape backup plan) is hardly reassuring to me. This is why I really like the Git idea that everybody has a complete copy of the entire history of the repository.

Let's look at things from a different angle: Git is a distributed version control system. While it certainly supports it, there is no true need for a single, central, "master" repository to work from. We could, as a community, radically change our workflow if we had Git. If we were going to use it exactly the same way as we currently use SVN, there really isn't enough of a reason to switch.

What we have now is a single master trunk, which is where most of the action takes place. People make branches, do work, merge them to trunk. Then, on the second Tuesday of every month, somebody copies trunk to a tag, calls it a release, and we continue on with our normal development.

So let's imagine a different workflow. I'll posit one example, but this certainly isn't the only choice. With Git, everybody has their own local branch where they do work and make commits. Hell, "everybody" here can be a much larger group than the committers we have now: Anybody can make a fork and do work on their own local branches. We could have something like a master integration branch where trusted committers could pull changes from the entire ecosystem. The integration branch would be more like a testbed and less like a master reference copy.

From the integration branch, a monthly release manager would cherry pick the good stable features of the integration branch into a release master branch. At the end of the cycle, the release manager's master branch would get tagged as a release, and would become the new baseline integration branch. Remaining changes from the previous integration branch could be pulled in if they were worthy or if they gained more maturity. At that point, the old integration branch could disappear, and the cycle starts again.

This idea is more complex than our current system, but it does have a few nice features. First, we can strive for higher-quality releases by keeping closer track of what ends up in the release branch. Second, we can try to get more people involved by allowing everybody to create a development branch to do "real" work, not just our designated committers. Third, the release manager has more control over releases, including being able to shift the focus on the release and be able to drive it in a way that is not possible now.

I like Git, I like the idea of it and I like the things that it makes possible. I don't dislike SVN, but it doesn't have any compelling features that make me want to stay with it. SVN is good but not great, and there's one usage pattern for SVN that works but is limiting. SVN isn't hurting Parrot, but it's not lifting us up to the next level either.

Tuesday, May 4, 2010

Been away for a while

I've been quite absent from all things Parrot in recent weeks. Work and other engagements have been keeping me occupied, and the advent of spring has motivated my family and myself to spend more time outdoors. My son, who has started teething, has the same threshold for deciding to put things into his mouth as the Perl 6 language designers have for putting features into their language: "Doesn't matter what it is, jam it in there!". But I jest.

A lot has happened in Parrot-land since I fell off the edge of the world. Parrot 2.3 was released to much fanfare, as always. Allison fixed TT #389, though her strategy was different from what chromatic and I thought was necessary. So long as it works and there are no more bugs, I'm happy to see it closed by any means. TT #389 is one of our older bugs, stemming all the way back to the first Parrot Developer Summit in 2008. It has certainly been a large pain to Rakudo developers in particular.

Bacek and chromatic also put together an implementation of immutable strings, and following the 2.3 release merged it into trunk. Immutable strings have some performance advantages, but also as was quickly found, some performance drawbacks. Specifically, string append operations can be a little bit more expensive because a new buffer needs to be allocated instead of resizing one of the argument buffers in place. A strategy to get around that, which many of our developers have been pursuing is to reduce the number of append operations.

The issue of version control software has been bubbling up to the surface again, and a long thread appeared on the mailing list yesterday after some lengthy chats on IRC. If I can continue clawing my way out of my little isolation hole and make another post this week, I would like to discuss that topic a little bit more.

GSoC students also got selected this week. I'll have plenty to post about that in the coming days and week.

I'm the release manager for 2.4 coming out on May 18th.  Hopefully it's going to be a good one!