It's a part of the VM that really "just works" and so nobody spends much time playing with it. This is not to say that nobody has touched it, but in my tenure as a Parrot developer I have not seen nearly so much development on the entire subsystem as I have seen in other places. This has been changing recently with some cleanups to the freeze/thaw system, which is related to the bytecode system. There are several packfile-related PMC types waiting in the wings, but Parrot only uses them as a thin wrapper layer to provide PBC access from PIR.
There are several issues that need attention in the realm of Parrot's bytecode. Some of these issues, or combinations thereof, could make for very interesting--and very rewarding--GSoC projects.
First project that I would like to see is in making Parrot's bytecode more portable. Currently bytecode isn't really portable between different systems like it should be, nor is it portable between different versions. Everytime somebody increases the bytecode version number in the PBC_COMPAT file, all previously-generated bytecode files become invalid.
One way to tackle this problem would be to include metadata in the bytecode file. This metadata could include a list of PMC types and their corresponding type numbers, and opcodes and their corresponding index numbers. A verification step at Parrot load could check versions and, if different, could loop over this metadata and attempt to remap old numbers to the new numbers.
...And while we're talking about bytecode portability, Parrot needs a "robust testing strategy" for testing and verifying PBC files. I mention this because I see the warning several times per test run, every test run, every day. Seriously, somebody fix this warning!
The packfile system code is terrible and very complicated. We do have those PMC types that act as thin wrappers over the packfile structures and APIs. One thing that I would like to see is a complete refactor of the system that replaced all of the packfile structures with the equivalent, properly-encapsulated, PMCs.
Parrot's bytecode also stores long lists of the constants used in the code. These long lists, unfortunately, often contain duplicates. A post-processing step on the generated bytecode could perform some basic constant folding on the code to decrease the final size of the generated PBC file.
And speaking of constant folding, there are plenty of other optimizations that could be performed on bytecode. Hooks to add optional optimizations would be a most welcome addition, especially if we could notably decrease file size or even improve performance (though performance-improving optimizations would probably be better suited at the PIR or PAST levels). One such optimization that could be useful is a translator routine that converts bytecode files to a "native" format where data is in the correct byte order.
The pbc_dump program, which attempts a basic disassembly of a bytecode file, is woefully inadequate for most needs. Major improvements to this would be nice. A full-featured PBC disassembler that could produce round-trip capable code would be even better. The pbc_merge program similarly is in need of some love. Major improvements to this could be made as part of a larger set of refactors. Tangentially related but still important is a proper PIR->PASM translation utility. IMCC has an option to do this, but produces output code that is known to be incorrect and incomplete.
The freeze/thaw system converts individual PMCs to serialized strings suitable for saving to a file and then loading again at a later time. Better integration of the freeze/thaw code would enable storing all Parrot data, be it bytecode or data, to a similarly-structured file. Eventually we could see the freeze/thaw types be inheritable, so users could define how their data is serialized. Information in the file could direct Parrot which serialization types are required to read the data later. Some ideas of subtypes of the PMC serializer type are serializers that checksum data for later verification, or types which encrypt or obfuscate the data for security purposes.
So these are a few random ideas about projects that have to do with the PBC system. I think this is an area that is ripe for some GSoC-level projects. Also, I think projects in this area would be majorly beneficial to a lot of people and would help to make bytecode a compelling format for shipping portable executable files to end-users. If you like one of these ideas, or if you have ideas of your own to fix these systems, please do let me know!