I'm more familiar with Windows API programming then I am with the Linux Kernel, so I'm going to talk about the AIO implementation on Windows first.
In Windows, we have two basic methods for implementing AIO: Overlapped IO and Completion Ports. Overlapped IO uses the normal IO API (WriteFile, ReadFile, etc). The difference is that when we open a file handle we open it with the FILE_FLAG_OVERLAPPED flag. This tells the OS that all operations on this handle should be "overlapped", or asynchronous. We also have to create a special OVERLAPPED structure that includes information about timeouts and callbacks. Normal calls to WriteFile and ReadFile will then dispatch an asynchronous request instead of a synchronous one, so long as we also pass the OVERLAPPED structure to these calls.
An IO Completion Port is like a queue structure that contains and services many AIO requests. The completion port contains both a queue of AIO requests and a message queue. As requests are completed, messages are added to the message queue. The program can then poll this queue to determine which events have completed, if any. It's worth mentioning that IO Completion Ports can use callback routines too, so we don't lose that if we use completion ports.
In the Windows case it seems like it may be easier to retrofit the asynchronous operations into the FileHandle and Socket PMCs, instead of trying to create something new. We create a new FileHandle that is synchronous by default, although setting some kind of flag would open the handle instead as an overlapped handle. Maybe the simple act of setting a non-null callback function would cause this behavior. Of course, that doesn't really allow room for integrating AIO with Parrot's concurrency scheduler, and certainly isn't going to make things easy for having a smooth API that's usable with Windows AND Linux systems. So even though it seems like the more straight-forward method on Windows, I don't think it's going to be the way we should go in Parrot.
One thing that we could have instead is to add an "IO Request Queue" object to the scheduler, which in the Windows case would be an IO Completion Port structure. Asynchronous requests get added to the Queue, and the concurrency scheduler will regularly poll it to see when a message is received. When a message is received, the callback task is scheduled (or maybe even executed directly). There are lots of inefficiencies in this, and I don't have a lot of nice things to say about any system that blindly polls a flag, but it's a start for designing a unified AIO system.
So there are two basic methods that I can think of right now to implement an AIO system in Parrot. The first, as I mentioned above, uses a poll loop to keep track of completed IO events and schedules callbacks when they are received. The second, which I think I would like to avoid as much as possible is to use threading.
In a threaded AIO system, every new IO request launches a new thread. That worker thread executes various blocking operations, handles the callback, and then terminates. A big problem with this is that we can get into race conditions and data corruption issues if we launch two separate requests on the same IO target, unless we do lots of costly error checking and synchronization in Parrot. Instead, I think it's much better to let the OS's AIO API (now there's an alphabet soup for you!) handle the ordering and serializing of the requests.
I have to get some resources together and do a little research, but tomorrow or the day after I'll talk about the AIO situation on unixy systems too.
Update: I found some more interesting links: