AIO .. no, nothing to do with 'rise of Nations'

Some time ago I started to see compilation options that talked about AIO (Asynchronous IO), often as an alternative to poll or select and wondered what all the fuss was about. After spending some time working on low-level disk IO, turns out it's actually quite important, especially when it comes to parallel requests and reducing application level threading and synchronisation.

Consider the traditional method of IO;

  seek(fd,position_in_file)
  read(fd,buffer,blocks) /* process blocks here */
  seek(fd,position_in_file)
  read(fd,buffer,blocks) /* process blocks here */

Now, if you're running this in a thread, then it's not too critical, the thread will block and so long as it's not doing anything else, you're probably good. First problem is, let's say this code is sat on the end of an Internet connection, and you have 5000, clients .. that would be 5000 threads all potentially blocking. Not only is this going to provide less than optimal performance, but it's also going to do interesting things to your system's load average.

The other point to consider, let's say you have a whole bunch of reads and you process them sequentially, maybe getting different things from different files, you are choosing the order in which the reads happen, and if that order doesn't tie in with the ordering on disk (which it probably won't) then you'll be making the read heads do far more work than they need to, and possibly even doing more reads than you need to. For example if you do two reads that are physically back-to-back (which you probably wouldn't know) then you could have actually performed one read action for double the block size, rather than two .. this also makes a significant difference to performance.

Without threads ...

AIO essentially provides a mechanism whereby your process doesn't block, and where you can have an many clients as you want using only one thread. It does this by allowing you to queue up a bunch of read requests and then submit them to the kernel as a batch, then carry on doing whatever it is you want to be doing, say preparing reads for the next client. In the meantime the OS is processing the read requests, and when complete, is provides a background signal the process can pick to complete the process and mark the job done.

One less obvious feature here, because the OS has a bunch of queued read requests, before it starts to do the actual reading it has the chance to analyse the queue and potentially re-order the requests based on it's knowledge of the underlying hardware. So for example if you queue 4 read requests, each request being for a 4k block, and you request blocks 1,2,3,4, when the OS actually comes to process the queue, it will physically make ONE read request for a 16k chunk starting at block #1. This would make the operation ~ 4x more efficient than the traditional alternative and indeed almost 4x faster.

AIO used a data structure something like this;

struct aiocb {
  int             aio_fildes;     /* File descriptor */
  off_t           aio_offset;     /* File offset */
  volatile void  *aio_buf;        /* Location of buffer */
  size_t          aio_nbytes;     /* Length of transfer */
  int             aio_reqprio;    /* Request priority */
  struct sigevent aio_sigevent;   /* Notification method */
};

So you would create an array of entries like this, filling in the appropriate details for your request, and then send them to the kernel for processing with io_read(). Then just carry on with your day as per normal.

You would then need a signal handler designed to a receive completion notification, typically this is passed the relevant aiocb structure for the completing operation.

Where this really comes into it's own is in languages that are based on an event-loop where no one routine is allowed to block. (Javascript for example)

So to sum up, it offers;

  • The ability to handle many concurrent processes without using threads
  • Kernel based IO optimisation
  • Non-blocking IO for event-loops

Although you can achieve the same thing by using a non-blocking read followed by a select, this is a far more efficient mechanism, and the performance of select starts to drop off badly when you move into thousands of outstanding file descriptors.

So the next time ./configure offers the option, if the programmer's done their job properly, AIO is likely to be the superior choice!

If you were looking for computer games and the AIO World, you wanted this link .. ;-)