Optimise or .. well .. take longer

Once upon a time I used to be inclined to add compiler directive to try to speed up code using features like "static inline" for function definitions and "register" for variable definitions, without actually thinking too much about the scale of the benefits.

Having done a little testing recently, in future I'm going to be less inclined to mess with the code and far more inclined to mess with the compiler flags. Consider the following example;

#include <stdio.h>
#include <stdlib.h>

long factorial(register int i) {
        return (i>1)?i*factorial(i-1):1;
}

main()
{
        int x,y;
        long r;
        for(y=0;y<10000000;y++)
                for(x=1;x<20;x++) r=factorial(x);
        printf("Result=%ld\n",r);
}

It's a bit of nonsense, but it does use a few CPU cycles when it runs, which is the point. I've compiled and run this code in a number of ways on my machine, the results are quite interesting.

  • compile and run as-is, time = 7.64s
  • switch function to "inline static" and variables to "register", time = 5.84s
  • compile with -O3 and run, time = 3.51s
  • compile with -Ofast and run, time = 3.5s
  • compile with -Os and run, time = 2.975s

I thought the last figure was the most interesting, albeit maybe code specific. At first I thought it might be down to the binary fitting in the L1 instruction cache, but on checking, -Os generates 8592 bytes and -Ofast 10056 bytes, and the L1 cache on my machine is 64k. Still pondering that one, but either way it hadn't occurred to me before that just defaulting to -Ofast wasn't necessarily going to give the best (or as near as dammit) results ...