I think everybody who programs has had a bug in one of their programs that they were positive was the compiler’s fault. (Perhaps “it’s the operating system’s fault” or “it’s the hardware’s fault”, though these are less common.) At this point, you learn the rules of programming:
- The compiler is always right.
- If the compiler is ever wrong, see rule #1.
The corollary, of course, is that it is your program that has the bug, not the compiler. (The third through sixth laws are restatements of these two with the operating system and the hardware, respectively.)
Yesterday was one of those occasions where I thought the compiler might be wrong and spent a little time remembering the First Rule. It’s instructive to look at the example and understand why the compiler is always right. I was looking at bug 611781, reducing the size of the NSS library shipped with Firefox, and ran across this Linux x86-64 assembly code, compiled with gcc -Os
:
0000000000000000 <NSSTrustDomain_GenerateSymmetricKeyFromPassword>: 0: 41 51 push %r9 2: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax 5: R_X86_64_GOTPCREL NSS_ERROR_NOT_FOUND+0xfffffffffffffffc 9: 8b 38 mov (%rax),%edi b: e8 00 00 00 00 callq 10 <NSSTrustDomain_GenerateSymmetricKeyFromPassword+0x10> c: R_X86_64_PLT32 nss_SetError+0xfffffffffffffffc 10: 31 c0 xor %eax,%eax 12: 41 5a pop %r10 14: c3 retq
You can find a lot of these small functions in lib/pki/trustdomain.c in an NSS source tree. Looking over this, you notice two things if you know a little bit about x86-64 assembler:
- The
push
andpop
instructions are suspiciously mismatched; if I save a register to the stack withpush
, I ought to restore the same register from the stack with apop
. The compiler must have a bug! - Also, there’s an efficiency concern here. Why is the compiler saving and restoring any registers here? We never use
%r9
in the body of the function, so we shouldn’t need to save it at all.
What is the compiler doing here? It’s easiest to explain away the second issue first. If you look at the x86-64 ABI document, you’ll see that in section 3.2.2, the stack must be kept 16-byte aligned before calling other functions. Since our function calls another function, our function must ensure that the stack is properly aligned when that other function begins execution. And since the call
instruction on x86-64 (which is how we would have arrived at our function) adjusts the stack pointer by 8 bytes (in addition to all the other work it does), our function must adjust by an additional 8 bytes to maintain 16-byte alignment. The compiler has chosen to use a push
instruction to manipulate the stack pointer in this case. This instruction subtracts 8 bytes from the stack pointer and stores the indicated register into the memory at the new stack pointer.
Another way to do this would be to subtract 8 bytes from the stack pointer (sub $0x8, %rsp
), which avoids writing the register to memory at all. If you compile with -O2
, optimizing for speed, instead of -Os
, you would indeed see the compiler using sub $0x8, %rsp
. But since we compiled this code with -Os
, optimizing for size, the compiler knows that the instruction for pushing a register onto the stack (2 bytes) is smaller than the instruction for subtracting 8 bytes from the stack pointer (4 bytes). Likewise, the instruction for popping a register from the stack (2 bytes) is smaller than the instruction for adding 8 bytes to the stack pointer (4 bytes).
These “useless” instructions are therefore doing real work, which is maintaining the contract of the ABI.
OK, so the efficiency claim has been addressed. What about the correctness claim with mismatched registers? Again, if you look at the aforementioned ABI document, section 3.2.3 describes how registers are used for function calls. The registers %r9
and %r10
are caller-saved registers, which means that a called function is free to overwrite any values stored in those registers. It doesn’t matter what value, if any, our function stores in %r10
, because we know that if the caller had cared about the value, the caller would have stored that value away somewhere. Since we need spare registers for maintaining stack alignment via push
and pop
, caller-saved registers are ideal for pushing and popping with abandon.
In this case, it turned out that my understanding of what the program was doing had the bug, not the compiler. It’s also worth pointing out that if the compiler really was mismatching register saves and restores, lots and lots of things would be broken. The likelihood of the code produced in this instance being wrong—but the same problem not occurring in the millions of lines of code the compiler has compiled to produce the system on your computer—is vanishingly small. The next time you see the compiler doing something weird, remember that the compiler is always right and try to figure out why it’s doing that.
(I should say, of course, that compilers, just like many other computer programs, do have bugs; GCC’s bugzilla or LLVM’s bugzilla would not exist otherwise, nor would bugfix releases continue to come out for your favorite compiler. But in the vast, vast majority of cases, you have a bug to fix, not the compiler.)