faster c++ builds by building bigger groups of code

There have been a lot of spectacular build changes going into the tree lately; my personal builds have gotten about 20% faster, which is no mean feat.  One smaller change that I’ve implemented in the last couple weeks is compiling the DOM bindings and the IPDL IPC code in what we’re been calling “unity” mode.

The idea behind unity mode is to compile a C++ file that #includes your actual C++ source files.  What’s the win from this?

  • Fewer compiler processes launched.  This is a good thing on Windows, where processes are expensive; it’s even a good thing on platforms where process creation is faster.
  • Less disk I/O.  The best case is if the original C++ source files include a lot of the same files.  Compiling the single C++ file then includes those headers only once, rather than once per original C++ source file.
  • Smaller debug information.  On Linux, at least, every Gecko object file compiled with debug information is going to include information about basic types like uint32_t, FILE, and so forth.  Compiling several files together means that you cut down on multiple definitions of things in the debug information, which is good.
  • Better optimization.  The compiler is seeing more source code at once, which means it has more information to make decisions about things like inlining.  This often leads to things not getting inlined (perhaps because the compiler can see that a function is called several times across several different files rather than one time in each of several source files).

It’s a little like link-time code optimization, except that your compiler doesn’t need to support LTO.  SQLite, in-tree and otherwise, already provides an option to compile everything as one big source file and claims ~5% speedup on benchmarks.

The concrete wins are that the DOM bindings compile roughly 5x faster, the IPC IPDL bindings compile roughly 2x faster, libxul got 100k+ smaller on Android, and that the Windows PGO memory required went down by over 4%.  (The PGO memory decrease was just from building DOM bindings in unity mode; the effect from the IPC code appears to have been negligible.)  The downside is that incremental builds when WebIDL or IPDL files are modified get slightly slower.  We tried to minimize this effect by compiling files in groups of 16, which appeared to provide the best balance between full builds and incremental builds.

The code is in moz.build and it’s not specific to the DOM bindings or IPC IPDL code; it will work on any collection of C++ source files, modulo issues with headers being included in unexpected places.  The wins are probably highest on generated code, but I’d certainly be interested in hearing what happens if other bits of the tree are compiled in unity mode.

2 comments

  1. As I mentioned in the corresponding bug, I did that more globally a while ago and that allowed to decrease the linked libxul.so size by about 10 to 20%. At the time I was experimenting with that, that’s the only metric I was looking. I didn’t look whether it changed build time. In fact, back then, the build system was essentially building one unified file at a time most of the time anyways, so it wasn’t an interesting metric to look at, as it was not very likely to go down significantly.
    There are however a lot of issues that prevent this being done for everything. Like, specific build flags for some files. Or conflicting names/defines/etc. in several files in the same directory.