gcc version comparison, part 1/n: libxul sizes

Some questions on #perf this morning got me wondering about how different versions of GCC compared in terms of the size of libxul.  (I have a lot of versions of GCC lying about for a sekret performance comparison project, so merely comparing sizes was pretty straightforward.)

The GCC versions I used are listed in the table below.  The Mozilla sources were from mozilla-central r120478, compiled with –disable-debug –disable-debug-symbols –enable-optimize.  The target was x86-64 Linux, the build did not use PGO, and the system linker, GNU ld 2.20.1, was used. Here’s what the size command has to say about libxul in each case; all sizes are in bytes:

GCC version Text size Data size Bss size .text section size .eh_frame section size
4.4.7 39120354 3410456 1611420 22969414 4212924
4.5.4 44833935 3791400 1625996 23449960 7481052
4.6.3 42819600 3774272 1625996 22970408 6467652
4.7.2 42103108 3769576 1631244 22297992 6519596
4.8 HEAD 39638390 3415424 1617260 21300806 6209220

The terms “text”, “data”, and “bss” aren’t just the similarly-named sections in binaries. “text” encompasses all code and read-only sections, so things like string constants (!) would be included in this number. “data” is everything non-constant that’s stored on disk: tables of function pointers, tables of non-constant data, and so forth. “bss” is everything that’s initialized to zero and can therefore be allocated by the system at program run time. I’ve provided the .text section sizes as a more useful (IMHO) number for the purposes of this comparison.

If you look at just the size of the .text section–that is, actual compiled code–there’s not much variation between the compiler versions. 4.5 is the outlier here, with a ~2% increase over 4.4, but 4.6 is back to 4.4′s codesize and 4.8 is smaller still. How or if these differences in code size translate into a difference in performance will have to wait until another blog post. So what’s with size reporting such a huge jump in “text” size for 4.5 through 4.7?

The .eh_frame section sizes help explain this increase. The corresponding .eh_frame_hdr sections show similar percentage-wise increases, but the absolute increases are somewhat smaller there, so I opted to not show data for those. GCC 4.5 started emitting unwind data for function epilogues and does so unconditionally whenever unwind data is emitted. This data is not needed for normal operation, since you never have to unwind the stack from epilogues. However, for unwinding the stack from arbitrary points in the program (e.g. as a sampling profiler similar to oprofile or perf might do), such data is absolutely necessary. (You could fake it by parsing the instruction stream, but that gets very messy very quickly. Been there, done that, don’t want to do it again.) So, extra unwind data leads to bigger section sizes.  No surprises there.

Before getting to other sources of the “text” size increase, we need to examine another interesting statistic: the data size increase seen in 4.5-4.7.  Why should different versions of the compiler differ so much in an essentially static figure?  I generated easy-to-compare lists of symbols from each version:

readelf --syms -W build-mozilla-gcc-${version}/dist/bin/libxul.so \
  | gawk '$4 == "OBJECT" && $7 != "UND" && $5 != "WEAK" {printf("% 6d %s\n", $3, $8); }' \
  | sort -n -k 1 -r > gcc-${version}-all-syms.txt

and diff‘d the 4.4 and the 4.5 version. (Comparing to 4.5 or 4.6 provides roughly the same data, and starting with a base of 4.7 provides the same information in the reverse direction.) While there were a few instances of user-specified variables that the compiler didn’t eliminate, the bulk of the hunks of the diff looked like this:

@@ -721,6 +791,10 @@
   1080 vtable for nsPrintSettings
   1080 keywords
   1080 sip_tcp_conn_tab
+  1072 vtable for js::ion::LIRGeneratorX86Shared
+  1072 vtable for js::ion::MInstructionVisitor
+  1072 vtable for js::ion::LIRGeneratorShared
+  1072 vtable for js::ion::LIRGeneratorX64
   1072 vtable for js::ion::LIRGenerator
   1072 vtable for nsBox
   1064 vtable for nsBaseWidget

or:

@@ -842,6 +934,10 @@
    864 vtable for imgRequestProxy
    864 mozilla::dom::NodeBinding::sAttributes_specs
    864 g_sip_table
+   856 vtable for nsIDOMSVGTextPositioningElement
+   856 vtable for nsIDOMSVGFECompositeElement
+   856 vtable for nsIDOMSVGTSpanElement
+   856 vtable for nsIDOMSVGTextElement
    855 sLayerMaskVS
    848 vtable for nsJARURI
    848 vtable for nsXMLHttpRequestUpload

And if you add up all the sizes of the vtables we’re now retaining:

diff -u gcc-44-all-syms.txt gcc-45-all-syms.txt \
  | c++filt | egrep '^\+' \
  | grep vtable | awk '{ sum += $2 } END { print sum }'

you get a total of about 325K, which accounts for a good chunk of the 375K difference between GCC 4.4′s generated data and GCC 4.5′s generated data.

How did GCC 4.4 make the vtables go away? [UPDATE: There's a simple explanation for what happened here.] I haven’t analyzed the code, but I can see two possibilities. The first is that the compiler devirtualizes all the function calls associated with those classes and can tell that instances of the classes never escape outside of the library. And if you don’t have virtual function calls, you don’t need a vtable. As a second possibility the compiler can see that instances of the associated classes are never created. This is probably what happened for all the nsIDOM* vtables in the example hunk above. So the vtables are never referenced and discarded at link time, or never generated for any compilation unit in the first place. Whether these suspicions are correct or there’s some other mechanism at work, the key point is that 4.5-4.7 lost the ability to do this in some (all?) cases and dramatically increased data sizes as a result.

Also, since 4.5-4.7 are generating spurious vtables, there’s a lot of unnecessary relocations associated with those tables: the values of the function pointers in the vtables can’t be known until the binary is loaded, so relocations are necessary. This increase in relocations can be partially seen in the “text” numbers in the table above (relocations are constant data…). Going from 4.4 to 4.5 added about 1MB of relocation data and 4.8 benefited by eliminating the need for those extra relocations.

Between the changes in .text section size, the extra relocations, and the extra .eh_frame information, we’ve accounted for a good chunk of the fluctuations seen in the “text” and “data” numbers between compiler versions.  There’s other nickel-and-dime stuff that accounts for the remainder of the fluctuations, but I’m not going to cover those bits here. This post is already long enough! Ideally, the next post will have some Talos performance comparisons.

5 comments

  1. Now that’s interesting, esp. as our official builds for Linux right now are using gcc 4.5, which apparently creates the largest size of all. Will be interesting to see how perf compares.

    Oh, and if 4.8 improves in perf as well as size, it sounds like it will be awesome!

    (Also, how does clang compare?)

  2. MSVC has declspec novtable for exactly this case.

    • Nathan Froyd

      Yeah, turns out the linker is responsible for handling this, but–on my machine, at least–Mozilla’s configury determines that those bits of the linker should be disabled so as to not harm the debugging experience. However, we don’t care about the debugging experience in this case…

  3. [...] my previous post, I discussed the size of libxul as compiled by various versions of GCC.  Due to some configuration [...]