{"id":133,"date":"2013-02-01T21:24:46","date_gmt":"2013-02-01T21:24:46","guid":{"rendered":"http:\/\/blog.mozilla.org\/nfroyd\/?p=133"},"modified":"2013-02-03T02:35:40","modified_gmt":"2013-02-03T02:35:40","slug":"gcc-version-comparison-part-1n-libxul-sizes","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nfroyd\/2013\/02\/01\/gcc-version-comparison-part-1n-libxul-sizes\/","title":{"rendered":"gcc version comparison, part 1\/n: libxul sizes"},"content":{"rendered":"<p><strong><\/strong>Some questions on #perf this morning got me wondering about how different versions of <a href=\"http:\/\/gcc.gnu.org\/\">GCC<\/a> compared in terms of the size of libxul.\u00a0 (I have a lot of versions of GCC lying about for a sekret performance comparison project, so merely comparing sizes was pretty straightforward.)<\/p>\n<p>The GCC versions I used are listed in the table below.\u00a0 The Mozilla sources were from mozilla-central r120478, compiled with &#8211;disable-debug &#8211;disable-debug-symbols &#8211;enable-optimize.\u00a0 The target was x86-64 Linux, the build did not use PGO, and the system linker, GNU ld 2.20.1, was used. Here&#8217;s what the <tt>size<\/tt> command has to say about libxul in each case; all sizes are in bytes:<\/p>\n<table style=\"border-collapse: separate; border-spacing: 3px; border: 1px solid;\" border=\"1\" cellspacing=\"2\" cellpadding=\"10\">\n<tbody>\n<tr>\n<th>GCC version<\/th>\n<th>Text size<\/th>\n<th>Data size<\/th>\n<th>Bss size<\/th>\n<th><tt>.text<\/tt> section size<\/th>\n<th><tt>.eh_frame<\/tt> section size<\/th>\n<\/tr>\n<tr>\n<td>4.4.7<\/td>\n<td>39120354<\/td>\n<td>3410456<\/td>\n<td>1611420<\/td>\n<td>22969414<\/td>\n<td>4212924<\/td>\n<\/tr>\n<tr>\n<td>4.5.4<\/td>\n<td>44833935<\/td>\n<td>3791400<\/td>\n<td>1625996<\/td>\n<td>23449960<\/td>\n<td>7481052<\/td>\n<\/tr>\n<tr>\n<td>4.6.3<\/td>\n<td>42819600<\/td>\n<td>3774272<\/td>\n<td>1625996<\/td>\n<td>22970408<\/td>\n<td>6467652<\/td>\n<\/tr>\n<tr>\n<td>4.7.2<\/td>\n<td>42103108<\/td>\n<td>3769576<\/td>\n<td>1631244<\/td>\n<td>22297992<\/td>\n<td>6519596<\/td>\n<\/tr>\n<tr>\n<td>4.8 HEAD<\/td>\n<td>39638390<\/td>\n<td>3415424<\/td>\n<td>1617260<\/td>\n<td>21300806<\/td>\n<td>6209220<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The terms &#8220;text&#8221;, &#8220;data&#8221;, and &#8220;bss&#8221; aren&#8217;t just the similarly-named sections in binaries. &#8220;text&#8221; encompasses all code and read-only sections, so things like string constants (!) would be included in this number. &#8220;data&#8221; is everything non-constant that&#8217;s stored on disk: tables of function pointers, tables of non-constant data, and so forth. &#8220;bss&#8221; is everything that&#8217;s initialized to zero and can therefore be allocated by the system at program run time. I&#8217;ve provided the <tt>.text<\/tt> section sizes as a more useful (IMHO) number for the purposes of this comparison.<\/p>\n<p>If you look at just the size of the <tt>.text<\/tt> section&#8211;that is, actual compiled code&#8211;there&#8217;s not much variation between the compiler versions. 4.5 is the outlier here, with a ~2% increase over 4.4, but 4.6 is back to 4.4&#8217;s codesize and 4.8 is smaller still. How or if these differences in code size translate into a difference in performance will have to wait until another blog post. So what&#8217;s with <tt>size<\/tt> reporting such a huge jump in &#8220;text&#8221; size for 4.5 through 4.7?<\/p>\n<p>The <tt>.eh_frame<\/tt> section sizes help explain this increase. The corresponding <tt>.eh_frame_hdr<\/tt> sections show similar percentage-wise increases, but the absolute increases are somewhat smaller there, so I opted to not show data for those. GCC 4.5 started emitting unwind data for function epilogues and does so unconditionally whenever unwind data is emitted. This data is not needed for normal operation, since you never have to unwind the stack from epilogues. However, for unwinding the stack from arbitrary points in the program (e.g. as a sampling profiler similar to <tt>oprofile<\/tt> or <tt>perf<\/tt> might do), such data is absolutely necessary. (You could fake it by parsing the instruction stream, but that gets very messy very quickly. Been there, done that, don&#8217;t want to do it again.) So, extra unwind data leads to bigger section sizes.\u00a0 No surprises there.<\/p>\n<p>Before getting to other sources of the &#8220;text&#8221; size increase, we need to examine another interesting statistic: the data size increase seen in 4.5-4.7.\u00a0 Why should different versions of the compiler differ so much in an essentially static figure?\u00a0 I generated easy-to-compare lists of symbols from each version:<\/p>\n<pre>readelf --syms -W build-mozilla-gcc-${version}\/dist\/bin\/libxul.so \\\r\n  | gawk '$4 == \"OBJECT\" &amp;&amp; $7 != \"UND\" &amp;&amp; $5 != \"WEAK\" {printf(\"% 6d %s\\n\", $3, $8); }' \\\r\n  | sort -n -k 1 -r &gt; gcc-${version}-all-syms.txt<\/pre>\n<p>and <tt>diff<\/tt>&#8216;d the 4.4 and the 4.5 version. (Comparing to 4.5 or 4.6 provides roughly the same data, and starting with a base of 4.7 provides the same information in the reverse direction.) While there were a few instances of user-specified variables that the compiler didn&#8217;t eliminate, the bulk of the hunks of the diff looked like this:<\/p>\n<pre>@@ -721,6 +791,10 @@\r\n   1080 vtable for nsPrintSettings\r\n   1080 keywords\r\n   1080 sip_tcp_conn_tab\r\n+  1072 vtable for js::ion::LIRGeneratorX86Shared\r\n+  1072 vtable for js::ion::MInstructionVisitor\r\n+  1072 vtable for js::ion::LIRGeneratorShared\r\n+  1072 vtable for js::ion::LIRGeneratorX64\r\n   1072 vtable for js::ion::LIRGenerator\r\n   1072 vtable for nsBox\r\n   1064 vtable for nsBaseWidget<\/pre>\n<p>or:<\/p>\n<pre>@@ -842,6 +934,10 @@\r\n    864 vtable for imgRequestProxy\r\n    864 mozilla::dom::NodeBinding::sAttributes_specs\r\n    864 g_sip_table\r\n+   856 vtable for nsIDOMSVGTextPositioningElement\r\n+   856 vtable for nsIDOMSVGFECompositeElement\r\n+   856 vtable for nsIDOMSVGTSpanElement\r\n+   856 vtable for nsIDOMSVGTextElement\r\n    855 sLayerMaskVS\r\n    848 vtable for nsJARURI\r\n    848 vtable for nsXMLHttpRequestUpload<\/pre>\n<p>And if you add up all the sizes of the vtables we&#8217;re now retaining:<\/p>\n<pre>diff -u gcc-44-all-syms.txt gcc-45-all-syms.txt \\\r\n  | c++filt | egrep '^\\+' \\\r\n  | grep vtable | awk '{ sum += $2 } END { print sum }'<\/pre>\n<p>you get a total of about 325K, which accounts for a good chunk of the 375K difference between GCC 4.4&#8217;s generated data and GCC 4.5&#8217;s generated data.<\/p>\n<p>How did GCC 4.4 make the vtables go away? [<strong>UPDATE:<\/strong> There&#8217;s <a title=\"gcc version comparison, part 1.5\/n: corrections\" href=\"http:\/\/blog.mozilla.org\/nfroyd\/2013\/02\/03\/gcc-version-comparison-part-1-5n-corrections\/\">a simple explanation<\/a> for what happened here.] I haven&#8217;t analyzed the code, but I can see two possibilities. The first is that the compiler devirtualizes all the function calls associated with those classes and can tell that instances of the classes never escape outside of the library. And if you don&#8217;t have virtual function calls, you don&#8217;t need a vtable. As a second possibility the compiler can see that instances of the associated classes are never created. This is probably what happened for all the <tt>nsIDOM*<\/tt> vtables in the example hunk above. So the vtables are never referenced and discarded at link time, or never generated for any compilation unit in the first place. Whether these suspicions are correct or there&#8217;s some other mechanism at work, the key point is that 4.5-4.7 lost the ability to do this in some (all?) cases and dramatically increased data sizes as a result.<\/p>\n<p>Also, since 4.5-4.7 are generating spurious vtables, there&#8217;s a lot of unnecessary relocations associated with those tables: the values of the function pointers in the vtables can&#8217;t be known until the binary is loaded, so relocations are necessary. This increase in relocations can be partially seen in the &#8220;text&#8221; numbers in the table above (relocations are constant data&#8230;). Going from 4.4 to 4.5 added about 1MB of relocation data and 4.8 benefited by eliminating the need for those extra relocations.<\/p>\n<p>Between the changes in <tt>.text<\/tt> section size, the extra relocations, and the extra <tt>.eh_frame<\/tt> information, we&#8217;ve accounted for a good chunk of the fluctuations seen in the &#8220;text&#8221; and &#8220;data&#8221; numbers between compiler versions.\u00a0 There&#8217;s other nickel-and-dime stuff that accounts for the remainder of the fluctuations, but I&#8217;m not going to cover those bits here. This post is already long enough! Ideally, the next post will have some Talos performance comparisons.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some questions on #perf this morning got me wondering about how different versions of GCC compared in terms of the size of libxul.\u00a0 (I have a lot of versions of GCC lying about for a sekret performance comparison project, so merely comparing sizes was pretty straightforward.) The GCC versions I used are listed in the [&hellip;]<\/p>\n","protected":false},"author":320,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/133"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/users\/320"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/comments?post=133"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/133\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/media?parent=133"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/categories?post=133"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/tags?post=133"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}