{"id":450,"date":"2015-10-09T12:06:37","date_gmt":"2015-10-09T16:06:37","guid":{"rendered":"http:\/\/blog.mozilla.org\/nfroyd\/?p=450"},"modified":"2015-10-09T15:17:35","modified_gmt":"2015-10-09T19:17:35","slug":"gecko-include-file-statistics","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nfroyd\/2015\/10\/09\/gecko-include-file-statistics\/","title":{"rendered":"gecko include file statistics"},"content":{"rendered":"<p>I was inspired to poke at which files were most heavily <code>#include<\/code>&#8216;d and which files contributed the most text as a result of their <code>#include<\/code>&#8216;ing after seeing the simplicity of <a href=\"https:\/\/gerrit.libreoffice.org\/gitweb?p=core.git;a=blob;f=bin\/includebloat.awk;h=3792ef95072172e9bad3a6b1faff609dda17229f;hb=HEAD\">Libre Office&#8217;s script<\/a> for doing so.  I had to rewrite it in Python, as the obvious modifications to the awk script weren&#8217;t working, and I had no taste for debugging awk code.  I&#8217;ve put the script up as a gist:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/froydnj\/8c9bc10dc0c16db1f3bd.js\"><\/script><\/p>\n<p>It&#8217;s intended to be run from a newly built objdir on Linux like so:<\/p>\n<pre>python includebloat.py .<\/pre>\n<p>The ability to pick a subdirectory of interest:<\/p>\n<pre>python includebloat.py dom\/bindings\/<\/pre>\n<p>was useful to me when I was testing the script, so I wasn&#8217;t groveling through several thousand files at a time.<\/p>\n<p>The output lines are formatted like so:<\/p>\n<pre>total_size file_size num_of_includes filename<\/pre>\n<p>and are intended to be manipulated further via <code>sort<\/code>, etc.  The script might work on Mac and Windows, but I make no promises.<\/p>\n<p>The results were&#8230;interesting, if not especially helpful at suggesting modifications for future work.  I won&#8217;t show the entirety of the script&#8217;s output, but here are the top twenty files by total size included (size of the file on disk multiplied by number of times it appears as a dependency), done by filtering the script&#8217;s output through <code>sort -n -k 1 -r | head -n 20 | cut -f 1,4 -d ' '<\/code>:<\/p>\n<pre>332478924 \/usr\/lib\/gcc\/x86_64-linux-gnu\/4.9\/include\/avx512fintrin.h\r\n189877260 \/home\/froydnj\/src\/gecko-dev.git\/js\/src\/jsapi.h\r\n161543424 \/usr\/include\/c++\/4.9\/bits\/stl_algo.h\r\n141264528 \/usr\/include\/c++\/4.9\/bits\/random.h\r\n113475040 \/home\/froydnj\/src\/gecko-dev.git\/xpcom\/glue\/nsTArray.h\r\n105880002 \/usr\/include\/c++\/4.9\/bits\/basic_string.h\r\n92449760 \/home\/froydnj\/src\/gecko-dev.git\/xpcom\/glue\/nsISupportsImpl.h\r\n86975736 \/usr\/include\/c++\/4.9\/bits\/random.tcc\r\n76991387 \/usr\/include\/c++\/4.9\/type_traits\r\n72934768 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/TypeTraits.h\r\n68956018 \/usr\/include\/c++\/4.9\/bits\/locale_facets.h\r\n68422130 \/home\/froydnj\/src\/gecko-dev.git\/js\/src\/jsfriendapi.h\r\n66917730 \/usr\/include\/c++\/4.9\/limits\r\n66625614 \/home\/froydnj\/src\/gecko-dev.git\/xpcom\/glue\/nsCOMPtr.h\r\n66284625 \/usr\/include\/x86_64-linux-gnu\/c++\/4.9\/bits\/c++config.h\r\n63730800 \/home\/froydnj\/src\/gecko-dev.git\/js\/public\/Value.h\r\n62968512 \/usr\/include\/stdlib.h\r\n57095874 \/home\/froydnj\/src\/gecko-dev.git\/js\/public\/HashTable.h\r\n56752164 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/Attributes.h\r\n56126246 \/usr\/include\/wchar.h\r\n<\/pre>\n<p>How does <code>avx512fintrin.h<\/code> get included so much?  It turns out <code>&lt;algorithm&gt;<\/code> drags in a lot of code, despite people usually only needing <code>min<\/code>, <code>max<\/code>, or <code>swap<\/code>.  In this case, <code>&lt;algorithm&gt;<\/code> includes <code>&lt;random&gt;<\/code> because <code>std::shuffle<\/code> requires <code>std::uniform_int_distribution<\/code> from <code>&lt;random&gt;<\/code>.  This include chain is responsible for essentially all of the <code>\/usr\/include\/c++\/4.9<\/code>-related files in the above list.<\/p>\n<p>If you are compiling with SSE2 enabled (as is the default on x86-64 Linux), then<code>&lt;random&gt;<\/code> includes <code>&lt;x86intrin.h&gt;<\/code> because <code>&lt;random&gt;<\/code> contains a <a href=\"http:\/\/www.math.sci.hiroshima-u.ac.jp\/~m-mat\/MT\/SFMT\/\">SIMD Mersenne Twister<\/a> implementation.  And <code>&lt;x86intrin.h&gt;<\/code> is a clearinghouse for all sorts of x86 intrinsics, even though all we need is a few typedefs and intrinsics for SSE2 code.  Minus points for GCC header cleanliness here.<\/p>\n<p>What about the top twenty files by number of times included (filter the script&#8217;s output through <code>sort -n -k 3 -r | head -n 20 | cut -f 3,4 -d ' '<\/code>)?<\/p>\n<pre>2773 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/Char16.h\r\n2268 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/Attributes.h\r\n2243 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/Compiler.h\r\n2234 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/Types.h\r\n2204 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/TypeTraits.h\r\n2132 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/Likely.h\r\n2123 \/home\/froydnj\/src\/gecko-dev.git\/memory\/mozalloc\/mozalloc.h\r\n2108 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/Assertions.h\r\n2079 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/MacroArgs.h\r\n2002 \/home\/froydnj\/src\/gecko-dev.git\/xpcom\/base\/nscore.h\r\n1973 \/usr\/include\/stdc-predef.h\r\n1955 \/usr\/include\/x86_64-linux-gnu\/gnu\/stubs.h\r\n1955 \/usr\/include\/x86_64-linux-gnu\/bits\/wordsize.h\r\n1955 \/usr\/include\/x86_64-linux-gnu\/sys\/cdefs.h\r\n1955 \/usr\/include\/x86_64-linux-gnu\/gnu\/stubs-64.h\r\n1944 \/usr\/lib\/gcc\/x86_64-linux-gnu\/4.9\/include\/stddef.h\r\n1942 \/home\/froydnj\/src\/gecko-dev.git\/mfbt\/Move.h\r\n1941 \/usr\/include\/features.h\r\n1921 \/opt\/build\/froydnj\/build-mc\/js\/src\/js-config.h\r\n1918 \/usr\/lib\/gcc\/x86_64-linux-gnu\/4.9\/include\/stdint.h\r\n<\/pre>\n<p>Not a lot of surprises here.  A lot of these are basic definitions for C++ and\/or Gecko (<code>&lt;stdint.h&gt;<\/code>, <code>mfbt\/Move.h<\/code>).<\/p>\n<p>There don&#8217;t seem to be very many obvious wins, aside from getting GCC to clean up its header files a bit.  Getting us to the point where we can use <code>&lt;type_traits&gt;<\/code> instead of own homegrown <code>mfbt\/TypeTraits.h<\/code> would be a welcome development.  Making <code>js\/src\/jsapi.h<\/code> less of a mega-header might help some, but brings of a burden of &#8220;did I remember to include the correct JS header files&#8221;, which probably devolves into people cutting-and-pasting complete lists, which isn&#8217;t a win.  Splitting up <code>nsISupportsImpl.h<\/code> seems like it could help a little bit, though with unified compilation, I suspect we&#8217;d likely wind up including all the split-up files at once anyway.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was inspired to poke at which files were most heavily #include&#8216;d and which files contributed the most text as a result of their #include&#8216;ing after seeing the simplicity of Libre Office&#8217;s script for doing so. I had to rewrite it in Python, as the obvious modifications to the awk script weren&#8217;t working, and I [&hellip;]<\/p>\n","protected":false},"author":320,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[540,161,72442,5,72443],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/450"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/users\/320"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/comments?post=450"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/450\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/media?parent=450"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/categories?post=450"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/tags?post=450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}