09
Oct 15

gecko include file statistics

I was inspired to poke at which files were most heavily #include‘d and which files contributed the most text as a result of their #include‘ing after seeing the simplicity of Libre Office’s script for doing so. I had to rewrite it in Python, as the obvious modifications to the awk script weren’t working, and I had no taste for debugging awk code. I’ve put the script up as a gist:

It’s intended to be run from a newly built objdir on Linux like so:

python includebloat.py .

The ability to pick a subdirectory of interest:

python includebloat.py dom/bindings/

was useful to me when I was testing the script, so I wasn’t groveling through several thousand files at a time.

The output lines are formatted like so:

total_size file_size num_of_includes filename

and are intended to be manipulated further via sort, etc. The script might work on Mac and Windows, but I make no promises.

The results were…interesting, if not especially helpful at suggesting modifications for future work. I won’t show the entirety of the script’s output, but here are the top twenty files by total size included (size of the file on disk multiplied by number of times it appears as a dependency), done by filtering the script’s output through sort -n -k 1 -r | head -n 20 | cut -f 1,4 -d ' ':

332478924 /usr/lib/gcc/x86_64-linux-gnu/4.9/include/avx512fintrin.h
189877260 /home/froydnj/src/gecko-dev.git/js/src/jsapi.h
161543424 /usr/include/c++/4.9/bits/stl_algo.h
141264528 /usr/include/c++/4.9/bits/random.h
113475040 /home/froydnj/src/gecko-dev.git/xpcom/glue/nsTArray.h
105880002 /usr/include/c++/4.9/bits/basic_string.h
92449760 /home/froydnj/src/gecko-dev.git/xpcom/glue/nsISupportsImpl.h
86975736 /usr/include/c++/4.9/bits/random.tcc
76991387 /usr/include/c++/4.9/type_traits
72934768 /home/froydnj/src/gecko-dev.git/mfbt/TypeTraits.h
68956018 /usr/include/c++/4.9/bits/locale_facets.h
68422130 /home/froydnj/src/gecko-dev.git/js/src/jsfriendapi.h
66917730 /usr/include/c++/4.9/limits
66625614 /home/froydnj/src/gecko-dev.git/xpcom/glue/nsCOMPtr.h
66284625 /usr/include/x86_64-linux-gnu/c++/4.9/bits/c++config.h
63730800 /home/froydnj/src/gecko-dev.git/js/public/Value.h
62968512 /usr/include/stdlib.h
57095874 /home/froydnj/src/gecko-dev.git/js/public/HashTable.h
56752164 /home/froydnj/src/gecko-dev.git/mfbt/Attributes.h
56126246 /usr/include/wchar.h

How does avx512fintrin.h get included so much? It turns out <algorithm> drags in a lot of code, despite people usually only needing min, max, or swap. In this case, <algorithm> includes <random> because std::shuffle requires std::uniform_int_distribution from <random>. This include chain is responsible for essentially all of the /usr/include/c++/4.9-related files in the above list.

If you are compiling with SSE2 enabled (as is the default on x86-64 Linux), then<random> includes <x86intrin.h> because <random> contains a SIMD Mersenne Twister implementation. And <x86intrin.h> is a clearinghouse for all sorts of x86 intrinsics, even though all we need is a few typedefs and intrinsics for SSE2 code. Minus points for GCC header cleanliness here.

What about the top twenty files by number of times included (filter the script’s output through sort -n -k 3 -r | head -n 20 | cut -f 3,4 -d ' ')?

2773 /home/froydnj/src/gecko-dev.git/mfbt/Char16.h
2268 /home/froydnj/src/gecko-dev.git/mfbt/Attributes.h
2243 /home/froydnj/src/gecko-dev.git/mfbt/Compiler.h
2234 /home/froydnj/src/gecko-dev.git/mfbt/Types.h
2204 /home/froydnj/src/gecko-dev.git/mfbt/TypeTraits.h
2132 /home/froydnj/src/gecko-dev.git/mfbt/Likely.h
2123 /home/froydnj/src/gecko-dev.git/memory/mozalloc/mozalloc.h
2108 /home/froydnj/src/gecko-dev.git/mfbt/Assertions.h
2079 /home/froydnj/src/gecko-dev.git/mfbt/MacroArgs.h
2002 /home/froydnj/src/gecko-dev.git/xpcom/base/nscore.h
1973 /usr/include/stdc-predef.h
1955 /usr/include/x86_64-linux-gnu/gnu/stubs.h
1955 /usr/include/x86_64-linux-gnu/bits/wordsize.h
1955 /usr/include/x86_64-linux-gnu/sys/cdefs.h
1955 /usr/include/x86_64-linux-gnu/gnu/stubs-64.h
1944 /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h
1942 /home/froydnj/src/gecko-dev.git/mfbt/Move.h
1941 /usr/include/features.h
1921 /opt/build/froydnj/build-mc/js/src/js-config.h
1918 /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stdint.h

Not a lot of surprises here. A lot of these are basic definitions for C++ and/or Gecko (<stdint.h>, mfbt/Move.h).

There don’t seem to be very many obvious wins, aside from getting GCC to clean up its header files a bit. Getting us to the point where we can use <type_traits> instead of own homegrown mfbt/TypeTraits.h would be a welcome development. Making js/src/jsapi.h less of a mega-header might help some, but brings of a burden of “did I remember to include the correct JS header files”, which probably devolves into people cutting-and-pasting complete lists, which isn’t a win. Splitting up nsISupportsImpl.h seems like it could help a little bit, though with unified compilation, I suspect we’d likely wind up including all the split-up files at once anyway.