Mirror, Mirror on the Wall, Why is my Binary Slow?

In an earlier post I described my Fiji hack: how to use some nasty instrumentation to spit out ld scripts to speed up cold startup. This week I tried to extract more data out of the binary to lay it out even better. Trouble is that even if one lays out functions perfectly, they load data for things like variable initializers which will cause more IO.

A very clever friend suggested that I can write a valgrind plugin that can detect data accesses and function access and write the linker input files in one step. So with much hand-holding I hacked a sample valgrind plugin to do what I want. Unfortunately, my binaries ended up not being significantly faster(if at all) than the Fiji ones. They also ended up 20% bigger.

Fortunately, the GCC devs were able to point out my linker mistakes and pointed me at a linker patch that does what I want without linker scripts(and has less binary-bloating side-effects). Unfortunately, that just confirmed that the speedup I was looking for wasn’t hiding behind data symbols. So I am going to have to sit down with my io tracing script and study what the heck is going on.

Cool Things I Learned

In the process of helping me, GCC people namedropped some compiler flags that may prove very helpful:

  • -freorder-blocks-and-partition: Apparently this breaks up functions into hot/cold parts and gives them different section names so they can be moved around at link time.
  • -fno-common, -fno-zero-initialized-in-bss should go well with my favourites: -ffunction-sections -fdata-sections

Additionally, it may be possible to benefit from linking with large page support. I have some doubts about that.

I did learn about some cool GNU Gold flags:

  • –compress-debug-sections=zlib: Most of the overhead of linking a development libxul.so is writing out a near gig of debug data
  • –icf: Identical code folding, I think that matches the deduplication feature found in the ms linker. Saves 5% on my libxul.so

5 comments

  1. Do you know why we’re compiling everything with -fno-reorder-functions on Linux?

  2. Also, I would expect -fno-common & -fno-zero-initialized-in-bss to hurt rather than help. Nothing in the .bss section has to be read from disk at all.

  3. Huh, I didn’t know gold had ICF, we should look into using that for our official builds.

    Also -freorder-blocks-and-partition sounds great, VC++ does that and it definitely helps. All those little-used branches wind up in separate pages that we may not have to load.

  4. Did you just consciously referred to Blind Guardian?! :)

  5. @zack: No idea about fno-reorder-functions

    @ted: I’m not sure if gold ships on any distros by default yet. I think it’s stable enough for us to use it.

    @gandalf, ha! Now that I think about it, the Candlemass “Mirror Mirror” has been playing on the radio, that’s probably to blame.