finding space savings

My last post talked about trimming down JSJitInfo.  But it left the question of how one might find things like this unanswered (and even unasked!).  Herein follows a short list of my strategies, with a large amount of explanation, for finding things that take up too much space.

The first thing to look at is what objects (as distinct from functions, see below) are taking up a lot of space (on Linux/Android):

readelf -sW dist/bin/libxul.so | awk '$4 == "OBJECT" { print }' | sort -k 3 -n -r | head -n 50 | c++filt

(I don’t know of a good way of doing this on Mac or Windows. Mach-O symbol tables on Mac don’t contain the sizes of objects like ELF symbol tables do. The sizes are derivable from the addresses, of course, but that’s non-trivial code to write and insert into the pipeline above. dumpbin on Windows appears to print sizes of symbols with /symbols, but its output is so verbose and there appears to be some sort of indirection between the symbol and the actual data associated with the symbol. If anybody has non-Linux/Android insight here, I would appreciate it!)

That pipeline is a mouthful, so breaking it down:

  • readelf -sW dist/bin/libxul.so: The -s argument tells readelf to print the symbol table from libxul.so. The -W argument says to format it for a wide display, so the long symbol names that are so common with C++ programming don’t get cut off.
  • awk '$4 == "OBJECT" { print }': The lines readelf prints out look like this:

    3: 00000000021ef9c0 0 FUNC GLOBAL DEFAULT 13 _fini@@xul29a1

    The interesting ones for our purposes are the third field (the size of the object), the fourth field (the kind of object), and the last field (the name of the object). The awk invocation is selecting all the symbols with a type of OBJECT.

  • sort -k 3 -n -r: Now that we have all of the information about the symbols, let’s sort them by their size. -k 3 selects the third field, which is the size, and -n says that we have a numeric field to sort on. -r is just for convenience to list the biggest symbols first.
  • head -n 50 selects the top fifty biggest symbols.
  • c++filt converts all the mangled C++ symbol names to something more human-readable. So instead of _Z25js_GetterOnlyPropertyStubP9JSContextN2JS6HandleIP8JSObjectEENS2_I4jsidEEbNS1_13MutableHandleINS1_5ValueEEE@@xul29a1, you get js_GetterOnlyPropertyStub(JSContext*, JS::Handle<JSObject*>, JS::Handle, bool, JS::MutableHandle)@@xul29a1, which is more understandable.

You don’t have to look at libxul, of course: you can look at whatever objects or executables are of interest to you.

Once you have this information in hand, you can start investigating. Maybe the object is an array of structures that need better bitfield packing.  Maybe the object is large and doesn’t get used.  Maybe we could do a better job of packing string data if there are embedded pointers which require relocations.  Maybe we could use smaller datatypes to store values (saved nearly a megabyte on 32-bit platforms!).  Maybe a structure’s fields just need to be reordered to pack better.  Maybe we have large, duplicated tables lying around.

For discovering whether structures are wasting space, I recommend pahole.  Specifically, I recommend using my fork of pahole, which contains some build fixes for non-Fedora systems and some improvements so pahole copes with modern C++ debug information better. My experience with pahole suggests that its options for only selecting structures with holes are buggy and unhelpful, so you’re going to have to examine the full output. Using the -M option to avoid printing out all the methods on individual objects is tremendously useful.

The second, but less fruitful, strategy is to look for large functions, rather than large objects:

readelf -sW dist/bin/libxul.so | awk '$4 == "FUNC" { print }' | sort -k 3 -n -r | head -n 50 | c++filt

This is less effective, because functions are often harder to trim down: there’s some amount of work that needs to be done and reducing that is difficult. The biggest functions also tend to be quite a bit smaller than the biggest objects in the program, so there’s less room for winning space back.

Even so, maybe you could find cases of excessive inlining.  Finding cases of excessive inlining is a little more difficult than just looking at the symbol table, but the principles are the same. Or you could find cases where lookup tables should be used instead of long if-else chains.  Or macro-constructed case statements could be made smaller.

The third strategy recognizes that sometimes individual functions or objects aren’t that large, but there are so many of them that they wind up costing you a significant amount of space.  For instance, if you do:

readelf -sW dist/bin/libxul.so | c++filt | grep '::Read' | grep StandardURLParams

you’ll notice that IPDL codegen has created several copies of the exact same function.  Perhaps you could find a way to teach the IPDL compiler to not do that.  Or maybe there are several copies of functions that differ only slightly; find a way to merge those functions together. Templates are great places for finding code duplication, maybe they could be restructured to duplicate less.

A word of caution: link-time optimization, which gets used on a majority of our major platforms, does essentially the same thing as some of the aforementioned by-hand merging strategies. So just because you get some huge win locally doesn’t mean that a release build is going to exhibit the same win. (I would argue that the by-hand merging is beneficial in any event, but that is a topic for some other time.)

Finally, you need to decide what is a reasonable amount of space savings for you to try and hunt down. There are likely several thousand places in Gecko where we could win back ten or twenty bytes. But finding them all is liable to be difficult. Changing them to effect the savings may require significant amounts of code gymnastics that not all reviewers would accept. My personal threshold is somewhere around 10K and varies with how difficult the patch is to write. If I can save 5K with a two-line, fairly obvious patch, I’ll do it. If I can save 12K, but the patch is more involved, it goes on the todo list for a rainy day.

3 comments

  1. Benoit Girard

    Please post a link to the top 50 or so. I don’t have a linux build handy with me at this moment but I wouldn’t mind taking a look at the biggest offenders in my modules.

  2. AnimalFriend

    Can your memory waste detection methodology be augmented by an instantiation count? Like how often a given structure is instantiated during a typical execution? I reckon that saving 4 bytes in a structure instantiated a million times is worth more than saving 40 on one you only create 100 times.

  3. […] post on space saving techniques generated a few comments.  It seemed worthwhile to highlight a few of the comments for a wider […]