invasive changes for relocation reduction

We have a lot of relocations in our libraries; almost 240k in libxul alone. Many of these relocations are due to the massive number of vtables that we have; every function pointer in a vtable requires a runtime relocation. And you can find function pointers in XPCOM CIDEntrys, function tables for PLDHashes, and so forth, which are certainly convenient.

But for other kinds of data, such as tables of strings, or tables of structures that contain strings, relocations are entirely avoidable if you’re willing to restructure things. The idea is not new; it’s been described in Ulrich Drepper’s manifesto for writing shared libraries, it’s been used for data structures in compilers and otherwise, and it’s often used in pickling data to and from disk.

The basic idea is that instead of storing pointers to data, you store offsets into some table of data. For instance, if you had:

static const char *errors[] = {
  "no error",
  "on fire",
  "timed out",
  "pebkac"
};

const char *geterr(int i)
{
  return errors[i];
}

then errors itself would be 16 bytes (assuming 32-bit pointers), plus the actual string data (34 bytes), plus the relocations for each entry in errors (32 bytes on Linux/x86, more or less depending on your platform), for a total of 82 bytes.

If instead you had something like:

static const char errtable[] = { "no error\0" "on fire\0" "timed out\0" "pebkac\0" };
static const uint8_t erridx[] = { 0, 9, 17, 27 };

const char *geterr(int i)
{
  return &errtable[erridx[i]];
}

you’d only need 34 bytes for string data plus 4 bytes for the index table, or 38 bytes total, for a savings of more than 50%. You do pay an extra instruction on each call to geterr as well; maybe you want to count that in the byte count if you’re being pedantic.

If you have character pointers embedded in structures, the savings can be more dramatic, because offsets can be smaller than pointers, which may enable you to pack structures better. For instance, if you have:

struct foo {
  const char *s;
  bool b1;
  bool b2;
};

sizeof(struct foo) would be 8 on a typical 32-bit platform, due to padding after b2. Using an offset scheme as described above, you now have:

struct foo {
  uint16_t offset;
  bool b1;
  bool b2;
};

and now struct foo is 50% smaller because padding is no longer required.

Obviously, forming the erridx array above can be tedious; Drepper’s paper linked above describes how one might use the C preprocessor to avoid such manual construction. His construction relies on the C compiler supporting string constants of arbitrary size; MSVC limits us here, as the maximum size of a string constant (post-preprocessor concatenation) is 65K bytes. While there are ways around this limitation, the limitation is not so bad as one might think: using 32-bit indices doesn’t make us much better off than we were before (it does avoid the relocations, which is worth something) and the compiler can now warn us if we’re about to walk off the deep end with 16-bit indices.

There are a number of places in Mozilla that could benefit from such reorganization: the effective TLD table (the above example about structure packing is taken from there), various tables in the CSS and HTML parsers, and so forth. Shared bits like constructing AtomTables could also have their interfaces changed to not be quite so pointer-heavy. Quickstubs could benefit from string table reorganizations and less pointer-ness. The list goes on, I’m sure.

I have a proof-of-concept bug open for nsEffectiveTLDService.cpp; I have been hesistant to dive deeply into other places; for one, the above bug doesn’t seem to have been touched for several weeks, and for two…

Is all this really worth it? I think it is; smaller binaries and faster startup times are a good thing in my book. I realize we have elfhack on mobile for reducing the size and cost of relocations; I say it’d be better to not pay for them in the first place. We also support other platforms where tricks like elfhack don’t work. But what do you think?

1 comment

  1. This has been on my todo list for a while. There are also plenty other structures that use what could be called pointer farms and that could very well be generated at runtime (the generation overhead would probably be lower than that of applying the relocations, except when elfhack is involved).