If you record every heap allocation and re-allocation done by Firefox you find some interesting things. In particular, you find some sub-optimal buffer growth strategies that cause a lot of heap churn.
Think about a data structure that involves a contiguous, growable buffer, such as a string or a vector. If you append to it and it doesn’t have enough space for the appended elements, you need to allocate a new buffer, copy the old contents to the new buffer, and then free the old buffer. realloc()
is usually used for this, because it does these three steps for you.
The crucial question: when you have to grow a buffer, how much do you grow it? One obvious answer is “just enough for the new elements”. That might seem space-efficient at first glance, but if you have to repeatedly grow the buffer it can quickly turn bad.
Consider a simple but not outrageous example. Imagine you have a buffer that starts out 1 byte long and you add single bytes to it until it is 1 MiB long. If you use the “just-enough” strategy you’ll cumulatively allocate this much memory:
1 + 2 + 3 + … + 1,048,575 + 1,048,576 = 549,756,338,176 bytes
Ouch. O(n2) behaviour really hurts when n gets big enough. Of course the peak memory usage won’t be nearly this high, but all those reallocations and copying will be slow.
In practice it won’t be this bad because heap allocators round up requests, e.g. if you ask for 123 bytes you’ll likely get something larger like 128 bytes. The allocator used by Firefox (an old, extensively-modified version of jemalloc) rounds up all requests between 4 KiB and 1 MiB to the nearest multiple of 4 KiB. So you’ll actually allocate approximately this much memory:
4,096 + 8,192 + 12,288 + … + 1,044,480 + 1,048,576 = 134,742,016 bytes
(This ignores the sub-4 KiB allocations, which in total are negligible.) Much better. And if you’re lucky the OS’s virtual memory system will do some magic with page tables to make the copying cheap. But still, it’s a lot of churn.
A strategy that is usually better is exponential growth. Doubling the buffer each time is the simplest strategy:
4,096 + 8,192 + 16,384 + 32,768 + 65,536 + 131,072 + 262,144 + 524,288 + 1,048,576 = 2,093,056 bytes
That’s more like it; the cumulative size is just under twice the final size, and the series is short enough now to write it out in full, which is nice — calls to malloc()
and realloc()
aren’t that cheap because they typically require acquiring a lock. I particularly like the doubling strategy because it’s simple and it also avoids wasting usable space due to slop.
Recently I’ve converted “just enough” growth strategies to exponential growth strategies in XDRBuffer and nsTArray, and I also found a case in SQLite that Richard Hipp has fixed. These pieces of code now match numerous places that already used exponential growth: pldhash, JS::HashTable, mozilla::Vector, JSString, nsString, and pdf.js.
Pleasingly, the nsTArray conversion had a clear positive effect. Not only did the exponential growth strategy reduce the amount of heap churn and the number of realloc()
calls, it also reduced heap fragmentation: the “heap-overhead” part of the purple measurement on AWSY (a.k.a. “RSS: After TP5, tabs closed [+30s, forced GC]”) dropped by 4.3 MiB! This makes sense if you think about it: an allocator can fulfil power-of-two requests like 64 KiB, 128 KiB, and 256 KiB with less waste than it can awkward requests like 244 KiB, 248 KiB, 252 KiB, etc.
So, if you know of some more code in Firefox that uses a non-exponential growth strategy for a buffer, please fix it, or let me know so I can look at it. Thank you.