Back in 2007 Nick Nethercote morphed his Massif heap profiler into
its present form. Massif intercepts malloc/free et al, takes
periodic snapshots of the heap and shows results using stack trees.
It answers the questions “what’s in the heap?” and “who put it there?”
Since then, I’d been mulling over a heap profiler that could also tell
me something about block lifetimes and usages. This year I finally
got around to hacking something up — DHAT. Like Massif, DHAT
intercepts malloc/free, but it also inspects every (data) memory
reference, to see which block, if any, it is to. By doing that we can
identify hot blocks and under-used ones. For allocation points which
always allocate blocks of the same size, DHAT keeps count of how often
each block offset is accessed, thereby giving information on hot and
cold object fields, and showing up probable aligment holes.
DHAT also records block lifetime information. Time is measured in
instructions executed, as does Massif. DHAT notes the age at death of
each block and shows the average value for each allocation point.
Doing that makes it easy to find allocation points which chew through
lots of heap, but don’t hold on to it for long, or, conversely, points
that allocate heap and hold on to it for the entire process lifetime.
DHAT tracks two kinds of entities: blocks and allocation points (APs). A block
is just a heap block. An AP is a stack trace that has allocated one
or more blocks. When a block is freed, its statistics are merged back
into its AP. At the end of the run, DHAT shows statistics for the top
N APs, as sorted by one of three user-selectable metrics. Most of the
art of using DHAT is in interpreting the mass of numbers it produces.
DHAT perhaps ought to be merged with Massif at some point. For now,
the emphasis was to get something up and running quickly, to see if it
generates any useful information.
So does it show up anything interesting in Firefox?
Here’s a no-brainer, bug 609905:
-------------------- 32 of 5000 --------------------
max-live: 524,328 in 1 blocks
tot-alloc: 524,328 in 1 blocks (avg size 524328.00)
deaths: 1, at avg age 15,015,989,851 (99.60% of prog lifetime)
acc-ratios: 0.00 rd, 0.00 wr (192 b-read, 825 b-written)
at 0x4C27ECA: operator new(unsigned long) (vg_replace_malloc.c:261)
by 0x661D01D: js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7807)
by 0x651FBEB: JSThreadData::init() (jscntxt.cpp:497)
by 0x652049D: js_CurrentThread(JSRuntime*) (jscntxt.cpp:588)
by 0x652085C: js_InitContextThread(JSContext*) (jscntxt.cpp:659)
This is a half-megabyte block that’s allocated, held onto for the
entire browser run, but never accessed. The key is to look at the
acc-ratios field, which shows the average number of times each byte
in the block got read and written — here, 0.00 for both. Turns out
to be a leftover allocator from the regexp engine that predated YARR.
Here’s another, bug 611400, that’s more
interesting. When profiling the browser I noticed quite a lot heap
occupied by allocations from Jaegermonkey’s method
js::mjit::Compiler::finishThisUp. I wondered what it was but didn’t
think much more about it until I profiled the JS engine running
Kraken, and fell across this:
-------------------- 4 of 100 --------------------
max-live: 12,197,056 in 1 blocks
tot-alloc: 12,197,056 in 1 blocks (avg size 12197056.00)
deaths: 1, at avg age 1,432,667,675 (42.98% of prog lifetime)
acc-ratios: 0.00 rd, 0.00 wr (59 b-read, 129 b-written)
at 0x47B44FF: calloc (vg_replace_malloc.c:467)
by 0x81FF24D: js::mjit::Compiler::finishThisUp(js::mjit::JITSc
by 0x8215E47: js::mjit::Compiler::performCompilation(js::mjit:
by 0xC2F402F: ???
Hmm, a 12.2 MB allocation which is never used! That’s around 12% of
the maximum live heap in this run. Turns out the method JIT creates
tables mapping JS bytecodes to the corresponding native code entry
points. This Kraken test (imaging-darkroom) includes huge tables of
constants, for which there are no entry points, so the 12MB allocation
is a completely empty table.
This is an extreme case of a more general problem, though.
Instrumenting the method jit shows that about 98% of table entries are
surfing at fairly complex-looking web pages, with 5 open tabs.
I changed the table representation to only store useful entries. That
saves the 12.2MB in Kraken. For a browser with 5 tabs, it saves
somewhere in the region of 1%-2% of the entire C++ heap, which is a
And there’s more:
- we were allocating a 3MB file buffer when writing Zip files (610040) (although JOrendorff beat me to it by a few minutes)
- Inefficient layout of CSS style rules (596140)
The above results were obtained by looking through allocation points
sorted by decreasing maximum-live-volume. Sorting the output on other
metrics gives other perspectives:
- sorting by decreasing max-blocks-live shows places where we allocate large numbers of small blocks (CSS handling, the HTML5 parser)
- sorting by decreasing total-bytes-allocated tends to show up places that turn over a lot of heap, even if it isn’t held onto for long (on Linux, the X client libraries seem particularly bad)
DHAT is available in Valgrind-3.6.0 (–tool=exp-dhat). It is stable
but can sometimes produce misleading numbers, and is unreasonably slow
for such a simple tool. I have a fixed up and 2-3 x faster version,
which I’ll ship in 3.6.1.