Fun ‘n’ games with DHAT

Back in 2007 Nick Nethercote morphed his Massif heap profiler into
its present form. Massif intercepts malloc/free et al, takes
periodic snapshots of the heap and shows results using stack trees.
It answers the questions “what’s in the heap?” and “who put it there?”

Since then, I’d been mulling over a heap profiler that could also tell
me something about block lifetimes and usages. This year I finally
got around to hacking something up — DHAT. Like Massif, DHAT
intercepts malloc/free, but it also inspects every (data) memory
reference, to see which block, if any, it is to. By doing that we can
identify hot blocks and under-used ones. For allocation points which
always allocate blocks of the same size, DHAT keeps count of how often
each block offset is accessed, thereby giving information on hot and
cold object fields, and showing up probable aligment holes.

DHAT also records block lifetime information. Time is measured in
instructions executed, as does Massif. DHAT notes the age at death of
each block and shows the average value for each allocation point.
Doing that makes it easy to find allocation points which chew through
lots of heap, but don’t hold on to it for long, or, conversely, points
that allocate heap and hold on to it for the entire process lifetime.

DHAT tracks two kinds of entities: blocks and allocation points (APs). A block
is just a heap block. An AP is a stack trace that has allocated one
or more blocks. When a block is freed, its statistics are merged back
into its AP. At the end of the run, DHAT shows statistics for the top
N APs, as sorted by one of three user-selectable metrics. Most of the
art of using DHAT is in interpreting the mass of numbers it produces.

DHAT perhaps ought to be merged with Massif at some point. For now,
the emphasis was to get something up and running quickly, to see if it
generates any useful information.

So does it show up anything interesting in Firefox?

Here’s a no-brainer, bug 609905:

-------------------- 32 of 5000 -------------------- max-live: 524,328 in 1 blocks tot-alloc: 524,328 in 1 blocks (avg size 524328.00) deaths: 1, at avg age 15,015,989,851 (99.60% of prog lifetime) acc-ratios: 0.00 rd, 0.00 wr (192 b-read, 825 b-written) at 0x4C27ECA: operator new(unsigned long) (vg_replace_malloc.c:261) by 0x661D01D: js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7807) by 0x651FBEB: JSThreadData::init() (jscntxt.cpp:497) by 0x652049D: js_CurrentThread(JSRuntime*) (jscntxt.cpp:588) by 0x652085C: js_InitContextThread(JSContext*) (jscntxt.cpp:659)

This is a half-megabyte block that’s allocated, held onto for the
entire browser run, but never accessed. The key is to look at the
acc-ratios field, which shows the average number of times each byte
in the block got read and written — here, 0.00 for both. Turns out
to be a leftover allocator from the regexp engine that predated YARR.

Here’s another, bug 611400, that’s more
interesting. When profiling the browser I noticed quite a lot heap
occupied by allocations from Jaegermonkey’s method
js::mjit::Compiler::finishThisUp. I wondered what it was but didn’t
think much more about it until I profiled the JS engine running
Kraken, and fell across this:

-------------------- 4 of 100 -------------------- max-live: 12,197,056 in 1 blocks tot-alloc: 12,197,056 in 1 blocks (avg size 12197056.00) deaths: 1, at avg age 1,432,667,675 (42.98% of prog lifetime) acc-ratios: 0.00 rd, 0.00 wr (59 b-read, 129 b-written) at 0x47B44FF: calloc (vg_replace_malloc.c:467) by 0x81FF24D: js::mjit::Compiler::finishThisUp(js::mjit::JITSc by 0x8215E47: js::mjit::Compiler::performCompilation(js::mjit: by 0xC2F402F: ???

Hmm, a 12.2 MB allocation which is never used! That’s around 12% of
the maximum live heap in this run. Turns out the method JIT creates
tables mapping JS bytecodes to the corresponding native code entry
points. This Kraken test (imaging-darkroom) includes huge tables of
constants, for which there are no entry points, so the 12MB allocation
is a completely empty table.

This is an extreme case of a more general problem, though.
Instrumenting the method jit shows that about 98% of table entries are
unused when running more “normal” Javascript, when
surfing at fairly complex-looking web pages, with 5 open tabs.

I changed the table representation to only store useful entries. That
saves the 12.2MB in Kraken. For a browser with 5 tabs, it saves
somewhere in the region of 1%-2% of the entire C++ heap, which is a
nice outcome.

And there’s more:

we were allocating a 3MB file buffer when writing Zip files (610040) (although JOrendorff beat me to it by a few minutes)

Inefficient layout of CSS style rules (596140)

The above results were obtained by looking through allocation points
sorted by decreasing maximum-live-volume. Sorting the output on other
metrics gives other perspectives:

sorting by decreasing max-blocks-live shows places where we allocate large numbers of small blocks (CSS handling, the HTML5 parser)

sorting by decreasing total-bytes-allocated tends to show up places that turn over a lot of heap, even if it isn’t held onto for long (on Linux, the X client libraries seem particularly bad)

DHAT is available in Valgrind-3.6.0 (–tool=exp-dhat). It is stable
but can sometimes produce misleading numbers, and is unreasonably slow
for such a simple tool. I have a fixed up and 2-3 x faster version,
which I’ll ship in 3.6.1.

6 responses

mmc wrote on December 5, 2010 at 9:43 pm:

This is great! I’ve been using valgrind for a number of years, and I’m always running the latest version, but I didn’t notice the usefulness of this tool before! Great stuff!
njn wrote on December 6, 2010 at 4:54 am:

One problem is the meaning of “heap”. Most of the allocation that Firefox does is not via malloc/new, which is what DHAT tracks.

That’s why I implemented the –pages-as-heap option for Massif, which gives results that are coarser-grained and harder to interpret than the default results, but at least they cover all memory allocations. I wonder if a similar option for DHAT would be useful; it probably would be larger allocations.
jseward wrote on December 6, 2010 at 10:21 am:

Yes. The –pages-as-heap option has proven remarkably
useful in Massif. It identified, for example, the biggest
single culprit in bug 615199 (“Methodjit enabled causes the
browser to use almost twice as much memory”). I agree it’d
be useful for DHAT and wouldn’t be hard to implement: just
need to change the notion of “allocated block” to “allocated
thing, where thing can be either a block or a page”. Maybe
something to do for Valgrind 3.7.0.

It might also give us a way to quantify the extent to which
different mallocs (jemalloc, normal libc one etc) are able
to avoid page level fragmentation. Now that would indeed
be interesting.
jseward wrote on December 6, 2010 at 12:03 pm:

> I have a fixed up and 2-3 x faster version,
> which I’ll ship in 3.6.1.

Committed as svn trunk r11480.
Robert Kaiser wrote on December 6, 2010 at 9:14 pm :

Hmm, are those things you found perhaps clues to some of the items investigated in the “FF4 uses more memory than 3.6” bug 598466?
jseward wrote on December 6, 2010 at 11:18 pm:

@Robert Kaiser:

Yes, some of the things (particularly 611400) contribute to the
increased space use. AFAICS though, there is no single big
culprit for the increased space use. Various smaller things.