Space profiling the browser

It’s been a good month or more since we started the current round of
chasing space problems in Firefox. Considerable effort has gone into
identifying and fixing memory hogs. Although the individual fixes are
often excellent, I’ve haven’t had the big picture on how we’re doing.
So today I did some 3 way profiling, comparing

mozilla-central of today, incorporating essentially all the
space fixes to date

mozilla-central of 1 Nov last year, before this really got going

1.9.2 of today, since that’s what we keep
getting compared against

These are release builds on x86_64 linux, using jemalloc, as that’s
presumably the least fragmentful allocator we have.

Each run loads 20 cad-comic.com tabs. I let the browser run through
60 billion machine instructions, then stopped it. By
around 40 billion instructions it has loaded the tabs completely, and
the last 20 billion are essentially idling, intended to give an
identifiable steady-state plateau. That plateau ought to indicate the
minimum achievable residency, after the cycle collector, JS garbage
collector, the method jit code thrower-awayer, the image discarder,
and any other such things, have done their thing. I regard the
plateau as more indicative of the behaviour of the browser during a
long run, than I do the peak.

I profiled using Valgrind’s Massif profiler, using the –pages-as-heap
option. This measures all mapped pages in the process, and so
includes C++ heap, other mmap’d space, code, data and bss segments —
everything.

Consequently a lot of the measured space is the constant overhead of
the text, data and bss segments of the many shared objects involved.
That cost is the same regardless of the browser’s workload. To
quantify it, I did a fourth profile run, loading a single blank page.
This gives me a way to compute the incremental cost for each
cad-comic.com tab.

The summary results of all this are (all numbers are MBs)

Constant overhead: 526

Total costs: 1.9.2 907, MC-Nov10 1149, MC-now 1077

Hence incremental per-tab costs are:
1.9.2 19.0,
MC-Nov10 31.1 (63% above 1.9.2),
MC-now 27.5 (45% above 1.9.2)

So we’re made considerable improvements since November. But we’re
still worse than 1.9.2. Nick Nethercote tells me that bug 623428 should
bring further improvements when it lands.

Here are the top-level visualisations for the three profiles.

Firstly, 1.9.2 (picture below). What surprised me is the massive peak
of around 1.6GB during page load. Once that’s done, it falls back to a
series of modest trough-peak variations. I took the steady-state
measurement above at the lowest trough, around 54 billion instructions
on the horizontal axis.

Also interesting is that steady-state is reached before 25 billion
instructions. The M-C runs below took longer to get there.

Profile for 1.9.2

The M-C Nov10 picture (below) is less dramatic. It lacks the 1.6GB peak,
instead climbing to pretty much the final level of around 1.2GB and
staying there, with a slight decline into steady-state at around 44
billion insns.

The M-C-of-now picture (below) is similar, although steady state
is less steady, and somewhat lower, reflecting the fixes of the past
few weeks. Observe how the orange band steps down slightly in
three stages after about 24 billion instructions. I believe that’s Brian
Hackett’s code discard patch, bug 617656. Also, note the gradual
slope up from around 38 billion to 53 billion insns. That might be
the excessively-infrequent GC problem investigated in bug 619822.

So what’s with the 1.6GB peak for 1.9.2 ? It gives the interesting
effect that, although M-C is worse in steady state than 1.9.2, M-C
has more modest peak requirements, at least for this test case.

On investigation, what 1.9.2 seems to be spiked by is thread stacks.
The implication is that it has more simultaneously live threads than
M-C. Why this should be, I don’t know. I did however notice that
1.9.2 seems to load all 20 tabs at the same time, whereas M-C appears
to pull them in in smaller groups. Related? I don’t know.

8 responses

Nicholas Nethercote wrote on January 7, 2011 at 4:15 am:

Cool! Thanks for the info.

My measurements of the same workload show that fixing bug 623428 could save as much as 100MB. That would get the per-tab overhead down to 22.6, which is 19% higher than in 1.9.2. That’s a lot better than 63%!

As for the thread stacks, I’m pretty sure I’ve read that Firefox 4.0 throttles the loading of tabs on start-up so that only a few (3?) tabs are loaded at a time.
Nicholas Nethercote wrote on January 7, 2011 at 4:18 am:

Oh, you should really use the command line from http://blog.mozilla.org/nnethercote/2011/01/07/memory-profiling-firefox-with-massif-part-2/. The bands in the graph are so much more useful that way.
Screwtape wrote on January 7, 2011 at 5:18 am:

Yeah, the browser.sessionstore.max_concurrent_tabs preference defaults to 3.
Boris wrote on January 7, 2011 at 6:29 am:

Was this in the default configuration, or with image discarding disabled? The 1.9.2 impl of image discarding discarded everything, while the m-c tip one keeps images in the currently viewed tab; that might or might not matter in this case depending on the size of those images…
Justin Dolske wrote on January 7, 2011 at 7:52 am :

“I did however notice that 1.9.2 seems to load all 20 tabs at the same time, whereas M-C appears to pull them in in smaller groups.”

How were you loading the tabs? I know that for Firefox 4, Paul O’Shannessy has implemented cascaded loads for session restore tabs (bug 586068), so that restoring a pile of tabs makes startup less painful. If that’s how you’re triggering the page loads, that would explain the grouping.
jseward wrote on January 7, 2011 at 12:15 pm:

@Screwtape: that’s definitely a good thing.
jseward wrote on January 7, 2011 at 4:32 pm:

@Boris: it was a completely default configuration.
jseward wrote on January 7, 2011 at 4:33 pm:

@Justin Dolske: I didn’t interact with the browser at all
after starting it (no mouse nor KB activity). I just let
it run for the 60 billion instructions, then clicked on
‘X’ to quit. (So to speak).