We’ve been chipping away at memory use of Firefox 4 for a couple of
months now, with good results. Recently, though, I’ve been wondering
if we’re measuring the right things. It seems to me there’s two
important things to measure:
- Maximum virtual address space use for the process. Why is this
important? Because if the process runs out of address space, it’s
in serious trouble. Ditto, perhaps worse, if the process uses up
all the machine’s swap.
- But the normal case is different: we don’t run out of address space
or swap. In this case I don’t care how much memory the browser
uses. Really. When we talk about memory use in the non-OOM
situation, we’re using that measure as a proxy for responsiveness.
Excessive memory use isn’t intrinsically bad. Rather, it’s the side
effect that’s the problem: it causes paging, both for the browser
and for everything else running on the machine, slowing
everything down.
Trying to gauge responsiveness by looking at peak RSS figures strikes
me as a losing prospect. The RSS values are set by some more-or-less
opaque kernel page discard algorithm, and depend on the behaviour of
all processes in the system, not just Firefox. Worse, it’s uninformative:
we get no information about which parts of our code base are causing
paging.
So I hacked up a VM profiler. This tells me the page fault behaviour
when running Firefox using a given amount of real memory. It isn’t as
big a task as it sounds, since we already have 99.9% of the required
code in pace: Valgrind’s Cachegrind tool. It just required replacing
the cache simulator with a virtual-to-physical address map simulator.
The profiler does a pretty much textbook pseudo-LRU clock algorithm
simulation. It differentiates between page faults caused by data and
instruction accesses, since these require different fixes — make the
data smaller vs make the code smaller. It also differentiates between
clean (page unmodified) and dirty (page modified, requires writeback)
faults.
Here are some preliminary results. Bear in mind the profiler has only
just started to work, so the potential for bogosity is still large.
First question is: we know that 4.0 uses more memory than 3.6.x. But
does that result in more paging? I profiled both, loading 5 cad-comic
tabs (http://www.cad-comic.com/cad/random) and idling for a while, for
about 8 billion instructions. Results, simulating 100MB of real memory:
3.6.x, release build, using jemalloc:
VM I accesses: 8,250,840,547 (3,186 clean faults + 350 dirty faults)
VM D accesses: 3,089,412,941 (5,239 clean faults + 552 dirty faults)
M-C, release build, using jemalloc:
VM I accesses: 8,473,182,041 ( 8,140 clean faults + 4,979 dirty faults)
VM D accesses: 3,372,806,043 (22,720 clean faults + 14,335 dirty faults)
Apparently it does page more. Most of the paging is due to data
rather than instruction accesses. Requires further investigation.
Second question is: where does that paging come from? Are we missing
any easy wins? From a somewhat longer run with bigger workload, I got
this (w/ apologies for terrible formatting):
Da (# data accesses)
. Dfc (# clean data faults)
. function
------------------------------------------
18,921,574,436 382,023 PROGRAM TOTALS
. 19,339,625 60,583 js::Shape::trace
. 2,228,649 51,635 JSCompartment::purge
. 32,583,809 22,223 js_TraceScript
. 16,306,348 18,404 js::mjit::JITScript::purgePICs
. 18,160,249 12,847 js::mjit::JITScript::purgePICs
. 52,155,631 11,727 memset
. 27,229,391 10,813 js::PropertyTree::sweepShapes
. 120,482,308 10,256 js::gc::MarkChildren
. 138,049,859 9,134 memcpy
. 2,228,649 8,779 JSCompartment::sweep
. 179,083,731 8,057 js_TraceObject
. 6,269,454 5,949 js::mjit::JITScript::sweepCallICs
18% ish of the faults come from js::Shape::trace.
And quite a few come from js::mjit::JITScript::purgePICs (two
versions) and js::mjit::JITScript::sweepCallICs. According to Dave
Anderson and Chris Leary, there might be some opportunity to poke
the code pages in a less jumping-around-y fashion.
mmc wrote on :
Alessandro Pignotti wrote on :
Luke Wagner wrote on :
jseward wrote on :
Anonymous wrote on :
Anonymous wrote on :