Memory profiling Firefox with Massif, part 2

To follow up from this post: we’ve made some good progress on reducing JaegerMonkey’s memory consumption in Firefox 4, though there’s still a way to go.  Julian Seward will blog about this shortly.  In the meantime, I thought I’d share a particularly useful Massif invocation that Rob Sayre inspired me to concoct:

  valgrind \
  --smc-check=all --trace-children=yes \
  --tool=massif \
  --pages-as-heap=yes --detailed-freq=1000000 \
  --threshold=0.5 \
  --alloc-fn=mmap \
  --alloc-fn=syscall \
  --alloc-fn=pages_map \
  --alloc-fn=chunk_alloc \
  --alloc-fn=arena_run_alloc \
  --alloc-fn=arena_bin_malloc_hard \
  --alloc-fn=malloc \
  --alloc-fn=realloc \
  --alloc-fn='operator new(unsigned long)' \
  --alloc-fn=huge_malloc \
  --alloc-fn=posix_memalign \
  --alloc-fn=moz_xmalloc \
  --alloc-fn=JS_ArenaAllocate \
  --alloc-fn=PL_ArenaAllocate \
  --alloc-fn=NS_Alloc_P \
  --alloc-fn=NS_Realloc_P \
  --alloc-fn='XPConnectGCChunkAllocator::doAlloc()' \
  --alloc-fn='PickChunk(JSRuntime*)' \
  --alloc-fn='RefillFinalizableFreeList(JSContext*, unsigned int)' \
  --alloc-fn=sqlite3MemMalloc \
  --alloc-fn=mallocWithAlarm \
  --alloc-fn=sqlite3Malloc \
  <insert-firefox-command-here>

Good grief!  What a mess.  Don’t blame Massif for this, though;  it’s because Firefox has so many custom memory allocators.

With that invocation, the output of ms_print becomes something that is comprehensible to people other than Massif’s author :)  Here’s an extraction of the output which gives a high-level view of Firefox’s memory consumption on 64-bit Linux after loading 20 tabs, each with a random comic from http://www.cad-comic.com/cad/, which is a JavaScript-heavy site:

31.04% (366,878,720B) _dl_map_object_from_fd (dl-load.c:1195)
15.73% (185,998,724B) in 3693 places, all below massif's threshold (00.00%)
15.62% (184,639,488B) pthread_create@@GLIBC_2.2.5 (allocatestack.c:483)
05.68% (67,112,960B) pa_shm_create_rw (in /usr/lib/libpulsecommon-0.9.21.so)
04.35% (51,372,032B) JSC::ExecutablePool::systemAlloc(unsigned long) (ExecutableAllocatorPosix.cpp:43)
03.30% (38,993,920B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7644)
03.11% (36,741,120B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7643)
02.87% (33,935,360B) js::PropertyTree::newShape(JSContext*, bool) (jspropertytree.cpp:97)
02.84% (33,554,432B) js_NewFunction(JSContext*, JSObject*, int (*)(JSContext*, unsigned int, js::Value*), unsigned int, unsigned int, JSObject*, JSAtom*) (jsgcinlines.h:127)
02.79% (32,923,648B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7642)
01.99% (23,555,684B) js::mjit::Compiler::finishThisUp(js::mjit::JITScript**) (jsutil.h:213)
01.69% (19,934,784B) JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short) (jsutil.h:209)
01.53% (18,067,456B) pcache1Alloc (sqlite3.c:33368)
01.48% (17,457,388B) nsStringBuffer::Alloc(unsigned long) (nsSubstring.cpp:206)
01.31% (15,478,784B) g_mapped_file_new (in /lib/libglib-2.0.so.0.2400.1)
00.89% (10,486,784B) JS_NewObject (jsgcinlines.h:127)
00.71% (8,388,608B) js::StackSpace::init() (jscntxt.cpp:164)
00.68% (8,093,696B) GCGraphBuilder::NoteScriptChild(unsigned int, void*) (mozalloc.h:229)
00.68% (8,024,064B) NewOrRecycledNode(JSTreeContext*) (jsparse.cpp:495)
00.67% (7,974,936B) js::Vector<unsigned short, 32ul, js::ContextAllocPolicy>::growStorageBy(unsigned long) (jsutil.h:217)
00.53% (6,291,456B) js_CloneRegExpObject(JSContext*, JSObject*, JSObject*) (jsgcinlines.h:127)
00.52% (6,190,836B) nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity(unsigned int, unsigned int) (nsTArray.h:68)

The total is 1,182,094,880 bytes.

  • 31.04% is from _dl_map_object_from_fd.  This corresponds to code and data segments, mostly from libraries.
  • 15.73% is from allocation points small enough that they fell below the threshold (0.5%) that I used for this run.
  • 15.62% is from pthread_create, i.e. thread stacks.  Hopefully most of this space also won’t be mapped in.
  • 5.68% is from pa_shm_create_rwBug 617852 is open about this.  It won’t be fixed until after Firefox 4.0, but that’s not so bad because /proc/pid/smaps tells me that hardly any of it is mapped into physical memory.
  • That leaves 31.93% of big, heap-ish allocations.  It’s pretty obvious that for this workload, the JS engine is being greedy, accounting for 26.42% of that 31.83%.  One piece of good news is that the three js::InitJIT() entries, which together account for 9.2%, will be greatly improved by bug 623428;  I’m hoping to reduce them by a factor of 10 or more.

If anyone wants Massif’s full output, I’ll be happy to give it to them.  The full output contains full stack traces, which can be useful.

Some conclusions.

  • I’m still worred about our memory consumption, and I intend to keep pushing on it, both before Firefox 4.0 is released and afterwards.
  • Massif takes a bit of getting used to, particularly when you are profiling a huge, messy program like Firefox.  But it’s the only space profiler I know of that gives information that is detailed enough to be really useful in reducing memory consumption.  Without it, we wouldn’t have made much progress on reducing Firefox 4.0’s space consumption.  I’d love for other people to run it, it works on Linux and Mac (not Windows, unfortunately).  I’m happy to help anyone who wants to try it via IRC or email.  For all the improvements done lately, I’ve only looked at a single workload on a single machine!  There’s much more analysis to be done.
  • If anyone knows of other decent memory profilers that can handle programs as complex as Firefox, I’d love to hear about it.  In particular, note that if you only measure the heap (malloc et al) you’re only getting part of the story;  this is again because we have multiple allocators which bypass malloc and use mmap/VirtualAlloc directly.
  • I wonder if we need better memory benchmarks.  I’d like to have some that are as easy to run as, say, SunSpider.  Better telemetry would also be great.

2 Responses to Memory profiling Firefox with Massif, part 2

  1. Wow, https://bugzilla.mozilla.org/show_bug.cgi?id=623428 was very interesting to read.

    — “Otherwise you’re leaving a known memory-error in a product. If I have to pick
    between potential-DoS and potential-wild-write, I’ll go with DoS.”. Loved ;)

    Keep the good work on SM.

  2. Back in 2007 we used Purify at Joost to do memory profiling on windows. I remember that it helped us find things. But I clearly don’t remember all the details. You should ping :bas and ask him as he was working on that.