To follow up from this post: we’ve made some good progress on reducing JaegerMonkey’s memory consumption in Firefox 4, though there’s still a way to go. Julian Seward will blog about this shortly. In the meantime, I thought I’d share a particularly useful Massif invocation that Rob Sayre inspired me to concoct:
valgrind \ --smc-check=all --trace-children=yes \ --tool=massif \ --pages-as-heap=yes --detailed-freq=1000000 \ --threshold=0.5 \ --alloc-fn=mmap \ --alloc-fn=syscall \ --alloc-fn=pages_map \ --alloc-fn=chunk_alloc \ --alloc-fn=arena_run_alloc \ --alloc-fn=arena_bin_malloc_hard \ --alloc-fn=malloc \ --alloc-fn=realloc \ --alloc-fn='operator new(unsigned long)' \ --alloc-fn=huge_malloc \ --alloc-fn=posix_memalign \ --alloc-fn=moz_xmalloc \ --alloc-fn=JS_ArenaAllocate \ --alloc-fn=PL_ArenaAllocate \ --alloc-fn=NS_Alloc_P \ --alloc-fn=NS_Realloc_P \ --alloc-fn='XPConnectGCChunkAllocator::doAlloc()' \ --alloc-fn='PickChunk(JSRuntime*)' \ --alloc-fn='RefillFinalizableFreeList(JSContext*, unsigned int)' \ --alloc-fn=sqlite3MemMalloc \ --alloc-fn=mallocWithAlarm \ --alloc-fn=sqlite3Malloc \ <insert-firefox-command-here>
Good grief! What a mess. Don’t blame Massif for this, though; it’s because Firefox has so many custom memory allocators.
31.04% (366,878,720B) _dl_map_object_from_fd (dl-load.c:1195) 15.73% (185,998,724B) in 3693 places, all below massif's threshold (00.00%) 15.62% (184,639,488B) pthread_create@@GLIBC_2.2.5 (allocatestack.c:483) 05.68% (67,112,960B) pa_shm_create_rw (in /usr/lib/libpulsecommon-0.9.21.so) 04.35% (51,372,032B) JSC::ExecutablePool::systemAlloc(unsigned long) (ExecutableAllocatorPosix.cpp:43) 03.30% (38,993,920B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7644) 03.11% (36,741,120B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7643) 02.87% (33,935,360B) js::PropertyTree::newShape(JSContext*, bool) (jspropertytree.cpp:97) 02.84% (33,554,432B) js_NewFunction(JSContext*, JSObject*, int (*)(JSContext*, unsigned int, js::Value*), unsigned int, unsigned int, JSObject*, JSAtom*) (jsgcinlines.h:127) 02.79% (32,923,648B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7642) 01.99% (23,555,684B) js::mjit::Compiler::finishThisUp(js::mjit::JITScript**) (jsutil.h:213) 01.69% (19,934,784B) JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short) (jsutil.h:209) 01.53% (18,067,456B) pcache1Alloc (sqlite3.c:33368) 01.48% (17,457,388B) nsStringBuffer::Alloc(unsigned long) (nsSubstring.cpp:206) 01.31% (15,478,784B) g_mapped_file_new (in /lib/libglib-2.0.so.0.2400.1) 00.89% (10,486,784B) JS_NewObject (jsgcinlines.h:127) 00.71% (8,388,608B) js::StackSpace::init() (jscntxt.cpp:164) 00.68% (8,093,696B) GCGraphBuilder::NoteScriptChild(unsigned int, void*) (mozalloc.h:229) 00.68% (8,024,064B) NewOrRecycledNode(JSTreeContext*) (jsparse.cpp:495) 00.67% (7,974,936B) js::Vector<unsigned short, 32ul, js::ContextAllocPolicy>::growStorageBy(unsigned long) (jsutil.h:217) 00.53% (6,291,456B) js_CloneRegExpObject(JSContext*, JSObject*, JSObject*) (jsgcinlines.h:127) 00.52% (6,190,836B) nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity(unsigned int, unsigned int) (nsTArray.h:68)
The total is 1,182,094,880 bytes.
- 31.04% is from
_dl_map_object_from_fd. This corresponds to code and data segments, mostly from libraries.
- 15.73% is from allocation points small enough that they fell below the threshold (0.5%) that I used for this run.
- 15.62% is from
pthread_create, i.e. thread stacks. Hopefully most of this space also won’t be mapped in.
- 5.68% is from
pa_shm_create_rw. Bug 617852 is open about this. It won’t be fixed until after Firefox 4.0, but that’s not so bad because /proc/pid/smaps tells me that hardly any of it is mapped into physical memory.
- That leaves 31.93% of big, heap-ish allocations. It’s pretty obvious that for this workload, the JS engine is being greedy, accounting for 26.42% of that 31.83%. One piece of good news is that the three
js::InitJIT()entries, which together account for 9.2%, will be greatly improved by bug 623428; I’m hoping to reduce them by a factor of 10 or more.
If anyone wants Massif’s full output, I’ll be happy to give it to them. The full output contains full stack traces, which can be useful.
- I’m still worred about our memory consumption, and I intend to keep pushing on it, both before Firefox 4.0 is released and afterwards.
- Massif takes a bit of getting used to, particularly when you are profiling a huge, messy program like Firefox. But it’s the only space profiler I know of that gives information that is detailed enough to be really useful in reducing memory consumption. Without it, we wouldn’t have made much progress on reducing Firefox 4.0’s space consumption. I’d love for other people to run it, it works on Linux and Mac (not Windows, unfortunately). I’m happy to help anyone who wants to try it via IRC or email. For all the improvements done lately, I’ve only looked at a single workload on a single machine! There’s much more analysis to be done.
- If anyone knows of other decent memory profilers that can handle programs as complex as Firefox, I’d love to hear about it. In particular, note that if you only measure the heap (malloc et al) you’re only getting part of the story; this is again because we have multiple allocators which bypass malloc and use
- I wonder if we need better memory benchmarks. I’d like to have some that are as easy to run as, say, SunSpider. Better telemetry would also be great.