Memory profiling Firefox on TechCrunch

Rob Sayre suggested TechCrunch to me as a good stress test for Firefox’s memory usage:

  {sayrer} take a look at if you have a chance. that one is brutal.

So I measured space usage with Massif for a single TechCrunch tab, on 64-bit Linux. Here’s the high-level result:

  34.65% (371,376,128B) _dl_map_object_from_fd (dl-load.c:1199)
  21.14% (226,603,008B) pthread_create@@GLIBC_2.2.5 (allocatestack.c:483)
  08.93% (95,748,276B) in 4043 places, all below massif's threshold (00.20%)
  06.26% (67,112,960B) pa_shm_create_rw (in /usr/lib/
  03.10% (33,263,616B) JSC::ExecutablePool::systemAlloc(unsigned long) (ExecutableAllocatorPosix.cpp:43)
  02.67% (28,618,752B) NewOrRecycledNode(JSTreeContext*) (jsparse.cpp:670)
  01.90% (20,414,464B) js::PropertyTree::newShape(JSContext*, bool) (jspropertytree.cpp:97)
  01.57% (16,777,216B) GCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*) (nsCycleCollector.cpp:596)
  01.48% (15,841,208B) JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short, JSVersion) (jsutil.h:210)
  01.45% (15,504,752B) ChangeTable (pldhash.c:563)
  01.44% (15,478,784B) g_mapped_file_new (in /lib/
  01.41% (15,167,488B) GCGraphBuilder::NoteScriptChild(unsigned int, void*) (mozalloc.h:229)
  01.37% (14,680,064B) js_NewFunction(JSContext*, JSObject*, int (*)(JSContext*, unsigned int, js::Value*), unsigned int, unsigned int, JSObject*, JSAtom*) (jsgcinlines.h:127)
  00.97% (10,383,040B) js::mjit::Compiler::finishThisUp(js::mjit::JITScript**) (jsutil.h:214)
  00.78% (8,388,608B) js::StackSpace::init() (jscntxt.cpp:164)
  00.69% (7,360,512B) pcache1Alloc (sqlite3.c:33491)
  00.62% (6,601,324B) PL_DHashTableInit (pldhash.c:268)
  00.59% (6,291,456B) js_NewStringCopyN(JSContext*, unsigned short const*, unsigned long) (jsgcinlines.h:127)
  00.59% (6,287,516B) nsTArray_base::EnsureCapacity(unsigned int, unsigned int) (nsTArray.h:88)
  00.52% (5,589,468B) gfxImageSurface::gfxImageSurface(gfxIntSize const&, gfxASurface::gfxImageFormat) (gfxImageSurface.cpp:111)
  00.49% (5,292,184B) js::Vector::growStorageBy(unsigned long) (jsutil.h:218)
  00.49% (5,283,840B) nsHTTPCompressConv::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned int, unsigned int) (nsMemory.h:68)
  00.49% (5,255,168B) js::Parser::markFunArgs(JSFunctionBox*) (jsutil.h:210)
  00.49% (5,221,320B) nsStringBuffer::Alloc(unsigned long) (nsSubstring.cpp:209)
  00.43% (4,558,848B) _dl_map_object_from_fd (dl-load.c:1250)
  00.42% (4,554,740B) nsTArray_base::EnsureCapacity(unsigned int, unsigned int) (nsTArray.h:84)
  00.39% (4,194,304B) js_NewGCString(JSContext*) (jsgcinlines.h:127)
  00.39% (4,194,304B) js::NewCallObject(JSContext*, js::Bindings*, JSObject&, JSObject*) (jsgcinlines.h:127)
  00.39% (4,194,304B) js::NewNativeClassInstance(JSContext*, js::Class*, JSObject*, JSObject*) (jsgcinlines.h:127)
  00.39% (4,194,304B) JS_NewObject (jsgcinlines.h:127)
  00.35% (3,770,972B) js::PropertyTable::init(JSRuntime*, js::Shape*) (jsutil.h:214)
  00.35% (3,743,744B) NS_NewStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*, nsPresContext*) (nsPresContext.h:306)
  00.34% (3,621,704B) XPT_ArenaMalloc (xpt_arena.c:221)
  00.31% (3,346,532B) nsCSSSelectorList::AddSelector(unsigned short) (mozalloc.h:229)
  00.30% (3,227,648B) js::InitJIT(js::TraceMonitor*) (jsutil.h:210)
  00.30% (3,207,168B) js::InitJIT(js::TraceMonitor*) (jsutil.h:210)
  00.30% (3,166,208B) js_alloc_temp_space(void*, unsigned long) (jsatom.cpp:689)
  00.28% (2,987,548B) nsCSSExpandedDataBlock::Compress(nsCSSCompressedDataBlock**, nsCSSCompressedDataBlock**) (mozalloc.h:229)
  00.27% (2,883,584B) js::detail::HashTable::SetOps, js::SystemAllocPolicy>::add(js::detail::HashTable::SetOps, js::SystemAllocPolicy>::AddPtr&, unsigned long const&) (jsutil.h:210)
  00.26% (2,752,512B) FT_Stream_Open (in /usr/lib/
  00.24% (2,564,096B) PresShell::AllocateFrame(nsQueryFrame::FrameIID, unsigned long) (nsPresShell.cpp:2098)
  00.21% (2,236,416B) nsRecyclingAllocator::Malloc(unsigned long, int) (nsRecyclingAllocator.cpp:170)

Total memory usage at peak was 1,071,940,088 bytes. Lets go through some of these entries one by one.

  34.65% (371,376,128B) _dl_map_object_from_fd (dl-load.c:1199)
  21.14% (226,603,008B) pthread_create@@GLIBC_2.2.5 (allocatestack.c:483)
  06.26% (67,112,960B) pa_shm_create_rw (in /usr/lib/

These three, although the biggest single entries, can be more or less ignored;  I explained why previously.

  03.10% (33,263,616B) JSC::ExecutablePool::systemAlloc() (ExecutableAllocatorPosix.cpp:43)

This is for code generated by JaegerMonkey.  I know very little about JaegerMonkey’s code generation so I don’t have any good suggestions for reducing it.  As I understand it very little effort has been made to minimize the size of the generated code so there may well be some easy wins there.

  02.67% (28,618,752B) NewOrRecycledNode() (jsparse.cpp:670)

This is for JSParseNode, the basic type from which JS parse trees are constructed.  Bug 626932 is open to shrink JSParseNode;  there are a couple of good ideas but not much progress has been made.  I hope to do more here but probably not in time for Firefox 4.0.

  01.90% (20,414,464B) js::PropertyTree::newShape() (jspropertytree.cpp:97)

Shapes are a structure used to speed up JS property accesses.  Increasing the MAX_HEIGHT constant from 64 to 128 (which reduces the number of JS objects that are converted to “dictionary mode”, and thus the number of Shapes that are allocated) may reduce this by 3 or 4 MB with negligible speed cost.  I opened bug 630456 for this.

  01.57% (16,777,216B) GCGraphBuilder::AddNode() (nsCycleCollector.cpp:596)
  01.41% (15,167,488B) GCGraphBuilder::NoteScriptChild() (mozalloc.h:229)

This is the cycle collector.  I know almost nothing about it, but I see it allocates 32,768 PtrInfo structs at a time.  I wonder if that strategy could be improved.

  01.48% (15,841,208B) JSScript::NewScript() (jsutil.h:210)
  01.37% (14,680,064B) js_NewFunction() (jsgcinlines.h:127)

Each JS function has a JSFunction associated with it, and each JSFunction has a JSScript associated with it.  Each of them stores various bits of information about the function.  I don’t have any good ideas for how to shrink these structures.  Both of them are reasonably large, with lots of fields.

  00.97% (10,383,040B) js::mjit::Compiler::finishThisUp() (jsutil.h:214)

Each function compiled by JaegerMonkey also has some additional information associated with it, including all the inline caches.  This is allocated here.  Some good progress has already been made here, and I have some more ideas for getting it down a bit further.

  00.59% (6,291,456B) js_NewStringCopyN() (jsgcinlines.h:127)
  00.49% (5,292,184B) js::Vector::growStorageBy() (jsutil.h:218)

These entries are for space used during JS scanning (a.k.a. lexing, tokenizing).  Identifiers and strings get atomized, i.e. put into a table so there’s a single copy of each one.  Take an identifier as an example.  It starts off stored in a buffer of characters.  It gets scanned and copied into a js::Vector, with any escaped chars being converted along the way.  Then the copy in the js::Vector is atomized, which involves copying it again into a malloc’d buffer of just the right size.  I thought about avoiding this copying in bug 588648, but it turned out to be difficult.  (I did manage to remove another extra copy of every character, though!)

In summary, there is definitely room for more improvement.  I hope to get a few more space optimizations in before Firefox 4.0 is released, but there’ll be plenty of other work to do afterwards.  If anyone can see other easy wins for the entries above, I’d love to hear about them.