Categories
about:memory AdBlock Plus add-ons compartments DMD Firefox Garbage Collection JägerMonkey Massif Memory allocation Memory consumption MemShrink Performance SQLite Talks Valgrind

Notes on Reducing Firefox’s Memory Consumption

I gave a talk yesterday at the linux.conf.au Browser MiniConf, held in Ballarat, Australia.  Its title was “Notes On Reducing Firefox’s Memory Consumption”.

Below are the slides and notes in a SlideShare embedding. If you find that embedding problematic (some people do) you may prefer to download the PDF version directly.

Categories
JägerMonkey MemShrink Tracemonkey

MemShrink progress report, week 23

The only significant MemShrink-related change that landed this week was that David Anderson removed TraceMonkey, the tracing JIT compiler.  In fact, TraceMonkey was disabled a while ago, so the effects on code size and memory consumption of its removal have been felt since then.  But it feels more real now that the source code is gone (all 67,000 lines of it!), so I figure it’s worth mentioning.  (BTW, many thanks to Ryan VanderMeulen who has been going through Bugzilla, closing many old TraceMonkey-related bugs that are no longer relevant.)

People have asked why TraceMonkey isn’t needed any more.  In my opinion, tracing compilation can be a good strategy for certain kinds of code, such as very tight, non-branchy loops.  But it tends to do badly on other kinds of code.  Before JaegerMonkey, JS code in Firefox ran in one of two modes: interpreted (super slow), or trace-compiled (usually fast).  This kind of bimodal performance is bad, because you lose more when slow than you gain when fast.  Also, because tracing was the only way to make code fast, huge amounts of effort were put into tracing code that shouldn’t really be traced, which made TraceMonkey really complicated.

Once JaegerMonkey was added, the performance was still bimodal, but in a better way:  method-compiled (fairly fast) or trace-compiled (usually fast).  But the heuristics to switch between the two modes were quite hairy.  Then type inference was added to JaegerMonkey, which made it faster on average than JaegerMonkey+TraceMonkey.  Combine that with the fact that TraceMonkey was actively getting in the way of various additional JaegerMonkey and type inference improvements, and it was clear it was time for TraceMonkey to go.

It might sound like there’s been a lot of wasted effort with all these different JITs.  There’s some truth to that.  But JavaScript is a difficult language to compile well, and people have only been writing JITs for it for a few years, which isn’t long when it comes to compilers.  Each new JIT has taught the JS team about ideas that do and don’t work, and those lessons have been incorporated into the next, better JIT.  That’s why IonMonkey is now being developed — because JaegerMonkey with type inference still has a number of shortcomings that can’t be remedied incrementally.

In fact, it’s possible that IonMonkey might end up one day with a trace compiler for really hot, tight loops.  If it does, this trace compiler would be much simpler than TraceMonkey because it would only target code that trace-compiles easily;  trace compilation would be the icing on the cake, not the whole cake.

Enough about JITs.  Time for this week’s MemShrink bug counts.

  • P1: 31 (-0/+2)
  • P2: 132 (-3/+8)
  • P3: 60 (-0/+2)
  • Unprioritized: 4 (-0/+4)

Not a great deal of movement there.  The quietness is at least partly explained by the fact that Thanksgiving is happening in the US this week.  Next week will probably be quieter than usual for the same reason.

Categories
Firefox JägerMonkey Memory consumption MemShrink Tracemonkey Uncategorized

MemShrink progress, week 20

Surprise of the week

[Update: This analysis about livemarks may be wrong.  Talos results from early September show that the MaxHeaps number increased, and the reduction when the “Latest Headlines” livemark was removed has undone that increase.  So livemarks may not be responsible at all, it could just be a coincidence.  More investigation is needed!]

Jeff Muizelaar removed the “Latest Headlines” live bookmark from new profiles.  This was in the Bookmarks Toolbar, and so hasn’t been visible since Firefox 4.0, and most people don’t use it.  And it uses a non-zero amount of memory and CPU.  Just how non-zero was unclear until Marco Bonardo noticed some big performance improvements.  First, in the Talos “MaxHeap” results on WinNT5.2:

Talos MaxHeap graph

And secondly in the Talos “Allocs” results on WinNT5.2 and Mac10.5.2:

Talos Allocs graph

In the WinNT5.2 cases, it looks like we had a bi-modal distribution previously, and this patch changed things so that the higher of the two cases never occurred.  In the Mac10.5.2 case we just had a simple reduction in the number of allocations.  On Linux the results were less conclusive, but there may have been a similar if smaller effect.

This surprised me greatly.  I’ve done a lot of memory profiling of Firefox and never seen anything that pointed to the feed reader as using a lot of memory.  This may be because the feed reader’s usage is falling into a larger, innocuous bucket, such as JS or generic strings.  Or maybe I just missed the signs altogether.

Some conclusions and questions:

  • If you have live bookmarks in your bookmarks toolbar, remove them! [Update: I meant to say “unused live bookmarks”.]
  • We need to work out what is going on with the feed reader, and optimize its memory usage.
  • Can we disable unused live bookmarks for existing users?

Apparently nobody really owns the feed reader, because previous contributors to it have all moved on.  So I’m planning to investigate, but I don’t know the first thing about it.  Any help would be welcome!

 Other stuff

There was a huge memory leak in the Video DownloadHelper add-on v4.9.5, and possibly earlier versions.  This has been fixed in v4.9.6a3 and the fix will make it into the final version v4.9.6 when it is released.  That’s one more add-on leak down, I wonder how many more there are to go.

TraceMonkey, the trace JIT, is no longer built by default.  This means it’s no longer used, and this saves both code and data space.  The old combination of TraceMonkey and JaegerMonkey is slower than the new combination of JaegerMonkey with type inference, and TraceMonkey is also preventing various clean-ups and simplifications, so it’ll be removed entirely soon.

I refined the JS memory reporters in about:memory to give more information about objects, shapes and property tables.

I avoided creating small property tables, removed KidHashes when possible, and reduced the size of KidHashes with few entries.

I wrote about various upcoming memory optimizations in the JS engine.

Justin Lebar enabled jemalloc on MacOS 10.5 builds.  This was expected to be a space improvement, but it also reduced the “Tp5 MozAfterPaint” page loading benchmark by 8%.

Robert O’Callahan avoided excessive memory usage in certain DOM animations on Windows.

Drew Willcoxon avoided excessive memory usage in context menu items created by the add-on SDK.

Bug Counts

  • P1: 35 (-1, +1)
  • P2: 116 (-2, +5)
  • P3: 55 (-2, +3)
  • Unprioritized: 5 (-4, +5)

At this week’s MemShrink meeting we only had 9 bugs to triage, which is the lowest we’ve had in a long time.  It feels like the MemShrink bug list is growing slower than in the past.

Categories
about:memory Firefox Garbage Collection JägerMonkey Memory consumption MemShrink Tracemonkey

SpiderMonkey is on a diet

One thing I’ve learnt while working for Mozilla is that a web browser can be characterized as a JavaScript execution environment that happens to have some multimedia capabilities.  In particular, if you look at Firefox’s about:memory page, the JS engine is very often the component responsible for consuming the most memory.

Consider the following snapshot from about:memory of the memory used by a single JavaScript compartment.

about:memory snapshot

(For those of you who have looked at about:memory before, some of those entries may look unfamiliar, because I landed a patch to refine the JS memory reporters late last week.)

There is work underway to reduce many of the entries in that snapshot.  SpiderMonkey is on a diet.

Objects

Objects are the primary data structure used in JS programs;  after all, it is an object-oriented language.  Inside SpiderMonkey, each object is represented by a JSObject, which holds basic information, and possibly a slots array, which holds the object’s properties. The memory consumption for all JSObjects is measured by the “gc-heap/objects/non-function” and “gc-heap/objects/function” entries in about:memory, and the slots arrays are measured by the “object-slots” entries.

The size of a non-function JSObject is currently 40 bytes on 32-bit platforms and 72 bytes on 64-bit platforms.  Brian Hackett is working to reduce that to 16 bytes and 32 bytes respectively. Function JSObjects are a little larger, being (internally) a sub-class of JSObject called JSFunction.  JSFunctions will therefore benefit from the shrinking of JSObject, and Brian is slimming down the function-specific parts as well.  In fact, these changes are complete in the JaegerMonkey repository, and will likely be merged into mozilla-central early in the Firefox 11 development period.

As for the slots arrays, they are currently arrays of “fatvals” A fatval is a 64-bit internal representation that can hold any JS value — number, object, string, whatever.  (See here for details, scroll down to “Mozilla’s New JavaScript Value Representation”;  the original blog entry is apparently no longer available).  64-bits per entry is overkill if you know, for example, that you have an array full entirely of integers that could fit into 32 bits.  Luke Wagner and Brian Hackett have been discussing a specialized representation to take advantage of such cases.  Variations on this idea have been tried twice before and failed, but perhaps SpiderMonkey’s new type inference support will provide the right infrastructure for it to happen.

Shapes

There are a number of data structures within SpiderMonkey dedicated to making object property accesses fast.  The most important of these are Shapes.  Each Shape corresponds to a particular property that is present in one or more JS objects.  Furthermore, Shapes are linked into linear sequences called “shape lineages”, which describe object layouts.  Some shape lineages are shared and live in “property trees”.  Other shape lineages are unshared and belong to a single JS object;  these are “in dictionary mode”.

The “shapes/tree” and “shapes/dict” entries in about:memory measure the memory consumption for all Shapes.  Shapes of both kinds are the same size;  currently they are 40 bytes on 32-bit platforms and 64 bytes on 64-bit platforms.  But Brian Hackett has also been taking a hatchet to Shape, reducing them to 24 bytes and 40 bytes respectively.  This has required the creation of a new auxiliary BaseShape type, but there should be many fewer BaseShapes than there are Shapes.  This change will also increase the number of Shapes, but should result in a space saving overall.

SpiderMonkey often has to search shape lineages, and for lineages that are hot it creates an auxiliary hash table, called a “property table”, that makes lookups faster.  The “shapes-extra/tree-tables” and “shapes-extra/dict-tables” entries in about:memory measure these tables.  Last Friday I landed a patch that avoids building these tables if they only have a few items in them;  in that case a linear search is just as good.  This reduced the amount of memory consumed by property tables by about 20%.

I mentioned that many Shapes are in property trees.  These are N-ary trees, but most Shapes in them have zero or one child;  only a small fraction have more than that, but the maximum N can be hundreds or even thousands.  So there’s a long-standing space optimization where each shape contains (via a union) a single Shape pointer which is used if it has zero or one child.  But if the number of children increases to 2 or more, this is changed into a pointer to a hash table, which contains pointers to the N children.  Until recently, if a Shape had a child deleted and that reduced the number of children from 2 to 1, it wouldn’t be converted from the hash form back to the single-pointer.  I changed this last Friday.  I also reduced the minimum size of these hash tables from 16 to 4, which saves a lot of space because most of them only have 2 or 3 entries.  These two changes together reduced the size of the “shapes-extra/tree-shape-kids” entry in about:memory by roughly 30–50%.

Scripts

Internally, a JSScript represents (more or less) the code of a JS function, including things like the internal bytecode that SpiderMonkey generates for it.  The memory used by JSScripts is measured by the “gc-heap/scripts” and “script-data” entries in about:memory.

Luke Wagner did some measurements recently that showed that most (70–80%) JSScripts created in the browser are never run.  In hindsight, this isn’t so surprising — many websites load libraries like jQuery but only use a fraction of the functions in those libraries.  It wouldn’t be easy, but if SpiderMonkey could be changed to generate bytecode for scripts lazily, it could reduce “script-data” memory usage by 60–70%, as well as shaving non-trivial amounts of time when rendering pages.

Trace JIT

TraceMonkey is SpiderMonkey’s original JIT compiler, which was introduced in Firefox 3.5.  Its memory consumption is measured by the “tjit-*” entries in about:memory.

With the improvements that type inference made to JaegerMonkey, TraceMonkey simply isn’t needed any more.  Furthermore, it’s a big hairball that few if any JS team members will be sad to say goodbye to.  (js/src/jstracer.cpp alone is over 17,000 lines and over half a megabyte of code!)

TraceMonkey was turned off for web content JS code when type inference landed.  And then it was turned off for chrome code.  And now it is not even built by default.  (The about:memory snapshot above was from a build just before it was turned off.)  And it will be removed entirely early in the Firefox 11 development period.

As well as saving memory for trace JIT code and data (including the wasteful ballast hack required to avoid OOM crashes in Nanojit, ugh), removing all that code will significantly shrink the size of Firefox’s code.  David Anderson told me the binary of the standalone JS shell is about 0.5MB smaller with the trace JIT removed.

Method JIT

JaegerMonkey is SpiderMonkey’s second JIT compiler, which was introduced in Firefox 4.0.  Its memory consumption is measured by the “mjit-code/*” and “mjit-data” entries in about:memory.

JaegerMonkey generates a lot of code.  This situation will hopefully improve with the introduction of IonMonkey, which is SpiderMonkey’s third JIT compiler.  IonMonkey is still in early development and won’t be integrated for some time, but it should generate code that is not only much faster, but much smaller.

GC HEAP

There is a great deal of work being done on the JS garbage collector, by Bill McCloskey, Chris Leary, Terrence Cole, and others.  I’ll just point out two long-term goals that should reduce memory consumption significantly.

First, the JS heap currently has a great deal of wasted space due to fragmentation, i.e. intermingling of used and unused memory.  Once moving GC — i.e. the ability to move things on the heap — is implemented, it will pave the way for a compacting GC, which is one that can move live things that are intermingled with unused memory into contiguous chunks of memory.  This is a challenging goal, especially given Firefox’s high level of interaction between JS and C++ code (because moving C++ objects is not feasible), but one that could result in very large savings, greatly reducing the “gc-heap/arena/unused” and “gc-heap-chunk-*-unused” measurements in about:memory.

Second, a moving GC is a prerequisite for a generational GC, which allocates new things in a small chunk of memory called a “nursery”.  The nursery is garbage-collected frequently (this is cheap because it’s small), and objects in the nursery that survive a collection are promoted to a “tenured generation”.  Generational GC is a win because in practice the majority of things allocated die quickly and are not promoted to the tenured generation.  This means the heap will grow more slowly.

Is that all?

It’s all I can think of right now.  If I’ve missed anything, please add details in the comments.

There’s an incredible amount of work being done on SpiderMonkey at the moment, and a lot of it will help reduce Firefox’s memory consumption.  I can’t wait to see what SpiderMonkey looks like in 6 months!

Categories
about:memory Firefox JägerMonkey Memory consumption MemShrink SQLite

MemShrink progress, week 11

This week was quiet in terms of patches landed.

  • Marco Bonardo changed the way the places.sqlite database is handled. I’m reluctant to describe the change in much detail because I’ll probably get something wrong, and Marco told me he’s planning to write a blog post about it soon.  So I’ll just quote from the bug: “Globally on my system (8GBs) I’ve often seen places.sqlite cache going over 100MB, with the patch I plan to force a maximum of 60MB (remember this will vary based on hardware specs), that is a >40% improvement. We may further reduce in future but better being on the safe side for now.”  This was a MemShrink:P1 bug.
  • New contributor Sander van Veen knocked off another bug (with help from his friend Bas Weelinck) when he added more detail to the “mjit-code” entries in about:memory.  This makes it clear how much of JaegerMonkey’s code memory usage is for normal methods vs. memory for compiled regular expressions.
  • I rearranged nsCSSCompressedDataBlock to avoid some unnecessary padding on 64-bit platforms.  This can save a megabyte or two if you have several CSS-heavy (e.g. Gmail) tabs open.   It makes no difference on 32-bit platforms.

But it was a very busy week in terms of bug activity.  Let’s look at the numbers.

  • P1: 29 (-2, +2)
  • P2: 76 (-10, +20)
  • P3: 38 (-1, +2)
  • Unprioritized: 22 (-5, +23)

Several things happened here.

  • Marco Castelluccio looked through old bugs and found a lot (30 or more) that were related to memory usage and tagged them with “MemShrink”.
  • Nine new bugs were filed to reduce about:memory’s “heap-unclassified” number by adding memory reporters;  many of these were thanks to Boris Zbarsky’s insights into the output produced by DMD.
  • I closed out a number of bugs that were incomplete, stale, or finished;  this included some of those newly marked by Marco, and some ones that were already tagged with “MemShrink”.
  • I tagged five leaks that were found with the cppcheck static analysis tool.

We spent the entire MemShrink meeting today triaging unprioritized bugs and we got through 23 of them.  Of the remaining unprioritized bugs, the older ones tagged by Marco and the cppcheck ones (which I tagged after the meeting) constitute most of them.

It’s clear that the rate of problem/improvement identification is outstripping the rate of fixes.  We have a standing agenda item in MemShrink meetings to go through Steve Fink’s ideas list, but we haven’t touched it in the past two meetings because we’ve spent the entire time on triage.  And when we do go through that list, it will only result in more bugs being filed.  I’m hoping that this glut of MemShrink-tagged bugs is temporary and the new bug rate will slow again in the coming weeks.

In the meantime, if you want to help, please look through the lists of open bugs, or contact me if you aren’t sure where to start, and I’ll do my best to find something you can work on.  Thanks!

Categories
about:memory JägerMonkey Memory consumption Tracemonkey

You make what you measure

Inspired by Shaver’s patch, I implemented some per-compartment stats in about:memory.  I then visited TechCrunch because I know it stresses the JS engine.  Wow, there are over 70 compartments!  There are 20 stories on the front page.  Every story has a Facebook “like” button, a Facebook “send” button, and a Google “+1” button, and every button gets its own compartment.

That sounds like a bug, but it’s probably not.  Nonetheless, every one of those buttons had an entry in about:memory that looked like this:

Old compartment measurements

(The ‘object-slots’ and ‘scripts’ and ‘string-chars’ measurements are also new, courtesy of bug 571249.)

Ugh, 255,099 bytes for a compartment that has only 971 bytes (i.e. not much) of scripts?  Even worse, this is actually an underestimate because there is another 68KB of tjit-data memory that isn’t being measured for each compartment.  That gives a total of about 323KB per compartment.  And it turns out that no JIT compilation is happening for these compartments, so all that tjit-data and mjit-code space is allocated but unused.

Fortunately, it’s not hard to avoid this wasted space, by lazily initializing each compartment’s TraceMonitor and JaegerCompartment.  With that done, the entry in about:memory looks like this:

New compartment measurements

That’s an easy memory saving of over 20MB for a single webpage.  The per-compartment memory reporters haven’t landed yet, and may not even land in their current form, but they’ve already demonstrated their worth.  You make what you measure.

Categories
JägerMonkey Memory consumption

The JavaScript interpreter isn’t dead yet

During most of the development of JaegerMonkey, the JavaScript engine was configured so that JaegerMonkey compiled all executed JavaScript code.  Which meant that the old JavaScript interpreter was unused (with the exception that it’s still used during the recording phase during trace compilation).

However, we eventually discovered that lots of JavaScript functions are only run a small number of times.  Compiling such functions is a bad idea — compilation is slow, and the generated code can be quite large.  Bug 631951 changed things so that a function is only compiled once it has run 16 times, or any loop within it has run 16 times.

This didn’t end up making much speed difference, but it reduced the amount of code generated by JaegerMonkey by a lot;  I saw numbers ranging from 2.5x to 6.5x on different workloads.  For techcrunch.com, which is JS-intensive, this translates into roughly a 30MB saving on a 64-bit machine.

And the interpreter lives another day.

Categories
Firefox JägerMonkey Massif Memory consumption Valgrind

Memory profiling Firefox with Massif, part 2

To follow up from this post: we’ve made some good progress on reducing JaegerMonkey’s memory consumption in Firefox 4, though there’s still a way to go.  Julian Seward will blog about this shortly.  In the meantime, I thought I’d share a particularly useful Massif invocation that Rob Sayre inspired me to concoct:

  valgrind \
  --smc-check=all --trace-children=yes \
  --tool=massif \
  --pages-as-heap=yes --detailed-freq=1000000 \
  --threshold=0.5 \
  --alloc-fn=mmap \
  --alloc-fn=syscall \
  --alloc-fn=pages_map \
  --alloc-fn=chunk_alloc \
  --alloc-fn=arena_run_alloc \
  --alloc-fn=arena_bin_malloc_hard \
  --alloc-fn=malloc \
  --alloc-fn=realloc \
  --alloc-fn='operator new(unsigned long)' \
  --alloc-fn=huge_malloc \
  --alloc-fn=posix_memalign \
  --alloc-fn=moz_xmalloc \
  --alloc-fn=JS_ArenaAllocate \
  --alloc-fn=PL_ArenaAllocate \
  --alloc-fn=NS_Alloc_P \
  --alloc-fn=NS_Realloc_P \
  --alloc-fn='XPConnectGCChunkAllocator::doAlloc()' \
  --alloc-fn='PickChunk(JSRuntime*)' \
  --alloc-fn='RefillFinalizableFreeList(JSContext*, unsigned int)' \
  --alloc-fn=sqlite3MemMalloc \
  --alloc-fn=mallocWithAlarm \
  --alloc-fn=sqlite3Malloc \
  <insert-firefox-command-here>

Good grief!  What a mess.  Don’t blame Massif for this, though;  it’s because Firefox has so many custom memory allocators.

With that invocation, the output of ms_print becomes something that is comprehensible to people other than Massif’s author 🙂  Here’s an extraction of the output which gives a high-level view of Firefox’s memory consumption on 64-bit Linux after loading 20 tabs, each with a random comic from http://www.cad-comic.com/cad/, which is a JavaScript-heavy site:

31.04% (366,878,720B) _dl_map_object_from_fd (dl-load.c:1195)
15.73% (185,998,724B) in 3693 places, all below massif's threshold (00.00%)
15.62% (184,639,488B) pthread_create@@GLIBC_2.2.5 (allocatestack.c:483)
05.68% (67,112,960B) pa_shm_create_rw (in /usr/lib/libpulsecommon-0.9.21.so)
04.35% (51,372,032B) JSC::ExecutablePool::systemAlloc(unsigned long) (ExecutableAllocatorPosix.cpp:43)
03.30% (38,993,920B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7644)
03.11% (36,741,120B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7643)
02.87% (33,935,360B) js::PropertyTree::newShape(JSContext*, bool) (jspropertytree.cpp:97)
02.84% (33,554,432B) js_NewFunction(JSContext*, JSObject*, int (*)(JSContext*, unsigned int, js::Value*), unsigned int, unsigned int, JSObject*, JSAtom*) (jsgcinlines.h:127)
02.79% (32,923,648B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7642)
01.99% (23,555,684B) js::mjit::Compiler::finishThisUp(js::mjit::JITScript**) (jsutil.h:213)
01.69% (19,934,784B) JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short) (jsutil.h:209)
01.53% (18,067,456B) pcache1Alloc (sqlite3.c:33368)
01.48% (17,457,388B) nsStringBuffer::Alloc(unsigned long) (nsSubstring.cpp:206)
01.31% (15,478,784B) g_mapped_file_new (in /lib/libglib-2.0.so.0.2400.1)
00.89% (10,486,784B) JS_NewObject (jsgcinlines.h:127)
00.71% (8,388,608B) js::StackSpace::init() (jscntxt.cpp:164)
00.68% (8,093,696B) GCGraphBuilder::NoteScriptChild(unsigned int, void*) (mozalloc.h:229)
00.68% (8,024,064B) NewOrRecycledNode(JSTreeContext*) (jsparse.cpp:495)
00.67% (7,974,936B) js::Vector<unsigned short, 32ul, js::ContextAllocPolicy>::growStorageBy(unsigned long) (jsutil.h:217)
00.53% (6,291,456B) js_CloneRegExpObject(JSContext*, JSObject*, JSObject*) (jsgcinlines.h:127)
00.52% (6,190,836B) nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity(unsigned int, unsigned int) (nsTArray.h:68)

The total is 1,182,094,880 bytes.

  • 31.04% is from _dl_map_object_from_fd.  This corresponds to code and data segments, mostly from libraries.
  • 15.73% is from allocation points small enough that they fell below the threshold (0.5%) that I used for this run.
  • 15.62% is from pthread_create, i.e. thread stacks.  Hopefully most of this space also won’t be mapped in.
  • 5.68% is from pa_shm_create_rwBug 617852 is open about this.  It won’t be fixed until after Firefox 4.0, but that’s not so bad because /proc/pid/smaps tells me that hardly any of it is mapped into physical memory.
  • That leaves 31.93% of big, heap-ish allocations.  It’s pretty obvious that for this workload, the JS engine is being greedy, accounting for 26.42% of that 31.83%.  One piece of good news is that the three js::InitJIT() entries, which together account for 9.2%, will be greatly improved by bug 623428;  I’m hoping to reduce them by a factor of 10 or more.

If anyone wants Massif’s full output, I’ll be happy to give it to them.  The full output contains full stack traces, which can be useful.

Some conclusions.

  • I’m still worred about our memory consumption, and I intend to keep pushing on it, both before Firefox 4.0 is released and afterwards.
  • Massif takes a bit of getting used to, particularly when you are profiling a huge, messy program like Firefox.  But it’s the only space profiler I know of that gives information that is detailed enough to be really useful in reducing memory consumption.  Without it, we wouldn’t have made much progress on reducing Firefox 4.0’s space consumption.  I’d love for other people to run it, it works on Linux and Mac (not Windows, unfortunately).  I’m happy to help anyone who wants to try it via IRC or email.  For all the improvements done lately, I’ve only looked at a single workload on a single machine!  There’s much more analysis to be done.
  • If anyone knows of other decent memory profilers that can handle programs as complex as Firefox, I’d love to hear about it.  In particular, note that if you only measure the heap (malloc et al) you’re only getting part of the story;  this is again because we have multiple allocators which bypass malloc and use mmap/VirtualAlloc directly.
  • I wonder if we need better memory benchmarks.  I’d like to have some that are as easy to run as, say, SunSpider.  Better telemetry would also be great.
Categories
Firefox JägerMonkey Performance Tracemonkey

Multi-faceted JavaScript speed improvements

Firefox 4.0 beta 7’s release announcement was accompanied by the following graphs that show great improvements in JavaScript speed:

Fx4b7 JavaScript speed-ups

Impressive!  The graphs claim speed-ups of 3x, 3x and 5x;  by my calculations the more precise numbers are 3.49x, 2.94x and 5.24x.

The Sunspider and V8bench results are no surprise to anyone who knows about JägerMonkey and has been following AWFY, but the excellent Kraken results really surprised me.  Why?

  • Sunspider and V8bench have been around for ages.  They are the benchmarks most commonly used (for better or worse) to gauge JavaScript performance and so they have been the major drivers of performance improvements.  To put it more bluntly, like all the other browser vendors, we tune for these benchmarks a lot. In contrast, Kraken was only released on September 14th, and so we’ve done very little tuning for it yet.
  • Unlike Sunspider and V8bench, Kraken contains a lot of computationally intensive code such as image and audio processing. These benchmarks are dominated by tight loops containing numerous array accesses.  As a result, they trace really well, and so even 4b7 spends most of its Kraken time (I’d estimate 90%+) in code generated by TraceMonkey, the trace JIT.

We can draw two happy conclusions from Kraken’s improvement.

  • Our speed-ups apply widely, not just to Sunspider and V8bench.
  • Our future performance eggs are not all in one basket: the JavaScript team has made and will continue to make great improvements to the non-JägerMonkey parts of the JavaScript engine.

Firefox 4.0 is going to be great release!