Category Archives: Firefox

MemShrink’s 3rd birthday

June 14, 2014 was the third anniversary of the first MemShrink meeting. MemShrink is a mature effort at this point, and many of the problems that motivated its creation have been fixed. Nonetheless, there are still some areas for improvement. So, as I did at this time last year, I’ll take the opportunity to update the “big ticket items” list.

The Old Big Ticket Items List

#5: pdf.js

pdf.js is the PDF viewer that now ships by default in Firefox. I greatly reduced its memory usage in two rounds, first in February, and again in June. The first round of improvements were released in Firefox 29, and the second round is due to be released in Firefox 33, which should be out in mid-October.

pdf.js will still use more memory than native PDF viewers, but the situation has improved enough that it can be removed from this list.

#4: Dev tools

See below.

#3: B2G Nuwa

Cervantes Yu, Thinker Li, and others succeeded in landing Nuwa, an impressive technical achievement that increased the amount of memory sharing between different processes on Firefox OS. This allows many more apps to run at once. I think this is present in Firefox OS 1.3 and later.

#2: Compacting Generational GC

See below.

#1: Better Foreground Tab Image Handling

Timothy Nikkel completely fixed the massive spikes in decoded image data that we used to get on image-heavy pages. This was a fantastic improvement that shipped in Firefox 26.

The New Big Ticket Items List

In a break from tradition, I will present the items in alphabetical order, rather than ranking them. This reflects the fact that I don’t have a clear opinion on the relevant importance of these different items. (I view this as a good thing, because it means that I feel there aren’t any hair-on-fire problems.)

Better regression detection

areweslimyet.com (a.k.a. AWSY) is our best tool for detecting memory usage regressions. It currently tracks Firefox and Firefox for Android, and there are plans to integrate Firefox OS.

Unfortunately, although AWSY has been successful at detecting large regressions, leading to them being fixed, its measurements are noisy enough that detecting smaller regressions is difficult. As a result, the general trend of the graphs has been upward. I do not think this is as bad as it looks at first glance, because Firefox’s behaviour in the worst cases — which matter more than the average case — is much better than it used to be. Nonetheless, improved sensitivity here would be an excellent thing, and Eric Rahm is actively working on this.

Developer tools

Developer-oriented memory profiling tools are under active development by Nick Fitzgerald, Jim Blandy, and others. A lot of the necessary profiling infrastructure is in place, and this seems to be progressing well, though I don’t know when it’s expected to be finished.

By the way, if you haven’t tried Firefox’s dev tools recently, you should! They have improved an incredible amount in the past year or so.

GC Arena Fragmentation

Generational GC landed a few months ago. I had been hoping that this would help reduce GC fragmentation, but the effect was small. It looks like compacting GC is what’s needed to make a big difference here, though some smaller tweaks may help things a little.

Tarako

Tarako is the codename for the “$25 phone” running Firefox OS that will ship in India and other countries later this year. It has a paltry 128 MiB of RAM, so memory usage is critical. Since the hardware first became available to Mozilla employees at the start of the year, Tarako has improved from “almost unusable” to “not bad”. But apps still close due to OOMs fairly frequently, especially when a user does something that involves two apps working together in some way.

There is no single change that will decisively fix this problem for Tarako;  further improvements will require an ongoing grind of work from many engineers. Improvements made for Tarako are also likely to benefit higher-end Firefox OS devices as well.

Windows OOM crashes

The number of out-of-memory (OOM) crashes that occur on Windows is relatively high. A sizeable fraction of these appear to be 32-bit virtual OOM crashes, caused by running out of address space.

Things that will mitigate this include modifications to jemalloc and the JS allocator, as well as Electrolysis. And some additional data and analysis may help us understand the problem better.

The ultimate solution, for users on 64-bit versions of Windows, is 64-bit Firefox builds for Windows, though it’ll be a while before those builds are in good enough shape to ship to regular users.

Summary

Three items from the old list (pdf.js, Nuwa, image handling) have been ticked off.  Two items remain (devtools, GC fragmentation), the latter in altered form, and both are a lot closer to completion than they were. Three new items (regression detection, Tarako, and Windows OOMs) have been added.

Let me know if I’ve omitted anything important!

AdBlock Plus’s effect on Firefox’s memory usage

[Update: Wladimir Palant has posted a response on the AdBlock Plus blog. Also, a Chrome developer using the handle “Klathmon” has posted numerous good comments in the Reddit discussion of this post, explaining why ad-blockers are inherently CPU- and memory-intensive, and why integrating ad-blocking into a browser wouldn’t necessarily help.]

AdBlock Plus (ABP) is the most popular add-on for Firefox. AMO says that it has almost 19 million users, which is almost triple the number of the second most popular add-on. I have happily used it myself for years — whenever I use a browser that doesn’t have an ad blocker installed I’m always horrified by the number of ads there are on the web.

But we recently learned that ABP can greatly increase the amount of memory used by Firefox.

First, there’s a constant overhead just from enabling ABP of something like 60–70 MiB. (This is on 64-bit builds; on 32-bit builds the number is probably a bit smaller.) This appears to be mostly due to additional JavaScript memory usage, though there’s also some due to extra layout memory.

Second, there’s an overhead of about 4 MiB per iframe, which is mostly due to ABP injecting a giant stylesheet into every iframe. Many pages have multiple iframes, so this can add up quickly. For example, if I load TechCrunch and roll over the social buttons on every story (thus triggering the loading of lots of extra JS code), without ABP, Firefox uses about 194 MiB of physical memory. With ABP, that number more than doubles, to 417 MiB. This is despite the fact that ABP prevents some page elements (ads!) from being loaded.

An even more extreme example is this page, which contains over 400 iframes. Without ABP, Firefox uses about 370 MiB. With ABP, that number jumps to 1960 MiB. Unsurprisingly, the page also loads more slowly with ABP enabled.

So, it’s clear that ABP greatly increases Firefox’s memory usage. Now, this isn’t all bad. Many people (including me!) will be happy with this trade-off — they will gladly use extra memory in order to block ads. But if you’re using a low-end machine without much memory, you might have different priorities.

I hope that the ABP authors can work with us to reduce this overhead, though I’m not aware of any clear ideas on how to do so. In the meantime, it’s worth keeping these measurements in mind. In particular, if you hear people complaining about Firefox’s memory usage, one of the first questions to ask is whether they have ABP installed.

[A note about the comments: I have deleted 17 argumentative, repetitive, borderline-spam comments from a single commenter — after giving him a warning via email — and I will delete any further comments from him on this post. As a result, I also had to delete three replies to his comments from others, for which I apologize.]

Generational GC has landed

Big news: late last week, generational garbage collection landed. It was backed out at first due to some test failures, but then re-landed and appears to have stuck.

This helps with performance. There are certain workloads where generational GC makes the code run much faster, and Firefox hasn’t been able to keep up with Chrome on these. For example, it has made Firefox slightly faster on the Octane benchmark, and there is apparently quite a bit of headroom for additional improvements.

Interestingly, its effect on memory usage has been small. I was hoping that the early filtering of many short-lived objects would make the tenured heap grow more slowly and thus reduce memory usage, but the addition of other structures (such as the nursery and store buffers) appears to have balanced that out.

The changes to the graphs at AWSY have been all within the noise, with the exception of the “Fresh start” and “Fresh start [+30s]” measurements in the “explicit” graph, both of which ticked up slightly. This isn’t cause for concern, however, because the corresponding “resident” graph hasn’t increased accordingly, and “resident” is the real metric of interest.

“Compacting Generational GC” is the #1 item on the current MemShrink “Big Ticket Items” list. Hopefully the “compacting” part of that, which still remains to be done, will produce some sizeable memory wins.

DMD now works on Windows

DMD is our tool for improving Firefox’s memory reporting.  It helps identify where new memory reporters need to be added in order to reduce the “heap-unclassified” value in about:memory.

DMD has always worked well on Linux, and moderately well on Mac (it is crashy for some people).  And it works on Android and B2G.  But it has never worked on Windows.

So I’m happy to report that DMD now does work on Windows, thanks to the excellent efforts of Catalin Iacob.  If you’re on Windows and you’ve been seeing high “heap-unclassified” values, and you’re able to build Firefox yourself, please give DMD a try.

MemShrink progress, week 121–124

It’s been a quiet but steady four weeks for MemShrink with 19 bugs fixed, including several leaks.

The only fix that I feel is worth highlighting is bug 918207, in which I added support for fast, coarse-grained measurement of a tab’s memory consumption.  The implemented machinery isn’t currently exposed through the UI, though there are two bugs open that will use it:  a simple one that will implement a command for the developer toolbar, and a more complex one that will implement a constantly-updating memory monitor widget for the devtools pane.

See you next time!

Warning for Firefox devs planning to upgrade to Ubuntu 13.10

I just upgraded from Ubuntu 13.04 to Ubuntu 13.10, and Firefox wouldn’t build with either clang or GCC.

clang was initially failing during configure, complaining about not being able to find joystick.h, though the underlying failure was an inability to find stddef.h.  This Ubuntu bug describes a workaround, which is to do the following.

cd /usr/lib/clang/3.2/
sudo ln -s /usr/lib/llvm-3.2/lib/clang/3.2/include

With that in place, I clobbered and rebuilt, and clang complained about a problem in allocator.h relating to a name __allocator_base, and GCC complained about C++11 support being insufficient.

Both failures had the same underlying cause, which is that both compilers are hardwired to look for some GCC-4.7 headers (which they shouldn’t) as well as GCC-4.8 headers.  I filed a bug with Ubuntu about this.

I worked around the problem just by renaming /usr/include/c++/4.7/ and /usr/include/x86_64-linux-gnu/c++/4.7/.  There may be more elegant workarounds, but that was good enough for me.

How to trigger a child process in desktop Firefox

Firefox is now multi-process, and not just for the plugin-container process.  For example, there is now (present but disabled in Firefox 25, and likely to be released in Firefox 27) a separate process that is used to update the thumbnails shown in a new tab.

As a result, sometimes you might want to test something in the presence of multiple processes.  Here’s how I’ve been doing it.

  • Delete the images in the thumbnails/ directory within the profile’s temporary directory.
    • On Linux it’s ~/.cache/mozilla/firefox/<profile>/thumbnails/.
    • On Mac it’s ~/Library/Caches/Firefox/Profiles/<profile>/thumbnails/.
    • On Windows it’s C:\Users\<username>\AppData\Local\Mozilla\Firefox\Profiles\<profile>\thumbnails\.
    • I’m not sure about Android.
  • Open about:newtab.  This triggers a thumbnails process.  It’ll live for about 60 seconds.  (If you’ve configured about:newtab to be blank rather than showing thumbnails, this might not work, though I’m not sure.)

Please let me know if there’s a better way!

(And if anyone can give me extra info on the things I’m not sure about, I’ll update the text above accordingly.  Thanks!)

MemShrink progress, week 117–120

Lots of important MemShrink stuff has happened in the last 27 days:  22 bugs were fixed, and some of them were very important indeed.

Images

Timothy Nikkel fixed bug 847223, which greatly reduces peak memory consumption when loading image-heavy pages.  The combination of this fix and the fix from bug 689623 — which Timothy finished earlier this year and which shipped in Firefox 24 — have completely solved our longstanding memory consumption problems with image-heavy pages!  This was the #1 item on the MemShrink big ticket items list.

To give you an idea of the effect of these two fixes, I did some rough measurements on a page containing thousands of images, which are summarized in the graph below.

Improvements in Firefox's Memory Consumption on One Image-heavy Page

First consider Firefox 23, which had neither fix, and which is represented by the purple line in the graph.  When loading the page, physical memory consumption would jump to about 3 GB, because every image in the page was decoded (a.k.a. decompressed).  That decoded data was retained so long as the page was in the foreground.

Next, consider Firefox 24 (and 25), which had the first fix, and which is represented by the green line on the graph.  When loading the page, physical memory consumption would still jump to almost 3 GB, because the images are still decoded.  But it would soon drop down to a few hundred MB, as the decoded data for non-visible images was discarded, and stay there (with some minor variations) while scrolling around the page. So the scrolling behaviour was much improved, but the memory consumption spike still occurred, which could still cause paging, out-of-memory problems, and the like.

Finally consider Firefox 26 (currently in the Aurora channel), which has both fixes, and which is represented by the red line on the graph.  When loading the page, physical memory jumps to a few hundred MB and stays there.  Furthermore, the loading time for the page dropped from ~5 seconds to ~1 second, because the unnecessary decoding of most of the images is skipped.

These measurements were quite rough, and there was quite a bit of variation, but the magnitude of the improvement is obvious.  And all these memory consumption improvements have occurred without hurting scrolling performance.  This is fantastic work by Timothy, and great news for all Firefox users who visit image-heavy pages.

[Update: Timothy emailed me this:  “Only minor thing is that we still need to turn it on for b2g. We flipped the pref for fennec on central (it’s not on aurora though). I’ve been delayed in testing b2g though, hopefully we can flip the pref on b2g soon. That’s the last major thing before declaring it totally solved.”]

[Update 2: This has hit Hacker News.]

NuWa

Cervantes Yu landed Nuwa, which is a low-level optimization of B2G.  Quoting from the big ticket items list (where this was item #3):

Nuwa… aims to give B2G a pre-initialized template process from which every subsequent process will be forked… it greatly increases the ability for B2G processes to share unchanging data.  In one test run, this increased the number of apps that could be run simultaneously from five to nine

Nuwa is currently disabled by default, so that Cervantes can fine-tune it, but I believe it’s intended to ship with B2G version 1.3.  Fingers crossed it makes it!

Memory Reporting

I made some major simplifications to our memory reporting infrastructure, paving the way for future improvements.

First, we used to have two kinds of memory reporters:  uni-reporters (which report a single measurement) and multi-reporters (which report multiple measurements).  Multi-reporters, unsurprisingly, subsume uni-reporters, and so I got rid of uni-reporters, which simplified quite a bit of code.

Second, I removed about:compartments and folded its functionality into about:memory.  I originally created about:compartments at the height of our zombie compartment problem.  But ever since Kyle Huey made it more or less impossible for add-ons to cause zombie compartments, about:compartments has hardly been used.   I was able to fold about:compartments’ data into about:memory, so there’s no functionality loss, and this change simplified quite a bit more code.  If you visit about:compartments now you’ll get a message telling you to visit about:memory.

Third, I removed the smaps (size/rss/pss/swap) memory reporters.  These were only present on Linux, they were of questionable utility, and they complicated about:memory significantly.

Finally, I fixed a leak in about:memory.  Yeah, it was my fault.  Sorry!

Summit

The Mozilla summit is coming up!  In fact, I’m writing this report a day earlier than normal because I will be travelling to Toronto tomorrow.  Please forgive any delayed responses to comments, because I will be travelling for almost 24 hours to get there.

MemShrink progress, week 113–116

It’s been a relatively quiet four weeks for MemShrink, with 17 bugs fixed.  (Relatedly, in today’s MemShrink meeting we only had to triage 10 bugs, which is the lowest we’ve had for ages.)  Among the fixed bugs were lots for B2G leaks and leak-like things, many of which are hard to explain, but are important for the phone’s stability.

Fabrice Desré made a couple of notable B2G non-leak fixes.

On desktop, Firefox users who view about:memory may notice that it now sometimes mentions more than one process.  This is due to the thumbnails child process, which generates the thumbnails seen on the new tab page, and which occasionally is spawned and runs briefly in the background.  about:memory copes with this child process ok, but the mechanism it uses is sub-optimal, and I’m planning to rewrite it to be nicer and scale better in the presence of multiple child processes, because that’s a direction we’re heading in.

Finally, some sad news:  Justin Lebar, whose name should be familiar to any regular reader of these MemShrink reports, has left Mozilla.  Justin was a core MemShrink-er from the very beginning, and contributed greatly to the success of the project.  Thanks, Justin, and best of luck in the future!

Using include-what-you-use

include-what-you-use (a.k.a. IWYU) is a clang tool that tells you which #include statements should be added and removed from a file.  Nicholas Cameron used it to speed up the building of gfx/layers by 12.5%.  I’ve also used it quite a bit within SpiderMonkey;  I’ve seen smaller build speed improvements but I’ve also been doing it in chunks over time.  Ms2ger started a tracking bug for all places where IWYU has been used in Mozilla code.

Ehsan asked for instructions on setting up IWYU.  There are official instructions, but I thought it might be helpful to document exactly what I did.

First, here is how I installed clang, based on Ehsan’s instructions.  I put the source code under $HOME/local/src and installed the build under $HOME/local.

  mkdir $HOME/local/src
  cd $HOME/local/src
  svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
  cd llvm/tools
  svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
  cd ../..
  mkdir build
  cd build/
  ../configure --enable-optimized --disable-assertions --prefix=$HOME/local
  make
  sudo make install

Then I followed the “Building in-tree” instructions.  The first part is to get the IWYU code.

   cd $HOME/local/src/llvm/tools/clang/tools
   svn checkout http://include-what-you-use.googlecode.com/svn/trunk/ include-what-you-use

The second part was to do the following steps.

  • Edit tools/clang/tools/Makefile and add |include-what-you-use| to the DIRS variable.
  • Edit tools/clang/tools/CMakeLists.txt and add |add_subdirectory(include-what-you-use)|.
  • Re-build clang as per the above instructions.

After that, configure a Mozilla tree for a clang build and then run make with the following options: -j1 -k CXX=/home/njn/local/src/llvm/build/Release/bin/include-what-you-use. I’m not certain if the -j1 is necessary, but since IWYU spits out lots of output, it seemed wise. The -k tells make to keep building even after errors; for some reason, IWYU triggers compilation failure on every file it looks at.

Pipe the output to a file, and you’ll see lots of stuff like this.

../jsarray.h should add these lines:
#include <stdint.h>                     // for uint32_t
#include <sys/types.h>                  // for int32_t
#include "dist/include/js/RootingAPI.h"  // for HandleObject, Handle, etc
#include "jsapi.h"                      // for Value, HandleObject, etc
#include "jsfriendapi.h"                // for JSID_TO_ATOM
#include "jstypes.h"                    // for JSBool
#include "vm/String.h"                  // for JSAtom
namespace JS { class Value; }
namespace js { class ExclusiveContext; }
struct JSContext;

../jsarray.h should remove these lines:

The full include-list for ../jsarray.h:
#include <stdint.h>                     // for uint32_t
#include <sys/types.h>                  // for int32_t
#include "dist/include/js/RootingAPI.h"  // for HandleObject, Handle, etc
#include "jsapi.h"                      // for Value, HandleObject, etc
#include "jsfriendapi.h"                // for JSID_TO_ATOM
#include "jsobj.h"                      // for JSObject (ptr only), etc
#include "jspubtd.h"                    // for jsid
#include "jstypes.h"                    // for JSBool
#include "vm/String.h"                  // for JSAtom
namespace JS { class Value; }
namespace js { class ArrayObject; }  // lines 44-44
namespace js { class ExclusiveContext; }
struct JSContext;

I focused on addressing the “should remove these lines” #includes, and I did it manually. There’s also a script you can use to automatically do everything for you;  I don’t know how well it works.

Note that IWYU’s output is just plain wrong about 5% of the time — i.e. it says you can remove #includes that you clearly cannot.  (A lot of the time this seems to be because it hasn’t realized that a macro is needed.)  I also found that, while it produced output for all .cpp files, it only produced output for some of the .h files.  No idea why. Finally, it doesn’t know about local idioms; in particular, if you have platform-dependent code, its suggestions are often terrible because it only sees the files for the platform you are building on.

Good luck!