Category Archives: MemShrink

Generational GC has landed

Big news: late last week, generational garbage collection landed. It was backed out at first due to some test failures, but then re-landed and appears to have stuck.

This helps with performance. There are certain workloads where generational GC makes the code run much faster, and Firefox hasn’t been able to keep up with Chrome on these. For example, it has made Firefox slightly faster on the Octane benchmark, and there is apparently quite a bit of headroom for additional improvements.

Interestingly, its effect on memory usage has been small. I was hoping that the early filtering of many short-lived objects would make the tenured heap grow more slowly and thus reduce memory usage, but the addition of other structures (such as the nursery and store buffers) appears to have balanced that out.

The changes to the graphs at AWSY have been all within the noise, with the exception of the “Fresh start” and “Fresh start [+30s]” measurements in the “explicit” graph, both of which ticked up slightly. This isn’t cause for concern, however, because the corresponding “resident” graph hasn’t increased accordingly, and “resident” is the real metric of interest.

“Compacting Generational GC” is the #1 item on the current MemShrink “Big Ticket Items” list. Hopefully the “compacting” part of that, which still remains to be done, will produce some sizeable memory wins.

Nuwa has landed

A big milestone for Firefox OS was reached this week: after several bounces spread over several weeks, Nuwa finally landed and stuck.

Nuwa is a special Firefox OS process from which all other app processes are forked. (The name “Nuwa” comes from the Chinese creation goddess.) It allows lots of unchanging data (such as low-level Gecko things like XPCOM structures) to be shared among app processes, thanks to Linux’s copy-on-write forking semantics. This greatly increases the number of app processes that can be run concurrently, which is why it was the #3 item on the MemShrink “big ticket items” list.

One downside of this increased sharing is that it renders about:memory’s measurements less accurate than before, because about:memory does not know about the sharing, and so will over-report shared memory. Unfortunately, this is very difficult to fix, because about:memory’s reports are generated entirely within Firefox, whereas the sharing information is only available at the OS level. Something to be aware of.

Thanks to Cervantes Yu (Nuwa’s primary author), along with those who helped, including Thinker Li, Fabrice Desré, and Kyle Huey.

A slimmer and faster pdf.js

TL;DR: Firefox’s built-in PDF viewer is on track to gain some drastic improvements in memory consumption and speed when Firefox 29 is released in late April.

Firefox 19 introduced a built-in PDF viewer which allows PDF files to be viewed directly within Firefox. This is made possible by the pdf.js project, which implements a PDF viewer entirely in HTML and JavaScript.

This is a wonderful feature that makes the reading of PDFs on websites much less disruptive. However, pdf.js unfortunately suffers at times from high memory consumption. Enough, in fact, that it is currently the #5 item on the MemShrink project’s “big ticket items” list.

Recently, I made four improvements to pdf.js, each of which reduces its memory consumption greatly on certain kinds of PDF documents.

Image masks

The first improvement involved documents that use image masks, which are bitmaps that augment an image and dictate which pixels of the image should be drawn. Previously, the 1-bit-per-pixel (a.k.a 1bpp) image mask data was being expanded into 32bpp RGBA form (a typed array) in a web worker, such that every RGB element was 0 and the A element was either 0 or 255. This typed array was then passed to the main thread, which copied the data into an ImageData object and then put that data to a canvas.

The change was simple: instead of expanding the bitmap in the worker, just transfer it as-is to the main thread, and expand its contents directly into the ImageData object. This removes the RGBA typed array entirely.

I tested two documents on my Linux desktop, using a 64-bit trunk build of Firefox. Initially, when loading and then scrolling through the documents, physical memory consumption peaked at about 650 MiB for one document and about 800 MiB for the other. (The measurements varied somewhat from run to run, but were typically within 10 or 20 MiB of those numbers.) After making the improvement, the peak for both documents was about 400 MiB.

Image copies

The second improvement involved documents that use images. This includes scanned documents, which consist purely of one image per page.

Previously, we would make five copies of the 32bpp RGBA data for every image.

  1. The web worker would decode the image’s colour data (which can be in several different colour forms: RGB, grayscale, CMYK, etc.) from the PDF file into a 24bpp RGB typed array, and the opacity (a.k.a. alpha) data into an 8bpp A array.
  2. The web worker then combined the the RGB and A arrays into a new 32bpp RGBA typed array. The web worker then transferred this copy to the main thread. (This was a true transfer, not a copy, which is possible because it’s a typed array.)
  3. The main thread then created an ImageData object of the same dimensions as the typed array, and copied the typed array’s contents into it.
  4. The main thread then called putImageData() on the ImageData object. The C++ code within Gecko that implements putImageData() then created a new gfxImageSurface object and copied the data into it.
  5. Finally, the C++ code also created a Cairo surface from the gfxImageSurface.

Copies 4 and 5 were in C++ code and are both very short-lived. Copies 1, 2 and 3 were in JavaScript code and so lived for longer; at least until the next garbage collection occurred.

The change was in two parts. The first part involved putting the image data to the canvas in tiny strips, rather than doing the whole image at once. This was a fairly simple change, and it allowed copies 3, 4 and 5 to be reduced to a tiny fraction of their former size (typically 100x or more smaller). Fortunately, this caused no slow-down.

The second part involved decoding the colour and opacity data directly into a 32bpp RGBA array in simple cases (e.g. when no resizing is involved), skipping the creation of the intermediate RGB and A arrays. This was fiddly, but not too difficult.

If you scan a US letter document at 300 dpi, you get about 8.4 million pixels, which is about 1 MiB of data. (A4 paper is slightly larger.) If you expand this 1bpp data to 32bpp, you get about 32 MiB per page. So if you reduce five copies of this data to one, you avoid about 128 MiB of allocations per page.

Black and white scanned documents

The third improvement also involved images. Avoiding unnecessary RGBA copies seemed like a big win, but when I scrolled through large scanned documents the memory consumption still grew quickly as I scrolled through more pages. I eventually realized that although four of those five copies had been short-lived, one of them was very long-lived. More specifically, once you scroll past a page, its RGBA data is held onto until all pages that are subsequently scrolled past have been decoded. (The memory is eventually freed; it just takes longer than we’d like.) And fixing it is not easy, because it involves page-prioritization code isn’t easy to change without hurting other aspects of pdf.js’s performance.

However, I was able to optimize the common case of simple (e.g. unmasked, with no resizing) black and white images. Instead of expanding the 1bpp image data to 32bpp RGBA form in the web worker and passing that to the main thread, the code now just passes the 1bpp form directly. (Yep, that’s the same optimization that I used for the image masks.) The main thread can now handle both forms, and for the 1bpp form the expansion to the 32bpp form also only happens in tiny strips.

I used a 226 page scanned document to test this. At about 34 MiB per page, that’s over 7,200 MiB of pixel data when expanded to 32bpp RGBA form. And sure enough, prior to my change, scrolling quickly through the whole document caused Firefox’s physical memory consumption to reach about 7,800 MiB. With the fix applied, this number reduced to about 700 MiB. Furthermore, the time taken to render the final page dropped from about 200 seconds to about 25 seconds. Big wins!

The same optimization could be done for some non-black and white images (though the improvement will be smaller). But all the examples from bug reports were black and white, so that’s all I’ve done for now.

Parsing

The fourth and final improvement was unrelated to images. It involved the parsing of the PDF files. The parsing code reads files one byte at a time, and constructs lots of JavaScript strings by appending one character at a time. SpiderMonkey’s string implementation has an optimization that handles this kind of string construction efficiently, but the optimization doesn’t kick in until the strings have reached a certain length; on 64-bit platforms, this length is 24 characters. Unfortunately, many of the strings constructed during PDF parsing are shorter than this, so in order a string of length 20, for example, we would also create strings of length 1, 2, 3, …, 19.

It’s possible to change the threshold at which the optimization applies, but this would hurt the performance of some other workloads. The easier thing to do was to modify pdf.js itself. My change was to build up strings by appending single-char strings to an array, and then using Array.join to concatenate them together once the token’s end is reached. This works because JavaScript arrays are mutable (unlike strings which are immutable) and Array.join is efficient because it knows exactly how long the final string will be.

On a 4,155 page PDF, this change reduced the peak memory consumption during file loading from about 1130 MiB to about 800 MiB.

Profiling

The fact that I was able to make a number of large improvements in a short time indicates that pdf.js’s memory consumption has not previously been closely looked at. I think the main reason for this is that Firefox currently doesn’t have much in the way of tools for profiling the memory consumption of JavaScript code (though the devtools team is working right now to rectify this). So I will explain the tricks I used to find the places that needed optimization.

Choosing test cases

First I had to choose some test cases. Fortunately, this was easy, because we had numerous bug reports about high memory consumption which included test files. So I just used them.

Debugging print statements, part 1

For each test case, I looked first at about:memory. There were some very large “objects/malloc-heap/elements/non-asm.js” entries, which indicate that lots of memory is being used by JavaScript array elements. And looking at pdf.js code, typed arrays are used heavily, especially Uint8Array. The question is then: which typed arrays are taking up space?

To answer this question, I introduced the following new function.

function newUint8Array(length, context) {
  dump("newUint8Array(" + context + "): " + length + "\n");
  return new Uint8Array(length);
}

I then replaced every instance like this:

var a = new Uint8Array(n);

with something like this:

var a = newUint8Array(n, 1);

I used a different second argument for each instance. With this in place, when the code ran, I got a line printed for every allocation, identifying its length and location. With a small amount of post-processing, it was easy to identify which parts of the code were allocating large typed arrays. (This technique provides cumulative allocation measurements, not live data measurements, because it doesn’t know when these arrays are freed. Nonetheless, it was good enough.) I used this data in the first three optimizations.

Debugging print statements, part 2

Another trick involved modifying jemalloc, the heap allocator that Firefox uses. I instrumented jemalloc’s huge_malloc() function, which is responsible for allocations greater than 1 MiB. I printed the sizes of allocations, and at one point I also used gdb to break on every call to huge_malloc(). It was by doing this that I was able to work out that we were making five copies of the RGBA pixel data for each image. In particular, I wouldn’t have known about the C++ copies of that data if I hadn’t done this.

Notable strings

Finally, while looking again at about:memory, I saw some entries like the following, which are found by the “notable strings” detection.

> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=10, copies=6174, "http://sta")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=11, copies=6174, "http://stac")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=12, copies=6174, "http://stack")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=13, copies=6174, "http://stacks")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=14, copies=6174, "http://stacks.")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=15, copies=6174, "http://stacks.m")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=16, copies=6174, "http://stacks.ma")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=17, copies=6174, "http://stacks.mat")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=18, copies=6174, "http://stacks.math")/gc-heap

It doesn’t take much imagination to realize that strings were being built up one character at a time. This looked like the kind of thing that would happen during tokenization, and I found a file called parser.js and looked there. And I knew about SpiderMonkey’s optimization of string concatenation and asked on IRC about why it might not be happening, and Shu-yu Guo was able to tell me about the threshold. Once I knew that, switching to use Array.join wasn’t difficult.

What about Chrome’s heap profiler?

I’ve heard good things in the past about Chrome/Chromium’s heap profiling tools. And because pdf.js is just HTML and JavaScript, you can run it in other modern browsers. So I tried using Chromium’s tools, but the results were very disappointing.

Remember the 226 page scanned document I mentioned earlier, where over 7,200 MiB of pixel data was created? I loaded that document into Chromium and used the “Take Heap Snapshot” tool, which gave the following snapshot.

Heap Snapshot from Chromium

At the top left, it claims that the heap was just over 50 MiB in size. Near the bottom, it claims that 225 Uint8Array objects had a “shallow” size of 19,608 bytes, and a “retained” size of 26,840 bytes. This seemed bizarre, so I double-checked. Sure enough, the operating system (via top) reported that the relevant chromium-browser process was using over 8 GiB of physical memory at this point.

So why the tiny measurements? I suspect what’s happening is that typed arrays are represented by a small header struct which is allocated on the GC heap, and it points to the (much larger) element data which is allocated on the malloc heap. So if the snapshot is just measuring the GC heap, in this case it’s accurate but not useful. (I’d love to hear if anyone can confirm or refute this hypothesis.) I also tried the “Record Heap Allocations” tool but it gave much the same results.

Status

These optimizations have landed in the master pdf.js repository, and were imported into Firefox 29, which is currently on the Aurora branch, and is on track to be released on April 29.

The optimizations are also on track to be imported into the Firefox OS 1.3 and 1.3T branches. I had hoped to show that some PDFs that were previously unloadable on Firefox OS would now be loadable. Unfortunately, I am unable to load even the simplest PDFs on my Buri (a.k.a. Alcatel OneTouch), because the PDF viewer app appears to consistently run out of gralloc memory just before the first page is displayed. Ben Kelly suggested that Async pan zoom (APZ) might be responsible, but disabling it didn’t help. If anybody knows more about this please contact me.

Finally, I’ve fixed most of the major memory consumption problems with the PDFs that I’m aware of. If you know of other PDFs that still cause pdf.js to consume large amounts of memory, please let me know. Thanks.

A big step towards generational and compacting GC

People frequently ask me for status updates on generational GC, and I usually say I’ll tell them when something notable happens. Well, something notable just happened: exact rooting landed.

What is exact rooting? In order to support generational and/or compacting GC, you need to be able to move GC-allocated things such as objects around. This means you can’t have raw C++ pointers to any objects that might move; instead, you need some kind of indirect pointer that can be updated when necessary.

Unfortunately, both the JS engine and Gecko have a lot of pointers to GC-allocated things. The process of checking and converting them has been the main part of a task called “exact rooting”, and that’s what just finished. This has required an enormous amount of what is essentially very tedious work. Jim Blandy summarized it nicely, as follows.

I’ve never heard of a major project escaping from conservative GC once it had entered that state of sin; nor have I heard of anyone implementing a moving collector after starting with a non-moving collector. So, doing *both* is impressive. I hope it pays off big!

Major kudos to Terrence Cole, Steve Fink, Jon Coppeard, Brian Hackett, and the small army of other helpers who did this. Now that they’ve finished eating this gigantic serving of vegetables, they can move onto dessert, i.e. making the GC generational and compacting.

System-wide memory measurement for Firefox OS

Have you ever wondered exactly how all the physical memory in a Firefox OS device is used?   Wonder no more.  I just landed a system-wide memory reporter which works on any Firefox product running on a Linux system.  This includes desktop Firefox builds on Linux, Firefox for Android, and Firefox OS.

This memory reporter is a bit different to the existing ones, which work entirely within Mozilla processes.  The new reporter provides measurements for the entire system, including every user-space process (Mozilla or non-Mozilla) that is running.  It’s aimed primarily at profiling Firefox OS devices, because we have full control over the code running on those devices, and so it’s there that a system-wide view is most useful.

Here is some example output from a GeeksPhone Keon.

System
Other Measurements 
397.24 MB (100.0%) -- mem
├──215.41 MB (54.23%) ── free
├──105.72 MB (26.61%) -- processes
│  ├───57.59 MB (14.50%) -- process(/system/b2g/b2g, pid=709)
│  │   ├──42.29 MB (10.65%) -- anonymous
│  │   │  ├──42.25 MB (10.63%) -- outside-brk
│  │   │  │  ├──41.94 MB (10.56%) ── [rw-p] [69]
│  │   │  │  └───0.31 MB (00.08%) ++ (2 tiny)
│  │   │  └───0.05 MB (00.01%) ── brk-heap/[rw-p]
│  │   ├──13.03 MB (03.28%) -- shared-libraries
│  │   │  ├───8.39 MB (02.11%) -- libxul.so
│  │   │  │   ├──6.05 MB (01.52%) ── [r-xp]
│  │   │  │   └──2.34 MB (00.59%) ── [rw-p]
│  │   │  └───4.64 MB (01.17%) ++ (69 tiny)
│  │   └───2.27 MB (00.57%) ++ (2 tiny)
│  ├───21.73 MB (05.47%) -- process(/system/b2g/plugin-container, pid=756)
│  │   ├──12.49 MB (03.14%) -- anonymous
│  │   │  ├──12.48 MB (03.14%) -- outside-brk
│  │   │  │  ├──12.41 MB (03.12%) ── [rw-p] [30]
│  │   │  │  └───0.07 MB (00.02%) ++ (2 tiny)
│  │   │  └───0.02 MB (00.00%) ── brk-heap/[rw-p]
│  │   ├───8.88 MB (02.23%) -- shared-libraries
│  │   │   ├──7.33 MB (01.85%) -- libxul.so
│  │   │   │  ├──4.99 MB (01.26%) ── [r-xp]
│  │   │   │  └──2.34 MB (00.59%) ── [rw-p]
│  │   │   └──1.54 MB (00.39%) ++ (50 tiny)
│  │   └───0.36 MB (00.09%) ++ (2 tiny)
│  ├───14.08 MB (03.54%) -- process(/system/b2g/plugin-container, pid=836)
│  │   ├───7.53 MB (01.89%) -- shared-libraries
│  │   │   ├──6.02 MB (01.52%) ++ libxul.so
│  │   │   └──1.51 MB (00.38%) ++ (47 tiny)
│  │   ├───6.24 MB (01.57%) -- anonymous
│  │   │   ├──6.23 MB (01.57%) -- outside-brk
│  │   │   │  ├──6.23 MB (01.57%) ── [rw-p] [22]
│  │   │   │  └──0.00 MB (00.00%) ── [r--p]
│  │   │   └──0.01 MB (00.00%) ── brk-heap/[rw-p]
│  │   └───0.31 MB (00.08%) ++ (2 tiny)
│  └───12.32 MB (03.10%) ++ (23 tiny)
└───76.11 MB (19.16%) ── other

The data is obtained entirely from the operating system, specifically from /proc/meminfo and the /proc/<pid>/smaps files, which are files provided by the Linux kernel specifically for measuring memory consumption.

I wish that the mem entry at the top was the amount of physical memory available. Unfortunately there is no way to get that on a Linux system, and so it’s instead the MemTotal value from /proc/meminfo, which is “Total usable RAM (i.e. physical RAM minus a few reserved bits and the kernel binary code)”.  And if you’re wondering about the exact meaning of the other entries, as usual if you hover the cursor over an entry in about:memory you’ll get a tool-tip explaining what it means.

The measurements given for each process are the PSS (proportional set size) measurements.  These attribute any shared memory equally among all processes that share it, and so PSS is the only measurement that can be sensibly summed across processes (unlike “Size” or “RSS”, for example).

For each process there is a wealth of detail about static code and data.  (The above example only shows a tiny fraction of it, because a number of the sub-trees are collapsed.  If you were viewing it in about:memory, you could expand and collapse sub-trees to your heart’s content.)  Unfortunately, there is little information about anonymous mappings, which constitute much of the non-static memory consumption.  I have some patches that will add an extra level of detail there, distinguishing major regions such as the jemalloc heap, the JS GC heap, and JS JIT code.  For more detail than that, the existing per-process memory reports in about:memory can be consulted.  Unfortunately the new system-wide reporter cannot be sensibly combined with the existing per-process memory reporters because the latter are unaware of implicit sharing between processes.  (And note that the amount of implicit sharing is increased significantly by the new Nuwa process.)

Because this works with our existing memory reporting infrastructure, anyone already using the get_about_memory.py script with Firefox OS will automatically get these reports along with all the usual ones once they update their source code, and the system-wide reports can be loaded and viewed in about:memory as usual. On Firefox and Firefox for Android, you’ll need to set the memory.system_memory_reporter flag in about:config to enable it.

My hope is that this reporter will supplant most or all of the existing tools that are commonly used to understand system-wide memory consumption on Firefox OS devices, such as ps, top and procrank.  And there will certainly be other interesting, available OS-level measurements that are not currently obtained. For example, Jed Davis has plans to measure the pmem subsystem.  Please file a bug or email me if you have other suggestions for adding such measurements.

DMD now works on Windows

DMD is our tool for improving Firefox’s memory reporting.  It helps identify where new memory reporters need to be added in order to reduce the “heap-unclassified” value in about:memory.

DMD has always worked well on Linux, and moderately well on Mac (it is crashy for some people).  And it works on Android and B2G.  But it has never worked on Windows.

So I’m happy to report that DMD now does work on Windows, thanks to the excellent efforts of Catalin Iacob.  If you’re on Windows and you’ve been seeing high “heap-unclassified” values, and you’re able to build Firefox yourself, please give DMD a try.

MemShrink progress, final

I was due to write a MemShrink progress report today, but I’ve decided that after almost 2.5 years, my reserves of enthusiasm for these regular reports has been exhausted.  Sorry!

I do still plan to write posts when significant fixes relating to memory consumption are made.  (For example, when generational GC lands, you’ll hear about it here.)  I will also continue to periodically update the MemShrink “big ticket items” list.  And MemShrink meetings will continue, so MemShrink-tagged bugs will still be triaged.  And for those of you who read the weekly Platform meeting notes, I will continue to write MemShrink updates there.  So don’t despair — good things will continue to happen, but they’ll just be marginally less visible.

Libraries should permit custom allocators

Some C and C++ libraries permit the use of custom allocators, which are registered through some kind of external API.  For example, the following libraries used by Firefox provide this facility.

  • FreeType provides this via the FT_MemoryRec_ argument of the FT_New_Library() function.
  • ICU provides this via the u_setMemoryFunctions() function.
  • SQLite provides this via the sqlite3_config() function.

This gives the users of these libraries additional flexibility that can be very helpful.  For example, in Firefox we provide custom allocators that measure the size of all the live allocations done by the library;  these measurements are shown in about:memory.

In contrast, libraries that don’t allow custom allocator are very hard to account for in about:memory.  Such libraries are major contributors to the dreaded “heap-unclassified” value in about:memory.  These include Cairo and the WebRTC libraries.

Now, supporting custom allocators in a library takes some effort.  You have to be careful to always allocate in a fashion that will use the custom allocators if they have been registered.  Direct calls to vanilla allocation/free functions like malloc(), realloc(), and free() must be avoided.  For example, SpiderMonkey allows custom allocators (although Firefox doesn’t need to use that functionality), and I just fixed a handful of cases where it was accidentally using vanilla allocation/free functions.

But, it’s a very useful facility to provide, and I encourage all library writers to consider it.

MemShrink progress, week 121–124

It’s been a quiet but steady four weeks for MemShrink with 19 bugs fixed, including several leaks.

The only fix that I feel is worth highlighting is bug 918207, in which I added support for fast, coarse-grained measurement of a tab’s memory consumption.  The implemented machinery isn’t currently exposed through the UI, though there are two bugs open that will use it:  a simple one that will implement a command for the developer toolbar, and a more complex one that will implement a constantly-updating memory monitor widget for the devtools pane.

See you next time!

MemShrink progress, week 117–120

Lots of important MemShrink stuff has happened in the last 27 days:  22 bugs were fixed, and some of them were very important indeed.

Images

Timothy Nikkel fixed bug 847223, which greatly reduces peak memory consumption when loading image-heavy pages.  The combination of this fix and the fix from bug 689623 — which Timothy finished earlier this year and which shipped in Firefox 24 — have completely solved our longstanding memory consumption problems with image-heavy pages!  This was the #1 item on the MemShrink big ticket items list.

To give you an idea of the effect of these two fixes, I did some rough measurements on a page containing thousands of images, which are summarized in the graph below.

Improvements in Firefox's Memory Consumption on One Image-heavy Page

First consider Firefox 23, which had neither fix, and which is represented by the purple line in the graph.  When loading the page, physical memory consumption would jump to about 3 GB, because every image in the page was decoded (a.k.a. decompressed).  That decoded data was retained so long as the page was in the foreground.

Next, consider Firefox 24 (and 25), which had the first fix, and which is represented by the green line on the graph.  When loading the page, physical memory consumption would still jump to almost 3 GB, because the images are still decoded.  But it would soon drop down to a few hundred MB, as the decoded data for non-visible images was discarded, and stay there (with some minor variations) while scrolling around the page. So the scrolling behaviour was much improved, but the memory consumption spike still occurred, which could still cause paging, out-of-memory problems, and the like.

Finally consider Firefox 26 (currently in the Aurora channel), which has both fixes, and which is represented by the red line on the graph.  When loading the page, physical memory jumps to a few hundred MB and stays there.  Furthermore, the loading time for the page dropped from ~5 seconds to ~1 second, because the unnecessary decoding of most of the images is skipped.

These measurements were quite rough, and there was quite a bit of variation, but the magnitude of the improvement is obvious.  And all these memory consumption improvements have occurred without hurting scrolling performance.  This is fantastic work by Timothy, and great news for all Firefox users who visit image-heavy pages.

[Update: Timothy emailed me this:  "Only minor thing is that we still need to turn it on for b2g. We flipped the pref for fennec on central (it's not on aurora though). I've been delayed in testing b2g though, hopefully we can flip the pref on b2g soon. That's the last major thing before declaring it totally solved."]

[Update 2: This has hit Hacker News.]

NuWa

Cervantes Yu landed Nuwa, which is a low-level optimization of B2G.  Quoting from the big ticket items list (where this was item #3):

Nuwa… aims to give B2G a pre-initialized template process from which every subsequent process will be forked… it greatly increases the ability for B2G processes to share unchanging data.  In one test run, this increased the number of apps that could be run simultaneously from five to nine

Nuwa is currently disabled by default, so that Cervantes can fine-tune it, but I believe it’s intended to ship with B2G version 1.3.  Fingers crossed it makes it!

Memory Reporting

I made some major simplifications to our memory reporting infrastructure, paving the way for future improvements.

First, we used to have two kinds of memory reporters:  uni-reporters (which report a single measurement) and multi-reporters (which report multiple measurements).  Multi-reporters, unsurprisingly, subsume uni-reporters, and so I got rid of uni-reporters, which simplified quite a bit of code.

Second, I removed about:compartments and folded its functionality into about:memory.  I originally created about:compartments at the height of our zombie compartment problem.  But ever since Kyle Huey made it more or less impossible for add-ons to cause zombie compartments, about:compartments has hardly been used.   I was able to fold about:compartments’ data into about:memory, so there’s no functionality loss, and this change simplified quite a bit more code.  If you visit about:compartments now you’ll get a message telling you to visit about:memory.

Third, I removed the smaps (size/rss/pss/swap) memory reporters.  These were only present on Linux, they were of questionable utility, and they complicated about:memory significantly.

Finally, I fixed a leak in about:memory.  Yeah, it was my fault.  Sorry!

Summit

The Mozilla summit is coming up!  In fact, I’m writing this report a day earlier than normal because I will be travelling to Toronto tomorrow.  Please forgive any delayed responses to comments, because I will be travelling for almost 24 hours to get there.