Category Archives: MemShrink

An even slimmer pdf.js

TL;DR: Firefox’s built-in PDF viewer is on track to benefit from additional large reductions in memory consumption when Firefox 33 is released in mid-October.

Earlier this year I made some major improvements to the memory usage and speed of pdf.js, Firefox’s built-in PDF viewer. At the time I was quite pleased with myself. I had picked all the low-hanging fruit and reached a point of diminishing returns.

Some unexpected measurements

Shortly after that, while on a plane I tried pdf.js on my Mac laptop. I was horrified to find that memory usage was much higher than on the Linux desktop machine on which I had done my optimizations. The first document I measured made Firefox’s RSS (resident set size, a.k.a. physical memory usage) peak at over 1,000 MiB. The corresponding number on my Linux desktop had been 400 MiB! What was going on?

The short answer is “canvases”. pdf.js uses HTML canvas elements to render each PDF page. These can be millions of pixels in size, and each canvas pixel takes 32-bits of RGBA data. But on Linux this data is not stored within the Firefox process. Instead, it gets passed to the X server. Firefox’s about:memory page does report the canvas usage under the “canvas-2d-pixels” and “gfx-surface-xlib” labels, but I hadn’t seen those — during my optimization work I had only measured the RSS of the Firefox process. RSS is usually the best single number to focus on, but in this case it was highly misleading.

In contrast, on Mac the canvas data is stored within the process, which is why Firefox’s RSS is so much higher on Mac. (There’s also the fact that my MacBook has a retina screen, which means the canvases used have approximately four times as many pixels as the canvases used on my Linux machine.)

Fixing the problem

It turns out that pdf.js was intentionally caching an overly generous number of canvases (20) and then unintentionally failing to dispose of them in a timely manner. This could result in hundreds of canvases being held onto unnecessarily if you scrolled quickly through a large document. On my MacBook each canvas is over 20 MiB, so the resultant memory spike could be enormous.

Happily, I was able to fix this behaviour with four tiny patches. pdf.js now caches only 10 canvases, and disposes of any excess ones immediately.


I measured the effects of both rounds of optimization on the following three documents on my Mac.

  • TraceMonkey: This is an academic paper about the TraceMonkey JIT compiler. It’s a 14 page Latex document containing several pictures and graphs.
  • Telefónica: This is the 2009 annual report from Telefónica. It’s a 122 page full-colour document with many graphs, tables and pictures.
  • Decontamination: This is an old report about the decontamination of an industrial site in New Jersey. It’s a 226 page scan of a paper document that mostly consists of black and white typewritten text and crude graphs, with only few pages featuring colour.

For each document, I measure the peak RSS while scrolling quickly from the first page to the last by holding down the “fn” and down keys at the same time.  Although this is a stress test, it’s not unrealistic; it’s easy to imagine people quickly scrolling through 10s or even 100s of pages of a PDF to find a particular page.

I did this three times for each document and picked the highest value. The results are shown by the following graph. The “Fx28” bars show the original peak RSS values; the “Fx29” bars show the peak RSS values after my first round of optimizations; and the “Fx33” bars show the peak RSS values after my second round of optimizations.


Comparing Firefox 28 and Firefox 33, the reductions in peak RSS for the three documents are 1.2x, 4.4x, and 7.8x. It makes sense that the relative improvements increase with the length of the documents.

Despite these major improvements, pdf.js still uses substantially more memory than native PDF viewers. So we need to keep chipping away… but it’s also worth recognizing how far we’ve come.


These patches have landed in the master pdf.js repository. They have not yet imported into Firefox’s code, but this will happen at some point during the development cycle for Firefox 33. Firefox 33 is on track to be released in mid-October. data is now exportable and diffable (a.k.a. AWSY) tracks Firefox’s memory usage on a basic workload that opens lots of websites. It’s not a perfect tool — you shouldn’t consider its measurements as a reliable proxy for Firefox’s memory usage in general — but it does help detect regressions.

One thing it doesn’t support is doing diffs between separate runs. Until now, that is! Thanks to work done by Eric Rahm, it’s now possible to download the data for each snapshot done during a run. This file can then be loaded in about:memory. It’s also possible to download the data for two snapshots and diff them in about:memory. Yay! This diff workflow isn’t super slick, as it requires the downloading of two files and then the loading of them in about:memory. But it’s a lot better than manually eyeballing two sets of data in two separate AWSY tabs, which was the best we could do previously. Furthermore, AWSY and about:memory already duplicate some functionality, and this implementation avoids increasing the amount of duplication.

To do the export, select a single run (zooming in on the graph appropriately) and click on the red “[export]” link next to the appropriate snapshot, as seen in the following screenshot.

Screenshot of AWSY showing the export links

Once it has finished generating the data, the “[export]” link changes to “[download]”, and you can click on it again to do the download.

This is a first step towards improving AWSY so that it can detect memory usage regressions with much higher sensitivity than it currently has.

Generational GC has landed

Big news: late last week, generational garbage collection landed. It was backed out at first due to some test failures, but then re-landed and appears to have stuck.

This helps with performance. There are certain workloads where generational GC makes the code run much faster, and Firefox hasn’t been able to keep up with Chrome on these. For example, it has made Firefox slightly faster on the Octane benchmark, and there is apparently quite a bit of headroom for additional improvements.

Interestingly, its effect on memory usage has been small. I was hoping that the early filtering of many short-lived objects would make the tenured heap grow more slowly and thus reduce memory usage, but the addition of other structures (such as the nursery and store buffers) appears to have balanced that out.

The changes to the graphs at AWSY have been all within the noise, with the exception of the “Fresh start” and “Fresh start [+30s]” measurements in the “explicit” graph, both of which ticked up slightly. This isn’t cause for concern, however, because the corresponding “resident” graph hasn’t increased accordingly, and “resident” is the real metric of interest.

“Compacting Generational GC” is the #1 item on the current MemShrink “Big Ticket Items” list. Hopefully the “compacting” part of that, which still remains to be done, will produce some sizeable memory wins.

Nuwa has landed

A big milestone for Firefox OS was reached this week: after several bounces spread over several weeks, Nuwa finally landed and stuck.

Nuwa is a special Firefox OS process from which all other app processes are forked. (The name “Nuwa” comes from the Chinese creation goddess.) It allows lots of unchanging data (such as low-level Gecko things like XPCOM structures) to be shared among app processes, thanks to Linux’s copy-on-write forking semantics. This greatly increases the number of app processes that can be run concurrently, which is why it was the #3 item on the MemShrink “big ticket items” list.

One downside of this increased sharing is that it renders about:memory’s measurements less accurate than before, because about:memory does not know about the sharing, and so will over-report shared memory. Unfortunately, this is very difficult to fix, because about:memory’s reports are generated entirely within Firefox, whereas the sharing information is only available at the OS level. Something to be aware of.

Thanks to Cervantes Yu (Nuwa’s primary author), along with those who helped, including Thinker Li, Fabrice Desré, and Kyle Huey.

A slimmer and faster pdf.js

TL;DR: Firefox’s built-in PDF viewer is on track to gain some drastic improvements in memory consumption and speed when Firefox 29 is released in late April.

Firefox 19 introduced a built-in PDF viewer which allows PDF files to be viewed directly within Firefox. This is made possible by the pdf.js project, which implements a PDF viewer entirely in HTML and JavaScript.

This is a wonderful feature that makes the reading of PDFs on websites much less disruptive. However, pdf.js unfortunately suffers at times from high memory consumption. Enough, in fact, that it is currently the #5 item on the MemShrink project’s “big ticket items” list.

Recently, I made four improvements to pdf.js, each of which reduces its memory consumption greatly on certain kinds of PDF documents.

Image masks

The first improvement involved documents that use image masks, which are bitmaps that augment an image and dictate which pixels of the image should be drawn. Previously, the 1-bit-per-pixel (a.k.a 1bpp) image mask data was being expanded into 32bpp RGBA form (a typed array) in a web worker, such that every RGB element was 0 and the A element was either 0 or 255. This typed array was then passed to the main thread, which copied the data into an ImageData object and then put that data to a canvas.

The change was simple: instead of expanding the bitmap in the worker, just transfer it as-is to the main thread, and expand its contents directly into the ImageData object. This removes the RGBA typed array entirely.

I tested two documents on my Linux desktop, using a 64-bit trunk build of Firefox. Initially, when loading and then scrolling through the documents, physical memory consumption peaked at about 650 MiB for one document and about 800 MiB for the other. (The measurements varied somewhat from run to run, but were typically within 10 or 20 MiB of those numbers.) After making the improvement, the peak for both documents was about 400 MiB.

Image copies

The second improvement involved documents that use images. This includes scanned documents, which consist purely of one image per page.

Previously, we would make five copies of the 32bpp RGBA data for every image.

  1. The web worker would decode the image’s colour data (which can be in several different colour forms: RGB, grayscale, CMYK, etc.) from the PDF file into a 24bpp RGB typed array, and the opacity (a.k.a. alpha) data into an 8bpp A array.
  2. The web worker then combined the the RGB and A arrays into a new 32bpp RGBA typed array. The web worker then transferred this copy to the main thread. (This was a true transfer, not a copy, which is possible because it’s a typed array.)
  3. The main thread then created an ImageData object of the same dimensions as the typed array, and copied the typed array’s contents into it.
  4. The main thread then called putImageData() on the ImageData object. The C++ code within Gecko that implements putImageData() then created a new gfxImageSurface object and copied the data into it.
  5. Finally, the C++ code also created a Cairo surface from the gfxImageSurface.

Copies 4 and 5 were in C++ code and are both very short-lived. Copies 1, 2 and 3 were in JavaScript code and so lived for longer; at least until the next garbage collection occurred.

The change was in two parts. The first part involved putting the image data to the canvas in tiny strips, rather than doing the whole image at once. This was a fairly simple change, and it allowed copies 3, 4 and 5 to be reduced to a tiny fraction of their former size (typically 100x or more smaller). Fortunately, this caused no slow-down.

The second part involved decoding the colour and opacity data directly into a 32bpp RGBA array in simple cases (e.g. when no resizing is involved), skipping the creation of the intermediate RGB and A arrays. This was fiddly, but not too difficult.

If you scan a US letter document at 300 dpi, you get about 8.4 million pixels, which is about 1 MiB of data. (A4 paper is slightly larger.) If you expand this 1bpp data to 32bpp, you get about 32 MiB per page. So if you reduce five copies of this data to one, you avoid about 128 MiB of allocations per page.

Black and white scanned documents

The third improvement also involved images. Avoiding unnecessary RGBA copies seemed like a big win, but when I scrolled through large scanned documents the memory consumption still grew quickly as I scrolled through more pages. I eventually realized that although four of those five copies had been short-lived, one of them was very long-lived. More specifically, once you scroll past a page, its RGBA data is held onto until all pages that are subsequently scrolled past have been decoded. (The memory is eventually freed; it just takes longer than we’d like.) And fixing it is not easy, because it involves page-prioritization code isn’t easy to change without hurting other aspects of pdf.js’s performance.

However, I was able to optimize the common case of simple (e.g. unmasked, with no resizing) black and white images. Instead of expanding the 1bpp image data to 32bpp RGBA form in the web worker and passing that to the main thread, the code now just passes the 1bpp form directly. (Yep, that’s the same optimization that I used for the image masks.) The main thread can now handle both forms, and for the 1bpp form the expansion to the 32bpp form also only happens in tiny strips.

I used a 226 page scanned document to test this. At about 34 MiB per page, that’s over 7,200 MiB of pixel data when expanded to 32bpp RGBA form. And sure enough, prior to my change, scrolling quickly through the whole document caused Firefox’s physical memory consumption to reach about 7,800 MiB. With the fix applied, this number reduced to about 700 MiB. Furthermore, the time taken to render the final page dropped from about 200 seconds to about 25 seconds. Big wins!

The same optimization could be done for some non-black and white images (though the improvement will be smaller). But all the examples from bug reports were black and white, so that’s all I’ve done for now.


The fourth and final improvement was unrelated to images. It involved the parsing of the PDF files. The parsing code reads files one byte at a time, and constructs lots of JavaScript strings by appending one character at a time. SpiderMonkey’s string implementation has an optimization that handles this kind of string construction efficiently, but the optimization doesn’t kick in until the strings have reached a certain length; on 64-bit platforms, this length is 24 characters. Unfortunately, many of the strings constructed during PDF parsing are shorter than this, so in order a string of length 20, for example, we would also create strings of length 1, 2, 3, …, 19.

It’s possible to change the threshold at which the optimization applies, but this would hurt the performance of some other workloads. The easier thing to do was to modify pdf.js itself. My change was to build up strings by appending single-char strings to an array, and then using Array.join to concatenate them together once the token’s end is reached. This works because JavaScript arrays are mutable (unlike strings which are immutable) and Array.join is efficient because it knows exactly how long the final string will be.

On a 4,155 page PDF, this change reduced the peak memory consumption during file loading from about 1130 MiB to about 800 MiB.


The fact that I was able to make a number of large improvements in a short time indicates that pdf.js’s memory consumption has not previously been closely looked at. I think the main reason for this is that Firefox currently doesn’t have much in the way of tools for profiling the memory consumption of JavaScript code (though the devtools team is working right now to rectify this). So I will explain the tricks I used to find the places that needed optimization.

Choosing test cases

First I had to choose some test cases. Fortunately, this was easy, because we had numerous bug reports about high memory consumption which included test files. So I just used them.

Debugging print statements, part 1

For each test case, I looked first at about:memory. There were some very large “objects/malloc-heap/elements/non-asm.js” entries, which indicate that lots of memory is being used by JavaScript array elements. And looking at pdf.js code, typed arrays are used heavily, especially Uint8Array. The question is then: which typed arrays are taking up space?

To answer this question, I introduced the following new function.

function newUint8Array(length, context) {
  dump("newUint8Array(" + context + "): " + length + "\n");
  return new Uint8Array(length);

I then replaced every instance like this:

var a = new Uint8Array(n);

with something like this:

var a = newUint8Array(n, 1);

I used a different second argument for each instance. With this in place, when the code ran, I got a line printed for every allocation, identifying its length and location. With a small amount of post-processing, it was easy to identify which parts of the code were allocating large typed arrays. (This technique provides cumulative allocation measurements, not live data measurements, because it doesn’t know when these arrays are freed. Nonetheless, it was good enough.) I used this data in the first three optimizations.

Debugging print statements, part 2

Another trick involved modifying jemalloc, the heap allocator that Firefox uses. I instrumented jemalloc’s huge_malloc() function, which is responsible for allocations greater than 1 MiB. I printed the sizes of allocations, and at one point I also used gdb to break on every call to huge_malloc(). It was by doing this that I was able to work out that we were making five copies of the RGBA pixel data for each image. In particular, I wouldn’t have known about the C++ copies of that data if I hadn’t done this.

Notable strings

Finally, while looking again at about:memory, I saw some entries like the following, which are found by the “notable strings” detection.

> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=10, copies=6174, "http://sta")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=11, copies=6174, "http://stac")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=12, copies=6174, "http://stack")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=13, copies=6174, "http://stacks")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=14, copies=6174, "http://stacks.")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=15, copies=6174, "http://stacks.m")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=16, copies=6174, "")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=17, copies=6174, "http://stacks.mat")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=18, copies=6174, "http://stacks.math")/gc-heap

It doesn’t take much imagination to realize that strings were being built up one character at a time. This looked like the kind of thing that would happen during tokenization, and I found a file called parser.js and looked there. And I knew about SpiderMonkey’s optimization of string concatenation and asked on IRC about why it might not be happening, and Shu-yu Guo was able to tell me about the threshold. Once I knew that, switching to use Array.join wasn’t difficult.

What about Chrome’s heap profiler?

I’ve heard good things in the past about Chrome/Chromium’s heap profiling tools. And because pdf.js is just HTML and JavaScript, you can run it in other modern browsers. So I tried using Chromium’s tools, but the results were very disappointing.

Remember the 226 page scanned document I mentioned earlier, where over 7,200 MiB of pixel data was created? I loaded that document into Chromium and used the “Take Heap Snapshot” tool, which gave the following snapshot.

Heap Snapshot from Chromium

At the top left, it claims that the heap was just over 50 MiB in size. Near the bottom, it claims that 225 Uint8Array objects had a “shallow” size of 19,608 bytes, and a “retained” size of 26,840 bytes. This seemed bizarre, so I double-checked. Sure enough, the operating system (via top) reported that the relevant chromium-browser process was using over 8 GiB of physical memory at this point.

So why the tiny measurements? I suspect what’s happening is that typed arrays are represented by a small header struct which is allocated on the GC heap, and it points to the (much larger) element data which is allocated on the malloc heap. So if the snapshot is just measuring the GC heap, in this case it’s accurate but not useful. (I’d love to hear if anyone can confirm or refute this hypothesis.) I also tried the “Record Heap Allocations” tool but it gave much the same results.


These optimizations have landed in the master pdf.js repository, and were imported into Firefox 29, which is currently on the Aurora branch, and is on track to be released on April 29.

The optimizations are also on track to be imported into the Firefox OS 1.3 and 1.3T branches. I had hoped to show that some PDFs that were previously unloadable on Firefox OS would now be loadable. Unfortunately, I am unable to load even the simplest PDFs on my Buri (a.k.a. Alcatel OneTouch), because the PDF viewer app appears to consistently run out of gralloc memory just before the first page is displayed. Ben Kelly suggested that Async pan zoom (APZ) might be responsible, but disabling it didn’t help. If anybody knows more about this please contact me.

Finally, I’ve fixed most of the major memory consumption problems with the PDFs that I’m aware of. If you know of other PDFs that still cause pdf.js to consume large amounts of memory, please let me know. Thanks.

A big step towards generational and compacting GC

People frequently ask me for status updates on generational GC, and I usually say I’ll tell them when something notable happens. Well, something notable just happened: exact rooting landed.

What is exact rooting? In order to support generational and/or compacting GC, you need to be able to move GC-allocated things such as objects around. This means you can’t have raw C++ pointers to any objects that might move; instead, you need some kind of indirect pointer that can be updated when necessary.

Unfortunately, both the JS engine and Gecko have a lot of pointers to GC-allocated things. The process of checking and converting them has been the main part of a task called “exact rooting”, and that’s what just finished. This has required an enormous amount of what is essentially very tedious work. Jim Blandy summarized it nicely, as follows.

I’ve never heard of a major project escaping from conservative GC once it had entered that state of sin; nor have I heard of anyone implementing a moving collector after starting with a non-moving collector. So, doing *both* is impressive. I hope it pays off big!

Major kudos to Terrence Cole, Steve Fink, Jon Coppeard, Brian Hackett, and the small army of other helpers who did this. Now that they’ve finished eating this gigantic serving of vegetables, they can move onto dessert, i.e. making the GC generational and compacting.

System-wide memory measurement for Firefox OS

Have you ever wondered exactly how all the physical memory in a Firefox OS device is used?   Wonder no more.  I just landed a system-wide memory reporter which works on any Firefox product running on a Linux system.  This includes desktop Firefox builds on Linux, Firefox for Android, and Firefox OS.

This memory reporter is a bit different to the existing ones, which work entirely within Mozilla processes.  The new reporter provides measurements for the entire system, including every user-space process (Mozilla or non-Mozilla) that is running.  It’s aimed primarily at profiling Firefox OS devices, because we have full control over the code running on those devices, and so it’s there that a system-wide view is most useful.

Here is some example output from a GeeksPhone Keon.

Other Measurements 
397.24 MB (100.0%) -- mem
├──215.41 MB (54.23%) ── free
├──105.72 MB (26.61%) -- processes
│  ├───57.59 MB (14.50%) -- process(/system/b2g/b2g, pid=709)
│  │   ├──42.29 MB (10.65%) -- anonymous
│  │   │  ├──42.25 MB (10.63%) -- outside-brk
│  │   │  │  ├──41.94 MB (10.56%) ── [rw-p] [69]
│  │   │  │  └───0.31 MB (00.08%) ++ (2 tiny)
│  │   │  └───0.05 MB (00.01%) ── brk-heap/[rw-p]
│  │   ├──13.03 MB (03.28%) -- shared-libraries
│  │   │  ├───8.39 MB (02.11%) --
│  │   │  │   ├──6.05 MB (01.52%) ── [r-xp]
│  │   │  │   └──2.34 MB (00.59%) ── [rw-p]
│  │   │  └───4.64 MB (01.17%) ++ (69 tiny)
│  │   └───2.27 MB (00.57%) ++ (2 tiny)
│  ├───21.73 MB (05.47%) -- process(/system/b2g/plugin-container, pid=756)
│  │   ├──12.49 MB (03.14%) -- anonymous
│  │   │  ├──12.48 MB (03.14%) -- outside-brk
│  │   │  │  ├──12.41 MB (03.12%) ── [rw-p] [30]
│  │   │  │  └───0.07 MB (00.02%) ++ (2 tiny)
│  │   │  └───0.02 MB (00.00%) ── brk-heap/[rw-p]
│  │   ├───8.88 MB (02.23%) -- shared-libraries
│  │   │   ├──7.33 MB (01.85%) --
│  │   │   │  ├──4.99 MB (01.26%) ── [r-xp]
│  │   │   │  └──2.34 MB (00.59%) ── [rw-p]
│  │   │   └──1.54 MB (00.39%) ++ (50 tiny)
│  │   └───0.36 MB (00.09%) ++ (2 tiny)
│  ├───14.08 MB (03.54%) -- process(/system/b2g/plugin-container, pid=836)
│  │   ├───7.53 MB (01.89%) -- shared-libraries
│  │   │   ├──6.02 MB (01.52%) ++
│  │   │   └──1.51 MB (00.38%) ++ (47 tiny)
│  │   ├───6.24 MB (01.57%) -- anonymous
│  │   │   ├──6.23 MB (01.57%) -- outside-brk
│  │   │   │  ├──6.23 MB (01.57%) ── [rw-p] [22]
│  │   │   │  └──0.00 MB (00.00%) ── [r--p]
│  │   │   └──0.01 MB (00.00%) ── brk-heap/[rw-p]
│  │   └───0.31 MB (00.08%) ++ (2 tiny)
│  └───12.32 MB (03.10%) ++ (23 tiny)
└───76.11 MB (19.16%) ── other

The data is obtained entirely from the operating system, specifically from /proc/meminfo and the /proc/<pid>/smaps files, which are files provided by the Linux kernel specifically for measuring memory consumption.

I wish that the mem entry at the top was the amount of physical memory available. Unfortunately there is no way to get that on a Linux system, and so it’s instead the MemTotal value from /proc/meminfo, which is “Total usable RAM (i.e. physical RAM minus a few reserved bits and the kernel binary code)”.  And if you’re wondering about the exact meaning of the other entries, as usual if you hover the cursor over an entry in about:memory you’ll get a tool-tip explaining what it means.

The measurements given for each process are the PSS (proportional set size) measurements.  These attribute any shared memory equally among all processes that share it, and so PSS is the only measurement that can be sensibly summed across processes (unlike “Size” or “RSS”, for example).

For each process there is a wealth of detail about static code and data.  (The above example only shows a tiny fraction of it, because a number of the sub-trees are collapsed.  If you were viewing it in about:memory, you could expand and collapse sub-trees to your heart’s content.)  Unfortunately, there is little information about anonymous mappings, which constitute much of the non-static memory consumption.  I have some patches that will add an extra level of detail there, distinguishing major regions such as the jemalloc heap, the JS GC heap, and JS JIT code.  For more detail than that, the existing per-process memory reports in about:memory can be consulted.  Unfortunately the new system-wide reporter cannot be sensibly combined with the existing per-process memory reporters because the latter are unaware of implicit sharing between processes.  (And note that the amount of implicit sharing is increased significantly by the new Nuwa process.)

Because this works with our existing memory reporting infrastructure, anyone already using the script with Firefox OS will automatically get these reports along with all the usual ones once they update their source code, and the system-wide reports can be loaded and viewed in about:memory as usual. On Firefox and Firefox for Android, you’ll need to set the memory.system_memory_reporter flag in about:config to enable it.

My hope is that this reporter will supplant most or all of the existing tools that are commonly used to understand system-wide memory consumption on Firefox OS devices, such as ps, top and procrank.  And there will certainly be other interesting, available OS-level measurements that are not currently obtained. For example, Jed Davis has plans to measure the pmem subsystem.  Please file a bug or email me if you have other suggestions for adding such measurements.

DMD now works on Windows

DMD is our tool for improving Firefox’s memory reporting.  It helps identify where new memory reporters need to be added in order to reduce the “heap-unclassified” value in about:memory.

DMD has always worked well on Linux, and moderately well on Mac (it is crashy for some people).  And it works on Android and B2G.  But it has never worked on Windows.

So I’m happy to report that DMD now does work on Windows, thanks to the excellent efforts of Catalin Iacob.  If you’re on Windows and you’ve been seeing high “heap-unclassified” values, and you’re able to build Firefox yourself, please give DMD a try.

MemShrink progress, final

I was due to write a MemShrink progress report today, but I’ve decided that after almost 2.5 years, my reserves of enthusiasm for these regular reports has been exhausted.  Sorry!

I do still plan to write posts when significant fixes relating to memory consumption are made.  (For example, when generational GC lands, you’ll hear about it here.)  I will also continue to periodically update the MemShrink “big ticket items” list.  And MemShrink meetings will continue, so MemShrink-tagged bugs will still be triaged.  And for those of you who read the weekly Platform meeting notes, I will continue to write MemShrink updates there.  So don’t despair — good things will continue to happen, but they’ll just be marginally less visible.

Libraries should permit custom allocators

Some C and C++ libraries permit the use of custom allocators, which are registered through some kind of external API.  For example, the following libraries used by Firefox provide this facility.

  • FreeType provides this via the FT_MemoryRec_ argument of the FT_New_Library() function.
  • ICU provides this via the u_setMemoryFunctions() function.
  • SQLite provides this via the sqlite3_config() function.

This gives the users of these libraries additional flexibility that can be very helpful.  For example, in Firefox we provide custom allocators that measure the size of all the live allocations done by the library;  these measurements are shown in about:memory.

In contrast, libraries that don’t allow custom allocator are very hard to account for in about:memory.  Such libraries are major contributors to the dreaded “heap-unclassified” value in about:memory.  These include Cairo and the WebRTC libraries.

Now, supporting custom allocators in a library takes some effort.  You have to be careful to always allocate in a fashion that will use the custom allocators if they have been registered.  Direct calls to vanilla allocation/free functions like malloc(), realloc(), and free() must be avoided.  For example, SpiderMonkey allows custom allocators (although Firefox doesn’t need to use that functionality), and I just fixed a handful of cases where it was accidentally using vanilla allocation/free functions.

But, it’s a very useful facility to provide, and I encourage all library writers to consider it.