A story about Brendan Eich

I attended a Mozilla work week a couple of years ago at Mozilla’s Mountain View office. There was a dinner event in San Francisco and, by chance, I ended up in Andreas Gal’s car, along with Brendan Eich and someone else (who, alas, I cannot remember now).

The destination was the California Academy of Sciences, a science museum in San Francisco, which was about a 45 minute drive away. Off we headed. Unfortunately, we headed off without closely checking where our destination was, and we somehow got the Academy of Sciences confused with the Exploratorium, another science museum in San Francisco. When we arrived and found it closed, we had to regroup.

Andreas confidently interrogated his car’s GPS unit and procured a new address that fortunately wasn’t too far away. Fifteen minutes later, we found ourselves in a residential area, outside a building that obviously wasn’t going to be hosting a dinner for several dozen MoCo employees.

Andreas again consulted his GPS unit for a new address. Unfortunately, this one
was on the far side of the city. Undeterred, we crawled through early-evening
traffic in the busiest parts of San Francisco — I’m pretty sure we actually
passed Union Square — to another address. Again, as soon as we laid eyes upon it, it clearly wasn’t the right destination.

It turns out there are several institutions in San Francisco with the words
“Academy” and “Science” or “Sciences” in their names, and we were doing a tour of all the wrong ones. On our fourth roll of the dice, Andreas found what ultimately was the correct address, and we crawled back to our final destination, which turned out — groan — to be not that far from the Exploratorium. We staggered in, two hours after we started, eliciting several comments of “what on earth took you guys so long?”

I remember being frustrated at the time — Andreas and Brendan were locals!
They should have known better. But now…

I’ve worked for Mozilla for over five years, but I visit California infrequently, and I’ve only had a chance to talk with Brendan in person a few times. The only
thing I remember from the conversation during the car trip is that at one point we were talking about the US economy and Brendan made a confident proclamation about the bond market — I can’t even remember what it was — that I wasn’t sure I agreed with but I wasn’t sure I could explain why I disagreed. It’s funny the details that stick.

This conversation was with Brendan the person — not Brendan the CTO, not
Brendan the inventor of JavaScript, not my boss’s boss’s boss, and not somebody who made a donation. Just Brendan, a person who knew a lot of stuff, had some interesting experiences and some strong opinions, and was good to chat to. It’s a small story, but it’s one I’ll remember.

Comments on this post are open, but be warned that I will delete without hesitation any comments that re-hash the CEO controversy of the past two weeks, or that I find rude or objectionable in any way. If you want to discuss the controversy, or be rude or objectionable, there are many other places on the web that you can do so.

Generational GC has landed

Big news: late last week, generational garbage collection landed. It was backed out at first due to some test failures, but then re-landed and appears to have stuck.

This helps with performance. There are certain workloads where generational GC makes the code run much faster, and Firefox hasn’t been able to keep up with Chrome on these. For example, it has made Firefox slightly faster on the Octane benchmark, and there is apparently quite a bit of headroom for additional improvements.

Interestingly, its effect on memory usage has been small. I was hoping that the early filtering of many short-lived objects would make the tenured heap grow more slowly and thus reduce memory usage, but the addition of other structures (such as the nursery and store buffers) appears to have balanced that out.

The changes to the graphs at AWSY have been all within the noise, with the exception of the “Fresh start” and “Fresh start [+30s]” measurements in the “explicit” graph, both of which ticked up slightly. This isn’t cause for concern, however, because the corresponding “resident” graph hasn’t increased accordingly, and “resident” is the real metric of interest.

“Compacting Generational GC” is the #1 item on the current MemShrink “Big Ticket Items” list. Hopefully the “compacting” part of that, which still remains to be done, will produce some sizeable memory wins.

Nuwa has landed

A big milestone for Firefox OS was reached this week: after several bounces spread over several weeks, Nuwa finally landed and stuck.

Nuwa is a special Firefox OS process from which all other app processes are forked. (The name “Nuwa” comes from the Chinese creation goddess.) It allows lots of unchanging data (such as low-level Gecko things like XPCOM structures) to be shared among app processes, thanks to Linux’s copy-on-write forking semantics. This greatly increases the number of app processes that can be run concurrently, which is why it was the #3 item on the MemShrink “big ticket items” list.

One downside of this increased sharing is that it renders about:memory’s measurements less accurate than before, because about:memory does not know about the sharing, and so will over-report shared memory. Unfortunately, this is very difficult to fix, because about:memory’s reports are generated entirely within Firefox, whereas the sharing information is only available at the OS level. Something to be aware of.

Thanks to Cervantes Yu (Nuwa’s primary author), along with those who helped, including Thinker Li, Fabrice Desré, and Kyle Huey.

A slimmer and faster pdf.js

TL;DR: Firefox’s built-in PDF viewer is on track to gain some drastic improvements in memory consumption and speed when Firefox 29 is released in late April.

Firefox 19 introduced a built-in PDF viewer which allows PDF files to be viewed directly within Firefox. This is made possible by the pdf.js project, which implements a PDF viewer entirely in HTML and JavaScript.

This is a wonderful feature that makes the reading of PDFs on websites much less disruptive. However, pdf.js unfortunately suffers at times from high memory consumption. Enough, in fact, that it is currently the #5 item on the MemShrink project’s “big ticket items” list.

Recently, I made four improvements to pdf.js, each of which reduces its memory consumption greatly on certain kinds of PDF documents.

Image masks

The first improvement involved documents that use image masks, which are bitmaps that augment an image and dictate which pixels of the image should be drawn. Previously, the 1-bit-per-pixel (a.k.a 1bpp) image mask data was being expanded into 32bpp RGBA form (a typed array) in a web worker, such that every RGB element was 0 and the A element was either 0 or 255. This typed array was then passed to the main thread, which copied the data into an ImageData object and then put that data to a canvas.

The change was simple: instead of expanding the bitmap in the worker, just transfer it as-is to the main thread, and expand its contents directly into the ImageData object. This removes the RGBA typed array entirely.

I tested two documents on my Linux desktop, using a 64-bit trunk build of Firefox. Initially, when loading and then scrolling through the documents, physical memory consumption peaked at about 650 MiB for one document and about 800 MiB for the other. (The measurements varied somewhat from run to run, but were typically within 10 or 20 MiB of those numbers.) After making the improvement, the peak for both documents was about 400 MiB.

Image copies

The second improvement involved documents that use images. This includes scanned documents, which consist purely of one image per page.

Previously, we would make five copies of the 32bpp RGBA data for every image.

  1. The web worker would decode the image’s colour data (which can be in several different colour forms: RGB, grayscale, CMYK, etc.) from the PDF file into a 24bpp RGB typed array, and the opacity (a.k.a. alpha) data into an 8bpp A array.
  2. The web worker then combined the the RGB and A arrays into a new 32bpp RGBA typed array. The web worker then transferred this copy to the main thread. (This was a true transfer, not a copy, which is possible because it’s a typed array.)
  3. The main thread then created an ImageData object of the same dimensions as the typed array, and copied the typed array’s contents into it.
  4. The main thread then called putImageData() on the ImageData object. The C++ code within Gecko that implements putImageData() then created a new gfxImageSurface object and copied the data into it.
  5. Finally, the C++ code also created a Cairo surface from the gfxImageSurface.

Copies 4 and 5 were in C++ code and are both very short-lived. Copies 1, 2 and 3 were in JavaScript code and so lived for longer; at least until the next garbage collection occurred.

The change was in two parts. The first part involved putting the image data to the canvas in tiny strips, rather than doing the whole image at once. This was a fairly simple change, and it allowed copies 3, 4 and 5 to be reduced to a tiny fraction of their former size (typically 100x or more smaller). Fortunately, this caused no slow-down.

The second part involved decoding the colour and opacity data directly into a 32bpp RGBA array in simple cases (e.g. when no resizing is involved), skipping the creation of the intermediate RGB and A arrays. This was fiddly, but not too difficult.

If you scan a US letter document at 300 dpi, you get about 8.4 million pixels, which is about 1 MiB of data. (A4 paper is slightly larger.) If you expand this 1bpp data to 32bpp, you get about 32 MiB per page. So if you reduce five copies of this data to one, you avoid about 128 MiB of allocations per page.

Black and white scanned documents

The third improvement also involved images. Avoiding unnecessary RGBA copies seemed like a big win, but when I scrolled through large scanned documents the memory consumption still grew quickly as I scrolled through more pages. I eventually realized that although four of those five copies had been short-lived, one of them was very long-lived. More specifically, once you scroll past a page, its RGBA data is held onto until all pages that are subsequently scrolled past have been decoded. (The memory is eventually freed; it just takes longer than we’d like.) And fixing it is not easy, because it involves page-prioritization code isn’t easy to change without hurting other aspects of pdf.js’s performance.

However, I was able to optimize the common case of simple (e.g. unmasked, with no resizing) black and white images. Instead of expanding the 1bpp image data to 32bpp RGBA form in the web worker and passing that to the main thread, the code now just passes the 1bpp form directly. (Yep, that’s the same optimization that I used for the image masks.) The main thread can now handle both forms, and for the 1bpp form the expansion to the 32bpp form also only happens in tiny strips.

I used a 226 page scanned document to test this. At about 34 MiB per page, that’s over 7,200 MiB of pixel data when expanded to 32bpp RGBA form. And sure enough, prior to my change, scrolling quickly through the whole document caused Firefox’s physical memory consumption to reach about 7,800 MiB. With the fix applied, this number reduced to about 700 MiB. Furthermore, the time taken to render the final page dropped from about 200 seconds to about 25 seconds. Big wins!

The same optimization could be done for some non-black and white images (though the improvement will be smaller). But all the examples from bug reports were black and white, so that’s all I’ve done for now.

Parsing

The fourth and final improvement was unrelated to images. It involved the parsing of the PDF files. The parsing code reads files one byte at a time, and constructs lots of JavaScript strings by appending one character at a time. SpiderMonkey’s string implementation has an optimization that handles this kind of string construction efficiently, but the optimization doesn’t kick in until the strings have reached a certain length; on 64-bit platforms, this length is 24 characters. Unfortunately, many of the strings constructed during PDF parsing are shorter than this, so in order a string of length 20, for example, we would also create strings of length 1, 2, 3, …, 19.

It’s possible to change the threshold at which the optimization applies, but this would hurt the performance of some other workloads. The easier thing to do was to modify pdf.js itself. My change was to build up strings by appending single-char strings to an array, and then using Array.join to concatenate them together once the token’s end is reached. This works because JavaScript arrays are mutable (unlike strings which are immutable) and Array.join is efficient because it knows exactly how long the final string will be.

On a 4,155 page PDF, this change reduced the peak memory consumption during file loading from about 1130 MiB to about 800 MiB.

Profiling

The fact that I was able to make a number of large improvements in a short time indicates that pdf.js’s memory consumption has not previously been closely looked at. I think the main reason for this is that Firefox currently doesn’t have much in the way of tools for profiling the memory consumption of JavaScript code (though the devtools team is working right now to rectify this). So I will explain the tricks I used to find the places that needed optimization.

Choosing test cases

First I had to choose some test cases. Fortunately, this was easy, because we had numerous bug reports about high memory consumption which included test files. So I just used them.

Debugging print statements, part 1

For each test case, I looked first at about:memory. There were some very large “objects/malloc-heap/elements/non-asm.js” entries, which indicate that lots of memory is being used by JavaScript array elements. And looking at pdf.js code, typed arrays are used heavily, especially Uint8Array. The question is then: which typed arrays are taking up space?

To answer this question, I introduced the following new function.

function newUint8Array(length, context) {
  dump("newUint8Array(" + context + "): " + length + "\n");
  return new Uint8Array(length);
}

I then replaced every instance like this:

var a = new Uint8Array(n);

with something like this:

var a = newUint8Array(n, 1);

I used a different second argument for each instance. With this in place, when the code ran, I got a line printed for every allocation, identifying its length and location. With a small amount of post-processing, it was easy to identify which parts of the code were allocating large typed arrays. (This technique provides cumulative allocation measurements, not live data measurements, because it doesn’t know when these arrays are freed. Nonetheless, it was good enough.) I used this data in the first three optimizations.

Debugging print statements, part 2

Another trick involved modifying jemalloc, the heap allocator that Firefox uses. I instrumented jemalloc’s huge_malloc() function, which is responsible for allocations greater than 1 MiB. I printed the sizes of allocations, and at one point I also used gdb to break on every call to huge_malloc(). It was by doing this that I was able to work out that we were making five copies of the RGBA pixel data for each image. In particular, I wouldn’t have known about the C++ copies of that data if I hadn’t done this.

Notable strings

Finally, while looking again at about:memory, I saw some entries like the following, which are found by the “notable strings” detection.

> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=10, copies=6174, "http://sta")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=11, copies=6174, "http://stac")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=12, copies=6174, "http://stack")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=13, copies=6174, "http://stacks")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=14, copies=6174, "http://stacks.")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=15, copies=6174, "http://stacks.m")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=16, copies=6174, "http://stacks.ma")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=17, copies=6174, "http://stacks.mat")/gc-heap
> │    │  │  │  │  │  ├────0.38 MB (00.03%) ── string(length=18, copies=6174, "http://stacks.math")/gc-heap

It doesn’t take much imagination to realize that strings were being built up one character at a time. This looked like the kind of thing that would happen during tokenization, and I found a file called parser.js and looked there. And I knew about SpiderMonkey’s optimization of string concatenation and asked on IRC about why it might not be happening, and Shu-yu Guo was able to tell me about the threshold. Once I knew that, switching to use Array.join wasn’t difficult.

What about Chrome’s heap profiler?

I’ve heard good things in the past about Chrome/Chromium’s heap profiling tools. And because pdf.js is just HTML and JavaScript, you can run it in other modern browsers. So I tried using Chromium’s tools, but the results were very disappointing.

Remember the 226 page scanned document I mentioned earlier, where over 7,200 MiB of pixel data was created? I loaded that document into Chromium and used the “Take Heap Snapshot” tool, which gave the following snapshot.

Heap Snapshot from Chromium

At the top left, it claims that the heap was just over 50 MiB in size. Near the bottom, it claims that 225 Uint8Array objects had a “shallow” size of 19,608 bytes, and a “retained” size of 26,840 bytes. This seemed bizarre, so I double-checked. Sure enough, the operating system (via top) reported that the relevant chromium-browser process was using over 8 GiB of physical memory at this point.

So why the tiny measurements? I suspect what’s happening is that typed arrays are represented by a small header struct which is allocated on the GC heap, and it points to the (much larger) element data which is allocated on the malloc heap. So if the snapshot is just measuring the GC heap, in this case it’s accurate but not useful. (I’d love to hear if anyone can confirm or refute this hypothesis.) I also tried the “Record Heap Allocations” tool but it gave much the same results.

Status

These optimizations have landed in the master pdf.js repository, and were imported into Firefox 29, which is currently on the Aurora branch, and is on track to be released on April 29.

The optimizations are also on track to be imported into the Firefox OS 1.3 and 1.3T branches. I had hoped to show that some PDFs that were previously unloadable on Firefox OS would now be loadable. Unfortunately, I am unable to load even the simplest PDFs on my Buri (a.k.a. Alcatel OneTouch), because the PDF viewer app appears to consistently run out of gralloc memory just before the first page is displayed. Ben Kelly suggested that Async pan zoom (APZ) might be responsible, but disabling it didn’t help. If anybody knows more about this please contact me.

Finally, I’ve fixed most of the major memory consumption problems with the PDFs that I’m aware of. If you know of other PDFs that still cause pdf.js to consume large amounts of memory, please let me know. Thanks.

Lightweight branches aren’t always appropriate

Recently, I wrote about how I use multiple (10!) clones of the mozilla-inbound repository, with one Mercurial queue per clone, to work on multiple changes to the Mozilla codebase concurrently.

At times, I’ve felt almost guilty about using such a heavyweight branching mechanism, as opposed to a lightweight (i.e. intra-clone) branching mechanism such as git branches, or Mercurial bookmarks, or multiple Mercurial queues in a single clone (managed via hg qqueue). It seemed clumsy, like I was missing out on a compelling feature of modern version control systems.

But I now have come to understand that each approach is appropriate in a particular circumstances. In parcticular, lightweight branches are not appropriate when code modifications incur a non-trivial build-time cost.

Consider a Mozilla developer who works only on code that doesn’t need compilation, such as JavaScript, HTML, or Python. After modifying code, such a developer might incur a zero, or almost-zero build-time cost. For example, they might not have to do anything for their changes to propagate into the built product, or they might merely have to copy the modified file into the build directory.

For this developer, lightweight branches are entirely appropriate, because they can switch between branches with hardly a care in the world.

In contrast, consider a Mozilla developer (such as me!) who works mostly on C++ code within Gecko. After modifying code, this developer incurs a decidedly non-zero build cost — on my machine, just linking libxul takes around ten seconds. So any change to Gecko’s C++ code will require at least this much time, and it’s often substantially more, especially for anyone using a slow machine and/or OS.

For this developer, lightweight branches are not appropriate, because they will have to wait for rebuilding to occur every time they switch. ccache mitigates this problem, but it doesn’t solve it. In particular, the developer may well have switched away from one branch precisely because they are waiting for a long-running build to complete, and lightweight branches certainly don’t allow that.

These two distinct cases may be obvious to some people, but they weren’t to me. If nothing else, as someone who mostly works on C++ Mozilla code, I now can feel content with the heavyweight branching approach I use.

A big step towards generational and compacting GC

People frequently ask me for status updates on generational GC, and I usually say I’ll tell them when something notable happens. Well, something notable just happened: exact rooting landed.

What is exact rooting? In order to support generational and/or compacting GC, you need to be able to move GC-allocated things such as objects around. This means you can’t have raw C++ pointers to any objects that might move; instead, you need some kind of indirect pointer that can be updated when necessary.

Unfortunately, both the JS engine and Gecko have a lot of pointers to GC-allocated things. The process of checking and converting them has been the main part of a task called “exact rooting”, and that’s what just finished. This has required an enormous amount of what is essentially very tedious work. Jim Blandy summarized it nicely, as follows.

I’ve never heard of a major project escaping from conservative GC once it had entered that state of sin; nor have I heard of anyone implementing a moving collector after starting with a non-moving collector. So, doing *both* is impressive. I hope it pays off big!

Major kudos to Terrence Cole, Steve Fink, Jon Coppeard, Brian Hackett, and the small army of other helpers who did this. Now that they’ve finished eating this gigantic serving of vegetables, they can move onto dessert, i.e. making the GC generational and compacting.

YEINU: Your Experience Is Not Universal

I’ve lost count of the number of times I’ve heard statements like the following.

Firefox crashes ten times a day for me.  I don’t understand how anyone can use it.

There’s a simple answer: for most people it doesn’t crash ten times a day. But the person making the statement hasn’t realized that what links the first sentence to the second is an assumption — that other people’s experiences are the same. This is despite the fact that browsers are immensely complex, highly configurable, and used in many different ways on an enormous range of inputs.

(BTW, in case it’s unclear: I don’t think it’s ok for Firefox to crash ten times a day for anyone.)

My mental rebuttal to this kind of thinking is YEINU, short for “your experience is not universal”. It’s something of a past-tense dual to the well-known YMMV (“your mileage may vary”).

Although I’ve mostly thought about this in the context of browser development, it’s not hard to see how it relates to many facets of life. In particular, just this morning I was thinking about it in relation to this question on Quora and the general notion of privilege. Out of curiosity, I googled the exact phrase (using quotes), and while I got several hits on software forums (such as this and this, the latter being on a Mozilla forum), three of the five highest-ranked hits were from posts on feminist blogs (#1, #4, and #5). Interesting!

So, next time you’re puzzled by someone’s reaction to something, it might be worth considering if YEINU.

How I work on Mozilla code

Most Mozilla developers have their own particular set-ups and work-flows, and over time develop various scripts, shortcuts, and habits to make their lives easier.  But we rarely talk about them.

In this post I will describe various interesting aspects of how I work on Mozilla code, in the hope that (a) it will give other people useful ideas, and (b) other people will in turn give me useful ideas.

Machines

I have two machines:  a quite new and very fast Linux desktop machine, on which I do 99% of my coding work, and a two-year old Macbook Pro, on which I do little coding but a lot of browsing and other non-development stuff.  In theory my Linux desktop also has a Windows VM on it, though in practice that hasn’t happened yet!

I use Ubuntu on my Linux machine.  I don’t really enjoy sysadmin-type stuff, so I use the most widely-used, vanilla distribution available.  That way, if something goes wrong there’s a decent chance somebody else will have had the same problem.

Mercurial Repositories

I do most of my work on mozilla-inbound.  I clone that into a “master” repository,  that I leave untouched, called ws0. Then I have nine local clones of ws0, called ws1..ws9. I don’t have trouble filling the nine local clones because I usually have multiple different coding tasks in flight at any one time. Indeed, this is a necessity when dealing with the various latencies involved with development, such as compilation and local test runs (minutes), try server runs (hours), and reviews (days).

Mercurial

I use Mercurial queues and am quite content with them, though I am looking forward to changeset evolution becoming stable.  I tried git once but didn’t like it much;  the CLI is awful, the speed wasn’t noticeably better than Mercurial (and this was before I upgraded from Mercurial 1.8 to 2.7, which is much faster), and it was inconvenient to have to move patches over to a Mercurial repo before landing.

One problem I had with Mercurial queues was that I would often type hg qref patchname when I meant hg qnew patchname. This can lead to surprising and annoying mix-ups with patches, so I wrote a pre-hook for hg qref — if I give it an argument that isn’t -e or -u, I almost certainly meant hg qnew, so it aborts with a reminder. This has saved me some hassle on numerous occasions.

Some Mercurial extensions that I particularly like are color (colour output), progress (progress bars for slow operations), record (commit part of a change) and bzexport (upload patches to Bugzilla from the command line).

Try Server

Even though I do most of my work with mozilla-inbound, pushing to try from mozilla-inbound isn’t a great idea, because every so often you’ll catch some test breakage caused by someone else’s patch and then you have to work out if it’s your fault or was pre-existing.  So recently I took RyanVM’s advice and wrote a script that transplants the patch queue from a mozilla-inbound repo to a mozilla-central repo, and then pushes to try.  This avoids the test bustage problem, but occasionally the patches don’t apply cleanly.  In that case I just push from the mozilla-inbound repo and deal with the risk of test bustage.

Compiling

I mostly use Clang, though I also sometimes use GCC.  Clang is substantially faster than GCC, and its error messages are much better (though GCC’s have improved recently).  Clang’s generated code is slightly worse (~5–10% slower), but that’s not much of an issue for me while developing.  In fact, faster compilation time is important enough that my most common build configuration has the following line:

ac_add_options --enable-optimize='-O0'  # worse code, but faster builds

Last time I measured (which was before unified builds were introduced) this shaved a couple of minutes off build times, as compared to a vanilla --enable-optimize build.

Mozconfigs

I have a lot of mozconfig files.  I switch frequently between different kinds of builds, so much so that all my custom short-cut commands (e.g. for building and running the browser) have a mandatory argument that specifies the relevant mozconfig.  As a result, the mozconfig names I use are much shorter than the default names.  For example, the following is a selection of some of my more commonly-used mozconfigs for desktop Firefox.

  • d64: debug 64-bit build with clang
  • o64: optimized 64-bit build with clang
  • gd64: debug 64-bit build with GCC
  • go64: optimized 64-bit build with GCC
  • cd64: debug 64-bit build with clang and ccache
  • co64: optimized 64-bit build with clang and ccache
  • o64v: optimized 64-bit build with clang and –enable-valgrind
  • o64dmd: optimized 64-bit build with clang and –enable-dmd

Although I never do 32-bit browser builds, I do sometimes do 32-bit JS shell builds, so the ’64′ isn’t entirely redundant!

Building

I have a script called mmq that I use to build the browser.  I invoke it like this:

mmq o64

The argument is the mozconfig/objdir name.  This script invokes the build and redirects the output to an errors.err file (for use with quickfix, see below).  Once compilation finishes, the script also does some hacky grepping to reprint the first five compilation errors, if they exist, once compilation finishes.  I do this to make it easier to find the errors — sometimes they get swamped by the subsequent output.  (My use of Quickfix has made this feature less important than it once was, though it’s still a good thing to have.)

Profiles

I have multiple profiles.

  • default: used for my normal browsing.
  • dev: my standard development profile.
  • dev2: a second development profile, mostly used if I already am using the dev profile in another browser invocation.
  • e10s: a profile with Electrolysis enabled.

Starting Firefox

I have a script called ff with which I start Firefox like the following.

ff o64 dev

The first argument is the mozconfig, and the second is the profile.  Much of the time, this invokes Firefox in the usual way, e.g.:

o64/dist/bin/firefox -P dev -no-remote

However, this script knows about my mozconfigs and it automatically does more elaborate invocations when necessary, e.g. for DMD (which requires the setting of some environment variables).  I also wrote it so that I can tack on gdb as a third argument and it’ll run under GDB.

Virtual desktop and window layout

I use a 2×2 virtual desktop layout.

In each of the top-left and bottom-left screens I have three xterms — a full-height one on the left side in which I do editing, and two half-height ones on the right side in which I invoke builds, hg commands, and sundry other stuff.

In the top-right screen I have Firefox open for general use.

In the bottom-right screen I have a Chatzilla window open.

Text Editing

I use Vim.  This is largely due to path dependence;  it’s what they taught in my introductory programming classes at university, and I’ve happily used it ever since.  My setup isn’t particularly advanced, and most of it isn’t worth writing about.  However, I have done a few things that I think are worth mentioning.

Tags

Tags are fantastic — they let you jump immediately to the definition of a function/type/macro.  I use ctags and I have an alias for the following command.

ctags -R --langmap=C++:.c.h.cpp.idl --languages=C++ --exclude='*dist\/include*' --exclude='*[od]32/*' --exclude='*[od]64/*'

I have to re-run this command periodically to keep up with changes to the codebase.  Fortunately, it only takes about 5 seconds on my fast SSD.  (My old machine with a mechanical HD took much longer).  The coverage isn’t perfect but it’s good enough, and the specification of .idl files in the –langmap option was a recent tweak I made that improved coverage quite a bit.

Quickfix

I now use quickfix, which is a special mode to speed up the edit-compile-edit cycle.  The commands I use to build Firefox redirect the output to a special file, and if there are any compile errors, I use Vim’s quickfix command to quickly jump to their locations.  This saves enormous amounts of manual file and line traversal — I can’t recommend it highly enough.

In order to use quickfix you need to tell Vim what the syntax of the compile errors is.  I have the following command in my .vimrc for this.

" Multiple entries (separated by commas):
" - compile (%f:%l:%c) errors for different levels of file nesting
" - linker (%f:%l) errors for different levels of file nesting
set efm=../../../../../%f:%l:%c:\ error:\ %m,../../../../%f:%l:%c:\ error:\ %m,../../../%f:%l:%c:\ error:\ %m,../../%f:%l:%c:\ error:\ %m,../%f:%l:%c:\ error:\ %m,%f:%l:%c:\ error:\ %m,../../../../../%f:%l:\ error:\ %m,../../../../%f:%l:\ error:\ %m,../../../%f:%l:\ error:\ %m,../../%f:%l:\ error:\ %m,../%f:%l:\ error:\ %m,%f:%l:\ error:\ %m

This isn’t pretty, but it works well for Mozilla code.  Then it’s just a matter of doing :cf to load the new errors file (which also takes me to the first error) and then :cn/:cp to move forward and backward through the list.  Occasionally I get an error in a header file that is present in the objdir and the matching fails, and so I have to navigate to that file manually, but this is rare enough that I haven’t bothered trying to fix it properly.

One nice thing about quickfix is that it lets me start fixing errors before compilation has finished!  As soon as I see the first error message I can do :cf.  This means I have to re-do :cf and skip over the already-fixed errors if more errors occur later, but this is still often a win.

If you use Vim, work on Mozilla C++ code, and don’t use it, you should set it up right now.  There are many additional commands and options, but what I’ve written above is enough to get you started, and covers 95% of my usage.  (In case you’re curious, the :cnf, :cpf, :copen and :cclose commands cover the other 5%.)

:grep

I also set up Vim’s :grep command for use with Firefox.  I put the following script in ~/bin/.

#! /bin/sh
pattern=$1;
if [ -z "$pattern" ]; then
    echo "usage: $FUNCNAME <pattern>";
    return 1;
fi;
grep -n -r -s \
    --exclude-dir="*[od]32*" \
    --exclude-dir="*[od]64*" \
    --exclude-dir=".hg*" \
    --exclude-dir=".svn*" \
    --include="*.cpp" \
    --include="*.h" \
    --include="*.c" \
    --include="*.idl" \
    --include="*.html" \
    --include="*.xul" \
    --include="*.js" \
    --include="*.jsm" \
    "$pattern"

It does a recursive grep for a pattern through various kinds of source files, ignoring my objdirs and repository directories.  To use it, I just type “:grep <pattern>” and Vim runs the script and sends the results to quickfix, so I can again use :cn and :cp to navigate through the matches.  (Vim already knows about grep’s output format, so you don’t need to configure anything for that.)

I can also use that script from the command line, which is useful when I want to see all the matches at once, rather than stepping through them one at a time in Vim.

Trailing whitespace detection

This line in my .vimrc tells Vim to highlight any trailing whitespace in red.

highlight ExtraWhitespace ctermbg=red guibg=red
autocmd BufWinEnter *.{c,cc,cpp,h,py,js,idl} match ExtraWhitespace /\s\+$/
autocmd InsertEnter *.{c,cc,cpp,h,py,js,idl} match ExtraWhitespace /\s\+\%#\@<!$/
autocmd InsertLeave *.{c,cc,cpp,h,py,js,idl} match ExtraWhitespace /\s\+$/
autocmd BufWinLeave *.{c,cc,cpp,h,py,js,idl} call clearmatches()

[Update: I accidentally omitted the final four lines of this when I first published this post.]

The good thing about this is that I now never submit patches with trailing whitespace.  The bad thing is that I can see where other people have left behind trailing whitespace :)

Ctrl-P

Finally, I installed the Ctrl-P plugin, which aims to speed up the opening of files by letting you type in portions of the file’s path.  This is potentially quite useful for Mozilla code, where the directory nesting can get quite deep.  However, Ctrl-P has been less of a win than I hoped, for two reasons.

First, it’s quite slow, even on my very fast desktop with its very fast SSD.  While typing, there are often pauses of hundreds of milliseconds after each keystroke, which is annoying.

Second, I find it too eager to find matches. If you type a sequence of characters, it will match against any file that has those characters present in that order, no matter how many other characters separate them, and it will do so case-insensitively.  This might work well for some people, but if I’m opening a file such as content/base/src/nsContentUtils.cpp, I always end up typing just the filename in full.  By the time I’ve typed “nsContentU” ideally it would be down to the two files in the repository that match that exact string.  But instead I get the following.

> widget/tests/window_composition_text_querycontent.xul
> dom/interfaces/base/nsIQueryContentEventResult.idl
> dom/base/nsQueryContentEventResult.cpp
> dom/base/nsQueryContentEventResult.h
> content/base/public/nsIContentSecurityPolicy.idl
> content/base/public/nsContentCreatorFunctions.h
> dom/interfaces/base/nsIContentURIGrouper.idl
> content/base/public/nsContentPolicyUtils.h
> content/base/src/nsContentUtils.cpp
> content/base/public/nsContentUtils.h

I wish I could say “case-insensitive exact matches, please”, but I don’t think that’s possible.  As a result, I don’t use Ctrl-P that much, though it’s still occasionally useful if I want to open a file for which I know the name but not the path — it’s faster for that than dropping back to a shell and using find.

Conclusion

That’s everything of note that I can think of right now.  Please feel free to steal as many ideas as you wish from this post.

I haven’t given full details for a lot of the things I mentioned above. I’m happy to give more details (e.g. what various scripts look like) if people are interested.

Finally, I encourage other developers to write posts like this one explaining how they work on Mozilla code.  Thanks!

System-wide memory measurement for Firefox OS

Have you ever wondered exactly how all the physical memory in a Firefox OS device is used?   Wonder no more.  I just landed a system-wide memory reporter which works on any Firefox product running on a Linux system.  This includes desktop Firefox builds on Linux, Firefox for Android, and Firefox OS.

This memory reporter is a bit different to the existing ones, which work entirely within Mozilla processes.  The new reporter provides measurements for the entire system, including every user-space process (Mozilla or non-Mozilla) that is running.  It’s aimed primarily at profiling Firefox OS devices, because we have full control over the code running on those devices, and so it’s there that a system-wide view is most useful.

Here is some example output from a GeeksPhone Keon.

System
Other Measurements 
397.24 MB (100.0%) -- mem
├──215.41 MB (54.23%) ── free
├──105.72 MB (26.61%) -- processes
│  ├───57.59 MB (14.50%) -- process(/system/b2g/b2g, pid=709)
│  │   ├──42.29 MB (10.65%) -- anonymous
│  │   │  ├──42.25 MB (10.63%) -- outside-brk
│  │   │  │  ├──41.94 MB (10.56%) ── [rw-p] [69]
│  │   │  │  └───0.31 MB (00.08%) ++ (2 tiny)
│  │   │  └───0.05 MB (00.01%) ── brk-heap/[rw-p]
│  │   ├──13.03 MB (03.28%) -- shared-libraries
│  │   │  ├───8.39 MB (02.11%) -- libxul.so
│  │   │  │   ├──6.05 MB (01.52%) ── [r-xp]
│  │   │  │   └──2.34 MB (00.59%) ── [rw-p]
│  │   │  └───4.64 MB (01.17%) ++ (69 tiny)
│  │   └───2.27 MB (00.57%) ++ (2 tiny)
│  ├───21.73 MB (05.47%) -- process(/system/b2g/plugin-container, pid=756)
│  │   ├──12.49 MB (03.14%) -- anonymous
│  │   │  ├──12.48 MB (03.14%) -- outside-brk
│  │   │  │  ├──12.41 MB (03.12%) ── [rw-p] [30]
│  │   │  │  └───0.07 MB (00.02%) ++ (2 tiny)
│  │   │  └───0.02 MB (00.00%) ── brk-heap/[rw-p]
│  │   ├───8.88 MB (02.23%) -- shared-libraries
│  │   │   ├──7.33 MB (01.85%) -- libxul.so
│  │   │   │  ├──4.99 MB (01.26%) ── [r-xp]
│  │   │   │  └──2.34 MB (00.59%) ── [rw-p]
│  │   │   └──1.54 MB (00.39%) ++ (50 tiny)
│  │   └───0.36 MB (00.09%) ++ (2 tiny)
│  ├───14.08 MB (03.54%) -- process(/system/b2g/plugin-container, pid=836)
│  │   ├───7.53 MB (01.89%) -- shared-libraries
│  │   │   ├──6.02 MB (01.52%) ++ libxul.so
│  │   │   └──1.51 MB (00.38%) ++ (47 tiny)
│  │   ├───6.24 MB (01.57%) -- anonymous
│  │   │   ├──6.23 MB (01.57%) -- outside-brk
│  │   │   │  ├──6.23 MB (01.57%) ── [rw-p] [22]
│  │   │   │  └──0.00 MB (00.00%) ── [r--p]
│  │   │   └──0.01 MB (00.00%) ── brk-heap/[rw-p]
│  │   └───0.31 MB (00.08%) ++ (2 tiny)
│  └───12.32 MB (03.10%) ++ (23 tiny)
└───76.11 MB (19.16%) ── other

The data is obtained entirely from the operating system, specifically from /proc/meminfo and the /proc/<pid>/smaps files, which are files provided by the Linux kernel specifically for measuring memory consumption.

I wish that the mem entry at the top was the amount of physical memory available. Unfortunately there is no way to get that on a Linux system, and so it’s instead the MemTotal value from /proc/meminfo, which is “Total usable RAM (i.e. physical RAM minus a few reserved bits and the kernel binary code)”.  And if you’re wondering about the exact meaning of the other entries, as usual if you hover the cursor over an entry in about:memory you’ll get a tool-tip explaining what it means.

The measurements given for each process are the PSS (proportional set size) measurements.  These attribute any shared memory equally among all processes that share it, and so PSS is the only measurement that can be sensibly summed across processes (unlike “Size” or “RSS”, for example).

For each process there is a wealth of detail about static code and data.  (The above example only shows a tiny fraction of it, because a number of the sub-trees are collapsed.  If you were viewing it in about:memory, you could expand and collapse sub-trees to your heart’s content.)  Unfortunately, there is little information about anonymous mappings, which constitute much of the non-static memory consumption.  I have some patches that will add an extra level of detail there, distinguishing major regions such as the jemalloc heap, the JS GC heap, and JS JIT code.  For more detail than that, the existing per-process memory reports in about:memory can be consulted.  Unfortunately the new system-wide reporter cannot be sensibly combined with the existing per-process memory reporters because the latter are unaware of implicit sharing between processes.  (And note that the amount of implicit sharing is increased significantly by the new Nuwa process.)

Because this works with our existing memory reporting infrastructure, anyone already using the get_about_memory.py script with Firefox OS will automatically get these reports along with all the usual ones once they update their source code, and the system-wide reports can be loaded and viewed in about:memory as usual. On Firefox and Firefox for Android, you’ll need to set the memory.system_memory_reporter flag in about:config to enable it.

My hope is that this reporter will supplant most or all of the existing tools that are commonly used to understand system-wide memory consumption on Firefox OS devices, such as ps, top and procrank.  And there will certainly be other interesting, available OS-level measurements that are not currently obtained. For example, Jed Davis has plans to measure the pmem subsystem.  Please file a bug or email me if you have other suggestions for adding such measurements.

DMD now works on Windows

DMD is our tool for improving Firefox’s memory reporting.  It helps identify where new memory reporters need to be added in order to reduce the “heap-unclassified” value in about:memory.

DMD has always worked well on Linux, and moderately well on Mac (it is crashy for some people).  And it works on Android and B2G.  But it has never worked on Windows.

So I’m happy to report that DMD now does work on Windows, thanks to the excellent efforts of Catalin Iacob.  If you’re on Windows and you’ve been seeing high “heap-unclassified” values, and you’re able to build Firefox yourself, please give DMD a try.