Author Archives: Nicholas Nethercote

Poor battery life on the Flame device?

The new Firefox OS reference phone is called the Flame. I have one that I’m using as my day-to-day phone, and as soon as I got it I found that the battery life was terrible — I use my phone very lightly, but the battery was draining in only 24 hours or so.

It turns out this was because a kernel daemon called ksmd (kernel samepage merging daemon) was constantly running and using about 3–7% of CPU. I detected this by running the following command which prints CPU usage stats for all running processes every five seconds.

adb shell top -m 5

ksmd doesn’t seem very useful for a device such as the Flame, and Alexandre Lissy kindly built me a new kernel with it disabled, which improved my battery life by at least 3x. Details are in the relevant bug.

It seems that plenty of other Flame users are not having problems with ksmd, but if your Flame’s battery life is poor it would be worth checking if this is the cause.

Dipping my toes in the Servo waters

I’m very interested in Rust and Servo, and have been following their development closely. I wanted to actually do some coding in Rust, so I decided to start making small contributions to Servo.

At this point I have landed two changes in the tree — one to add very basic memory measurements for Linux, and the other for Mac — and I thought it might be interesting for people to hear about the basics of contributing. So here’s a collection of impressions and thoughts in no particular order.

Getting the code and building Servo was amazingly easy. The instructions actually worked first time on both Ubuntu and Mac! Basically it’s just apt-get install (on Ubuntu) or port install (on Mac), git clone, configure, and make. The configure step takes a while because it downloads an appropriate version of Rust, but that’s fine; I was expecting to have to install the appropriate version of Rust first so I was pleasantly surprised.

Once you build it, Servo is very bare-boned. Here’s a screenshot.

Servo

There is no address bar, or menu bar, or chrome of any kind. You simply choose which page you want to display from the command line when you start Servo. The memory profiling I implemented is enabled by using the -m option, which causes memory measurements to be periodically printed to the console.

Programming in Rust is interesting. I’m not the first person to observe that, compared to C++, it takes longer to get your code past the compiler, but it’s more likely to to work once you do. It reminds me a bit of my years programming in Mercury (imagine Haskell, but strict, and with a Prolog ancestry). Discriminated unions, pattern matching, etc. In particular, you have to provide code to handle all the error cases in place. Good stuff, in my opinion.

One thing I didn’t expect but makes sense in hindsight: Servo has seg faulted for me a few times. Rust is memory-safe, and so shouldn’t crash like this. But Servo relies on numerous libraries written in C and/or C++, and that’s where the crashes originated.

The Rust docs are a mixed bag. Libraries are pretty well-documented, but I haven’t seen a general language guide that really leaves me feeling like I understand a decent fraction of the language. (Most recently I read Rust By Example.) This is meant to be an observation rather than a complaint; I know that it’s a pre-1.0 language, and I’m aware that Steve Klabnik is now being paid by Mozilla to actively improve the docs, and I look forward to those improvements.

The spotty documentation isn’t such a problem, though, because the people in the #rust and #servo IRC channels are fantastic. When I learned Python last year I found that simply typing “how to do X in Python” into Google almost always leads to a Stack Overflow page with a good answer. That’s not the case for Rust, because it’s too new, but the IRC channels are almost as good.

The code is hosted on GitHub, and the project uses a typical pull request model. I’m not a particularly big fan of git — to me it feels like a Swiss Army light-sabre with a thousand buttons on the handle, and I’m confident using about ten of those buttons. And I’m also not a fan of major Mozilla projects being hosted on GitHub… but that’s a discussion for another time. Nonetheless, I’m sufficiently used to the GitHub workflow from working on pdf.js that this side of things has been quite straightforward.

Overall, it’s been quite a pleasant experience, and I look forward to gradually helping build up the memory profiling infrastructure over time.

Measuring memory used by third-party code

Firefox’s memory reporting infrastructure, which underlies about:memory, is great. And when it lacks coverage — causing the “heap-unclassified” number to get large — we can use DMD to identify where the unreported allocations are coming from. Using this information, we can extend existing memory reporters or write new ones to cover the missing heap blocks.

But there is one exception: third-party code. Well… some libraries support custom allocators, which is great, because it lets us provide a counting allocator. And if we have a copy of the third-party code within Firefox, we can even use some pre-processor hacks to forcibly provide custom counting allocators for code that doesn’t support them.

But some heap allocations are done by code we have no control over, like OpenGL drivers. For example, after opening a simple WebGL demo on my Linux box, I have over 50% “heap-unclassified”.

208.11 MB (100.0%) -- explicit
├──107.76 MB (51.78%) ── heap-unclassified

DMD’s output makes it clear that the OpenGL drivers are responsible. The following record is indicative.

Unreported: 1 block in stack trace record 2 of 3,597
 15,486,976 bytes (15,482,896 requested / 4,080 slop)
 6.92% of the heap (20.75% cumulative); 10.56% of unreported (31.67% cumulative)
 Allocated at
 replace_malloc (/home/njn/moz/mi8/co64dmd/memory/replace/dmd/../../../../memory/replace/dmd/DMD.cpp:1245) 0x7bf895f1
 _swrast_CreateContext (??:?) 0x3c907f03
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3cd84fa8
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3cd9fa2c
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3cd8b996
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3ce1f790
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3ce1f935
 glXGetDriverConfig (??:?) 0x3dce1827
 glXDestroyGLXPixmap (??:?) 0x3dcbc213
 glXCreateNewContext (??:?) 0x3dcbc48a
 mozilla::gl::GLContextGLX::CreateGLContext(mozilla::gfx::SurfaceCaps const&, mozilla::gl::GLContextGLX*, bool, _XDisplay*, unsigned long, __GLXFBConfigRec*, bool, gfxXlibSurface*) (/home/njn/moz/mi8/co64dmd/gfx/gl/../../../gfx/gl/GLContextProviderGLX.cpp:783) 0x753c99f4

The bottom-most frame is for a function (CreateGLContext) within Firefox’s codebase, and then control passes to the OpenGL driver, which eventually does a heap allocation, which ends up in DMD’s replace_malloc function.

The following DMD report is a similar case that shows up on Firefox OS.

Unreported: 1 block in stack trace record 1 of 463
 1,454,080 bytes (1,454,080 requested / 0 slop)
 9.75% of the heap (9.75% cumulative); 21.20% of unreported (21.20% cumulative)
 Allocated at
 replace_calloc /Volumes/firefoxos/B2G/gecko/memory/replace/dmd/DMD.cpp:1264 (0xb6f90744 libdmd.so+0x5744)
 os_calloc (0xb25aba16 libgsl.so+0xda16) (no addr2line)
 rb_alloc_primitive_lists (0xb1646ebc libGLESv2_adreno.so+0x74ebc) (no addr2line)
 rb_context_create (0xb16446c6 libGLESv2_adreno.so+0x726c6) (no addr2line)
 gl2_context_create (0xb16216f6 libGLESv2_adreno.so+0x4f6f6) (no addr2line)
 eglCreateClientApiContext (0xb25d3048 libEGL_adreno.so+0x1a048) (no addr2line)
 qeglDrvAPI_eglCreateContext (0xb25c931c libEGL_adreno.so+0x1031c) (no addr2line)
 eglCreateContext (0xb25bfb58 libEGL_adreno.so+0x6b58) (no addr2line)
 eglCreateContext /Volumes/firefoxos/B2G/frameworks/native/opengl/libs/EGL/eglApi.cpp:527 (0xb423dda2 libEGL.so+0xeda2)
 mozilla::gl::GLLibraryEGL::fCreateContext(void*, void*, void*, int const*) /Volumes/firefoxos/B2G/gecko/gfx/gl/GLLibraryEGL.h:180 (discriminator 3) (0xb4e88f4c libxul.so+0x789f4c)

We can’t traverse these allocations in the usual manner to measure them, because we have no idea about the layout of the relevant data structures. And we can’t provide a custom counting allocator to code outside of Firefox’s codebase.

However, although we pass control to the driver, control eventually comes back to the heap allocator, and that is something that we do have some power to change. So I had an idea to toggle some kind of mode that records all the allocations that occur within a section of code, as the following code snippet demonstrates.

SetHeapBlockTagForThread("webgl-create-new-context");
context = glx.xCreateNewContext(display, cfg, LOCAL_GLX_RGBA_TYPE, glxContext, True);
ClearHeapBlockTagForThread();

The calls on either side of glx.xCreateNewContext tell the allocator that it should tag all allocations done within that call. And later on, the relevant memory reporter can ask the allocator how many of these allocations remain and how big they are. I’ve implemented a draft version of this, and it basically works, as the following about:memory output shows.

216.97 MB (100.0%) -- explicit
├───78.50 MB (36.18%) ── webgl-contexts
├───32.37 MB (14.92%) ── heap-unclassified

The implementation is fairly simple.

  • There’s a global hash table which records which live heap blocks have a tag associated with them. (Most heap blocks don’t have a tag, so this table stays small.)
  • When SetHeapBlockTagForThread is called, the given tag is stored in thread-local storage. When ClearHeapBlockTagForThread is called, the tag is cleared.
  • When an allocation happens, we (quickly) check if there’s a tag set for the current thread and if so, put a (pointer,tag) pair into the table. Otherwise, we do nothing extra.
  • When a deallocation happens, we check if the deallocated block is in the table, and remove it if so.
  • To find all the live heap blocks with a particular tag, we simply iterate over the table looking for tag matches. This can be used by a memory reporter.

Unfortunately, the implementation isn’t suitable for landing in Firefox’s code, for several reasons.

  • It uses Mike Hommey’s replace_malloc infrastructure to wrap the default allocator (jemalloc). This works well — DMD does the same thing — but using it requires doing a special build and then setting some environment variables at start-up. This is ok for an occasional-use tool that’s only used by Firefox developers, but it’s important that about:memory works in vanilla builds without any additional effort.
  • Alternatively, I could modify jemalloc directly, but we’re hoping to one day move away from our old, heavily-modified version of jemalloc and start using an unmodified jemalloc3.
  • It may have a non-trivial performance hit. Although I haven’t measured performance yet — the above points are a bigger show-stopper at the moment — I’m worried about having to do a hash table lookup on every deallocation. Alternative implementations (store a marker in each block, or store tagged blocks in their own zone) are possible but present their own difficulties.
  • It can miss some allocations. When adding a tag for a particular section of code, you don’t want to mark every allocation that occurs while that section executes, because there could be multiple threads running and you don’t want to mark allocations from other threads. So it restricts the marking to a single thread, but if the section creates a new thread itself, any allocations done on that new thread will be missed. This might sound unlikely, but my implementation appears to miss some allocations and this is my best theory as to why.

This issue of OpenGL drivers and other kinds of third-party code has been a long-term shortcoming with about:memory. For the first time I have a partial solution, though it still has major problems. I’d love to hear if anyone has additional ideas on how to make it better.

A browser benchmarking manifesto

This post represents my own opinion, not the official position of Mozilla.

I’ve been working on Firefox for over five years and I’ve been unhappy with the state of browser benchmarking the entire time.

The current problems

The existing benchmark suites are bad

Most of the benchmark suites are terrible. Consider at Hennessy and Patterson’s categorization of benchmarks, from best to worst.

  1. Real applications.
  2. Modified applications (e.g. with I/O removed to make it CPU-bound).
  3. Kernels (key fragments of real applications).
  4. Toy benchmarks (e.g. sieve of Erastosthenes).
  5. Synthetic benchmarks (code created artificially to fit a profile of particular operations, e.g. Dhrystone).

In my opinion, benchmark suites should only contain benchmarks in categories 1 and 2. There are certainly shades of grey in these categories, but I personally have a high bar, and I’m likely to consider most programs whose line count is in the hundreds as closer to a “toy benchmark” than a “real application”.

Very few browser benchmark suites contain benchmarks from category 1 or 2. I think six or seven of the benchmarks in Octane meet this standard. I’m having trouble thinking of any other existing benchmark suite that contains any that reach this standard. (This includes Mozilla’s own Kraken suite.)

Bad benchmark suites hurt the web, because every hour that browser vendors spend optimizing bad benchmarks is an hour they don’t spend improving browsers in ways that actually help users. I’ve seen first-hand how much time this wastes.

Conflicts of interest

Some of the benchmark suites — including the best-known ones — come from browser vendors. The conflict of interest is so obvious I won’t bother discussing it further.

Opaque scoring systems

Some of them have opaque scoring systems. Octane (derived from V8bench) is my least favourite here. It also suffers from much higher levels of variance between runs than the benchmarks with time-based scoring systems.

A better way

Here’s how I think a new browser benchmark suite should be created.  Some of these ideas are taken from the SPEC benchmark suiteswhich have been used for over 20 years to benchmark the performance of C, C++, Fortran and Java programs, among other things.

A collaborative and ongoing process

The suite shouldn’t be owned by a single browser vendor. Ideally, all the major browser vendors (Apple, Google, Microsoft, and Mozilla) would be involved. Involvement of experts from outside those companies would also be desirable.

This is a major political and organizational challenge, but it would avoid bias (both in appearance and reality), and would likely lead to a higher-quality suite, because low-quality benchmarks are likely to be shot down.

There should be a public submissions process, where people submit web
apps/sites for consideration.  And obviously there needs to be a selection
process to go with that.

The benchmark suite should be open source and access should be free of
charge.  It should be hosted on a public site not owned by a browser vendor.

There should also be an official process for modifying the suite: modifying benchmarks, removing benchmark, and adding new benchmarks.  This process
shouldn’t be too fluid.  Perhaps allowing changes once per year would be reasonable. The current trend of continually augmenting old suites should be resisted.

High-quality

All benchmarks should be challenging, and based on real website/apps;  they should all belong to category 1 or 2 on the above scale.

Each benchmark should have two or three inputs of different sizes.  Only the
largest input would be used in official runs, but the smaller inputs
would be useful for browser developers for testing and profiling purposes.

All benchmarks should have side-effects of some kind that depend on most or
all of the computation.  There should be no large chunks of computation whose
results are unused, because such benchmarks are subject to silly results due
to certain optimizations such as dead code elimination.

Categorization

Any categorization of benchmarks should not be done by browser subsystems. E.g. there shouldn’t be groups like “DOM”, “CSS”, “hardware acceleration”, “code loading” because that encourages artificial benchmarks that stress a single
subsystem. Instead, any categories should be based on application domain:
“games”, “productivity”, “multimedia”, “social”, etc. If a grouping like
that ends up stressing some browser subsystems more than others, well that’s
fine, because it means that’s what real websites/apps are doing (assuming the
benchmark selection is balanced).

There is one possible exception to this rule, which is the JavaScript engine. At least some of the benchmarks will be dominated by JS execution time, and because JS engines can be built standalone, the relevant JS engine
teams will inevitably want to separate JS code from everything else so that
the code can be run in JS shells. It may be worthwhile to pre-emptively do that, where appropriate. If that were done, you wouldn’t run these JS-only sub-applications as part of an official suite run, but they would be present in the repository.

Scoring

Scoring should be as straightforward as possible. Times are best because their meaning is more obvious than anything else, *especially* when multiple
times are combined. Opaque points systems are awful.

Using times for everything makes most sense when benchmarks are all
throughput-oriented.  If some benchmarks are latency-oriented (or FPS, or
something else) this becomes trickier.  Even still, a simple system that humans can intuitively understand is best.

Conclusion

It’s time the browser world paid attention to the performance and benchmarking lessons that the CPU and C/C++/Fortran worlds learned years ago. It’s time to grow up.

If you want to learn more, I highly recommend getting your hands on a copy of Hennessy & Patterson’s Computer Architecture and reading the section on measuring performance. This is section 1.5 (“Measuring and Reporting Performance”) in the 3rd edition, but I expect later editions have similar sections.

MemShrink’s 3rd birthday

June 14, 2014 was the third anniversary of the first MemShrink meeting. MemShrink is a mature effort at this point, and many of the problems that motivated its creation have been fixed. Nonetheless, there are still some areas for improvement. So, as I did at this time last year, I’ll take the opportunity to update the “big ticket items” list.

The Old Big Ticket Items List

#5: pdf.js

pdf.js is the PDF viewer that now ships by default in Firefox. I greatly reduced its memory usage in two rounds, first in February, and again in June. The first round of improvements were released in Firefox 29, and the second round is due to be released in Firefox 33, which should be out in mid-October.

pdf.js will still use more memory than native PDF viewers, but the situation has improved enough that it can be removed from this list.

#4: Dev tools

See below.

#3: B2G Nuwa

Cervantes Yu, Thinker Li, and others succeeded in landing Nuwa, an impressive technical achievement that increased the amount of memory sharing between different processes on Firefox OS. This allows many more apps to run at once. I think this is present in Firefox OS 1.3 and later.

#2: Compacting Generational GC

See below.

#1: Better Foreground Tab Image Handling

Timothy Nikkel completely fixed the massive spikes in decoded image data that we used to get on image-heavy pages. This was a fantastic improvement that shipped in Firefox 26.

The New Big Ticket Items List

In a break from tradition, I will present the items in alphabetical order, rather than ranking them. This reflects the fact that I don’t have a clear opinion on the relevant importance of these different items. (I view this as a good thing, because it means that I feel there aren’t any hair-on-fire problems.)

Better regression detection

areweslimyet.com (a.k.a. AWSY) is our best tool for detecting memory usage regressions. It currently tracks Firefox and Firefox for Android, and there are plans to integrate Firefox OS.

Unfortunately, although AWSY has been successful at detecting large regressions, leading to them being fixed, its measurements are noisy enough that detecting smaller regressions is difficult. As a result, the general trend of the graphs has been upward. I do not think this is as bad as it looks at first glance, because Firefox’s behaviour in the worst cases — which matter more than the average case — is much better than it used to be. Nonetheless, improved sensitivity here would be an excellent thing, and Eric Rahm is actively working on this.

Developer tools

Developer-oriented memory profiling tools are under active development by Nick Fitzgerald, Jim Blandy, and others. A lot of the necessary profiling infrastructure is in place, and this seems to be progressing well, though I don’t know when it’s expected to be finished.

By the way, if you haven’t tried Firefox’s dev tools recently, you should! They have improved an incredible amount in the past year or so.

GC Arena Fragmentation

Generational GC landed a few months ago. I had been hoping that this would help reduce GC fragmentation, but the effect was small. It looks like compacting GC is what’s needed to make a big difference here, though some smaller tweaks may help things a little.

Tarako

Tarako is the codename for the “$25 phone” running Firefox OS that will ship in India and other countries later this year. It has a paltry 128 MiB of RAM, so memory usage is critical. Since the hardware first became available to Mozilla employees at the start of the year, Tarako has improved from “almost unusable” to “not bad”. But apps still close due to OOMs fairly frequently, especially when a user does something that involves two apps working together in some way.

There is no single change that will decisively fix this problem for Tarako;  further improvements will require an ongoing grind of work from many engineers. Improvements made for Tarako are also likely to benefit higher-end Firefox OS devices as well.

Windows OOM crashes

The number of out-of-memory (OOM) crashes that occur on Windows is relatively high. A sizeable fraction of these appear to be 32-bit virtual OOM crashes, caused by running out of address space.

Things that will mitigate this include modifications to jemalloc and the JS allocator, as well as Electrolysis. And some additional data and analysis may help us understand the problem better.

The ultimate solution, for users on 64-bit versions of Windows, is 64-bit Firefox builds for Windows, though it’ll be a while before those builds are in good enough shape to ship to regular users.

Summary

Three items from the old list (pdf.js, Nuwa, image handling) have been ticked off.  Two items remain (devtools, GC fragmentation), the latter in altered form, and both are a lot closer to completion than they were. Three new items (regression detection, Tarako, and Windows OOMs) have been added.

Let me know if I’ve omitted anything important!

An even slimmer pdf.js

TL;DR: Firefox’s built-in PDF viewer is on track to benefit from additional large reductions in memory consumption when Firefox 33 is released in mid-October.

Earlier this year I made some major improvements to the memory usage and speed of pdf.js, Firefox’s built-in PDF viewer. At the time I was quite pleased with myself. I had picked all the low-hanging fruit and reached a point of diminishing returns.

Some unexpected measurements

Shortly after that, while on a plane I tried pdf.js on my Mac laptop. I was horrified to find that memory usage was much higher than on the Linux desktop machine on which I had done my optimizations. The first document I measured made Firefox’s RSS (resident set size, a.k.a. physical memory usage) peak at over 1,000 MiB. The corresponding number on my Linux desktop had been 400 MiB! What was going on?

The short answer is “canvases”. pdf.js uses HTML canvas elements to render each PDF page. These can be millions of pixels in size, and each canvas pixel takes 32-bits of RGBA data. But on Linux this data is not stored within the Firefox process. Instead, it gets passed to the X server. Firefox’s about:memory page does report the canvas usage under the “canvas-2d-pixels” and “gfx-surface-xlib” labels, but I hadn’t seen those — during my optimization work I had only measured the RSS of the Firefox process. RSS is usually the best single number to focus on, but in this case it was highly misleading.

In contrast, on Mac the canvas data is stored within the process, which is why Firefox’s RSS is so much higher on Mac. (There’s also the fact that my MacBook has a retina screen, which means the canvases used have approximately four times as many pixels as the canvases used on my Linux machine.)

Fixing the problem

It turns out that pdf.js was intentionally caching an overly generous number of canvases (20) and then unintentionally failing to dispose of them in a timely manner. This could result in hundreds of canvases being held onto unnecessarily if you scrolled quickly through a large document. On my MacBook each canvas is over 20 MiB, so the resultant memory spike could be enormous.

Happily, I was able to fix this behaviour with four tiny patches. pdf.js now caches only 10 canvases, and disposes of any excess ones immediately.

Measurements

I measured the effects of both rounds of optimization on the following three documents on my Mac.

  • TraceMonkey: This is an academic paper about the TraceMonkey JIT compiler. It’s a 14 page Latex document containing several pictures and graphs.
  • Telefónica: This is the 2009 annual report from Telefónica. It’s a 122 page full-colour document with many graphs, tables and pictures.
  • Decontamination: This is an old report about the decontamination of an industrial site in New Jersey. It’s a 226 page scan of a paper document that mostly consists of black and white typewritten text and crude graphs, with only few pages featuring colour.

For each document, I measure the peak RSS while scrolling quickly from the first page to the last by holding down the “fn” and down keys at the same time.  Although this is a stress test, it’s not unrealistic; it’s easy to imagine people quickly scrolling through 10s or even 100s of pages of a PDF to find a particular page.

I did this three times for each document and picked the highest value. The results are shown by the following graph. The “Fx28″ bars show the original peak RSS values; the “Fx29″ bars show the peak RSS values after my first round of optimizations; and the “Fx33″ bars show the peak RSS values after my second round of optimizations.

pdfjs

Comparing Firefox 28 and Firefox 33, the reductions in peak RSS for the three documents are 1.2x, 4.4x, and 7.8x. It makes sense that the relative improvements increase with the length of the documents.

Despite these major improvements, pdf.js still uses substantially more memory than native PDF viewers. So we need to keep chipping away… but it’s also worth recognizing how far we’ve come.

Status

These patches have landed in the master pdf.js repository. They have not yet imported into Firefox’s code, but this will happen at some point during the development cycle for Firefox 33. Firefox 33 is on track to be released in mid-October.

areweslimyet.com data is now exportable and diffable

areweslimyet.com (a.k.a. AWSY) tracks Firefox’s memory usage on a basic workload that opens lots of websites. It’s not a perfect tool — you shouldn’t consider its measurements as a reliable proxy for Firefox’s memory usage in general — but it does help detect regressions.

One thing it doesn’t support is doing diffs between separate runs. Until now, that is! Thanks to work done by Eric Rahm, it’s now possible to download the data for each snapshot done during a run. This file can then be loaded in about:memory. It’s also possible to download the data for two snapshots and diff them in about:memory. Yay! This diff workflow isn’t super slick, as it requires the downloading of two files and then the loading of them in about:memory. But it’s a lot better than manually eyeballing two sets of data in two separate AWSY tabs, which was the best we could do previously. Furthermore, AWSY and about:memory already duplicate some functionality, and this implementation avoids increasing the amount of duplication.

To do the export, select a single run (zooming in on the graph appropriately) and click on the red “[export]” link next to the appropriate snapshot, as seen in the following screenshot.

Screenshot of AWSY showing the export links

Once it has finished generating the data, the “[export]” link changes to “[download]“, and you can click on it again to do the download.

This is a first step towards improving AWSY so that it can detect memory usage regressions with much higher sensitivity than it currently has.

Why do new MacBooks ship with the firewall off by default?

I just got a new MacBook Pro. It’s the fifth one I’ve had since 2005, and as usual the hardware is gorgeous, and migrating from the old laptop was a breeze.

But there’s one thing that boggles my mind about the default system configuration…

Dialog box showing the firewall off by default

…the firewall is off by default. It was off by default on my previous MacBook Pro too. (I have a short file describing the steps I took when migrating to the old laptop, and “Security & Privacy: turn on firewall(!)” was one.)

My wife bought a MacBook Air a couple of months and I just checked and the firewall is disabled on it, too.

I admit I am not a world-class expert on matters of network security. Is this totally insane and negligent, or is there something I’m missing?

AdBlock Plus’s effect on Firefox’s memory usage

[Update: Wladimir Palant has posted a response on the AdBlock Plus blog. Also, a Chrome developer using the handle "Klathmon" has posted numerous good comments in the Reddit discussion of this post, explaining why ad-blockers are inherently CPU- and memory-intensive, and why integrating ad-blocking into a browser wouldn't necessarily help.]

AdBlock Plus (ABP) is the most popular add-on for Firefox. AMO says that it has almost 19 million users, which is almost triple the number of the second most popular add-on. I have happily used it myself for years — whenever I use a browser that doesn’t have an ad blocker installed I’m always horrified by the number of ads there are on the web.

But we recently learned that ABP can greatly increase the amount of memory used by Firefox.

First, there’s a constant overhead just from enabling ABP of something like 60–70 MiB. (This is on 64-bit builds; on 32-bit builds the number is probably a bit smaller.) This appears to be mostly due to additional JavaScript memory usage, though there’s also some due to extra layout memory.

Second, there’s an overhead of about 4 MiB per iframe, which is mostly due to ABP injecting a giant stylesheet into every iframe. Many pages have multiple iframes, so this can add up quickly. For example, if I load TechCrunch and roll over the social buttons on every story (thus triggering the loading of lots of extra JS code), without ABP, Firefox uses about 194 MiB of physical memory. With ABP, that number more than doubles, to 417 MiB. This is despite the fact that ABP prevents some page elements (ads!) from being loaded.

An even more extreme example is this page, which contains over 400 iframes. Without ABP, Firefox uses about 370 MiB. With ABP, that number jumps to 1960 MiB. Unsurprisingly, the page also loads more slowly with ABP enabled.

So, it’s clear that ABP greatly increases Firefox’s memory usage. Now, this isn’t all bad. Many people (including me!) will be happy with this trade-off — they will gladly use extra memory in order to block ads. But if you’re using a low-end machine without much memory, you might have different priorities.

I hope that the ABP authors can work with us to reduce this overhead, though I’m not aware of any clear ideas on how to do so. In the meantime, it’s worth keeping these measurements in mind. In particular, if you hear people complaining about Firefox’s memory usage, one of the first questions to ask is whether they have ABP installed.

[A note about the comments: I have deleted 17 argumentative, repetitive, borderline-spam comments from a single commenter -- after giving him a warning via email -- and I will delete any further comments from him on this post. As a result, I also had to delete three replies to his comments from others, for which I apologize.]

A story about Brendan Eich

I attended a Mozilla work week a couple of years ago at Mozilla’s Mountain View office. There was a dinner event in San Francisco and, by chance, I ended up in Andreas Gal’s car, along with Brendan Eich and someone else (who, alas, I cannot remember now).

The destination was the California Academy of Sciences, a science museum in San Francisco, which was about a 45 minute drive away. Off we headed. Unfortunately, we headed off without closely checking where our destination was, and we somehow got the Academy of Sciences confused with the Exploratorium, another science museum in San Francisco. When we arrived and found it closed, we had to regroup.

Andreas confidently interrogated his car’s GPS unit and procured a new address that fortunately wasn’t too far away. Fifteen minutes later, we found ourselves in a residential area, outside a building that obviously wasn’t going to be hosting a dinner for several dozen MoCo employees.

Andreas again consulted his GPS unit for a new address. Unfortunately, this one
was on the far side of the city. Undeterred, we crawled through early-evening
traffic in the busiest parts of San Francisco — I’m pretty sure we actually
passed Union Square — to another address. Again, as soon as we laid eyes upon it, it clearly wasn’t the right destination.

It turns out there are several institutions in San Francisco with the words
“Academy” and “Science” or “Sciences” in their names, and we were doing a tour of all the wrong ones. On our fourth roll of the dice, Andreas found what ultimately was the correct address, and we crawled back to our final destination, which turned out — groan — to be not that far from the Exploratorium. We staggered in, two hours after we started, eliciting several comments of “what on earth took you guys so long?”

I remember being frustrated at the time — Andreas and Brendan were locals!
They should have known better. But now…

I’ve worked for Mozilla for over five years, but I visit California infrequently, and I’ve only had a chance to talk with Brendan in person a few times. The only
thing I remember from the conversation during the car trip is that at one point we were talking about the US economy and Brendan made a confident proclamation about the bond market — I can’t even remember what it was — that I wasn’t sure I agreed with but I wasn’t sure I could explain why I disagreed. It’s funny the details that stick.

This conversation was with Brendan the person — not Brendan the CTO, not
Brendan the inventor of JavaScript, not my boss’s boss’s boss, and not somebody who made a donation. Just Brendan, a person who knew a lot of stuff, had some interesting experiences and some strong opinions, and was good to chat to. It’s a small story, but it’s one I’ll remember.

Comments on this post are open, but be warned that I will delete without hesitation any comments that re-hash the CEO controversy of the past two weeks, or that I find rude or objectionable in any way. If you want to discuss the controversy, or be rude or objectionable, there are many other places on the web that you can do so.