Category Archives: Firefox

A work-around for Tree Style Tab breakage on Firefox Nightly caused by mozRequestAnimationFrame removal

This post is aimed at Firefox Nightly users who also use the Tree Style Tab extension. Bug 909154 landed last week. It removed support for the prefixed mozRequestionAnimationFrame function, and broke Tree Style Tab. The GitHub repository that hosts Tree Style Tab’s code has been updated, but that has not yet made it into the latest Tree Style Tab build, which has version number 0.15.2015061300a003855.

Fortunately, it’s fairly easy to modify your installed version of Tree Style Tab to fix this problem. (“Fairly easy”, at least, for the technically-minded users who run Firefox Nightly.)

  • Find the Tree Style Tabs .xpi file. On my Linux machine, it’s at ~/.mozilla/firefox/ndbcibpq.default-1416274259667/extensions/treestyletab@piro.sakura.ne.jp.xpi. Your profile name will not be exactly the same. (In general, you can find your profile with these instructions.)
  • That file is a zip file. Edit the modules/lib/animationManager.js file within that file, and change the two occurrences of mozRequestAnimationFrame to requestAnimationFrame. Save the change.

I did the editing in vim, which was easy because vim has the ability to edit zip files in place. If your editor does not support that, it might work if you unzip the code, edit the file directly, and then rezip, but I haven’t tried that myself. Good luck.

Compacting GC

Go read Jon Coppeard’s description of the compacting GC algorithm now used by SpiderMonkey!

Firefox 41 will use less memory when running AdBlock Plus

Last year I wrote about AdBlock Plus’s effect on Firefox’s memory usage. The most important part was the following.

First, there’s a constant overhead just from enabling ABP of something like 60–70 MiB. […] This appears to be mostly due to additional JavaScript memory usage, though there’s also some due to extra layout memory.

Second, there’s an overhead of about 4 MiB per iframe, which is mostly due to ABP injecting a giant stylesheet into every iframe. Many pages have multiple iframes, so this can add up quickly. For example, if I load TechCrunch and roll over the social buttons on every story […], without ABP, Firefox uses about 194 MiB of physical memory. With ABP, that number more than doubles, to 417 MiB.

An even more extreme example is this page, which contains over 400 iframes. Without ABP, Firefox uses about 370 MiB. With ABP, that number jumps to 1960 MiB.

(This description was imprecise; the overhead is actually per document, which includes both top-level documents in a tab and documents in iframes.)

Last week Mozilla developer Cameron McCormack landed patches to fix bug 77999, which was filed more than 14 years ago. These patches enable sharing of CSS-related data — more specifically, they add data structures that share the results of cascading user agent style sheets — and in doing so they entirely fix the second issue, which is the more important of the two.

For example, on the above-mentioned “extreme example” (a.k.a. the Vim Color Scheme Test) memory usage dropped by 3.62 MiB per document. There are 429 documents on that page, which is a total reduction of about 1,550 MiB, reducing memory usage for that page down to about 450 MiB, which is not that much more than when AdBlock Plus is absent. (All these measurements are on a 64-bit build.)

I also did measurements on various other sites and confirmed the consistent saving of ~3.6 MiB per document when AdBlock Plus is enabled. The number of documents varies widely from page to page, so the exact effect depends greatly on workload. (I wanted to test TechCrunch again, but its front page has been significantly changed so it no longer triggers such high memory usage.) For example, for one of my measurements I tried opening the front page and four articles from each of nytimes.com, cnn.com and bbc.co.uk, for a total of 15 tabs. With Cameron’s patches applied Firefox with AdBlock Plus used about 90 MiB less physical memory, which is a reduction of over 10%.

Even when AdBlock Plus is not enabled this change has a moderate benefit. For example, in the Vim Color Scheme Test the memory usage for each document dropped by 0.09 MiB, reducing memory usage by about 40 MiB.

If you want to test this change out yourself, you’ll need a Nightly build of Firefox and a development build of AdBlock Plus. (Older versions of AdBlock Plus don’t work with Nightly due to a recent regression related to JavaScript parsing). In Firefox’s about:memory page you’ll see the reduction in the “style-sets” measurements. You’ll also see a new entry under “layout/rule-processor-cache”, which is the measurement of the newly shared data; it’s usually just a few MiB.

This improvement is on track to make it into Firefox 41, which is scheduled for release on September 22, 2015. For users on other release channels, Firefox 41 Beta is scheduled for release on August 11, and Firefox 41 Developer Edition is scheduled to be released in the next day or two.

Measuring data structure sizes: Firefox (C++) vs. Servo (Rust)

Firefox’s about:memory page presents fine-grained measurements of memory usage. Here’s a short example.

725.84 MB (100.0%) -- explicit
├──504.36 MB (69.49%) -- window-objects
│ ├──115.84 MB (15.96%) -- top(https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound, id=2147483655)
│ │ ├───85.30 MB (11.75%) -- active
│ │ │ ├──84.75 MB (11.68%) -- window(https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound)
│ │ │ │ ├──36.51 MB (05.03%) -- dom
│ │ │ │ │ ├──16.46 MB (02.27%) ── element-nodes
│ │ │ │ │ ├──13.08 MB (01.80%) ── orphan-nodes
│ │ │ │ │ └───6.97 MB (00.96%) ++ (4 tiny)
│ │ │ │ ├──25.17 MB (03.47%) -- js-compartment(https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound)
│ │ │ │ │ ├──23.29 MB (03.21%) ++ classes
│ │ │ │ │ └───1.87 MB (00.26%) ++ (7 tiny)
│ │ │ │ ├──21.69 MB (02.99%) ++ layout
│ │ │ │ └───1.39 MB (00.19%) ++ (2 tiny)
│ │ │ └───0.55 MB (00.08%) ++ window(https://login.persona.org/communication_iframe)
│ │ └───30.54 MB (04.21%) ++ js-zone(0x7f131ed6e000)

A typical about:memory invocation contains many thousands of measurements. Although they can be hard for non-experts to interpret, they are immensely useful to Firefox developers. For this reason, I’m currently implementing a similar system in Servo, which is a next-generation browser engine that’s implemented in Rust. Although the implementation in Servo is heavily based on the Firefox implementation, Rust has some features that make the Servo implementation a lot nicer than the Firefox implementation, which is written in C++. This blog post is a deep dive that explains how and why.

Measuring data structures in Firefox

A lot of the measurements done for about:memory are of heterogeneous data structures that live on the heap and contain pointers. We want such data structures to be able to measure themselves. Consider the following simple example.

struct CookieDomainTuple
{
  nsCookieKey key;
  nsRefPtr<nsCookie> cookie;
 
  size_t SizeOfExcludingThis(mozilla::MallocSizeOf aMallocSizeOf) const;
};

The things to immediately note about this type are as follows.

  • The details of nsCookieKey and nsCookie don’t matter here.
  • nsRefPtr is a smart pointer type.
  • There is a method, called SizeOfExcludingThis, for measuring the size of a CookieDomainTuple.

That measurement method has the following form.

size_t
CookieDomainTuple::SizeOfExcludingThis(MallocSizeOf aMallocSizeOf) const
{
  size_t amount = 0;
  amount += key.SizeOfExcludingThis(aMallocSizeOf);
  amount += cookie->SizeOfIncludingThis(aMallocSizeOf);
  return amount;
}

Things to note here are as follows.

  • aMallocSizeOf is a pointer to a function that takes a pointer to a heap block and returns the size of that block in bytes. Under the covers it’s implemented with a function like malloc_usable_size. Using a function like this is superior to computing the size analytically, because (a) it’s less error-prone and (b) it measures the actual size of heap blocks, which is often larger than the requested size because heap allocators round up some request sizes. It will also naturally measure any padding between members.
  • The two data members are measured by invocations to size measurement methods that they provide.
  • The first of these is called SizeOfExcludingThis. The “excluding this” here is necessary because key is an nsCookieKey that sits within a CookieDomainTuple. We don’t want to measure the nsCookieKey struct itself, just any additional heap blocks that it has pointers to.
  • The second of these is called SizeOfIncludingThis. The “including this” here is necessary because cookie is just a pointer to an nsCookie struct, which we do want to measure, along with any additional heap blocks it has pointers to.
  • We need to be careful with these calls. If we call SizeOfIncludingThis when we should call SizeOfExcludingThis, we’ll likely get a crash due to calling aMallocSizeOf on a non-heap pointer. And if we call SizeOfExcludingThis when we should call SizeOfIncludingThis, we’ll miss measuring the struct.
  • If this struct had a pointer to a raw heap buffer — e.g. a char* member — it would measure it by calling aMallocSizeOf directly on the pointer.

With that in mind, you can see that this method is itself a SizeOfExcludingThis method, and indeed, it doesn’t measure the memory used by the struct instance itself. A method that did include that memory would look like the following.

size_t
CookieDomainTuple::SizeOfIncludingThis(MallocSizeOf aMallocSizeOf)
{ 
  return aMallocSizeOf(this) + SizeOfExcludingThis(aMallocSizeOf);
}

All it does is measure the CookieDomainTuple struct itself — i.e. this — and then call the SizeOfExcludingThis method, which measures all child structures.

There are a few other wrinkles.

  • Often we want to ignore a data member. Perhaps it’s a scalar value, such as an integer. Perhaps it’s a non-owning pointer to something and that thing would be better measured as part of the measurement of another data structure. Perhaps it’s something small that isn’t worth measuring. In these cases we generally use comments in the measurement method to explain why a field isn’t measured, but it’s easy for these comments to fall out-of-date. It’s also easy to forget to update the measurement method when a new data member is added.
  • Every SizeOfIncludingThis method body looks the same: return aMallocSizeOf(this) + SizeOfExcludingThis(aMallocSizeOf);
  • Reference-counting complicates things, because you end up with pointers that conceptually own a fraction of another structure.
  • Inheritance complicates things.

(The full documentation goes into more detail.)

Even with all the wrinkles, it all works fairly well. Having said that, there are a lot of SizeOfExcludingThis and SizeOfIncludingThis methods that are boilerplate-y and tedious to write.

Measuring data structures in SERVO

When I started implementing a similar system in Servo, I naturally followed a similar design. But I soon found I was able to improve upon it.

With the same functions defined for lots of types, it was natural to define a Rust trait, like the following.

pub trait HeapSizeOf { 
  fn size_of_including_self(&self) -> usize;
  fn size_of_excluding_self(&self) -> usize; 
}

Having to repeatedly define size_of_including_self when its definition always looks the same is a pain. But heap pointers in Rust are handled via the parameterized Box type, and it’s possible to implement traits for this type. This means we can implement size_of_excluding_this for all Box types — thus removing the need for size_of_including_this — in one fell swoop, as the following code shows.

impl<T: HeapSizeOf> HeapSizeOf for Box<T> {
  fn size_of_excluding_self(&self) -> usize {
    heap_size_of(&**self as *const T as *const c_void) + (**self).size_of_excluding_self()
  }
}

The pointer manipulations are hairy, but basically it says that if T implements the HeapSizeOf trait, then we can measure Box<T> by measuring the T struct itself (via heap_size_of, which is similar to the aMallocSizeOf function in the Firefox example), and then measuring the things hanging off T (via the size_of_excluding_self call). Excellent!

With the including/excluding distinction gone, I renamed size_of_excluding_self as heap_size_of_children, which I thought communicated the same idea more clearly; it seems better for the name to describe what it is measuring rather than what it is not measuring.

But there was still a need for a lot of tedious boilerplate code, as this example shows.

pub struct DisplayList {
  pub background_and_borders: LinkedList<DisplayItem>,
  pub block_backgrounds_and_borders: LinkedList<DisplayItem>,
  pub floats: LinkedList<DisplayItem>,
  pub content: LinkedList<DisplayItem>,
  pub positioned_content: LinkedList<DisplayItem>,
  pub outlines: LinkedList<DisplayItem>,
  pub children: LinkedList<Arc<StackingContext>>,
}

impl HeapSizeOf for DisplayList {
  fn heap_size_of_children(&self) -> usize {
    self.background_and_borders.heap_size_of_children() +
    self.block_backgrounds_and_borders.heap_size_of_children() +
    self.floats.heap_size_of_children() +
    self.content.heap_size_of_children() +
    self.positioned_content.heap_size_of_children() +
    self.outlines.heap_size_of_children() +
    self.children.heap_size_of_children()
  }
}

However, the Rust compiler has the ability to automatically derive implementations for some built-in traits. Even better, the compiler lets you write plug-ins that do arbitrary transformations of the syntax tree, which makes it possible to write a plug-in that does the same for non-built-in traits on request. And the delightful Manish Goregaokar has done exactly that. This allows the example above to be reduced to the following.

#[derive(HeapSizeOf)]
pub struct DisplayList {
  pub background_and_borders: LinkedList<DisplayItem>,
  pub block_backgrounds_and_borders: LinkedList<DisplayItem>,
  pub floats: LinkedList<DisplayItem>,
  pub content: LinkedList<DisplayItem>,
  pub positioned_content: LinkedList<DisplayItem>,
  pub outlines: LinkedList<DisplayItem>,
  pub children: LinkedList<Arc<StackingContext>>,
}

The first line is an annotation that triggers the plug-in to do the obvious thing: generate a heap_size_of_children definition that just calls heap_size_of_children on all the struct fields. Wonderful!

But you may remember that I mentioned that sometimes in Firefox’s C++ code we want to ignore a particular member. This is also true in Servo’s Rust code, so the plug-in supports an ignore_heap_size annotation which can be applied to any field in the struct definition; the plug-in will duly ignore any such field.

If a new field is added which has a type for which HeapSizeOf has not been implemented, the compiler will complain. This means that we can’t add a new field to a struct and forget to measure it. The ignore_heap_size_of annotation also requires a string argument, which (by convention) holds a brief explanation why the member is ignored, as the following example shows.

#[ignore_heap_size_of = "Because it is a non-owning reference."]
pub image: Arc<Image>,

(An aside: the best way to handle Arc is an open question. If one of the references is clearly the owner, it probably makes sense to count the full size for that one reference. Otherwise, it is probably best to divide the size equally among all the references.)

The plug-in also has a known_heap_size_of! macro that lets us easily dictate the heap size of built-in types (such as integral types, whose heap size is zero). This works because Rust allows implementations of custom traits for built-in types. It provides additional uniformity because built-in types don’t need special treatment. The following line says that all the built-in signed integer types have a heap_size_of_children value of zero.

known_heap_size_of!(0, i8, i16, i32, i64, isize);

Finally, if there is a type for which the measurement needs to do something more complicated, we can still implement heap_size_of_children manually.

Conclusion

The Servo implementation is much nicer than the Firefox implementation, in the following ways.

  •  There is no need for an including/excluding split thanks to trait implementations on Box. This avoids boilerplate some code and makes it impossible to accidentally call the wrong method.
  • Struct fields that use built-in types are handled the same way as all others, because Rust trait implementations can be defined for built-in types.
  • Even more boilerplate is avoided thanks to the compiler plug-in that auto-derives HeapSizeOf implementations; it can even ignore fields.
  • For ignored fields, the required string parameter makes it impossible to forget to explain why the field is ignored.

These are possible due to several powerful language and compiler features of Rust that C++ lacks. There may be some C++ features that could improve the Firefox code — and I’d love to hear suggestions — but it’s never going to be as nice as the Rust code.

Fix your damned data races

Nathan Froyd recently wrote about how he has been using ThreadSanitizer to find data races in Firefox, and how a number of Firefox developers — particular in the networking and JS GC teams — have been fixing these.

This is great news. I want to re-emphasise and re-state one of the points from Nathan’s post, which is that data races are a class of bug that almost everybody underestimates. Unless you have, say, authored a specification of the memory model for a systems programming language, your intuition about the potential impact of many data races is probably wrong. And I’m going to give you three links to explain why.

Hans Boehm’s paper How to miscompile programs with “benign” data races explains very clearly that it’s possible to correctly conclude that a data race is benign at the level of machine code, but it’s almost impossible at the level of C or C++ code. And if you try to do the latter by inspecting the code generated by a C or C++ compiler, you are not allowing for the fact that other compilers (including future versions of the compiler you used) can and will generate different code, and so your conclusion is both incomplete and temporary.

Dmitri Vyukov’s blog post Benign data races: what could possibly go wrong? covers similar ground, giving more examples of how compilers can legitimately compile things in surprising ways. For example, at any point the storage used by a local variable can be temporarily used to hold a different variable’s value (think register spilling). If another thread reads this storage in an racy fashion, it could read the value of an unrelated value.

Finally, John Regehr’s blog has many posts that show how C and C++ compilers take advantage of undefined behaviour to do surprising (even shocking) program transformations, and how the extent of these transformations has steadily increased over time. Compilers genuinely are getting smarter, and are increasingly pushing the envelope of what a language will let them get away with. And the behaviour of a C or C++ programs is undefined in the presence of data races. (This is sometimes called “catch-fire” semantics, for reasons you should be able to guess.)

So, in summary: if you write C or C++ code that deals with threads in Firefox — and that probably includes everybody who writes C or C++ code in Firefox — you should have read at least the first two links I’ve given above. If you haven’t, please do so now. If you have read them and they haven’t made you at least slightly terrified, you should read them again. And if TSan identifies a data race in code that you are familiar with, please take it seriously, and fix it. Thank you.

Cumulative heap profiling in Firefox with DMD

DMD is a tool that I originally created to help identify where new memory reporters should be added to Firefox in order to reduce the “heap-unclassified” measurement in about:memory. (The name is actually short for “Dark Matter Detector”, because we sometimes call the “heap-unclassified” measurement “dark matter“.)

Recently, I’ve modified DMD to become a more general heap profiling tool. It now has three distinct modes of operation.

  1. “Dark matter”: this mode gives you DMD’s original behaviour.
  2. “Live”: this mode tracks all the live blocks on the system heap, and lets you take snapshots at particular points in time.
  3. Cumulative“: this mode tracks all the blocks that have ever been allocated on the system heap, and so gives you information about all the allocations done by Firefox during an entire session.

Most memory profilers (including as about:memory) are snapshot-based, and so work much like DMD’s “live” mode. But “cumulative” mode is equally interesting.

In particular, unlike “live” mode, “cumulative” mode tells you about parts of the code that are responsible for allocating many short-lived heap blocks (sometimes called “heap churn”). Such allocations can hurt performance: allocations and deallocations themselves aren’t free, particularly because they require obtaining a global lock; such code often involves unnecessary initialization or copying of heap data; and if these allocations come in a variety of sizes they can cause additional heap fragmentation.

Another nice thing about cumulative heap profiling is that, unlike live heap profiling, you don’t have to decide when to take snapshots. You can just profile an entire workload of interest and get the results at the end.

I’ve used DMD’s cumulative mode to find inefficiencies in SpiderMonkey’s source compression  and code generation, SQLite, NSS, nsTArray, XDR encoding, Gnome start-up, IPC messaging, nsStreamLoader, cycle collection, and safe-browsing. There are “start doing something clever” optimizations and then there are “stop doing something stupid” optimizations, and every one of these fixes has been one of the latter. Each change has avoided cumulative heap allocations ranging from tens to thousands of MiBs.

It’s typically difficult to quantify any speed-ups from these changes, because the workloads are noisy and non-deterministic, but I’m convinced that simple changes to fix these issues are worthwhile. For one, I used cumulative profiling (via a different tool) to drive the major improvements I made to pdf.js earlier this year. Also, Chrome developers have found that “Chrome in general is *very* close to the threshold where heap lock contention causes noticeable UI lag”.

So far I have only profiled a few simple workloads. There are all sorts of things I haven’t tried: text-heavy pages, image-heavy pages, audio and video, WebRTC, WebGL, popular benchmarks… the list goes on. I intend to do more profiling and fix things where I can, but it would be great to have help from domain experts with this stuff. If you want to try out cumulative heap profiling in Firefox, please read the DMD instructions and feel free to ask me for help. In particular, I now have a good feel for which hot allocations are unavoidable and reasonable — plenty of them, of course — and which are avoidable. Let’s get rid of the avoidable ones.

Better documentation for memory profiling and leak detection tools

Until recently, the documentation for all of Mozilla’s memory profiling and leak detection tools had some major problems.

  • It was scattered across MDN, the Mozilla Wiki, and the Mozilla archive site (yes, really).
  • Documentation for several tools was spread across multiple pages.
  • Documentation for some tools was meagre, non-existent, or overly verbose.
  • Some of the documentation was out of date, e.g. describing tools that no longer exist.

A little while back I fixed these problems.

  • The documentation for these tools is now all on MDN. If you look at the MDN Performance page in the “Memory profiling and leak detection tools” section, you’ll see a brief description of each tool that explains the circumstances in which it is useful, and a link to the relevant documentation.
  • The full list of documented tools includes: about:memory, DMD, areweslimyet.com, BloatView, Refcount tracing and balancing, GC and CC logs, Valgrind, LeakSanitizer, Apple tools, TraceMalloc, Leak Gauge, and LogAlloc.
  • As well as consolidating all the pages in one place, I also improved some of the pages (with the help of people like Andrew McCreight). In particular, about:memory now has reasonably detailed documentation, something it has lacked until now.

Please take a look, and if you see any problems let me know. Or, if you’re feeling confident just fix things yourself! Thanks.

mfbt/SegmentedVector.h

I just landed a new container type called mozilla::SegmentedVector in MFBT. It’s similar to nsTArray and mozilla::Vector, but the the element storage is broken into segments rather than being contiguous. This makes it less flexible than those types — currently you can only append elements and iterate over all elements.

Hoever, in cases where those operations suffice, you should strongly consider using it. It uses multiple moderate-sized storage chunks instead of a single large storage chunk, which greatly reduces the likelihood of OOM crashes, especially on Win32 where large chunks of address space can be difficult to find. (See bug 1096624 for a good example; use of a large nsTArray was triggering ~1% of all OOM crashes.) It also avoids the need for repeatedly allocating new buffers and copying existing data into them as it grows.

The declaration is as follows.

template<typename T,
         size_t IdealSegmentSize,
         typename AllocPolicy = MallocAllocPolicy>
class SegmentedVector
  • T is the element type.
  • IdealSegmentSize is the size of each segment, in bytes. It should be a power-of-two (to avoid slop), not too small (so you can fit a reasonable number of elements in each chunk, which keeps the per-segmente book-keeping overhead low) and not too big (so virtual OOM crashes are unlikely). A value like 4,096 or 8,192 or 16,384 is likely to be good.
  • AllocPolicy is the allocation policy. A SegmentedVector can be made infallible by using InfallibleAllocPolicy from mozalloc.h.

If you want to use SegmentedVector but it doesn’t support the operations you need, please talk to me. While it will never be as flexible as a contiguous vector type, there’s definitely scope for adding new operations.

Quantifying the effects of Firefox’s Tracking Protection

A number of people at Mozilla are working on a wonderful privacy initiative called Polaris. This will include activities such as Mozilla hosting its own high-capacity Tor middle relays.

But the part of Polaris I’m most interested in is Tracking Protection, which is a Firefox feature that will make it trivial for users to avoid many forms of online tracking. This not only gives users better privacy; experiments have shown it also speeds up the loading of the median page by 20%! That’s an incredible combination.

An experiment

I decided to evaluate the effectiveness of Tracking Protection. To do this, I used Lightbeam, a Firefox extension designed specifically to record third-party tracking. On November 2nd, I used a trunk build of the mozilla-inbound repository and did the following steps.

  • Start Firefox with a new profile.
  • Install Lightbeam from addons.mozilla.org.
  • Visit the following sites, but don’t interact with them at all:
    1. google.com
    2. techcrunch.com
    3. dictionary.com (which redirected to dictionary.reference.com)
    4. nytimes.com
    5. cnn.com
  • Open Lightbeam in a tab, and go to the “List” view.

I then repeated these steps, but before visiting the sites I added the following step.

  • Open about:config and toggle privacy.trackingprotection.enabled to
    “true”.

Results with Tracking Protection turned off

The sites I visited directly are marked as “Visited”. All the third-party sites are marked as “Third Party”.

Connected with 86 sites

Type            Website                Sites Connected
----            -------                ---------------
Visited         google.com              3
Third Party     gstatic.com             5
Visited         techcrunch.com         25
Third Party     aolcdn.com              1
Third Party     wp.com                  1
Third Party     gravatar.com            1
Third Party     wordpress.com           1
Third Party     twitter.com             4
Third Party     google-analytics.com    3
Third Party     scorecardresearch.com   6
Third Party     aol.com                 1
Third Party     questionmarket.com      1
Third Party     grvcdn.com              1
Third Party     korrelate.net           1
Third Party     livefyre.com            1
Third Party     gravity.com             1
Third Party     facebook.net            1
Third Party     adsonar.com             1
Third Party     facebook.com            4
Third Party     atwola.com              4
Third Party     adtech.de               1
Third Party     goviral-content.com     7
Third Party     amgdgt.com              1
Third Party     srvntrk.com             2
Third Party     voicefive.com           1
Third Party     bluekai.com             1
Third Party     truste.com              2
Third Party     advertising.com         2
Third Party     youtube.com             1
Third Party     ytimg.com               1
Third Party     5min.com                1
Third Party     tacoda.net              1
Third Party     adadvisor.net           2
Third Party     dictionary.com          1
Visited         reference.com          32
Third Party     sfdict.com              1
Third Party     amazon-adsystem.com     1
Third Party     thesaurus.com           1
Third Party     quantserve.com          1
Third Party     googletagservices.com   1
Third Party     googleadservices.com    1
Third Party     googlesyndication.com   3
Third Party     imrworldwide.com        3
Third Party     doubleclick.net         5
Third Party     legolas-media.com       1
Third Party     googleusercontent.com   1
Third Party     exponential.com         1
Third Party     twimg.com               1
Third Party     tribalfusion.com        2
Third Party     technoratimedia.com     2
Third Party     chango.com              1
Third Party     adsrvr.org              1
Third Party     exelator.com            1
Third Party     adnxs.com               1
Third Party     securepaths.com         1
Third Party     casalemedia.com         2
Third Party     pubmatic.com            1
Third Party     contextweb.com          1
Third Party     yahoo.com               1
Third Party     openx.net               1
Third Party     rubiconproject.com      2
Third Party     adtechus.com            1
Third Party     load.s3.amazonaws.com   1
Third Party     fonts.googleapis.com    2
Visited         nytimes.com            21
Third Party     nyt.com                 2
Third Party     typekit.net             1
Third Party     newrelic.com            1
Third Party     moatads.com             2
Third Party     krxd.net                2
Third Party     dynamicyield.com        2
Third Party     bizographics.com        1
Third Party     rfihub.com              1
Third Party     ru4.com                 1
Third Party     chartbeat.com           1
Third Party     ixiaa.com               1
Third Party     revsci.net              1
Third Party     chartbeat.net           2
Third Party     agkn.com                1
Visited         cnn.com                14
Third Party     turner.com              1
Third Party     optimizely.com          1
Third Party     ugdturner.com           1
Third Party     akamaihd.net            1
Third Party     visualrevenue.com       1
Third Party     batpmturner.com         1

Results with Tracking Protection turned on

Connected with 33 sites

Visited         google.com              3
Third Party     google.com.au           0
Third Party     gstatic.com             1
Visited         techcrunch.com         12
Third Party     aolcdn.com              1
Third Party     wp.com                  1
Third Party     wordpress.com           1
Third Party     gravatar.com            1
Third Party     twitter.com             4
Third Party     grvcdn.com              1
Third Party     korrelate.net           1
Third Party     livefyre.com            1
Third Party     gravity.com             1
Third Party     facebook.net            1
Third Party     aol.com                 1
Third Party     facebook.com            3
Third Party     dictionary.com          1
Visited         reference.com           5
Third Party     sfdict.com              1
Third Party     thesaurus.com           1
Third Party     googleusercontent.com   1
Third Party     twimg.com               1
Visited         nytimes.com             3
Third Party     nyt.com                 2
Third Party     typekit.net             1
Third Party     dynamicyield.com        2
Visited         cnn.com                 7
Third Party     turner.com              1
Third Party     optimizely.com          1
Third Party     ugdturner.com           1
Third Party     akamaihd.net            1
Third Party     visualrevenue.com       1
Third Party     truste.com              1

 Discussion

86 site connections were reduced to 33. No wonder it’s a performance improvement as well as a privacy improvement. The only effect I could see on content was that some ads on some of the sites weren’t shown; all the primary site content was still present.

google.com was the only site that didn’t trigger Tracking Protection (i.e. the shield icon didn’t appear in the address bar).

The results are quite variable. When I repeated the experiment the number of third-party sites without Tracking Protection was sometimes as low as 55, and with Tracking Protection it was sometimes as low as 21. I’m not entirely sure what causes the variation.

If you want to try this experiment yourself, note that Lightbeam was broken by a recent change. If you are using mozilla-inbound, revision db8ff9116376 is the one immediate preceding the breakage. Hopefully this will be fixed soon. I also found Lightbeam’s graph view to be unreliable. And note that the privacy.trackingprotection.enabled preference was recently renamed browser.polaris.enabled. [Update: that is not quite right; Monica Chew has clarified the preferences situation in the comments below.]

Finally, Tracking Protection is under active development, and I’m not sure which version of Firefox it will ship in. In the meantime, if you want to try it out, get a copy of Nightly and follow these instructions.

Please grow your buffers exponentially

If you record every heap allocation and re-allocation done by Firefox you find some interesting things. In particular, you find some sub-optimal buffer growth strategies that cause a lot of heap churn.

Think about a data structure that involves a contiguous, growable buffer, such as a string or a vector. If you append to it and it doesn’t have enough space for the appended elements, you need to allocate a new buffer, copy the old contents to the new buffer, and then free the old buffer. realloc() is usually used for this, because it does these three steps for you.

The crucial question: when you have to grow a buffer, how much do you grow it? One obvious answer is “just enough for the new elements”. That might seem  space-efficient at first glance, but if you have to repeatedly grow the buffer it can quickly turn bad.

Consider a simple but not outrageous example. Imagine you have a buffer that starts out 1 byte long and you add single bytes to it until it is 1 MiB long. If you use the “just-enough” strategy you’ll cumulatively allocate this much memory:

1 + 2 + 3 + … + 1,048,575 + 1,048,576 = 549,756,338,176 bytes

Ouch. O(n2) behaviour really hurts when n gets big enough. Of course the peak memory usage won’t be nearly this high, but all those reallocations and copying will be slow.

In practice it won’t be this bad because heap allocators round up requests, e.g. if you ask for 123 bytes you’ll likely get something larger like 128 bytes. The allocator used by Firefox (an old, extensively-modified version of jemalloc) rounds up all requests between 4 KiB and 1 MiB to the nearest multiple of 4 KiB. So you’ll actually allocate approximately this much memory:

4,096 + 8,192 + 12,288 + … + 1,044,480 + 1,048,576 = 134,742,016 bytes

(This ignores the sub-4 KiB allocations, which in total are negligible.) Much better. And if you’re lucky the OS’s virtual memory system will do some magic with page tables to make the copying cheap. But still, it’s a lot of churn.

A strategy that is usually better is exponential growth. Doubling the buffer each time is the simplest strategy:

4,096 + 8,192 + 16,384 + 32,768 + 65,536 + 131,072 + 262,144 + 524,288 + 1,048,576 = 2,093,056 bytes

That’s more like it; the cumulative size is just under twice the final size, and the series is short enough now to write it out in full, which is nice — calls to malloc() and realloc() aren’t that cheap because they typically require acquiring a lock. I particularly like the doubling strategy because it’s simple and it also avoids wasting usable space due to slop.

Recently I’ve converted “just enough” growth strategies to exponential growth strategies in XDRBuffer and nsTArray, and I also found a case in SQLite that Richard Hipp has fixed. These pieces of code now match numerous places that already used exponential growth: pldhash, JS::HashTable, mozilla::Vector, JSString, nsString, and pdf.js.

Pleasingly, the nsTArray conversion had a clear positive effect. Not only did the exponential growth strategy reduce the amount of heap churn and the number of realloc() calls, it also reduced heap fragmentation: the “heap-overhead” part of the purple measurement on AWSY (a.k.a. “RSS: After TP5, tabs closed [+30s, forced GC]”) dropped by 4.3 MiB! This makes sense if you think about it: an allocator can fulfil power-of-two requests like 64 KiB, 128 KiB, and 256 KiB with less waste than it can awkward requests like 244 KiB, 248 KiB, 252 KiB, etc.

So, if you know of some more code in Firefox that uses a non-exponential growth strategy for a buffer, please fix it, or let me know so I can look at it. Thank you.