Categories
DMD Firefox Memory allocation MemShrink Performance

Cumulative heap profiling in Firefox with DMD

DMD is a tool that I originally created to help identify where new memory reporters should be added to Firefox in order to reduce the “heap-unclassified” measurement in about:memory. (The name is actually short for “Dark Matter Detector”, because we sometimes call the “heap-unclassified” measurement “dark matter“.)

Recently, I’ve modified DMD to become a more general heap profiling tool. It now has three distinct modes of operation.

  1. “Dark matter”: this mode gives you DMD’s original behaviour.
  2. “Live”: this mode tracks all the live blocks on the system heap, and lets you take snapshots at particular points in time.
  3. Cumulative“: this mode tracks all the blocks that have ever been allocated on the system heap, and so gives you information about all the allocations done by Firefox during an entire session.

Most memory profilers (including as about:memory) are snapshot-based, and so work much like DMD’s “live” mode. But “cumulative” mode is equally interesting.

In particular, unlike “live” mode, “cumulative” mode tells you about parts of the code that are responsible for allocating many short-lived heap blocks (sometimes called “heap churn”). Such allocations can hurt performance: allocations and deallocations themselves aren’t free, particularly because they require obtaining a global lock; such code often involves unnecessary initialization or copying of heap data; and if these allocations come in a variety of sizes they can cause additional heap fragmentation.

Another nice thing about cumulative heap profiling is that, unlike live heap profiling, you don’t have to decide when to take snapshots. You can just profile an entire workload of interest and get the results at the end.

I’ve used DMD’s cumulative mode to find inefficiencies in SpiderMonkey’s source compressionΒ  and code generation, SQLite, NSS, nsTArray, XDR encoding, Gnome start-up, IPC messaging, nsStreamLoader, cycle collection, and safe-browsing. There are “start doing something clever” optimizations and then there are “stop doing something stupid” optimizations, and every one of these fixes has been one of the latter. Each change has avoided cumulative heap allocations ranging from tens to thousands of MiBs.

It’s typically difficult to quantify any speed-ups from these changes, because the workloads are noisy and non-deterministic, but I’m convinced that simple changes to fix these issues are worthwhile. For one, I used cumulative profiling (via a different tool) to drive the major improvements I made to pdf.js earlier this year. Also, Chrome developers have found that “Chrome in general is *very* close to the threshold where heap lock contention causes noticeable UI lag”.

So far I have only profiled a few simple workloads. There are all sorts of things I haven’t tried: text-heavy pages, image-heavy pages, audio and video, WebRTC, WebGL, popular benchmarks… the list goes on. I intend to do more profiling and fix things where I can, but it would be great to have help from domain experts with this stuff. If you want to try out cumulative heap profiling in Firefox, please read the DMD instructions and feel free to ask me for help. In particular, I now have a good feel for which hot allocations are unavoidable and reasonable — plenty of them, of course — and which are avoidable. Let’s get rid of the avoidable ones.

Categories
about:memory Firefox Memory allocation Memory consumption MemShrink Performance

Better documentation for memory profiling and leak detection tools

Until recently, the documentation for all of Mozilla’s memory profiling and leak detection tools had some major problems.

  • It was scattered across MDN, the Mozilla Wiki, and the Mozilla archive site (yes, really).
  • Documentation for several tools was spread across multiple pages.
  • Documentation for some tools was meagre, non-existent, or overly verbose.
  • Some of the documentation was out of date, e.g. describing tools that no longer exist.

A little while back I fixed these problems.

  • The documentation for these tools is now all on MDN. If you look at the MDN Performance page in the “Memory profiling and leak detection tools” section, you’ll see a brief description of each tool that explains the circumstances in which it is useful, and a link to the relevant documentation.
  • The full list of documented tools includes: about:memory, DMD, areweslimyet.com, BloatView, Refcount tracing and balancing, GC and CC logs, Valgrind, LeakSanitizer, Apple tools, TraceMalloc, Leak Gauge, and LogAlloc.
  • As well as consolidating all the pages in one place, I also improved some of the pages (with the help of people like Andrew McCreight). In particular, about:memory now has reasonably detailed documentation, something it has lacked until now.

Please take a look, and if you see any problems let me know. Or, if you’re feeling confident just fix things yourself! Thanks.

Categories
Data structures Firefox Memory consumption

mfbt/SegmentedVector.h

I just landed a new container type called mozilla::SegmentedVector in MFBT. It’s similar to nsTArray and mozilla::Vector, but the the element storage is broken into segments rather than being contiguous. This makes it less flexible than those types — currently you can only append elements and iterate over all elements.

Hoever, in cases where those operations suffice, you should strongly consider using it. It uses multiple moderate-sized storage chunks instead of a single large storage chunk, which greatly reduces the likelihood of OOM crashes, especially on Win32 where large chunks of address space can be difficult to find. (See bug 1096624 for a good example; use of a large nsTArray was triggering ~1% of all OOM crashes.) It also avoids the need for repeatedly allocating new buffers and copying existing data into them as it grows.

The declaration is as follows.

template<typename T,
         size_t IdealSegmentSize,
         typename AllocPolicy = MallocAllocPolicy>
class SegmentedVector
  • T is the element type.
  • IdealSegmentSize is the size of each segment, in bytes. It should be a power-of-two (to avoid slop), not too small (so you can fit a reasonable number of elements in each chunk, which keeps the per-segmente book-keeping overhead low) and not too big (so virtual OOM crashes are unlikely). A value like 4,096 or 8,192 or 16,384 is likely to be good.
  • AllocPolicy is the allocation policy. A SegmentedVector can be made infallible by using InfallibleAllocPolicy from mozalloc.h.

If you want to use SegmentedVector but it doesn’t support the operations you need, please talk to me. While it will never be as flexible as a contiguous vector type, there’s definitely scope for adding new operations.

Categories
Humans Work habits

Two suggestions for the Portland work week

Mozilla is having a company-wide work week in Portland next week. It’s extremely rare to have this many Mozilla employees in the same place at the same time, and I have two suggestions.

  • Write down a list of people that you want to meet. This will probably contain people you’ve interacted with online but not in person. And send an email to everybody on that list saying “I would like to meet you in person in Portland next week”. I’ve done this at previous work weeks and it has worked well. (And I did it again earlier today.)
  • During the week, don’t do much work you could do at home. This includes most solo coding tasks. If you’re tempted to do such work, stand up and try to find someone to talk to (or listen to) who you couldn’t normally talk to easily. (This is a rule of thumb; if a zero-day security exploit is discovered in your code on Tuesday morning, yes, you should fix it.) Failing that, gee, you might as well do something that you can only do in Portland.

That’s it. Have a great week!

Categories
Firefox Performance Tracking Protection

Quantifying the effects of Firefox’s Tracking Protection

A number of people at Mozilla are working on a wonderful privacy initiative called Polaris. This will include activities such as Mozilla hosting its own high-capacity Tor middle relays.

But the part of Polaris I’m most interested in is Tracking Protection, which is a Firefox feature that will make it trivial for users to avoid many forms of online tracking. This not only gives users better privacy; experiments have shown it also speeds up the loading of the median page by 20%! That’s an incredible combination.

An experiment

I decided to evaluate the effectiveness of Tracking Protection. To do this, I used Lightbeam, a Firefox extension designed specifically to record third-party tracking. On November 2nd, I used a trunk build of the mozilla-inbound repository and did the following steps.

  • Start Firefox with a new profile.
  • Install Lightbeam from addons.mozilla.org.
  • Visit the following sites, but don’t interact with them at all:
    1. google.com
    2. techcrunch.com
    3. dictionary.com (which redirected to dictionary.reference.com)
    4. nytimes.com
    5. cnn.com
  • Open Lightbeam in a tab, and go to the “List” view.

I then repeated these steps, but before visiting the sites I added the following step.

  • Open about:config and toggle privacy.trackingprotection.enabled to
    “true”.

Results with Tracking Protection turned off

The sites I visited directly are marked as “Visited”. All the third-party sites are marked as “Third Party”.

Connected with 86 sites

Type            Website                Sites Connected
----            -------                ---------------
Visited         google.com              3
Third Party     gstatic.com             5
Visited         techcrunch.com         25
Third Party     aolcdn.com              1
Third Party     wp.com                  1
Third Party     gravatar.com            1
Third Party     wordpress.com           1
Third Party     twitter.com             4
Third Party     google-analytics.com    3
Third Party     scorecardresearch.com   6
Third Party     aol.com                 1
Third Party     questionmarket.com      1
Third Party     grvcdn.com              1
Third Party     korrelate.net           1
Third Party     livefyre.com            1
Third Party     gravity.com             1
Third Party     facebook.net            1
Third Party     adsonar.com             1
Third Party     facebook.com            4
Third Party     atwola.com              4
Third Party     adtech.de               1
Third Party     goviral-content.com     7
Third Party     amgdgt.com              1
Third Party     srvntrk.com             2
Third Party     voicefive.com           1
Third Party     bluekai.com             1
Third Party     truste.com              2
Third Party     advertising.com         2
Third Party     youtube.com             1
Third Party     ytimg.com               1
Third Party     5min.com                1
Third Party     tacoda.net              1
Third Party     adadvisor.net           2
Third Party     dictionary.com          1
Visited         reference.com          32
Third Party     sfdict.com              1
Third Party     amazon-adsystem.com     1
Third Party     thesaurus.com           1
Third Party     quantserve.com          1
Third Party     googletagservices.com   1
Third Party     googleadservices.com    1
Third Party     googlesyndication.com   3
Third Party     imrworldwide.com        3
Third Party     doubleclick.net         5
Third Party     legolas-media.com       1
Third Party     googleusercontent.com   1
Third Party     exponential.com         1
Third Party     twimg.com               1
Third Party     tribalfusion.com        2
Third Party     technoratimedia.com     2
Third Party     chango.com              1
Third Party     adsrvr.org              1
Third Party     exelator.com            1
Third Party     adnxs.com               1
Third Party     securepaths.com         1
Third Party     casalemedia.com         2
Third Party     pubmatic.com            1
Third Party     contextweb.com          1
Third Party     yahoo.com               1
Third Party     openx.net               1
Third Party     rubiconproject.com      2
Third Party     adtechus.com            1
Third Party     load.s3.amazonaws.com   1
Third Party     fonts.googleapis.com    2
Visited         nytimes.com            21
Third Party     nyt.com                 2
Third Party     typekit.net             1
Third Party     newrelic.com            1
Third Party     moatads.com             2
Third Party     krxd.net                2
Third Party     dynamicyield.com        2
Third Party     bizographics.com        1
Third Party     rfihub.com              1
Third Party     ru4.com                 1
Third Party     chartbeat.com           1
Third Party     ixiaa.com               1
Third Party     revsci.net              1
Third Party     chartbeat.net           2
Third Party     agkn.com                1
Visited         cnn.com                14
Third Party     turner.com              1
Third Party     optimizely.com          1
Third Party     ugdturner.com           1
Third Party     akamaihd.net            1
Third Party     visualrevenue.com       1
Third Party     batpmturner.com         1

Results with Tracking Protection turned on

Connected with 33 sites

Visited         google.com              3
Third Party     google.com.au           0
Third Party     gstatic.com             1
Visited         techcrunch.com         12
Third Party     aolcdn.com              1
Third Party     wp.com                  1
Third Party     wordpress.com           1
Third Party     gravatar.com            1
Third Party     twitter.com             4
Third Party     grvcdn.com              1
Third Party     korrelate.net           1
Third Party     livefyre.com            1
Third Party     gravity.com             1
Third Party     facebook.net            1
Third Party     aol.com                 1
Third Party     facebook.com            3
Third Party     dictionary.com          1
Visited         reference.com           5
Third Party     sfdict.com              1
Third Party     thesaurus.com           1
Third Party     googleusercontent.com   1
Third Party     twimg.com               1
Visited         nytimes.com             3
Third Party     nyt.com                 2
Third Party     typekit.net             1
Third Party     dynamicyield.com        2
Visited         cnn.com                 7
Third Party     turner.com              1
Third Party     optimizely.com          1
Third Party     ugdturner.com           1
Third Party     akamaihd.net            1
Third Party     visualrevenue.com       1
Third Party     truste.com              1

Β Discussion

86 site connections were reduced to 33. No wonder it’s a performance improvement as well as a privacy improvement. The only effect I could see on content was that some ads on some of the sites weren’t shown; all the primary site content was still present.

google.com was the only site that didn’t trigger Tracking Protection (i.e. the shield icon didn’t appear in the address bar).

The results are quite variable. When I repeated the experiment the number of third-party sites without Tracking Protection was sometimes as low as 55, and with Tracking Protection it was sometimes as low as 21. I’m not entirely sure what causes the variation.

If you want to try this experiment yourself, note that Lightbeam was broken by a recent change. If you are using mozilla-inbound, revision db8ff9116376 is the one immediate preceding the breakage. Hopefully this will be fixed soon. I also found Lightbeam’s graph view to be unreliable. And note that the privacy.trackingprotection.enabled preference was recently renamed browser.polaris.enabled. [Update: that is not quite right; Monica Chew has clarified the preferences situation in the comments below.]

Finally, Tracking Protection is under active development, and I’m not sure which version of Firefox it will ship in. In the meantime, if you want to try it out, get a copy of Nightly and follow these instructions.

Categories
DMD Firefox Memory allocation Memory consumption MemShrink

Please grow your buffers exponentially

If you record every heap allocation and re-allocation done by Firefox you find some interesting things. In particular, you find some sub-optimal buffer growth strategies that cause a lot of heap churn.

Think about a data structure that involves a contiguous, growable buffer, such as a string or a vector. If you append to it and it doesn’t have enough space for the appended elements, you need to allocate a new buffer, copy the old contents to the new buffer, and then free the old buffer. realloc() is usually used for this, because it does these three steps for you.

The crucial question: when you have to grow a buffer, how much do you grow it? One obvious answer is “just enough for the new elements”. That might seemΒ  space-efficient at first glance, but if you have to repeatedly grow the buffer it can quickly turn bad.

Consider a simple but not outrageous example. Imagine you have a buffer that starts out 1 byte long and you add single bytes to it until it is 1 MiB long. If you use the “just-enough” strategy you’ll cumulatively allocate this much memory:

1 + 2 + 3 + … + 1,048,575 + 1,048,576 = 549,756,338,176 bytes

Ouch. O(n2) behaviour really hurts when n gets big enough. Of course the peak memory usage won’t be nearly this high, but all those reallocations and copying will be slow.

In practice it won’t be this bad because heap allocators round up requests, e.g. if you ask for 123 bytes you’ll likely get something larger like 128 bytes. The allocator used by Firefox (an old, extensively-modified version of jemalloc) rounds up all requests between 4 KiB and 1 MiB to the nearest multiple of 4 KiB. So you’ll actually allocate approximately this much memory:

4,096 + 8,192 + 12,288 + … + 1,044,480 + 1,048,576 = 134,742,016 bytes

(This ignores the sub-4 KiB allocations, which in total are negligible.) Much better. And if you’re lucky the OS’s virtual memory system will do some magic with page tables to make the copying cheap. But still, it’s a lot of churn.

A strategy that is usually better is exponential growth. Doubling the buffer each time is the simplest strategy:

4,096 + 8,192 + 16,384 + 32,768 + 65,536 + 131,072 + 262,144 + 524,288 + 1,048,576 = 2,093,056 bytes

That’s more like it; the cumulative size is just under twice the final size, and the series is short enough now to write it out in full, which is nice — calls to malloc() and realloc() aren’t that cheap because they typically require acquiring a lock. I particularly like the doubling strategy because it’s simple and it also avoids wasting usable space due to slop.

Recently I’ve converted “just enough” growth strategies to exponential growth strategies in XDRBuffer and nsTArray, and I also found a case in SQLite that Richard Hipp has fixed. These pieces of code now match numerous places that already used exponential growth: pldhash, JS::HashTable, mozilla::Vector, JSString, nsString, and pdf.js.

Pleasingly, the nsTArray conversion had a clear positive effect. Not only did the exponential growth strategy reduce the amount of heap churn and the number of realloc() calls, it also reduced heap fragmentation: the “heap-overhead” part of the purple measurement on AWSY (a.k.a. “RSS: After TP5, tabs closed [+30s, forced GC]”) dropped by 4.3 MiB! This makes sense if you think about it: an allocator can fulfil power-of-two requests like 64 KiB, 128 KiB, and 256 KiB with less waste than it can awkward requests like 244 KiB, 248 KiB, 252 KiB, etc.

So, if you know of some more code in Firefox that uses a non-exponential growth strategy for a buffer, please fix it, or let me know so I can look at it. Thank you.

Categories
Australia B2G

Firefox OS phones on sale in Australia

Firefox OS phones are now on sale in Australia! You can buy a ZTE Open C with Firefox OS 1.3 installed for $99 (AUD) at JB Hi-Fi. (For non-Australian readers: JB Hi-Fi is probably the biggest electronics and home entertainment retailer in Australia.)

Australia’s not the ideal market for the current versions of Firefox OS, being aΒ  country where a large fraction of people already use high-end phones. But it’s nice that they’re easily available πŸ™‚

Categories
Firefox WebRTC

You should use WebRTC for your 1-on-1 video meetings

Did you know that Firefox 33 (currently in Beta) lets you make a Skype-like video call directly from one running Firefox instance to another without requiring an account with a central service (such as Skype or Vidyo)?

This feature is built on top of Firefox’s WebRTC support, and it’s kind of amazing.

It’s pretty easy to use: just click on the toolbar button that looks like a phone handset or a speech bubble (which one you see depends which version of Firefox you have) and you’ll be given a URL with a call.mozilla.com domain name. [Update: depending on which beta version you have, you might need to set the loop.enabled preference in about:config, and possibly customize your toolbar to make the handset/bubble icon visible.] Send that URL to somebody else — via email, or IRC, or some other means — and when they visit that URL in Firefox 33 (or later) it will initiate a video call with you.

I’ve started using it for 1-on-1 meetings with other Mozilla employees and it works well. It’s nice to finally have an open source implementation of video calling. Give it a try!

Categories
Uncategorized

This is not the security blog

Planet Mozilla’s been a little mixed up for the past few days, claiming that I was the author of posts on the Mozilla Security Blog. The good news is that this problems appears to have been fixed, thanks to Mike Hoye.

However, it’s likely that very few people saw the post I made a few days ago about the new per-class measurements in about:memory. So please take a look if you’re interested in that sort of thing. Thanks.

Categories
about:memory Firefox MemShrink

Per-class JS object and shape measurements in Firefox’s about:memory

A few days ago I landed support for per-class reporting of JavaScript objects and shapes in about:memory. (Shapes are auxiliary, engine-internal data structures that are used to facilitate object property accesses. They can use large amounts of memory.)

Prior to this patch, the JavaScript objects and shapes within a single compartment (which corresponds to a JavaScript window or global object) would be covered by measurements in a small number of fixed categories.

10,179,152 B (02.59%) -- objects
β”œβ”€β”€β”€6,749,600 B (01.72%) -- gc-heap
β”‚   β”œβ”€β”€3,512,640 B (00.89%) ── dense-array
β”‚   β”œβ”€β”€2,965,184 B (00.75%) ── ordinary
β”‚   └────271,776 B (00.07%) ── function
β”œβ”€β”€β”€3,429,552 B (00.87%) -- malloc-heap
β”‚   β”œβ”€β”€2,377,600 B (00.61%) ── slots
β”‚   └──1,051,952 B (00.27%) ── elements/non-asm.js
└───────────0 B (00.00%) ── non-heap/code/asm.js
474,144 B (00.12%) -- shapes
β”œβ”€β”€316,832 B (00.08%) -- gc-heap
β”‚  β”œβ”€β”€167,320 B (00.04%) -- tree
β”‚  β”‚  β”œβ”€β”€152,400 B (00.04%) ── global-parented
β”‚  β”‚  └───14,920 B (00.00%) ── non-global-parented
β”‚  β”œβ”€β”€125,352 B (00.03%) ── base
β”‚  └───24,160 B (00.01%) ── dict
└──157,312 B (00.04%) -- malloc-heap
   β”œβ”€β”€β”€99,328 B (00.03%) ── compartment-tables
   β”œβ”€β”€β”€35,040 B (00.01%) ── tree-tables
   β”œβ”€β”€β”€12,704 B (00.00%) ── dict-tables
   └───10,240 B (00.00%) ── tree-shape-kids

These measurements are only interesting to those who understand the guts of the JavaScript engine.

In contrast, objects and shapes are now grouped by their class. Per-class measurements relate back to the JavaScript code in a more obvious way, making these measurements useful to a wider range of people.

10,515,296 B (02.69%) -- classes
β”œβ”€β”€β”€4,566,840 B (01.17%) ++ class(Array)
β”œβ”€β”€β”€3,618,464 B (00.93%) ++ class(Object)
β”œβ”€β”€β”€1,755,232 B (00.45%) ++ class(HTMLDivElement)
β”œβ”€β”€β”€β”€β”€333,624 B (00.09%) ++ class(Function)
β”œβ”€β”€β”€β”€β”€165,624 B (00.04%) ++ class(<non-notable classes>)
β”œβ”€β”€β”€β”€β”€β”€38,736 B (00.01%) ++ class(Window)
└──────36,776 B (00.01%) ++ class(CSS2PropertiesPrototype)

(The <non-notable classes> entry aggregates all classes that are smaller than a certain threshold. This prevents any long tail of classes from bloating about:memory too much.)

Expanding the sub-tree for the Object class, we see that the fixed categories are still present, for those who are interested in them.

3,618,464 B (00.93%) -- class(Object)
β”œβ”€β”€3,540,672 B (00.91%) -- objects
β”‚  β”œβ”€β”€2,349,632 B (00.60%) -- malloc-heap
β”‚  β”‚  β”œβ”€β”€2,348,480 B (00.60%) ── slots
β”‚  β”‚  └──────1,152 B (00.00%) ── elements/non-asm.js
β”‚  └──1,191,040 B (00.30%) ── gc-heap
└─────77,792 B (00.02%) -- shapes
      β”œβ”€β”€57,376 B (00.01%) -- gc-heap
      β”‚  β”œβ”€β”€47,120 B (00.01%) ── tree
      β”‚  β”œβ”€β”€β”€5,360 B (00.00%) ── dict
      β”‚  └───4,896 B (00.00%) ── base
      └──20,416 B (00.01%) -- malloc-heap
         β”œβ”€β”€11,552 B (00.00%) ── tree-tables
         β”œβ”€β”€β”€6,912 B (00.00%) ── tree-kids
         └───1,952 B (00.00%) ── dict-tables

Although the per-class measurements often aren’t surprising — Object and Array objects and shapes often dominate — sometimes they are. Consider the following examples.

  • The above example has 1.7 MiB of HTMLDivElement objects and shapes, which indicates that the compartment contains many div elements.
  • If you have lots of memory used by Function objects and shapes, it suggests that the code is creating excessive numbers of closures.
  • Just this morning a visitor to the #memshrink IRC channel was wondering why they had 11 MiB of XPC_WN_NoMods_NoCall_Proto_JSClass objects and shapes in one compartment. (This is a question I currently don’t have a good answer for.)

Historically, the data-dependent measurements in about:memory — e.g. those done on a per-tab, or per-compartment, or per-image, or per-script basis — have been more useful and interesting than the ones in fixed categories, because they map obviously to browser and code artifacts. For example, per-tab measurements let you know if a particular web page is using excessive memory, and per-compartment measurements revealed the existence of zombie compartments, a kind of bad memory leak that used to be common in Firefox and itsΒ add-ons.

I’m hoping that these per-class measurements will prove similarly useful. Keep an eye on them, and please let me know and/or file bugs if you see any surprising cases.

A final note: Mozilla’s devtools team is currently making great progress on a JavaScript memory profiler, which will give finer-grained measurements of JavaScript memory usage in web content. Although there will be some overlap between that tool and these new measurements in about:memory, it will useful to have both tools, because each one will be appropriate in different circumstances.