Categories
Fennec Firefox Memory consumption MemShrink

MemShrink Progress, weeks 13–18

I’ve been on vacation, so this report covers (incompletely) the past six weeks’ worth of MemShrink improvements.

Big Things

Paul Biggar and Mike Hommey enabled jemalloc on Mac 10.6.  This will hopefully reduce fragmentation on that platform.  And it brings it in line with Windows, Linux and Android.  Now we just need it for Mac 10.5 and Mac 10.7.

Oleg Romashin found a way to drop some Thebes layers in inactive tabs in Fennec.  I won’t pretend to understand the details of this bug, but if I’ve understood correctly it can saves 12MB or more per background tab.

Jeff Muizelaar turned on decode-on-draw.  This means that if you open a new page in a background tab, none of its images will be decoded until you switch to that tab.  Previously any images would be decoded and then discarded after 10 to 20 seconds (unless you switched to the tab before the time-out occurred).  This change can save a lot of memory (and CPU time) for users browsing image-heavy sites.

Gian-Carlo Pascutto optimized the safe browsing database.  This hopefully has fixed our url-classifier bloat problems.  (I plan to verify this soon.)

Chris Leary and Jonathan “Duke” Leto made regexp compilation lazy.  This potentially saves 10s or even 100s of MBs of memory in some cases by not compiling some regexps, and also allowing regexps to be GC’d more quickly.  There were some possible performance regressions from this patch, it’s unclear from the bug what exactly the state of these are.

Justin Lebar converted some uses of nsTArray to nsAutoTArray (and also here).  These avoided some calls to malloc (in the latter case, around 3% of all malloc calls!) and may help reduce heap fragmentation a little.  Robert O’Callahan did a separate, similar nsAutoTArray change here.  Justin also avoided another 1% of total malloc calls in JSAutoEnterCompartment.

Chris Leary rewrote JSArena, which avoided some wasted memory, as well as replacing some hoary old C code with sleek modern C++.

SMALLER THINGS

A new contributor named Jiten (a.k.a. deLta30) fixed about:memory’s GC and CC buttons so they trigger collections in both the content and chrome process in Fennec.  Previously only the chrome process was affected.  (I’m not sure how this change will be affected by the decision to switch to the native chrome process in Fennec.)  Great work, Jiten!

I avoided some wasted space in the JS code generator, and some more in the parser.  Justin Lebar did something similar in nsTArray_base.

Jonathan Kew added a memory reporter for textruns and associated data.  Justin Lebar added the “history-links-hashtable” memory reporter.

Justin Lebar fixed some bogus “canvas-2d-pixel-bytes” values in about:memory.

Brian Bondy fixed a leak in Windows drag and drop code.

Tim Taubert wrote about finding leaks in browser-chrome mochitests.

Bug Counts

The current bug counts are as follows.  The differences are against the MemShrink week 12 counts.

  • P1: 37 (-3, +11)
  • P2: 108 (-9, +37)
  • P3: 53 (-1, +14)
  • Unprioritized: 6 (-21, +5)

They’re still going up.  The good news is that my gut feeling is that not many of these bugs are problems reported by users.  (And those that are often are caused by add-ons.)  Most of the new reports are ideas for improvements from developers.

Categories
about:memory Firefox Memory consumption MemShrink

MemShrink progress, week 12

about:memory improvements

Lots of changes were made to about:memory this week.

Justin Lebar landed a patch that provides detailed information about RSS, vsize and swap memory usage on Linux and Android.  (The patch was backed out due to a minor leak but I expect Justin will fix that and re-land it soon.)  This will help us understand memory usage that is not covered by the “Explicit Allocations” tree in about:memory, such as memory used for static code and data, and it should be particularly useful on Android.  The contents of the new trees are hidden by default;  you have to click on the tree heading to expand each one.

Kyle Huey split up about:memory’s layout measurements on a per-PresShell basis.  This makes it easy to see how much layout memory is being used by each web page.

Something I failed to mention last week was that with the landing of type inference, each JavaScript compartment has five new measurements: “object-main”, “script-main”, “tables”, “analysis-temporary”, and “object-empty-shapes”.

I converted some of the JavaScript memory reporters to use moz_malloc_usable_size to measure actual allocation sizes instead of requested allocation sizes.  This accounts for slop bytes caused by the heap allocator rounding up.  This is quite important — slop bytes can easily account for over 10% of the heap, and if we don’t account for them we’ll never get about:memory’s “heap-unclassified” number down.  Therefore I’ll be doing more of this in the future.  And it would be great if people writing new memory reporters can do the same thing!

Finally, on the topic of “heap-unclassified” number:  people often complain about it, so I’m happy to say that it is on a clear downward path.  Indeed, thanks to DMD, at the time of writing we have 16 open bugs to add new memory reporters for things that consume significant amounts of memory, and 13 of these are assigned.  I’m hoping that in a month or two the “heap-unclassified” number on development builds will typically be 10–15% rather than the 30–35% it usually is now.

Other things

I changed the growth strategy used for one of JaegerMonkey’s buffers to avoid large amounts of memory wasted due to slop bytes.  These buffers are short-lived so the fix doesn’t make a great difference to total memory consumption, but it does reduce the number of allocations and heap churn.

Marco Bonardo wrote about his recent changes to the handling of the places database.

Dietrich Ayala wrote about an experimental add-on that unloads tabs that haven’t been viewed in a while, which is an interesting idea.  I suspect the exact approach used in the add-on won’t be viable in the long run, but we can certainly benefit from doing a better job of discarding regenerable info that hasn’t been used in a while, particularly on mobile.

Bug counts

This weeks’s bug counts are as follows:

  • P1: 29 (-2, +2)
  • P2: 80 (-4, +8)
  • P3: 40 (-2, +4)
  • Unprioritized: 22 (-12, +12)

Just like last week, Marco Castelluccio tagged quite a lot of old bugs with “[MemShrink]”.  We had 45 unprioritized bugs at the start of this week’s meeting, and we got through more than 20 of them.

Some comments on last week’s post got me thinking about how to make it easier for more people to help with MemShrink.  For those who don’t have much coding experience, probably the best bet is to look at the list of unconfirmed bugs — these are problems reported by users where the particular problem hasn’t been identified.  Often they need additional effort to determine if they are reproducible, due to add-ons, etc.  For example, in bug 676872 a user was seeing very high memory usage, and it’s clear that it was caused by one or more of the 41(!) add-ons he had enabled.  Ideally that bug’s reporter would disable them selectively to narrow that down, but anyone could do likewise with some effort.

For those who do have coding experience, it would be worth looking at the list of bugs that have a “mentor” annotation.  For example, bug 472209 is about adding some graphing capability to about:memory.  Jezreel Ng made some excellent progress on this during his internship, it just needs someone to take over and finish it up.

Finally, for those who like a challenge or have some experience with Firefox’s code, the full list of unassigned bugs might be of interest.  There are currently 86 such bugs!  More than I’d like.

(BTW, I’ve added links for the three bug lists above to the MemShrink wiki page.)

On HIATUS

I will be on vacation for the next five weeks and the MemShrink progress report will be on hiatus during that time.  But MemShrink meetings will continue (except there won’t be one next week due to the Mozilla all-hands meeting).  I look forward to writing a bumper progress report on October 19, where I’ll be able to summarize everything that happened while I was away!

Categories
about:memory Firefox JägerMonkey Memory consumption MemShrink SQLite

MemShrink progress, week 11

This week was quiet in terms of patches landed.

  • Marco Bonardo changed the way the places.sqlite database is handled. I’m reluctant to describe the change in much detail because I’ll probably get something wrong, and Marco told me he’s planning to write a blog post about it soon.  So I’ll just quote from the bug: “Globally on my system (8GBs) I’ve often seen places.sqlite cache going over 100MB, with the patch I plan to force a maximum of 60MB (remember this will vary based on hardware specs), that is a >40% improvement. We may further reduce in future but better being on the safe side for now.”  This was a MemShrink:P1 bug.
  • New contributor Sander van Veen knocked off another bug (with help from his friend Bas Weelinck) when he added more detail to the “mjit-code” entries in about:memory.  This makes it clear how much of JaegerMonkey’s code memory usage is for normal methods vs. memory for compiled regular expressions.
  • I rearranged nsCSSCompressedDataBlock to avoid some unnecessary padding on 64-bit platforms.  This can save a megabyte or two if you have several CSS-heavy (e.g. Gmail) tabs open.   It makes no difference on 32-bit platforms.

But it was a very busy week in terms of bug activity.  Let’s look at the numbers.

  • P1: 29 (-2, +2)
  • P2: 76 (-10, +20)
  • P3: 38 (-1, +2)
  • Unprioritized: 22 (-5, +23)

Several things happened here.

  • Marco Castelluccio looked through old bugs and found a lot (30 or more) that were related to memory usage and tagged them with “MemShrink”.
  • Nine new bugs were filed to reduce about:memory’s “heap-unclassified” number by adding memory reporters;  many of these were thanks to Boris Zbarsky’s insights into the output produced by DMD.
  • I closed out a number of bugs that were incomplete, stale, or finished;  this included some of those newly marked by Marco, and some ones that were already tagged with “MemShrink”.
  • I tagged five leaks that were found with the cppcheck static analysis tool.

We spent the entire MemShrink meeting today triaging unprioritized bugs and we got through 23 of them.  Of the remaining unprioritized bugs, the older ones tagged by Marco and the cppcheck ones (which I tagged after the meeting) constitute most of them.

It’s clear that the rate of problem/improvement identification is outstripping the rate of fixes.  We have a standing agenda item in MemShrink meetings to go through Steve Fink’s ideas list, but we haven’t touched it in the past two meetings because we’ve spent the entire time on triage.  And when we do go through that list, it will only result in more bugs being filed.  I’m hoping that this glut of MemShrink-tagged bugs is temporary and the new bug rate will slow again in the coming weeks.

In the meantime, if you want to help, please look through the lists of open bugs, or contact me if you aren’t sure where to start, and I’ll do my best to find something you can work on.  Thanks!

Categories
Firefox Memory consumption

“Browser X is using Y MB of memory with Z tabs open” is a meaningless observation

I read an exchange today about the memory usage of Firefox 7 (which is currently in beta).

It uses 600 MB of RAM with 10 tabs open (different websites).

lol wut? I’m not getting the same results as you. I have 12 tabs open here… RAM usage rarely exceeded 300mb for me. That’s already much better than Chrome, which seems to be using 190mb for just 2 tabs.

Memory consumption is around 300 MB for me too.

I’ve seen variants of this exchange countless times.  Unfortunately, observations such as “browser X is using Y MB of memory with Z tabs open” are meaningless.  To understand why, consider this alternative, hypothetical exchange:

I ran 3 different programs on my computer, they took 55 seconds to finish.

lol wut?  I just ran 8 programs and they only took 20 seconds to finish!

You’d never see an exchange like that, because it’s obvious that a comparison of the run-time of multiple programs is meaningless if you don’t specify what those programs are.  Workload matters. And yet, people make comparisons of browser memory usage like this all the time.

Here’s a simple example to show why this isn’t meaningful.  I just did some memory measurements with a development build of Firefox 9, using a moderately-used profile with no add-ons installed.  First I started it and opened 10 instances of www.google.com.au, one per tab, and the resident set size (RSS) was 139MB.  Then I re-started it and opened 10 instances of gmail.com, and the RSS climbed to 651MB.

This is hardly surprising.  gmail.com is a fully-fledged email client.  www.google.com.au is not much more than a search box.

Why do people so often make these meaningless “Y MB of memory with Z tabs open” comments?  My theory is that it’s because most people have no idea how complicated web browsers and web pages are.  Mild experience with HTML authoring is endemic — heaps of people have thrown together a basic web page, and it’s just a document with with some structure and styling, right?  So they think the difference between the website they wrote for their scout troop in 1997 (the one with the “under construction” animated GIF) and the front page of TechCrunch is merely a difference of degree, not kind.

They’re wrong.  For one, HTML and related technologies are hugely more powerful and complicated now than they were a few years ago.  Also, their scout troop website probably didn’t contain multiple megabytes of JavaScript code tracking its visitors’ every move. A web browser is not a document viewer, it is a full-blown programming environment with some very sophisticated text and graphical capabilities.  A web page is not a document but a program.  Therefore, the memory (and CPU) usage of different web pages varies dramatically.

Here’s a check-list of information that you should include if you want an observation about browser memory usage to be meaningful.

  • What sites do you have open?  The more specific, the better.  E.g. “Gmail and a few nytimes.com pages” is ok, but listing the exact URLs is better.
  • How did the browser get in this state?  Did you just start it, or have you been using it and visiting other sites for hours?
  • What memory metric are you using, and how did you measure it?  “Explicit” from Firefox’s about:memory?  “Private bytes” from the Windows Task Manager?  “RSS” from “top” on Mac or Linux?  (The full output of Firefox’s about:memory is hugely useful, because it includes dozens of highly relevant measurements.)
  • In Firefox, do you have any add-ons installed?  That can (and often does) make a big difference.  (And that’s a topic for another day.)

If you include all that, it’s highly likely that somebody else can reproduce your measurements, which means it’s a highly meaningful observation.

On a related note, I wrote previously about Firefox 7’s memory improvements, saying “Firefox 7 uses less memory than Firefox 6 (and 5 and 4): often 20% to 30% less, and sometimes as much as 50% less.”  The presence of the words “often” and “sometimes” in that sentence were deliberate.  We’ve seen those numbers in our testing, but once again, workload matters.  I really hope the numbers we’ve seen match what normal users see.  In fact, I thought that people would independently test these claims shortly after I wrote them.  But, to my knowledge, that hasn’t happened so far, even though those claimed improvements have been reported far and wide.  (I’ve even seen numerous headlines that say “Firefox 7 to use 50% less memory”, alas.)  Maybe once Firefox 7 is officially released people will make independent measurements.

(A final note:  suspicious readers may think that I’m trying to obliquely absolve Firefox from any responsibility for its memory usage by blaming web pages instead.  I’m not.  Firefox is doing much better now, but there’s still plenty of ways it can be improved;  please contact me if you want to help.)

Categories
Firefox Memory consumption MemShrink

MemShrink progress, week 10

A quieter week this week.  Well, plenty of work was done but not yet completed, and I mostly write only about changes that have been finished.

Now for the bug counts.  (Canned MemShrink bug searches are available here.)

  • P1: 29 (-5, +4)
  • P2: 66 (-6, +8)
  • P3: 37 (-1, +2)
  • Unprioritized: 4 (-1, +4)

There was lots of P1 movement, which is good:  a couple were fixed, some have had enough progress made on them that they were able to be downgraded to P2, and some new problems/opportunities were identified.

 

Categories
about:memory Firefox Garbage Collection Memory allocation Memory consumption MemShrink Tracemonkey Valgrind

MemShrink progress, week 9

Firefox 8 graduated to the Aurora channel this week, and the development period for what will become Firefox 9 began.  Lots of MemShrink activity happened this week, and I think all the changes listed below will make it into Firefox 8.

Avoiding Wasted Memory

I have blogged previously about memory wasted by “clownshoes” bugs.   Ed Morley found a webpage that resulted in 700MB of memory being wasted by the PLArena clownshoes bug.  Basically, on platforms where jemalloc is used (Windows, Linux), half the memory allocated by nsPresArena (which is built on top of PLArena) was wasted.  (On Mac the waste was 11%, because the Mac allocator rounds up less aggressively than jemalloc).

Fixing this problem properly for all PLArenas takes time because it requires changes to NSPR, so I made a spot-fix for the nsPresArena case.  This is a particularly big win on very large pages, but it saves around 3MB even on Gmail. This spot-fix has been granted beta approval and so will, barring disaster, make it into Firefox 7.

A Firefox Nightly user did some measurements with different browsers on the problematic page:

  • Firefox 8.0a1 before patch: 2.0 GB
  • Firefox 8.0a1 after patch: 1.3 GB
  • Latest Chrome canary build and dev (15.0.849.0): 1.1GB
  • Webkit2Process of Safari 5.1: 1.05 GB
  • Internet Explorer 9.0.2: 838 MB
  • Latest Opera Next 12.00: 727 MB

So this fix gets Firefox within spitting distance of other browsers, which is good!

In other developments related to avoiding wasted memory:

  • Luke Wagner discovered that, on typical websites, most JSScripts are byte-compiled but never run.  A JSScript roughly corresponds to a JavaScript function.  In hindsight, it’s not such a surprising result — Firefox byte-compiles all loaded JavaScript code, and you can imagine lots of websites use libraries like jQuery but only use a small fraction of the functions in the library.  Making byte-compilation lazy could potentially save MBs of memory per compartment.  But that will require some non-trivial reworking of the JS engine, and so is unlikely to happen in the short-term.
  • Kyle Huey avoided a small amount (~100KB per browser process) of waste due to rounding up in XPT arenas.

Improving about:memory

I made some progress on a Valgrind tool to help identify the memory that is currently reported only by the “heap-unclassified” bucket in about:memory.  It’s called “DMD”, short for “Dark Matter Detector”.  It’s in early stages and I still need to teach it about most of Firefox’s memory reporters, but it’s already spitting out useful data, which led to me and Ehsan Akhgari landing memory reporters for the JS atom table and the Hunspell spell checker.  We also now have some insight (here and here) about memory usage for very large pages.

Mounir Lamouri turned on the memory reporter for the DOM that he’s been working on for some time.  This shows up under “dom” in about:memory.  There are still some cases that require handling;  you can follow the progress of these here.

Andrew McCreight replaced about:memory’s buttons so you can force a cycle collection without also forcing a garbage collection, which may be useful in hunting down certain problems.

Finally, Sander van Veen added the existing “js-compartments-user” and “js-compartments-system” to the statistics collected by telemetry (his first landed patch!), and I did likewise for the “storage/sqlite” reporter.  I also added a new “tjit-data/trace-monitor” memory reporter that accounts for some of the memory used by TraceMonkey.

Miscellaneous

Igor Bukanov tweaked the handling of empty chunks by the JavaScript garbage collector.  That sounds boring until you see the results on Gregor Wagner’s 150-tab stress test: resident memory usage dropped 9.5% with all 150 tabs open, and dropped by 27% after all those tabs were closed.

Brian Hackett fixed a memory leak in type inference, which gets it one step closer to being ready to land.

Christian Höltje fixed a leak in his “It’s All Text” add-on that was causing zombie compartments.  This fix will be in version 1.6.0, which is currently awaiting to receive AMO approval, but can be obtained here in the meantime.  This fix and last week’s fix of a memory leak in LastPass are very encouraging — per-compartment reporters in about:memory have, for the first time, given add-on developers a reasonable tool for identifying memory leaks.  I hope we can continue to improve the situation here.  Several people have asked me for documentation on how to avoid memory leaks in add-ons.  I’m not the person to write that guide (I’m not a Gecko expert and I know almost nothing about add-ons) but hopefully someone else can step up to the plate.

Bug counts

Here’s the change in MemShrink bug counts.

  • P1: 30 (-0, +1)
  • P2: 64 (-4, +6)
  • P3: 36 (-5, +0)
  • Unprioritized: 1 (-2, +1)

Good progress on P3 bugs, but they’re the least important ones.  Other than that, new bugs are still being reported faster than they’re being fixed.  If you want to help but don’t know where to start, feel free to email me or ping me on IRC and I’ll do my best to help get you involved.

 

Categories
about:memory Firefox Memory consumption MemShrink

MemShrink progress, week 8

A hodge-podge of things happened this week.

The MemShrink bug counts changed as follows.

  • P1: 29 (-2, +4)
  • P2: 62 (-5, +14)
  • P3: 41 (-0, +6)
  • Unprioritized: 2 (-10, +1)

We actually got through almost all the unprioritized bugs in today’s MemShrink meeting, which was good, but the counts are still going up.  Fortunately, most of the new bugs are ideas for improvement that are reported by developers.  My gut feeling (which I get from reading a lot of memory-related bug reports) is that the number of reports from users about high memory usage are much lower than they were a few months ago.

Categories
Firefox Memory consumption MemShrink

Firefox 7 is lean and fast

tl;dr

Firefox 7 uses less memory than Firefox 6 (and 5 and 4): often 20% to 30% less, and sometimes as much as 50% less. In particular, Firefox 7’s memory usage will stay steady if you leave it running overnight, and it will free up more memory when you close many tabs.

This means that Firefox 7 is faster (sometimes drastically so) and less likely to crash, particularly if you have many websites open at once and/or keep Firefox running for a long time between restarts.

Background

Firefox has a reputation for being a memory hog, and the efficiency with which it uses memory has varied over the years. For example, Firefox 2 was quite bad, but Firefox 3, 3.5 and 3.6 were substantially better. But Firefox 4 regressed again, partly due to a large number of new features (not all of which were maximally efficient in their first iteration), and partly due to some over-aggressive tuning of heuristics relating to JavaScript garbage collection and image decoding.

As a result, Mozilla engineers started an effort called MemShrink, the aim of which is to improve Firefox’s speed and stability by reducing its memory usage.  A great deal of progress has been made in only 7 weeks, and thanks to Firefox’s new rapid release cycle, each improvement made will make its way into a final release in only 12–18 weeks. (These improvements are available earlier to users on the Aurora and Beta channels.) Firefox 7 is the first release to benefit from MemShrink’s successes, and the benefits are significant.

Quantifying the improvements

Measuring memory usage is difficult: there are no standard benchmarks, there are several different metrics you can use, and memory usage varies enormously depending on what the browser is doing. Someone who usually has only a handful of tabs open will have an entirely different experience from someone who usually has hundreds of tabs open. (This latter case is not uncommon, by the way, even though the idea of anyone having that many tabs open triggers astonishment and disbelief in many people. E.g. see the comment threads here and here.)

Endurance tests

Dave Hunt and others have been using the MozMill add-on to perform “endurance tests“, where they open and close large numbers of websites and track memory usage in great detail. Dave recently performed an endurance test comparison of development versions of Firefox 6, 7, and 8, repeatedly opening and closing pages from 100 widely used websites in 30 tabs. The following graphs show the average and peak “resident” memory usage for each browser version over five runs of the tests. (“Resident” memory usage is the amount of physical RAM that is being used by Firefox, and is thus arguably the best measure of real machine resources being used.)

Average resident memory usage during endurance tests Peak resident memory usage during endurance test

Obviously the measurements varied significantly between runs. If we do a pair-wise comparison of runs, we see the following relative reductions in memory usage:

  • Minimum resident: 1.1% — 23.5% (median 6.6%)
  • Maximum resident: -3.5% — 17.9% (median 9.6%)
  • Average resident: 4.4% — 27.3% (median 20.0%)

The following two graphs showing how memory usage varied over time during Run 1 for each version. Firefox 6’s graph is first, Firefox 7’s graph is second. (Note: Compare only to the purple “resident” lines; the meaning of the green “explicit” line changed between the versions and so the two green lines cannot be sensibly compared.)

Memory usage of a single run of Firefox 6

Memory usage from a single run of Firefox 7

Firefox 7 is clearly much better; its graph is both lower and has less variation.

MemBench

Gregor Wagner has a memory stress test called MemBench. It opens 150 websites in succession, one per tab, with a 1.5 second gap between each site. The sites are mostly drawn from Alexa’s Top sites list. I ran this test on 64-bit builds of Firefox 6 and 7 on my Ubuntu Linux machine, which has 16GB of RAM. Each time, I let the stress test complete and then opened about:memory to get measurements for the peak resident usage. Then I hit the “Minimize memory usage” button in about:memory several times until the numbers stabilized again, and then re-measured the resident usage. (Hitting this button is not something normal users do, but it’s useful for testing purposes because causes Firefox to immediately free up memory that would be eventually freed when garbage collection runs.)

For Firefox 6, the peak resident usage was 2,028 MB and the final resident usage was 669 MB. For Firefox 7, the peak usage was 1,851 MB (a 8.7% reduction) and the final usage was 321 MB (a 52.0% reduction). This latter number clearly shows that fragmentation is a much smaller problem in Firefox 7.

(On a related note, Gregor recently measured cutting-edge development versions of Firefox and Google Chrome on MemBench. The results may be surprising to many people.)

Bookmarks

Nathan Kirsch from Legit Reviews performed a simple test comparing Firefox 5 against Firefox 7. He clicked “Open all in Tabs” on a bookmark bolder containing 117 bookmarks — causing each bookmark to be opened in a separate tab. Once they all finished loading, he used the Windows Task Manager to measure the “private working set” (which is not that same as “resident”, but will correlate strongly with it). Firefox 7 used half a GB less memory than Firefox 5 — a 39.7% reduction.

Conclusion

Obviously, these tests are synthetic and do not match exactly how users actually use Firefox. (Improved benchmarking is one thing we’re working on as part of MemShrink, but we’ve got a long way to go. ) Nonetheless, the basic operations (opening and closing web pages in tabs) are the same, and we expect the improvements in real usage will mirror improvements in the tests.

This means that users should see Firefox 7 using less memory than earlier versions — often 20% to 30% less, and sometimes as much as 50% less — though the improvements will depend on the exact workload. Indeed, we have had lots of feedback from early users that Firefox 7 feels faster, is more responsive, has fewer pauses, and is generally more pleasant to use than Firefox 4, 5 and 6.

The reduced memory usage should also result in fewer crashes and aborts on Windows, where Firefox is built as a 32-bit application and so is typically restricted to only 2GB of virtual memory.

Mozilla’s MemShrink efforts are continuing. The endurance test results above show that development versions of Firefox 8 already have even better memory usage, and I expect we’ll continue to make further improvements as time goes on. We also have plans to improve our testing infrastructure which should help prevent future regressions in memory usage.

Categories
about:memory Firefox Memory allocation Memory consumption

Clownshoes available in sizes 2^10+1 and up!

I’ve been working a lot on about:memory lately.  It’s a really useful tool, but one frustrating thing about it is that a decent chunk of our memory usage is not covered by any of the existing memory reporters, and so is falling into the “heap-unclassified” bucket:

This bucket typically accounts for 30–45% of about:memory’s “explicit” measurement.  We’ve discussed this “dark matter” several times in MemShrink meetings.  There’s lots of work underway to add new reporters to uninstrumented parts of Gecko, but I’ve seen cases where it looks like a decent chunk of the missing memory is due to the JS engine, which is already seemingly thoroughly instrumented.

This week I realized that some of the dark matter could be due to “slop” from jemalloc, the browser’s heap allocator.  What is slop?  When you ask a heap allocator for a certain number of bytes, it’ll often give you back more than you asked for, rounding the size up.  This wastes some memory, but there are good reasons for it — it makes the heap allocator much faster and simpler, and helps avoid fragmentation when the memory is freed.

The following comment from jemalloc.c shows jemalloc’s size classes.  Any request that’s not for one of the listed sizes below is rounded up to the nearest size.

*   |=====================================|
*   | Category | Subcategory    |    Size |
*   |=====================================|
*   | Small    | Tiny           |       2 |
*   |          |                |       4 |
*   |          |                |       8 |
*   |          |----------------+---------|
*   |          | Quantum-spaced |      16 |
*   |          |                |      32 |
*   |          |                |      48 |
*   |          |                |     ... |
*   |          |                |     480 |
*   |          |                |     496 |
*   |          |                |     512 |
*   |          |----------------+---------|
*   |          | Sub-page       |    1 kB |
*   |          |                |    2 kB |
*   |=====================================|
*   | Large                     |    4 kB |
*   |                           |    8 kB |
*   |                           |   12 kB |
*   |                           |     ... |
*   |                           | 1012 kB |
*   |                           | 1016 kB |
*   |                           | 1020 kB |
*   |=====================================|
*   | Huge                      |    1 MB |
*   |                           |    2 MB |
*   |                           |    3 MB |
*   |                           |     ... |
*   |=====================================|

In extreme cases, jemalloc will return almost double what you asked for.  For example, if you ask for 1,025 bytes, it’ll give you 2,048.  A lot of the time you have to just live with slop;  if you need to heap-allocate an object that’s 680 bytes, jemalloc will give you 1,024 bytes, and you just have to accept the 344 bytes of waste.  But if you have some flexibility in your request size, it’s a good idea to pick a size that’s a power-of-two, because that always gives you zero slop.  (Remember this, it’s important later on.)

So I instrumented jemalloc to print out an entry for every heap allocation, showing the requested amount, the actual allocated amount, and the resulting slop.  I then started up Firefox, opened Gmail, then shut down Firefox.  Next, I ran the resulting log through a wonderful little concordance-like script I have called “counts” which analyzes a file and tells you how many times each distinct line occurs.  Here’s the top ten lines of the output:

909062 numbers:
( 1 ) 205920 (22.7%, 22.7%): small:     24 ->     32 (     8 )
( 2 )  66162 ( 7.3%, 29.9%): small:     72 ->     80 (     8 )
( 3 )  61772 ( 6.8%, 36.7%): small:     40 ->     48 (     8 )
( 4 )  54386 ( 6.0%, 42.7%): small:   1056 ->   2048 (   992 )
( 5 )  48501 ( 5.3%, 48.0%): small:     18 ->     32 (    14 )
( 6 )  47668 ( 5.2%, 53.3%): small:     15 ->     16 (     1 )
( 7 )  24938 ( 2.7%, 56.0%): large:   4095 ->   4096 (     1 )
( 8 )  24278 ( 2.7%, 58.7%): small:     56 ->     64 (     8 )
( 9 )  13064 ( 1.4%, 60.1%): small:    104 ->    112 (     8 )
(10 )  12852 ( 1.4%, 61.6%): small:    136 ->    144 (     8 )

There were 909,062 lines, which means there were 909,062 allocations.  (I didn’t print information about frees.)  The most common line was “small: 24 -> 32 ( 8 )” which occurred 205,920 times, accounting for 22.7% of all the lines in the file.  The “small” refers to jemalloc’s class size categories (see above), and every allocation request of 24 bytes was rounded up to 32 bytes, resulting in 8 bytes of slop.

Looking through the list, I saw a case where 1,048,578 (2^20+2) bytes had been requested, and jemalloc had returned 2,097,152 (2^21) bytes.  That’s a huge (1MB) waste, and also smelled very fishy.  And on further inspection there were a lot of smaller but still suspicious cases where a number slightly larger than a power-of-two was being rounded up to the next power-of-two:  1,032 to 2048;  1,056 to 2,048;  2,087 to 4,096;  4,135 to 8,192, etc.

I investigated a number of these by adding some code to jemalloc to detect these suspicious request sizes, and then using Valgrind to print out a stack trace every time one occurred.  What I found was four distinct places in the codebase where the code in question had flexibility in the amount of memory it allocated, and so had tried to ask for a power-of-two, but had botched the request and thus ended up asking for slightly more than a power-of-two!

  • In js/src/vm/String.cpp, when concatenating two strings, the capacity of the resultant string is rounded up to the next power-of-two (unless the string is really big, in which case the rounding is less aggressive).  This is a reasonable thing to do, as it means that if a further concatenation occurs, there’s a good chance there’ll be space for the extra chars to be added in-line, avoiding an extra allocation and copy.  But the code first did its power-of-two round-up, and then added sizeof(jschar) (which is two bytes) to the size, to allow space for the terminating NULL char.  This code was responsible for the 1,048,578 byte request I mentioned earlier.  Luke Wagner fixed this bug with a minimal change earlier this week.  I was unable to easily measure the overall cost of this, but avoiding 1MB of slop with some frequency can only be a good thing.
  • nsprpub/lib/ds/plarena.c implements an arena allocator.  Each arena pool is given a size for each arena when it’s created, and the size passed in is almost always a power-of-two such as 1,024 or 4,096.  Which is good, except that the allocator then has to add a few more bytes onto the request because each arena has a header that holds book-keeping information.  When I start Firefox and load Gmail on my Linux64 box, I see that approximately 3MB of space is wasted because of this bug.  The fix is simple, it just requires that the arena payload be reduced by the size of the header;  my patch is awaiting review.  Sadly enough, this problem was identified 3.5 years ago but not fixed.
  • js/src/jsarena.cpp is almost identical to nsprpub/lib/ds/plarena.c and the story is the same:  it has the same problem; it was first identified 3.5 years ago but not fixed; and my patch is awaiting review.  This didn’t make much difference in practice because this problem had been separately identified for the particular JSArenaPool that allocates the most memory, and worked around using an ill-documented hack.
  • db/sqlite3/src/sqlite3.c is the best one of all.  SQLite is very careful about measuring its memory usage and provides an API to access those measurements.  But in order to measure its own memory usage accurately, it adds 8 bytes to every allocation request in order to store the requested size at the start of the resulting heap block.  In doing so, it converts a lot of 1,024 byte requests into 1,032 byte requests.  Ironically enough, the slop caused by storing the requested size rendered the stored size inaccurate.  I determined that as a result, SQLite is using roughly 1.15–1.20x more memory in Firefox than it thinks it is.  So, go look at the “storage/sqlite” number in about:memory and mentally adjust it accordingly.   (On my Mac laptop which has an old profile and 11 tabs open it’s currently 70MB, so there’s probably an additional 10–14MB of slop).  SQLite is an upstream product, and the authors are working on a fix.

Once all these fixes have landed, there will be two benefits.  First, Firefox’s memory usage will go down, which is good for everyone.  Second, the proportion of dark matter in about:memory will also drop, which makes our memory profiling more accurate and useful.

After that, converting existing reporters to use malloc_usable_size will account for much of the legitimate slop, further reducing the amount of dark matter. And then I have a master plan for identifying whatever’s left.

clownshoes

So what’s up with this post’s title?  Andreas Gal uses the term “clownshoes” to refer to code that is laughably bad.  When I found and filed the first of these bugs, in a fit of whimsy I put “[clownshoes]” in its whiteboard.  Only later did a realize how suitable this was: a clown’s shoes and these botched power-of-two-sized allocations both have lots of empty space in them, and thus make their owner less nimble than they could be.  Perfect.

Categories
about:memory Firefox Memory consumption MemShrink

MemShrink progress, week 7

Lots of good stuff this week.

Blog posts

There were four great blog posts this week relating to memory usage in Firefox.

Web workers

Ben Turner landed a complete reworking of web workers.  This fixed three MemShrink bugs, and probably a number of others.  Ben also added memory reporters so that web workers show up in about:memory.

Add-ons

I always cringe when someone files a bug report complaining about Firefox’s memory usage and they have many (e.g. 10, 20 or more) add-ons installed.  The chances of all of those add-ons behaving themselves is pretty low, unfortunately.  For example, one user found that the CyberSearch 2.0.8 add-on causes the places SQLite database to grow to over 300MB.  Another user found that one or more add-ons caused that same database to grow to 1.2GB(!);  the add-on(s) responsible has not yet been identified.

This kind of thing is terrible for perceptions of Firefox.  Maybe limiting the size of places.sqlite will help?

In better add-on-related news, Jan Honza Odvarko fixed a bad memory leak in Firebug, one that was causing zombie compartments on pages where Firebug hadn’t even been enabled.  Now Firebug only causes zombie compartments for pages where Firebug has been enabled, but Steve Fink is making progress there, as he described in his blog post that I linked to above.

Two other add-ons known to cause memory problems are Lastpass and It’s All Text.

Miscellaneous

Andrew McCreight is making great progress on his static analysis to find cycle collection leaks.  This week he found a problem in nsDOMEventTargetWrapperCache, which Olli Pettay then fixed.

If you ask a heap allocator like jemalloc for a certain number of bytes, it’ll often round that request size up to a number such as a power-of-two.  For example, if you ask for 99 bytes it’ll give you 112 or 128.  I found that this is contributing to the high “heap-unclassified” number in about:memory.  I also stumbled across two cases where Firefox itself deliberately requests a power-of-two-sized block of memory (with the aim of avoiding round-up in the heap allocator) but botches the calculation such that it asks for a block slightly bigger than a power-of-two, which jemalloc then rounds up again.  Luke Wagner fixed the first of these, and I have patches awaiting review to fix the second.

Finally, I should mention bug 658738 and its children.  Dão Gottwald and others have been working tirelessly to fix leaked windows in browser-chrome tests.  This has involved both (a) fixing problems with the tests, and (b) fixing problems with the browser.  Eleven child bugs have already been fixed, and there are four still open.

Bug counts

Here’s the change in MemShrink bug counts.

  • P1: 27 (+1)
  • P2: 53 (+4)
  • P3: 35 (+2)
  • P4: 11 (+5)

Increases all across the board.  Once again I think this reflects the increasing amount of work being done… but I keep saying that.  Therefore, beginning next week, I’ll show how many bugs in each category were fixed and how many were added, e.g. “P1: 27 (-2, +3)”.  That’ll give a better idea of progress.  (I can’t start it until next week because I’ve only today started recording enough info about the open bugs to get those numbers.  And before you suggest that I use a time-based Bugzilla search instead, the problem with that is that I write this progress report at a slightly different time each week, so if I just search during the last week I may double-count or fail to count some bugs.)