Category Archives: Firefox

MemShrink progress, week 89–90

IMAGES

The big news this week is that Timothy Nikkel finished bug 689623, which causes us to (for the most part) only decode images that are visible or nearly visible.  This goes a long way towards fixing our problems with decoded images which happens to be the #1 item on the MemShrink big ticket items list.  The fixing of this bug also happened to fix the following bugs in passing:  661304, 682230.

To give you an idea of what changed, consider an image-heavy page like this one.  Prior to Timothy’s change, shortly after loading that page on my machine Firefox’s memory consumption would reach 3 GiB, with 2.7 GiB of that being decoded (uncompressed) images.  If I switched to another tab then 2.45 GiB of those decoded images would be immediately discarded, then after 10–20 seconds the remaining 250 MiB would be.  If I switched back, all the images would immediately be decoded again and memory consumption would climb back to 3 GiB.

After Timothy’s change, Firefox’s memory consumption still peaks around 2.8 GiB just after loading the page, but then quickly drops back to 600 MiB as many of those decoded images are discarded.  After that, even when scrolling quickly through the page I couldn’t get memory consumption to exceed 700 MiB.

This change will make such pages much smoother on desktop machines that don’t have lots of RAM, because there won’t be any paging, which slows things down drastically.   Andrew McCreight wrote a comment that I can’t find now, but basically he said that on one of his machines, that page was extremely slow and painful to scroll through before the change, and after the change it was smooth and pleasant.

The peak that occurs when an image-heavy page first loads is still undesirable, and bug 542158 is open for fixing that, and it will build on the infrastructure that Timothy has developed.  When that’s fixed, it’ll also greatly reduce the likelihood that an image-heavy page will cause an out-of-memory (OOM) crash, which is great for both desktop (esp. 32-bit Windows) and mobile/B2G.

Miscellaneous

Nicolas Pierron fixed a leak in IonMonkey that was causing runaway memory consumption for Shumway demos on Android.  It has been backported to the Aurora and Beta channels.

greatly reduced the size of the property cache.  This is a data structure the JS engine uses to speed up property accesses.  It used to be important, but is now barely necessary thanks to type inference and the JITs.  This reduced memory consumption by 75 KiB per JSRuntime on 32-bit, and by 150 KiB per JSRuntime on 64-bit.  There is one JSRuntime for each process, plus one for each worker, so for B2G it saves around 525 KiB at start-up.

We declared victory in the bug tracking the memory consumption of not-yet-loaded tabs.  When the bug was opened it was over 1 MiB per tab (64-bit builds), it’s now around 200 KiB.  Furthermore, if someone has many (e.g. hundreds) of not-yet-restored tabs they’re probably on a high-end machine, so the motivation for further reducing the memory consumption is low.  There is a bug open about some scripts run for each of these tabs, however;  avoiding these scripts may make start-up faster for heavy tab users.

Marco Bonardo reduced the maximum size of the cache used by each SQLite connection (excluding places.sqlite) from 4 MiB to 2 MiB.  Update: Marco corrected me in the comments — there are four places.sqlite connections, but only one of them (the one for the awesome bar) is excluded from the new 2 MiB limit.

Somebody fixed a leak in the Roboform add-on.  The fix is present in v7.8.7.5.

Olli Pettay improved our memory reporting by adding extra information to some system compartments that previously were undistinguished.  Our experience has been that as soon as somebody adds a new memory measurement, it finds a problem, and this was no exception, leading to the filing of bug 844661 .

Jonathan Kew has been busy.

I wrote a script that can take a diff of two JSON memory report dumps.  This will be very useful when comparing the memory consumption before and after doing something, for example.  If you didn’t know that JSON memory report dumps existed, you’re probably not alone — it’s an undocumented feature that’s currently only available on Linux and Android, and is triggered by sending signal 34 to the browser(!)  These dumps can be loaded by about:memory (see the buttons at the bottom) but there’s currently no easy way to trigger them, which is why today I filed a bug to make it easier.

I removed the pool used for recycling nsNodeInfo objects, which was causing them to never be released back to the OS.  Never-shrinking structures like this can become a problem in longer-running browser sessions — for example, after running MemBench, which opens and closes 150 tabs, this pool was unnecessarily hoarding 4 MiB.

INteresting Open Bugs

In my MemShrink reports I mostly write about bugs that have been fixed.  While this is satisfying and demonstrates progress, it might also give the impression that the MemShrink team has everything under control, when really we could use some help.

(In fact, the term “the MemShrink team” is something of a misnomer since there isn’t any such entity, officially.  “The people who regularly attend MemShrink meetings” would be a more accurate term.  But I’ll use “MemShrink team” as a short-hand for that.)

The MemShrink team has expertise in some areas of Mozilla code, such as DOM, cycle collector, JS engine (partial), Fennec, B2G, and memory profiling, and we tend to make good progress in those areas — we can fix bugs in those areas, and we generally pay attention to how these areas affect memory consumption.

But we lack expertise in areas like graphics, image-handling, layout, text rendering, storage, Jetpack/add-ons, and front-end.  Graphics is a particular problem, because graphics issues, esp. those involving drivers, can cause huge memory consumption blow-ups.  Bug 837542 is an example from the MemShrink:P1 list where gradients are somehow causing memory consumption to spike by 10s or even 100s of MiBs just by opening and closing a few tabs!  We triage bugs like that as well as we can, but often we’re just guessing, and we’re mostly helpless to address such problems.

Therefore, moving forwards I’m going to start mentioning interesting open bugs that don’t have someone working on them.

One example is bug 846173, which I filed when I noticed that fully loading the front page of TechCrunch takes over 100 MiB!  And it’s mostly because of the many Facebook “like” buttons, Google “+1″ buttons, and Twitter “tweet this” buttons — see the about:memory output for the full gory details.  It’s obvious that most of these window objects are basically identical, except for the URL.  Could we do something clever to avoid all this duplication?  Justin Lebar wondered about a copy-on-write scheme.  Are there other ways we could improve this case?

Another example is bug 842003, which is looking for an owner.  Some basic leak-checking infrastructure in the IPC code could potentially detect some big problems for B2G.

In bug 842979 we’re seeing 10s or 100s of MiBs of orphan DOM nodes in long-running Gmail sessions.  It’s not clear if this is a bug in Gmail, or caused by add-ons, or something else.  Please comment in the bug if you have any additional data.

Another one:  DMD, which is one of our most important tools, is pretty much useless on Fennec because it can’t get stack traces.  If anyone who knows about such things would like to investigate, that would be very helpful.  (In fact, I got a crash in the stack tracing code when I most recently tried to run DMD on Mac, which I haven’t had time to investigate.)

Bug Counts

Here are the current bug counts.

  • P1: 12 (-6/+1)
  • P2: 128 (-7/+7)
  • P3: 122 (-0/+5)
  • Unprioritized: 6 (-2/+6)

The closed P1s were 661304, 689623, 837187, 841976, 841993, 842756.  Three of them related to an IPC leak fix that I mentioned in my last report.

Improving “Reset Firefox”

I recently wrote about the wonderful “Reset Firefox” feature, which in one fell swoop can fix all sorts of bad behaviours.  What follows is a number of complaints that commenters had, and some ideas about how to address them.

“The name is misleading”

It’s true.  It sounds like it will erase all of your data and customizations.  The SUMO page says the following.

The Reset Firefox feature can fix many issues by restoring Firefox to its factory default state while saving your essential information.

“Factory default state” and “while saving your essential information” are two phrases that sit together awkwardly. But it may be too late to change the name now.

“But I have lots of configuration tweaks”

And they’ll all be lost.  You can look in about:support to see “important modified preferences”.  Or you can look at the “user set” entries in about:config.  Or you can look at both, and wonder the two lists are different, and what the definition of “important” is.

It would be very helpful if you could see who or what was responsible for each of those changes.  Was it the user, via the Preferences dialog?  The user via about:config?  An add-on?  Something auto-updated by core Firefox code?  (And why do I always have dozens of “user set” preferences relating to printing, even on my Linux machine that I never print from?)

Would an API tweak help?  We could change the “setPref()” function (or whatever it’s called) to take an additional, optional argument that indicates who set it.  It wouldn’t fix the problem immediately but would give a path forward.

Update: I found bug 834034, which is about preserving a user’s custom spell-check dictionary.

“But I have hundreds of tabs open”

And they’ll all be lost, too.  Apparently the Session Manager extension solves this problem.  But that’s not obvious, and still a pain to manage.  Maybe we could mention it in the documentation?

Update: Alexander Limi pointed me in the direction of bug 833943, which is about fixing this.

“But I have Dozens of add-ons installed”

And one of them might be causing your problems.  You can get the list of installed add-ons from about:addons, so this one isn’t too hard to fix up manually.

“I can’t see the ‘Reset Firefox’ button” / “I want to reset a non-default profile”

The “Reset Firefox” button only shows up in the default profile.  (And sometimes not even there — I don’t get it on my default profile on my dev machine, where I have half a dozen profiles present.)

There’s a bug open about this.  I could be wrong, but it doesn’t sound that hard to fix.

“How do I undo it if it breaks something?”

The SUMO page addresses this.

After the reset is finished, your old Firefox profile information will be placed on your desktop in a folder named “Old Firefox Data.” If the reset didn’t fix your problem you can restore some of the information not saved by copying files to the new profile that was created. See Copying files between profile folders for instructions.

The instructions on copying files between profile folders are clear, except for the fact that even an expert user will have little idea which files within a profile are important.  It’s an intimidating process.

“Why isn’t this automated?”

Well, because it loses some of your configuration.  You don’t want Firefox discarding all your extensions every six months.

“Why is this even needed in the first place?”

Ah.  An interesting question.  What exactly are the sources of the problems that Reset Firefox fixes?

  • Problematic extensions.  This is probably a common cause of problems, especially for people who have toolbars they don’t want and things like that.  And I’ll give a special mention to GreaseMonkey users who have badly written scripts that do horrible things.
  • Bad preference settings.  In my experience, the most likely settings to cause problems are those involving hardware acceleration (e.g. bug 711656). Are there any other classes of preferences that could have big effects?  I guess if the JavaScript JITs were turned off that would slow things down drastically.  I don’t know how people end up with a bad setting, whether it’s by diffling in about:config or extensions or something else.  (The was a dev-platform thread about this last year.)
  • Corrupt(?) state.  This one’s vague, but I suspect it’s a factor.  Maybe some of the SQLite files?  Or perhaps the state isn’t corrupt, but just fragmented in some fashion?  For example, in an oldish profile on my Mac my places.sqlite file size is 73 MiB, and urlclassifier3.sqlite is 42 MiB, which seems like a lot.
  • Do we know of any other causes?

Update: in bug 754933 Michael Verdi lists the following preferences as “troublesome”:

  • home page
  • all search engines (location bar, search bar, right-click)
  • Application settings
  • History settings
  • Password settings
  • Proxy settings
  • Firefox update settings
  • SSL settings
  • All toolbars and controls

I don’t subscribe to the hardline “Reset Firefox shouldn’t even be necessary” viewpoint — software is hard.  But I would like to understand the root causes better, in order to understand if at least some of them could be reduced or prevented.

It’s not all bad

I hope this post doesn’t sound too negative.  I understand that Reset Firefox’s goal was to provide something really simple that’ll fix many problems, and it does that admirably.

It’s just a shame that some power users who genuinely want to improve Firefox’s behaviour are unable to try it because it discards too much data.  I wonder if having additional options (in an obscure corner of the UI) the way safe mode does — “preserve extensions”, “preserve preferences”, “preserve open tabs” — would help those users.  Then those users could use a more incremental approach:  discard some data away, see if that helps, discard some more, etc.

Or maybe that’s not worthwhile;  the people who comment on my blog aren’t exactly typical Firefox users!  And it could complicate things for the SUMO folks — they’d have to ask what options someone used when they initiated a Reset Firefox.  I’m not sure.  I’d just love to extend the goodness of Reset Firefox as far and wide as possible, and it seems like a few small tweaks might help.

“Reset Firefox” is wonderful

TL;DR: Please recommend the “Reset Firefox” feature to anyone who complains about odd, persistent performance problems and/or high memory consumption.

In Ars Technica’s article about the Firefox 19 release, carbon fibre made a comment complaining about Firefox’s performance.

Firefox still doesn’t fix memory issues and smoothness, at least on my experience. It’s constant random jerks and extreme mem usage, until restarted. I use chrome but I have been wanting to switch FF for awhile, but I can’t, as long as they fix these issues, long standing issues since ver. 4.

There were four notable responses.

korg250 (my emphasis):

They are losing market share by not fixing the memory issues. Even on my 8GB SSD i5 system I have to restart it when it reaches over 1 GB because it starts to slow down A LOT. And it get over 1 GB very fast for someone who constantly open/close tabs like me (even in safe mode).

I was an avid FF user until I saw myself installing an add-on to restart FF when it reaches X MB of memory usage. Today I use Opera for most of my browsing but I some sites does not work properly with Opera and some extensions only exist on FF.

Shame.

abhi_beckert:

Do you have extensions installed? Maybe one of them is causing it.

crislevin:

every benchmark I saw, firefox uses less memory than everybody else, I really don’t get why people complain about it?

nnethercote:

You might have problems with your profile. Visit about:support, and click on the “Reset Firefox” button in the top right corner. It’ll generated a new profile that preserves much of your browsing history, though it will disable any add-ons you have. (It’ll give you details about exactly what it preserves and discards before doing so, so you can cancel it if you change your mind.) This step fixes a lot of performance problems like the ones you describe.

To summarize the responses:

  • Me too!  It sucks.
  • Are extensions to blame?  (A reasonable question, though less likely since Firefox 15 fixed most extension-related leaks, and one that can be difficult to answer conclusively.)
  • Really?
  • Is a busted profile to blame?  Try “Reset Firefox” to determine if so.

carbon fibre didn’t respond, but korg250 (the “Me too!” commenter) took up the commenting baton…

korg250 (again):

In my case it is not about memory usage, but how slow it becomes when it reaches 1 GB of usage.

I will try the about:support tip listed before.

grumpy2:

Out of curiosity, which version of Firefox are you basing this on?

Because the thing about these “I used Firefox until X got too much for me, and I switched” kind of imply something quite significant, which is “I am no longer using Firefox, and can therefore not speak for the possible improvements that have happened since I switched”.

For the last couple of versions, Firefox has shown extremely good memory usage in my experience. And yes, they were absolutely horrible at this in earlier versions, and I’ve been *that* close to switching a number of times because of it. I’ve ranted a lot about how their memory usage was just so broken I doubted they’d ever be able to fix it. But in my experience, they have.

(And yes, this is 120% subjective, and I am certainly not arguing that if your experience differs then you are wrong. But I have noticed a vast improvement in recent versions. As I recall, the first of these fixes started trickling in around the FF13 timeframe, but I could be wrong on that.)

otomo1001:

I think you’ve some other computer issues. Here is my system, 2.4gig firefox process. Still running fine and dandy no slowdown, this is 18 mind you.

http://imgur.com/gNmZK84

korg250 (again):

I am always in the latest version. Was using 18.0.2 until today. Now I am on the 19 after resetting via about:support (still watching how it goes).

Maybe my installation had something wrong and even after the updates the memory issue remained – I don’t know. But I made the “switch” over the last week, when I compared speed between FF and Opera.

korg250 (again):

The reset via about:support worked for me. My FF is running smoothly for over 24 hours. Thanks guys!

Excellent!

Many people have used Firefox for years, and so the chance of mysterious profile problems occurring are quite high.  “Reset Firefox” cleans that dirty slate.  As the support page says, it will “easily fix most problems”.  It’s a simple, one-shot process with a high effectiveness rate that can be explained in a single sentence — perfect for mentioning in an online forum.  (Indeed, I should have linked to the support page in my comment, and then I could have written less explanation.)

Ideally it would also be tried by anyone who tries switching back to Firefox from another browser.  In fact, if Firefox is started up for the first time in a while (3 months? 6 months?) it would be great if it offered to automatically do this.

MemShrink progress, week 87–88

SpiderMonkey

Till Schneidereit implemented sharing of bytecode and related script data.  This can result in significant savings, especially if you have multiple tabs open from the same site.  For example, with 10 articles open from theage.com.au I saw an 11.6 MiB saving.  This was a MemShrink:P1, and its fix completes half of the “Better Script Handling” item from the MemShrink big ticket items list.

Terrence Cole fixed a bad regression that was causing multi-GiB memory spikes when using certain regular expressions.  He landed this fix on the Nightly, Aurora and Beta channels.

I reduced the overhead of small compartments a little.  This reduced the size of an unloaded tab from 222 KiB to 192 KiB on 64-bit platforms.

Jason Orendorff removed some source notes that are no longer required now that SpiderMonkey’s decompiler has been removed.  This slightly reduces the size of the “script-data” entries in about:memory.

I modified js::Vector so that it doesn’t waste space when its elements have a size that is not a power of two.

B2G

Chris Jones fixed a bad leak in the IPC code.  This is a big deal because B2G testing has found several cases of steadily growing memory consumption that lead to the devices becoming unusable, and there’s a good chance that this leak was responsible for some of them.  More testing is needed to determine exactly how many problems this has fixed.

It’s not surprising that IPC code is causing B2G problems, because that is code that desktop Firefox uses only lightly (for out-of-process plug-ins), whereas B2G uses it very heavily.  There’s a bug open for detecting such leaks that should be fairly easy to implement, and potentially could lead to big fixes, which is currently looking for an owner.

Miscellaneous

Timothy Nikkel reduced the memory consumed by display:none elements.  This was a tricky bug that was landed and then backed out multiple times due to subtle test failures.  And while this bug has benefits of its own, it is most notable for the fact that it was blocking bug 689623, which is the key bug that will help us start to fix Firefox’s problems with image-heavy pages.  (And 689623 itself is close to landing now, with 13 r+’d patches.)

Matthew Gregan fixed a bug, present only in Firefox 17 ESR, that was causing HTML5 videos to consume multiple GiBs of memory.

Alexandre Poirot fixed another leak in the Add-on SDK.

Jonathan Kew reduced the amount of memory consumed by textruns when Facebook Messenger is enabled.

David Keeler fixed a leak relating to IndexedDatabaseManager.

Bug Counts

Here are the current bug counts.

  • P1: 17 (-7/+8)
  • P2: 128 (-10/+19)
  • P3: 117 (-2/+15)
  • Unprioritized: 2 (-22/+2)

Lots of movement there.  The -7 P1s is mostly due to a number of bugs being downgraded;  these were bugs that seems important previously but now seem less important.  (For those who are interested, the bug numbers were: 679942, 763252, 764220, 770612, 819839, 829417, 833518.)

MemShrink progress, week 85–86

Lots of news today.

Fixed Regressions

I wrote last time about a couple of bad regressions that AWSY identified.

The ongoing DOM bindings work will hopefully fully fix the second regression before the end of this development cycle (February 18).

AWSY

John Schoenick made three big improvements to AWSY.

  • It now measures every push to mozilla-inbound.  Previously it measured mozilla-central once per day.  This will make it easier and faster to identify patches responsible for regressions.
  • It’s now possible to trigger an AWSY run for any try build.  Unfortunately John hasn’t yet written instructions on how to do this;  I hope he will soon…
  • AWSY now measures Fennec as well.  Kartikaya Gupta created the benchmark that is used for this.  He also fixed a 4 MB regression that it identified.

Leaks Fixed

Benoit Jacob fixed a CC leak that he found with his refgraph tool.

Johnny Stenback fixed a leak involving SVG that he found with DMD.  This was a very slow leak that Johnny had seen repeatedly, which manifested as slowly increasing “heap-unclassified” values in about:memory over days or even weeks.  It’s a really nice case because it shows that DMD can be used on long-running sessions with minimal performance impact.

Justin Lebar fixed a B2G leak relating to forms.js.

Randall Jesup fixed a leak relating to WebRTC.

Andrew McCreight fixed a leak relating to HTMLButtonElement.

Erik Vold fixed a leak in the Restartless Restart add-on.

Miscellaneous

Brian Hackett optimized the representation of JS objects that feature both array (indexed) elements and named properties.  Previously, if an object had both elements and named properties, it would use a sparse representation that was very memory-inefficient if many array elements were present.  This performance fault had been known for a long time, and it caused bad memory blow-ups every once in a while, so it’s great to have it fixed.

As a follow-up, Brian also made it possible for objects that use the sparse representation to change back to the dense array representation if enough array elements are subsequently added.  This should also avoid some occasional blow-ups that occur when arrays get filled in in complex ways.

Gregory Szorc reduced the memory consumption of the new Firefox Health Report feature, from ~3 MB to ~1–1.5 MB: here and here and here and here. On a related note, Bill McCloskey is making good progress with reducing compartment overhead, which should be a sizeable win once it lands.

Gregory also reduced the memory consumption of Firefox Sync:  https: here and here.

Jonathan Kew reduced the amount of memory used by textruns when Facebook Messenger is enabled.

The Add-on SDK is now present in mozilla-central, which is a big step towards getting all add-ons that use it to always use the latest version.  This is nice because it will mean that when memory leaks in the SDK are fixed (and there have been many) all add-ons that use it will automatically get the benefit, without having to be repacked.

Generational GC

Generational garbage collection is an ongoing JS engine project that should reap big wins once it’s completed.  I don’t normally write about things that haven’t been finished, but this is a big project and I’ve had various people asking about it recently, so here’s an update.

Generational GC is one the JS teams two major goals for the near-term.  (The other is benchmark and/or game performance, I can’t remember which.)  You can see from the plan that there are eight people working on it (though not all of them are working on it all the time).

Brian Hackett implemented a static analysis that can determine which functions in the JS engine can trigger garbage collection.  On top of that, he then implemented a static analysis that can identify rooting hazards and unnecessary roots.  This may sound esoteric, but it has massively reduced the amount of work required to complete exact rooting, which is the key prerequisite for generational GC.  To give you an idea:  Terrence Cole estimated that it reduced the number of distinct code locations that need to be looked at and possibly modified from ~10,000 to ~200!  Great stuff.

Another good step was taken when I removed support for E4X from the JS engine.  E4X is an old JavaScript language extension that never gained wide support and was only implemented in Firefox.  The code implementing it was complicated, and an ongoing source of many bugs and security flaws.  The removal cut almost 13,000 lines of code and over 16,000 lines of tests.  It’s been destined for the chopping block for a long time, and its presence has been blocking generational GC, so all the JS team members are glad to see it go.

Bug Counts

Here are the current bug counts.

  • P1: 16 (-5/+0)
  • P2: 119 (-6/+0)
  • P3: 104 (-0/+0)
  • Unprioritized: 22 (-0/+18)

Three of the P1 “fixes” weren’t actual fixes, but cases where a bug was WONTFIXed, or downgraded.  The unprioritized number is high because we skipped this week’s MemShrink meeting due to the DOM work week in London, which occupied three of our regular contributors.

MemShrink progress, week 83–84

Fixed

Justin Lebar made it so that memory reports could be collected on production B2G phones (i.e. not just developer phones with root access).  This was a MemShrink:P1, because getting these reports is crucial.

Gregor Wagner tuned the GC heuristics used by workers.  This is important for B2G, which uses workers extensively.

Andrew McCreight fixed a leak involving audio contexts.

I added a memory reporter for event targets, which includes XHRs.  This can measure multiple MiB of memory when running Gmail.

I added a memory reporter for data held by the JS engine’s regexp JIT compiler.  It usually measures insignificant amounts.

I fixed an inaccuracy in the “resident” memory report tree, which is visible in about:memory when running on Linux, which was caused by a change in recent kernels.

AWSY

The recent results on AWSY have been ugly.  There were two bad regressions in December, as the following graph makes clear.

areweslimyet.com, december 2012

John Schoenick did some work to improve AWSY to make regression hunting easier, and as a result we finally know which changes caused these regressions.

  • A refactoring of images code caused the bigger regression, on December 18.  Seth Fowler is looking into this.
  • Two changes relating to the new DOM bindings caused the smaller regression on December 11/12.  This is largely because many more JS getter/setter functions are present.  It’s not clear yet how to win back this memory, though it should be possible to turn these changes off in the short-term.

These regressions have made it to the Aurora branch, which means there is some urgency now to either fix them or back out/disable them soon.  We don’t want them to reach Beta.

Bug Counts

Here are the current bug counts.

  • P1: 21 (-1/+7)
  • P2: 125 (-2/+11)
  • P3: 104 (-1/+3)
  • Unprioritized: 4 (-17/+3)

The changes are larger than usual because we had a big log of untriaged bugs to go through, due to the six week break since the last MemShrink meeting.

MemShrink progress, week 79–82

I skipped the last MemShrink report due to Christmas, so we have four weeks’ worth of bug fixes today.

LEAKS

Joe Walker fixed a bad leak found by Jesse Ruderman:  if you closed a browser window with the developer toolbar open it would leak “everything”.  This was a MemShrink P1 bug.

Anton Kovalyov fixed a leak involving scratchpad.  This bug was also found by Jesse Ruderman.

Randell Jesup fixed some WebRTC leaks.

John Schoenick fixed a leak involving plugins.

Josh Aas fixed a leak in some networking code.

DMD

I wrote a more detailed blog post about DMD.  Here is the take-away message.

about:memory is MemShrink’s not-so-secret weapon when it comes to understanding Firefox’s memory consumption… and DMD is how we make about:memory better.

Lots of under-the-hood improvements have been made to DMD since I wrote that.  Users on Mac, Linux and B2G who aren’t afraid of doing their own builds should try it out.  Also, Ehsan Akhgari got it to build on Windows, though it’s not yet clear how well it works on that platform.  If anyone wants to try it out, please let me know how it goes.

Memory Reporters

Ben Turner made the workers memory reporter be able to handle workers that use ctypes.  This was important, especially on B2G, because each process can have one or two or more such workers — this is for Firefox chrome stuff, not web content — and we weren’t measuring them at all, and they can take multiple MiB each.

I fixed the orphan DOM node memory reporter.  The introduction of WebIDL had changed the layout of some paired JS/DOM objects, and such objects weren’t being reported.  (DMD discovered the unreported memory, and Boris Zbarsky helped me interpret what it meant.)  I see this accounting for multiple MiB of orphan nodes when using Gmail.

I added a memory reporter for the event listenener manager’s hash table.  It starts off small, but I’ve seen it go as high as 1.5 MiB after lots of browsing.  (DMD helped me identify this too.)

Kartikaya Gupta added a memory reporter for graphics textures on Android.

I added a memory reporter for any ctypes data that is hanging off JS objects.  I added it because one DMD profile on B2G showed non-trivial amounts of ctypes data, but that seems to have been a fluke and it rarely shows much memory now.  Oh well.

Miscellaneous

Andrew McCreight improved CC shutdown logging, which will make it easier to identify shutdown leaks.

Rail Aliiev and Kartikaya Gupta enabled a new NDK for Fennec builds on releng machines.  This might result in smaller binary sizes, which saves memory.

Bug Counts

Here are the current bug counts.

  • P1: 15 (-2/+0)
  • P2: 116 (-10/+0)
  • P3: 102 (-4/+0)
  • Unprioritized: 18 (-0/+18)

The number of unprioritized bugs is high because we didn’t have a MemShrink meeting this week.  This was because Justin Lebar and Kyle Huey are in Berlin for the B2G work week.  We’ll have our next meeting two weeks from today.

DMD

I recently landed a new version of DMD on mozilla-central.  DMD is a tool helps us understand and thus reduce Firefox’s memory consumption.  But in order to understand DMD, you first have to understand about:memory.

about:memory

The MemShrink project started about 18 months ago, and it has been very successful in reducing Firefox’s memory consumption.  about:memory is MemShrink’s not-so-secret weapon when it comes to understanding Firefox’s memory consumption.

about:memory screenshot

about:memory has some wonderful characteristics:  it provides literally thousands of measurements;  it’s available in ordinary release builds;  and it’s trivial to run (just type “about:memory” in the address bar).  This means that non-expert users can easily provide developers with detailed measurements if they are having problems.

However, about:memory has two shortcomings.  First, it simply visualizes the data provided by the memory reporter infrastructure.  The coverage provided by this infrastructure is good, but there are still some gaps.  These gaps manifest primarily in the “heap-unclassified” number in about:memory, which represents all the heap allocations that the memory reporters didn’t cover.  Unfortunately, about:memory can provide zero insight into what is within “heap-unclassified”.

Second, it’s hard to verify.  There’s no obvious way to tell if the numbers it gives are accurate (with a few exceptions;  e.g. negative numbers are obviously wrong).  We have established good practices for writing memory reporters (measure sizes, don’t compute then;  traverse data structures rather than maintaining size counters) that prevent most errors, but it’s still quite easy to accidentally count a block of memory twice.

This is where DMD comes in.  It helps with both the heap-unclassified and double-counting problems.

DMD version 1

I wrote the first version of DMD over a year ago.  “DMD” is short for “dark matter detector”, because “heap-unclassified” memory is sometimes jokingly called “dark matter”.

DMD works by intercepting all calls to malloc/free/etc.  This lets DMD track extra information about every heap block, such as where it was allocated.  Furthermore, DMD has hooks into the memory reporters (via functions created with the NS_MEMORY_REPORTER_MALLOC_SIZEOF_FUN macro, for those who are interested) so it knows when any heap block is measured by a memory reporter.

With that in place, DMD knows how many times each heap block has been reported.  So, after running the memory reporters, we can up each block into one of the following three groups.

  • Blocks that haven’t been reported indicate gaps in existing memory reporters.  They contribute to “heap-unclassified”.
  • Blocks that have been reported once are good.
  • Blocks that have been reported twice or more indicate defects in existing memory reporters.  They cause some measurements in about:memory to be too high, and “heap-unclassified” to be too low.

DMD presents its results in a sorted fashion that lists the unreported and twice-reported cases that are more important first.

This first version of DMD was implemented as a Valgrind tool, and what’s more, it required a special, patched build of Valgrind to run.  This meant it was difficult to set up, ran very slowly, and could only be used on Linux and Mac.  Despite these shortcomings, it has proved itself extremely useful, helping us get “heap-unclassified” down greatly (on my Linux desktop machine it’s typically between 8 and 12%) and identifying several cases of double-counting.

DMD Version 2

Some time after I wrote the first version of DMD, I realized that all it needed to work was the ability to intercept malloc/free/etc. and to obtain stack traces.  Valgrind was overkill for these purposes — it’s capable of supporting much more invasive tools — and, furthermore, there were existing tools (e.g. trace-malloc) in the Mozilla codebase that did exactly these things.  In other words, it would be possible to build a new version of DMD that integrated directly into the browser.

Work on this new version stalled for a while, because the old one was working well enough.  But then B2G’s Operation Slim Fast started, and it quickly became obvious that “heap-unclassified” on B2G was typically much higher than it was on desktop.  This is primarily because B2G has lots of small processes, and so various unmeasured things that weren’t a big deal on desktop became much more significant on B2G.  And DMD version 1 doesn’t work on B2G devices.

So that provided the impetus to complete the new version.  Mike Hommey finished his new replace-malloc infrastructure recently, which helped greatly, and the new version of DMD landed on mozilla-central last week.  The old version will soon be retired.

DMD now works on Linux, Android, Mac, B2G, and is just shy of working on Windows (where Ehsan Akhgari has been making gradual progress).  It’s also much faster, partly because it avoids the overhead of Valgrind, and partly because it now uses sampling.  The sampling causes blocks smaller than a certain size (4 KiB by default, though you can change that) to be sampled, while blocks larger than that are measured accurately;  this sacrifices a moderate amount of accuracy for a large amount of speed.

Try it

about:memory is wonderful, and DMD is how we make about:memory better.  There are now detailed instructions on how to build DMD, run it, and interpret its output.  It’s quite easy now, requiring only minor changes to the usual build and run steps, and several people have already done it successfully.  Please try it out, and help us further improve about:memory.

MemShrink progress, week 77–78

DMD

The big news this week is the landing of the native version of DMD.  The old version of DMD is a Valgrind-based tool that has been instrumental in reducing the size of about:memory’s “heap-unclassified”.  (For example, with trunk builds on my Linux desktop machine it’s now frequently less than 10%.)

However, the old version of DMD wasn’t easy to run.  For example, it required patching Valgrind’s source code and re-building it, among other things.  As a result, only a handful of people ever ran it.  Furthermore, because it’s a Valgrind tool it is very slow and will never run on Windows.

In contrast, the new version is much easier to use.  It just requires a slight configuration change at build time and a slight change in how the browser is invoked.

It’s also much faster, especially if you use the sampling mode which trades a small amount of precision for a great deal of speed.  Crucially, this means it is usable on B2G.  Also, it should be possible for people to use it for long-running sessions, which can be helpful for identifying slow leaks.

And while the new version doesn’t currently run on Windows, it should be fairly straightforward to get it working.  If anyone is interested in helping with this, please contact me, or take a look at the open bug.

I will write a more detailed post about the new version some time in the next few days, once I have updated the documentation and finalized a few more tweaks that should improve usability.  In the meantime, Justin Lebar is already using it in earnest to better understand B2G’s memory consumpion (e.g. see here, here, here, here, here, here, here).

Thanks to Mike Hommey for his excellent replace-malloc infrastructure, on which the new version is built, and to Justin for lots of help, especially with Android and B2G details.

B2G

The option that allows B2G to merge system compartments, previously implemented and landed by Kyle Huey, was enabled by default.  This is a big deal, because it’s the single biggest memory consumption improvement B2G has seen and is likely to see before version 1 is released.

Gabriel Svelto reduced the number of unused dirty pages kept around by jemalloc on B2G.  This can save ~4 MiB of memory across all processes, at a potential cost of making some allocations a little slower.  On a device as memory-constrained as the B2G phones, this is a good trade-off.

Social API

Felipe Gomes reduced the memory consumption of the social API when multiple browser windows are open.  This change has been backported to the beta channel, so it will be present in Firefox 18, in time for the bigger publicity push for this feature.  (For those of you that don’t like Facebook, please note that if you don’t enable the social API it will not affect Firefox’s performance in any way.)

Miscellaneous

I added a MEMORY_VSIZE telemetry reporters, which measures virtual memory consumption.  This might help us understand how many Windows users would benefit from 64-bit builds.

Bug counts

Here are the current bug counts.

  • P1: 17 (-3/+3)
  • P2: 126 (-2/+14)
  • P3: 106 (-4/+7)
  • Unprioritized: 0 (-2/+0)

MemShrink progress, week 75–76

B2G continues to be a major focus of work relating to memory consumption.  Even the non-B2G-specific improvements made this fortnight were mostly identified and made to help B2G.

B2G-specific

Gabriele Svelto caused dirty freed pages held onto by jemalloc to be purged when a B2G process experiences memory pressure, such as when it goes into the background.  This typically reduces a process’ size by about 2 MiB, so it’s a big deal.  This was a MemShrink P1 bug.

Benoit Jacob prevented the creation of redundant OpenGL contexts on B2G.  This saves 750 KiB in the main B2G process, and 750 KiB in every child that uses WebGL.  Benoit was able to do this based on data from a B2G-specific heap profiler written by Justin Lebar.

I made a large IPC message buffer shrink once all messages from it have been processed.  This saves either 120 KiB or 248 KiB per child process.

James Lal greatly reduced memory consumption of the ical parser during sync.  This change also made it much faster.

Justin Lebar and I added a memory reporter for Freetype, which uses about 2 MiB of memory in the main B2G process.  (This reporter also works on Fennec.)

Andrea Marchesini added a memory reporter for B2G’s gralloc memory.

Miscellaneous

Bill McCloskey made the JavaScript engine clean up more data on memory pressure events.

I increased the size of the chunks used by XPT info’s arena allocator, which saves about 80 KiB per process on 32-bit, and a bit more on 64-bit.

Randell Jesup fixed a PeerConnection leak in WebRTC code.

I added more detail to the JavaScript type inference memory reporters.

Bug counts

Here are the current bug counts.

  • P1: 17 (-4/+1)
  • P2: 114 (-5/+9)
  • P3: 103 (-4/+5)
  • Unprioritized: 2 (-3/+1)