Another leak fixed, part 2

I recently wrote about bug 654106, a memory leak that has been fixed.  In bug 653817 the reporter made some measurements that show this leak was quite a bad one.  The reporter measured “Uss” using procrank on an Android device.  This page says:

“Uss is the set of pages that are unique to a process. This is the amount of memory that would be freed if the application was terminated right now.”

Comment 19 and comment 24 have the numbers before and after the leak was fixed.  The reporter opened Firefox (with a single tab containing about:memory), measured the memory usage, then opened 8 popular sites, re-measured several times, then closed them all (except about:memory), re-measured, then re-opened them, and so on through several cycles.  The following table shows the key measurements from the first cycle.

Before After
Start-up 47,972 KiB 48,700 KiB
Open 8 tabs, wait 90 minutes 251,844 KiB 240,064 KiB
Close 8 tabs 226,328 KiB 108,908 KiB

These measurements have some noise, so don’t read too much into the minor differences.  The important difference is the last row;  the Uss after closing the 8 content tabs was 2.1x smaller after fixing the leak!

So, this is a great leak to have fixed.  But I have several concerns remaining.

  • It’s worrying that such a bad leak was able to get into Firefox 4.0 and remain undetected for this long.  My understanding is that we have various kinds of automatic leak detection tools, but I don’t know much about them, why they might not have detected this, and whether they could be improved.
  • The Uss after closing the 8 tabs is 2.2x higher than at start-up.  That seems high.  One thing I’ve been trying to understand lately is what kind of memory usage can legitimately remain when a lot of tabs have been closed and there’s only one left.  Obviously there’s a bunch of chrome stuff, but when I look at detailed profiles it’s hard for me to tell what things fall into that category and what doesn’t.  (One thought I had was that it might be worth doing some profiling on Mac, because it’s possible on Mac to close all browser windows without closing the browser itself.  Would all this chrome memory still remain in use in this case?)
  • Each time the bug reporter re-did the open/close cycle, the Uss after closing the tabs crept higher.  In the post-fix run, it was 108,909KiB the first time through, but the next three times through the figure was 121,552 KiB, 123,692 KiB, 127,588 KiB.  That smells like another leak (or more than one).

I read a lot of browser-related threads on tech websites.  They almost always descend into slanging matches where people explain why browser A is awesome and browser B sucks.  My perception from these threads is that memory leaks (be they real or perceived) are one of the things people complain about most with Firefox.  This is usually based on a measurement similar to the one described above — the person browses for a while, closes all their tabs except one, and memory usage is still high.  I’d love to hear any ideas people have about how to improve things on this front.

15 Responses to Another leak fixed, part 2

  1. I’m actually a bit confused by those numbers.

    The leak that was fixed leaked one word per innerHTML set. The number above are from a 32-bit build, so the 118MB difference observed is order of 30 million innerHTML sets. Divided by 8 tabs and 90 minutes, that’s an average of about 700 innerHTML sets per second per tab for the entire duration of the test. I have a _really_ hard time believing that.

  2. Thanks for your efforts in this and the about:memory revamp.

    I’m hopeful that regular posts of this nature on Planet will keep the outstanding issues (for example, how the leak you describe above managed to go unnoticed by the automated tests!) at the forefront of people’s minds – and help stop the “Firefox 3.x fixed all the memory leaks, there isn’t a problem any more” myth I’ve seen on Bugzilla on more than one occasion, from spreading any further.

    Regarding your experience with tech sites and comments about Firefox – this matches entirely what I’ve seen. Even now, memory leaks still inevitably end up being discussed whenever Firefox is compared to another browser.

    As further proof, the leaks category ranks higher than crashes, in metric’s upgrade opt-out survey:

    Keep the posts coming! 🙂

  3. All the automated leak testing that we do that I know about, is only looking for shutdown leaks (i.e. memory not freed after shutting down the entire browser). They’re blind to leaks where the memory will be freed at shutdown but won’t be freed or reused earlier than that. But those are precisely the leaks that users will notice.

  4. Another usual metric from users is long-standing Firefox vs. freshly started Firefox with the same (saved) session. And there’s usually a big difference.

  5. I am happy to see this work being done. I am really passionate about Firefox but feel that the team has a lack of focus to the things that really matter to users.

    These things include UI refinements, bug fixes, inconsistent behaviour and memory management, responsiveness of interface and speed in benchmarks.

    So I would like to thank you for this work.

    I would really like to see a tick tock approach from Mozilla with this new rapid release cycles.

    4 -> 4.5 -> 5

    The 4.5 should be mostly bug fixes, refinements. This gives you a easy win. The 5 should be new things. 6 weeks is very little time to bake new features. 12 weeks on the other hand is much better.

    This way at least you have something to write about when you hit a mayor release. What is in Firefox 5 that makes it 5? Who is not fooled to think that 5 is everything 4 should have been but they ran out of time.

    I would also like to see all Firefox 4 users silently updated to Firefox 5.

  6. I’d like to get to the state of being DEBUG_CC-clean — i.e., no DEBUG_CC warnings during some tests (eventually I’d hope during our test suites), which would detect a bunch of categories of leaks (although only ones of certain large object graphs).

    This is roughly equivalent to the state of being where the Leak Monitor extension doesn’t pop up any warnings — a state we haven’t been in since the cycle collector landed (for Firefox 3).

    (It’s possible that both of these warnings need some adjustment for the recent changes to disconnect cycle collection and garbage collection timing — though they also might not.)

    Once we get to that state, we can add automated tests that we stay in it (perhaps starting on some very simple tests), which would at least check that whatever we’re testing doesn’t leak windows or documents for the lifetime of the browser. Then we can build to stronger tests over time.

    • Nicholas Nethercote

      dbaron: you mean we have automatic leak tests that are currently failing?(!) Oh, please tell me more about DEBUG_CC, or point me at some documentation!

  7. > I read a lot of browser-related threads on tech websites. They almost always descend into slanging matches where people explain why browser A is awesome and browser B sucks. My perception from these threads is that memory leaks (be they real or perceived) are one of the things people complain about most with Firefox.

    Yeah, I read probably the same threads, and I see the same thing.

    Often the complaints are very general, so it’s hard to assess them, but sometimes users report reproducible problems and detailed steps (for example, a user reported bug 654028 to me on Slashdot).

    > This is usually based on a measurement similar to the one described above — the person browses for a while, closes all their tabs except one, and memory usage is still high.

    As Zack said, I think almost all our leak testing is for shutdown leaks. Jesse’s fuzzing did find some leak-until-shutdown bugs, but that’s all I am aware of in that area.

    We could definitely benefit from some automatic tests about closing tabs and seeing that memory usage decreases (either fuzzing or otherwise). I’m not sure the current leak test mechanisms would be directly applicable, though. But even something like measuring RSS might be useful as a start.

  8. Julian Seward

    Is it feasible / useful / already-done / impossible / pointless to run
    Mochitests and fix every single leak we can find that’s reported in
    it, that is to do with our code (as opposed to system libraries), no
    matter how insignificant? I have no feel for how assiduous we are
    about fixing all detectable leaks.

  9. Well, they’re automatable tests, but we never actually automated them because we never got them into a passing state.

    There’s some documentation here:

    I’d note additionally that the default things it looks for are any windows and documents that survive past the window being closed. It then prints explanations of why they survive.

    In some cases, the problem it prints information about are cases where things take more than one GC cycle to clean up. However, taking more than one GC cycle to clean up is a sign that, given some additional edges, we could end up with a permanent leak.

  10. While i am very happy that this HUGE bug is getting fixed. I am very sad to read this
    “It’s worrying that such a bad leak was able to get into Firefox 4.0 and remain undetected for this long. ”

    No, the bad leaks getting there this long, that is not the worrying thing at all.

    The most worrying thing is, there are no official communication from users to dev. Users have been shouting, barking, yelling for such a long time and yet, you didn’t hear us. We could only post on Mozillazine, which doesn’t even relate to Mozilla at all but most users thought that was official channel.

    I am glad these sort of fixes are getting it, at least it will do some good PR for mozilla. Hopefully Firefox 5 will be what Firefox 4 was supposed to be.

    • Nicholas Nethercote

      Ed: I sympathize with your statements, but hasn’t Bugzilla always been the main user-to-dev communication channel for this kind of problem?

  11. As a user: The dev channel has been there, but I for myself failed creating something reportable.

    While I know that my standard usage (20++- tabs) crashed FF4 after a few hours of usage preceded by a period of high memory consumption and slow response time, probably mostly garbage collections, and while I guessed that it happens on AJAX rich, dynamic web pages, I had nothing more concrete to say. So I reported that beta-12 were useless for me due to above problems and so (still) is 4.01 and frankly, got ignored. I mean I understand, it could have been each of the plugins I use and each of the pages open JavaScript.

    Anyway, I am really looking forward to check a new build, hoping that my problems are gone.

    • Nicholas Nethercote

      Frank: Firefox 5 will have several leaks fixed, hopefully that’ll improve thing for you. It won’t be perfect, but hopefully we’re now heading in the right direction.

      Disabling add-ons can really help, though it depends on the add-ons you used.

      Firefox 5 will have a much improved about:memory page that provides lots of useful information, and also buttons to trigger garbage collection. If you still have problems, reporting them via Bugzilla would be great; please include the output of about:memory and the sites you opened. Thanks!

  12. @ Nick

    I just want to clarify i am not blaming on you or anything. At least you care to reply to my comment. ( Which doesn’t normally happen with Mozilla )

    Regarding your questions. No, Bugzilla is not a Dev to User platform. It is not, it was not, it will not, and it never should be. The reason is Bugzilla doesn’t allow non specific problem to be posted. When 10 to 15 users all state their memory usage went up from Build x to Build something, there has got to be some truth, depending on whether those are long time users. However we have no technical ability or time to pin point where is the problem. All i can tell you is 10 Tabs with no Addon on build or version x is using 30% less memory usage then version / build x.

    By the time it went final, many more users started to complain in different forum or channels. Some will likely do some comparison. And the press started to pick it up, with the internet you are generating bad press at the speed of light.

    And the problem is not a structure, or communication channel problem. It is the culture in Mozilla of how things are done and handle that is more concerning.