Categories
Firefox Memory consumption

Leak reports mini-triage, May 30, 2011

I just created bug 660577 which consolidated five bug reports, all of which were complaining about Firefox 4 having high memory usage and/or OOM aborts on image-heavy pages.  This is a clear regression from Firefox 3.6, and appears to have two likely causes:

  • The introduction of infallible new/new[] means Firefox 4 sometimes aborts where Firefox 3.6 would try to recover.  (Kyle Huey already opened bug 660580 to fix this.  Thanks, Kyle!)
  • image.mem.min_discard_timeout_ms was increased from 10,000 (10 seconds) to 120,000 (120 seconds).  This means that Firefox holds on to some image data (I don’t understand the exact details) for longer.

Input from people who know the details of this stuff would be most welcome!  Thanks.

Categories
Bugzilla Firefox Memory consumption MemShrink

The new leak tracking bugs are live

Yesterday I proposed a new way of tracking leak reports.  It’s now up and running.  Two old tracking bugs have been decommissioned: bug 632234 (which was already resolved) and bug 640452.  Five new bugs have been created:

  • Bug 659855 – (mlk-fx4-beta) [meta] Leaks and quasi-leaks reported against Firefox 4 betas
  • Bug 659856 – (mlk-fx4) [meta] Leaks and quasi-leaks reported against Firefox 4
  • Bug 659857 – (mlk-fx5) [meta] Leaks and quasi-leaks reported against Firefox 5
  • Bug 659858 – (mlk-fx6) [meta] Leaks and quasi-leaks reported against Firefox 6
  • Bug 659860 – (mlk-fx7) [meta] Leaks and quasi-leaks reported against Firefox 7

Please CC yourself if you are interested.  Apologies for any bugspam you received as a result of these changes.  Hopefully this new tracking system will work well.

 

 

Categories
Bugzilla Firefox Memory consumption MemShrink

A new way of tracking leak reports

We get lots of leak reports from users.  There is a spectrum of quality.

  • Some are hopelessly vague and will never lead to anything useful. (“After browsing for several hours, Firefox is using 100s of MBs of memory.  This is unacceptable;  please fix.”)  Bug 643177 is an example.
  • Some are very precise.  This makes them easy to reproduce, likely to be fixed quickly, and easy to re-confirm if other leaks are fixed in the interim.  (“I managed to reduce the problem down to the attached 10 line HTML file, it causes my machine to run out of memory within 10 seconds of loading.”) Bug 654106 is a good example.
  • Most are somewhere between these two extremes.

Because many of the reports aren’t great, it can be hard to tell if the problem is still present some time later. A single leak may be reported N times, then fixed, and N-1 reports stay open.  In short, leak reports get stale.  (This is true of many bug reports, but I think leak reports are more prone to staleness than most.)

How bugs are currently tracked

There is a keyword, ‘mlk’, which is added to almost all leak reports.  There are over 600 open bugs with that keyword, going back over 10 years.  So it’s not much use.

In the lead-up to Firefox 4, I used bug 632234 (which I’ll henceforth call “mlk-fx4-old”) to track potentially blocking leaks.  It worked well.

After that, I created bug 640452 (which I’ll henceforth call “mlk-fx5+”), with which I’ve been tracking leaks in the lead-up to Firefox 5 and later versions.  I carried over unresolved bugs from mlk-fx4-old.  mlk-fx5+ is starting to fill up feel stale.  Basically, I can see it suffering the same problems as the ‘mlk’ keyword before too long.

So I’m thinking about changing how these are tracked.  The basic idea is to use keep using the ‘mlk’ keyword for all leak reports, and then have one leak-tracking bug for each version of Firefox, so it’s clear which version each report applies to.

Steps needed to start this

Add the ‘mlk’ keyword to all mlk-fx4-old and mlk-fx5+ bugs that lack it.

Open new tracking bugs: mlk-fx4-beta, mlk-fx4, mlk-fx5, mlk-fx6.  (The Firefox 4 beta period was long enough, and there were enough leak reports filed against beta versions, that separating mlk-fx4-beta and mlk-fx4 seems worthwhile.)  Make each mlk-fxN depend on mlk-fx(N-1).

For all the existing bugs tracked by mlk-fx4-old and mlk-fx5+, add them to the appropriate new tracking bug.  With one exception: for hopelessly vague ones, just mark them as duplicates of mlk-fxN, with an explanatory message (“we’re not ignoring leaks, look at all these ones we’re tracking!  but your report doesn’t tell us anything we don’t already know, sorry”).

Close mlk-fx5+.

Steps needed to maintain this in the future

When Firefox version N’s cycle starts, open mlk-fxN, and mark it as depending on mlk-fx(N-1).

For all new leak reports, mark it as blocking mlk-fxN, for appropriate N.  Also add the ‘mlk’ keyword.

If someone confirms in a comment that a problem reported in version N is still present in version N+1, mark that bug as also blocking mlk-fx(N+1).

Properties of this system

You can still search for all leak reports, based on the ‘mlk’ keyword.

You can immediately tell roughly how stale a report is likely to be, based on which mlk-fxN tracking bug it blocks.  This is more reliable than the bug number or file date;  for example, we are still getting reports against Firefox 4 even though Firefox 5 (which has fixed a number of leaks) is in beta and Firefox 6 just went to Aurora.  This immediately gives a starting priority for all leak reports:  more recent ones have higher priority because they’re more likely to still be unfixed.

Hopelessly vague reports are resolved immediately by duplicating, so they don’t clog things up.

Tracking bugs shouldn’t get too big and unwieldy, because each Firefox version has a limited lifespan.

Reports against version N still block mlk-fx(N+1), but via one level of indirection.  Reports against version N+2 still block mlk-fx(N+2), but via two levels of indirection, etc.  So the full chain of dependencies is maintained.

We could periodically go through older bugs (eg. 3 releases ago) and ask people to re-confirm, and close out ones that get no response.  But we wouldn’t have to do that.

Am I crazy?

Is this bureaucratic overkill?  I don’t think so.  It’ll take some work, but I’m happy to do that.  It’ll only take an hour or two to set up, and then it won’t be much harder to maintain than what I’m currently doing with the mlk-fx5+ bug.  (I also have plans for writing instructions to help users file better leak reports.)  And it’ll allow us to proceed much more usefully with the lists of leak reports that we have.

But I’m interested to hear if you disagree, or have any ideas for improving it.  Thanks!

Categories
Firefox Memory consumption

Firefox 5 has fewer leaks than Firefox 4

There’s been some confirmation from users that Firefox 5 has fewer leaks than Firefox 4.

From bug 640923:

Well, FF5b2 is really doing way better in my environment. This is great! I’ll abandon FF4 now for me, it never was “stable” in my world.

From bug 657232:

I’ve just upgraded from Firefox 3.6.17 to Beta 5 and it at first appeared to be a lot more stable and responsive than Firefox 4.

Nice to hear!

“Fewer memory leaks” (or “reduced memory usage” if we want to be less blunt) should definitely be on the Firefox 5 feature list when it comes out.

Categories
about:memory Firefox Memory consumption MemShrink

Leak reports triage, May 24, 2011

I’ve been tracking recent memory leak reports in bug 640452. I’m doing this because I think memory leaks hurt Firefox, in terms of losing users to other browsers, as much as any other single cause. (I suspect pathological performance slow-downs due to old and busted profiles hurt almost as much, but that’s a topic for another day.)

There are 61 bugs tracked by bug 640452, 21 of which have been resolved.  Any and all help with the 40 remaining would be most welcome. For each bug I’ve put in square brackets a summary of the action I think it needs.

  • [NEEDS ANALYSIS]:  needs someone to attempt to reproduce, try to work out if the problem is real and still occurring.
  • [NEEDS WORK]: problem is known to be real, needs someone to actually fix it.
  • [PENDING EVANGELISM]: problem with a website is causing a leak, needs someone to check the site has been fixed.
  • [CLOSE?]: bug report is unlikely to go anywhere useful.  Closing it (with a gentle explanation) is probably the best thing to do.
  • [GGC]: needs generation GC to be fixed properly.

Here are the bugs.

  • 497808: This is a leak in gmail, caused by a bug in gmail — when an email editing widget is dismissed, some stuff isn’t unlinked from a global object that should be.  Google Chrome also leaks, but a smaller amount, it’s unclear why. The bug is assigned to Peterv and is still open pending confirmation that it’s been fixed in gmail. [PENDING EVANGELISM]
  • 573688: Valgrind detects several basic leaks in SQLite.  Assigned to Sayre, no progress yet. [NEEDS WORK]
  • 616850: Huge heaps encountered when browsing www.pixiv.net, leading to incredibly slow cycle collections (3 minutes or more!)  Little progress. [NEEDS ANALYSIS]
  • 617569: Large heaps encountered for some pages using web workers.  Looks like it’s not an actual leak.  Assigned to Gal, he says a generational GC would help enormously, so probably nothing will happen with this bug until that is implemented (which is planned).  I marked it as depending on bug 619558. [GGC]
  • 624186: Using arguments.callee.caller from a JSM can trigger an xpcom leak.  The bug has a nice small test that demonstrates the problem.  Unassigned. [NEEDS WORK]
  • 631536: A bad string leak, seemingly on Windows only, with lots of discussion and a small test case.  Assigned to Honza Bambas.  Was a Firefox 4.0 blocker that was changed to a softblocker at the last minute without any explanation.  Seems close to being fixed. [NEEDS WORK]
  • 632012: Firefox 4 with browser.sessionstore.max_concurrent_tabs=0 uses a lot more memory restoring a session with 100s of tabs than Firefox 3.6 with BarTab.  Unassigned.  Unclear if this is a valid comparison.  [CLOSE?]
  • 634156: Identifies some places where the code could avoid creating sandboxes.  Assigned to Mrbkap, he said (only four days ago) he has a patch in progress.  Seems like it’s not actually a leak, so I changed it to block bug 640457 (mslim-fx5+). [NEEDS WORK]
  • 634449: A classic not-very-useful leak report.  One user reported high and increasing memory usage with vague steps to reproduce.  Two other users piled on with more vague complaints.  The original reporter didn’t respond to requests for more measurements with a later version.  I’m really tempted to close bugs like this, they’ll never lead anywhere.  Unassigned. [CLOSE?]
  • 634895: Vague report of memory usage increasing after awakening a machine after hibernation, with one “me too” report.  Unassigned.  Unlikely to lead to any useful changes.  [CLOSE?]
  • 635121: A leak in Facebook Chat, apparently it’s Facebook’s fault and occurs in other browsers too.  (Unfortunately, leaks like that hurt us disproportionately because we don’t have process separation.)  Assigned to Rob Arnold, marked as a Tech Evangelism bug.  Unclear if the Facebook code has been fixed, or if Facebook has even been contacted. [PENDING EVANGELISM]
  • 635620: Very vague report.  Unlikely to go anywhere.  Unassigned. [CLOSE?]
  • 636077: Report of increasing memory usage, with good test case.  Lots of discussion, but unclear outcomes.  Again, generational GC could help.  MozMill endurance tests showed the memory increase flattening out eventually.  Might be worth re-measuring now.  Assigned to Gal.  I marked it as depending on the generational GC bug (bug 619558). [NEEDS ANALYSIS, GGC]
  • 636220: Memory usage remains high after closing Google Docs tabs.  Assigned to Gal.  Needs more attempts to reproduce. [NEEDS ANALYSIS]
  • 637449: Looks like a clear WebGL leak.  Might be a duplicate of, or related to, bug 651695. Unassigned, but Bjacob looked into it a bit. [NEEDS ANALYSIS]
  • 637782: Memory usage increases on image-heavy sites like http://www.pixdaus.com/ or http://boston.com/bigpicture/ or http://www.theatlantic.com/infocus/.  Lots of discussion but not much progress.  Unclear if the memory is being released eventually.  Needs more analysis.  Unassigned.  [NEEDS ANALYSIS]
  • 638238: Report of memory increasing greatly while Firefox is minimized.  Might be related to RSS Ticker?  I would recommend giving up on this one except the reporter is extremely helpful (he’s participated in multiple bugs and I’ve chatted to him on IRC) and so progress might still be made with some effort.  Unassigned. [NEEDS ANALYSIS]
  • 639186: AdBlock Plus and NoScript together causing a leak on a specific page.  Lots of discussion but it petered out.  Unassigned.  [NEEDS ANALYSIS]
  • 639515: GreaseMonkey causes a big memory spike when entering private browsing.  Some discussion that went nowhere.  Unassigned.  [NEEDS ANALYSIS]
  • 639780: Report of steadily increasing memory usage leading to OOMs.  Steps to reproduce are vague, but the reporter is very helpful and collected lots of data.  Unassigned.
  • 640923: Vague reports of increasing memory usage, lots of people have piled on.  One useful lead:  RSS feeds might be causing problems on Windows 7?  The user named SineSwiper (who has alternated between being abusive and collecting useful data) thinks so.  Unassigned. [NEEDS ANALYSIS]
  • 642472: High memory usage on a mapping site.  Very detailed steps to reproduce;  one other user couldn’t reproduce.  Unassigned.  [NEEDS ANALYSIS]
  • 643177: Vague report.  Unassigned.  [CLOSE?]
  • 643940: Ehsan found leaks in the HTML5 parser with the OS X ‘leaks’ tool.  Unassigned.  [NEEDS ANALYSIS]
  • 644073: Ehsan found a shader leak with the OS X ‘leaks’ tool.  Unassigned. [NEEDS WORK]
  • 644457: High memory usage with gawker websites and add-ons (maybe NoScript?)  See comment 25. Unassigned.  [CLOSE?]
  • 644876: Leak with AdBlock Plus and PageSpeed add-ons on mapcrunch.com.  Unassigned.  [NEEDS ANALYSIS]
  • 645633: High memory usage with somewhat detailed steps to reproduce.  Reporter is helpful and has collected various pieces of data.  [NEEDS ANALYSIS]
  • 646575: Creating sandboxes causes leaks.  Good test case.  Unassigned.  [NEEDS WORK]
  • 650350: Problem with image element being held onto when image data has been released.  Bz said he would look at it.  Unassigned. [NEEDS ANALYSIS]
  • 650649: with only about:blank loaded, memory usage ticks up slightly.  Some discussion;  it may be due to the Urlclassifier downloading things.  If that’s true, it makes diagnosing leaks difficult. [NEEDS ANALYSIS]
  • 651695: Huge WebGL leak in the CubicVR demo.  Unassigned.  [NEEDS WORK]
  • 653817: Memory increase after opening and closing tabs.  A lot of discussion has happened, it’s unclear if the memory usage is due to legitimate things or if it’s an actual leak.  Assigned to me.  [NEEDS ANALYSIS]
  • 653970: High memory usage on an image-heavy site.  Comment 5 has a JS snippet that supposedly causes OOM crashes very quickly.  Unassigned.  [NEEDS ANALYSIS]
  • 654028: High memory usage on Slashdot.  Seems to be because Slashdot runs heaps of JavaScript when you type a comment.  Lots of discussion, seems to be due to bad GC heuristics and/or lack of generational GC?  Unclear if there’s an actual leak, or just delayed GC.  Unassigned.  [NEEDS ANALYSIS]
  • 654820: Leak in JaegerMonkey’s regular expression code generator caught by assertions.  Assigned to cdleary.  [NEEDS WORK]
  • 655227: Timers using small intervals (100ms or less) are never garbage collected(!)  Unassigned.  [NEEDS ANALYSIS]
  • 656120: Bug to do GC periodically when the browser is idle.  Assigned to Gwagner, has a patch.  [NEEDS WORK]
  • 657658: test_prompt.html leaks.  Unassigned.  [NEEDS WORK]

You can see that most bugs are marked as “[NEEDS ANALYSIS]”.  The size of the problem and the amount of developer attention it is receiving are not in proportion.

One thing I want to do is write a wiki page explaining how to submit a useful leak report, in an attempt to avoid the vague reports that never go anywhere.  But the improved about:memory is a big part of that, and it won’t land until Firefox 6.  I’m wondering if the about:memory changes should be backported to Firefox 5 in an attempt to improve our leak reports ASAP.

Another thing I’m wondering about is being more aggressive about closing old leak reports.  We have 624 open bugs that have the “mlk” keyword (including some recent ones that aren’t in the list above).  The oldest of these is bug 39323 which was filed in May 2000.  Surely most of these aren’t relevant any more?  It’s good to have a mechanism for tracking leaks (be it a keyword or a tracking bug) but if most such bugs are never closed, the mechanism ends up being useless.  I’d love to hear ideas about this.

Finally, I’d like to hear if people think this blog post is useful;  I’m happy to make it an ongoing series if so, though regular MemShrink meetings would be more effective.

 

Categories
about:memory Firefox Memory consumption

A better about:memory: stage 1.75

I just landed bug 657327, which makes about:memory simpler and more useful.  To understand the change, let’s look at what about:memory looked like before the change landed.

Old about:memory screenshot

The first thing to look at is the “mapped” entry at the top of the “Mapped Memory” tree.  It was meant to measure the total memory (both private and shared) mapped by the process.  But there were a couple of problems with it:

  • On Windows, it only measured the private bytes.  There’s no easy way I know of to measure the shared bytes as well.  This could lead to negative numbers in the output (bug 655642).
  • On Mac, the number includes an enormous amount of shared memory mapped.  If you have a Mac, run ‘top’ and look at the VSIZE column.  Almost every process has a value of 2GB or greater.  So the “mapped” value is really high, which looks bad, even though it’s not Firefox’s fault.
  • Even on Linux, where the amount of shared memory is smaller and so the “mapped” number is reasonable, it’s still not that useful, because it includes memory mappings like code and data segments that aren’t that interesting.

So, in summary, the very first number shown on about:memory was (a) incorrect on Windows, (b) misleadingly inflated on Mac, and (c) not much use on Linux.

The other thing to notice about the old about:memory is that there are two trees, “Mapped Memory” and “Used Heap Memory”.  For the purposes of this discussion, memory usage can be split into four groups.

  1. Explicitly allocated heap memory.  This is heap memory requested by Firefox through the heap allocation functions like malloc, calloc, realloc, and C++’s ‘operator new’.
  2. Implicitly allocated heap memory.  This is heap memory that has been freed by Firefox through the heap deallocate functions like free and C++’s ‘operator delete’, but which the heap allocator (e.g. jemalloc) has not yet handed back to the OS, for whatever reason.
  3. Explicitly allocated mapped memory.  This is memory requested by Firefox through OS-level allocation functions like mmap (on Linux and Mac), VirtualAlloc (on Windows), and vm_allocate (on Mac).
  4. Implicitly allocated mapped memory.  This is memory allocated by the OS that hasn’t been explicitly requested by Firefox.  It includes code and data segments (which are created when the executable and shared libraries are loaded) and thread stacks (which are created when threads are created).

In the old about:memory, 1 is shown in the “Used Heap Memory” tree, and 2, 3 and 4 are shown in the “Mapped Memory”.  But it’s 1 and 3 that we’re most interested in, because that’s memory that has been explicitly requested (and not yet freed) by Firefox.  That’s where most of the dynamic variation in memory usage occurs, and that’s where memory leaks occur.

The new about:memory reflects this better.

New about:memory screenshot

It has a single tree which only includes explicit allocations, and which does not distinguish between heap-level allocations (e.g. malloc) and OS-level allocations (e.g. mmap);  this shortens the output and reduces the amount of nesting in the tree.  Implicit allocations (2 and 4 above) are still covered, but only in the less-prominent “Other Measurements” list (under “vsize” and “heap-unused”).  And the “explicit” entry, the very first one, is now the single most interesting number on the page.  (Thanks to Jesse Ruderman for suggesting that I merge the two trees and flatten the resulting tree.)

One disadvantage of the new form is that some explicit OS-level allocations may not be accounted for.  (The full heap is always accounted for, thankfully.)  I’m in the process of adding more memory reporters for significant OS-level allocations (e.g. bug 546477).  Fortunately there doesn’t seem to be many.

Categories
Firefox Work habits

Working on the browser sucks

I’ve joked before that I hoped to never work on anything in Firefox that requires me to build the browser regularly.  That probably sounds weird unless you are a member of the JavaScript team, because it’s possible to do close to 100% of SpiderMonkey development just building the JS shell.

Working on the JS shell is great.  On my 2.5 year old Linux box it takes maybe 1 minute to build from scratch, rebuilds are almost instantaneous, the regression tests take a few minutes, shell start-up is instantaneous, you rarely have to do try server runs, tools like GDB and Valgrind are easy to run, and landing patches on the TraceMonkey repo is low-stress because breakage doesn’t affect too many people.

In comparison, working on the browser sucks.  Builds from scratch take 25 minutes, zero-change rebuilds take 1.5 minutes, single-change rebuilds take 3 or 4 minutes, the linking stage grinds my machine to a halt, cold starts take up to 20 seconds or more (warm starts are much better), the test suites are gargantuan, every change requires a try server run, tools like GDB and Valgrind require jumping though hoops (--disable-jemalloc, anyone?), and landing patches on mozilla-central is stressful.

Thanks to my recent about:memory work, I’ve had to experience this pain first-hand.  It’s awful.  Debugging experiments that would take 20 seconds in the shell take 5 minutes in the browser. I avoid ‘hg up’ as much as possible due to the slow rebuilds it usually entails.  How do all you non-JS people deal with it?  Maybe you just get used to it… but I figure there have to be some tips and tricks I’m not aware of.

(Nb: Why are rebuilds so slow?  Because configure is invoked every time?  Imprecise dependencies in makefiles?  Too much use of recursive make?  Bug 629668? Why do Fennec rebuilds seem to be slower than Firefox rebuilds?)

Kyle Huey told me how you can get away with rebuilding only parts of the browser.  Eg. if I only modify code under xpcom/, I can just do make -C <build>/xpcom && make -C <build>/toolkit/library and this reduces the rebuild time a bit.  The down-side is that when I screw it up, eg. by forgetting to rebuild a directoy that I changed, it creates Frankenbuilds, and realizing what I’ve done can end up taking a lot more time than I saved.

Another trick I worked out:  implement things in JavaScript.  Seriously!  While doing my about:memory revamp I originally had a big chunk of the work done on the C++ side.  I quickly realized that if I did most of it on the JavaScript side I could see the effect of most changes by simply copying aboutMemory.js from my source dir to my build dir and then reloading the page.  Much better than re-building and re-starting.

What else can I do?  Get a faster machine is the obvious option, I guess.  More cores would help, though linking would still be a bottleneck.  Do SSDs make a big difference?

Also, there’s talk of using more project branches and introducing mozilla-staging.  That would avoid the stress of landing on mozilla-central, but that’s really the smallest part of what I’m complaining about.

Any and all suggestions are welcome!  Please, I’m begging you.

Categories
about:memory Firefox Memory consumption

A better about:memory: stage 1.5

I just landed a bunch of changes to about:memory (bug 648490, bug 653630, bug 654041, bug 655638, bug 655583).  Mostly they just fix some minor problems;  if you’ve seen negative numbers in about:memory since the revamp hopefully you won’t any more!  (Please tell me or file a bug if you do.)

But there’s one cool new feature:

GC buttons

At the bottom of about:memory there are now three buttons.  Here are the “title” attributes for each one, which show up as tool-tips if you hover your  mouse over them, and explain what they do.

  • GC: Do a global garbage collection.
  • GC + CC: Do a global garbage collection followed by a cycle collection. (It currently is not possible to do a cycle collection on its own, see bug 625302.)
  • Minimize memory usage: Send three “heap-minimize” notifications in a row.  Each notification triggers a global garbage collection followed by a cycle collection, and causes the process to reduce memory usage in other ways, e.g. by flushing various caches.

As far as I know this is the first time users have been able to trigger GC and CC easily in a vanilla browser.  It’ll be particularly useful when analyzing memory usage, e.g. trying to determine if there’s a leak.  Often in that case you want to trigger a GC and/or CC to make sure that the memory stats aren’t currently inflated by dead objects, and it’s now really easy to do so.

On a related note: it’s important that the memory reporters used to generate about:memory be correct, and that the memory be categorized the right way.  (Otherwise you can end up with nonsensical output like the negative numbers I mentioned earlier.)  For example, I just discovered that the JavaScript heap can be allocated on the heap or directly via mmap/VirtualAlloc, depending on whether MOZ_MEMORY is defined or not (see bug 656520).  On Mac, MOZ_MEMORY is not defined (because it currently doesn’t use jemalloc) and so the GC heap was incorrectly being categorized under “Used Heap Memory” instead of the “Mapped Memory”.

I’ve checked all the reporters as best as I can.  I’m pretty confident now that all the JS and storage (SQLite) reporters are correctly categorized as “heap” or “mapped”.  I’ve looked at the others and I think they’re right, but I’m not totally certain.  More specifically, the reporters in the following screenshot are currently categorized as “heap” (i.e. allocated with oe of: malloc, calloc, realloc, posix_memalign, operator new, operator new[]).  If anyone knows that to be false, I’d love to hear about it.  In particular, I’m worried about image-related memory that might be stored in video RAM;  I already adjusted two reporters (gfx-2d-surfacecache and gfx-2d-surfacevram) for this reason.

about:memory screenshot
Thanks!

Categories
Firefox Programming Software Engineering

Duplicated abstraction layers in Firefox

Just about every operating system provides a mechanism for directly allocating and deallocating memory at the page level (ie. not malloc/free or new/delete).  The functions to do this vary from OS to OS:

  • Windows: VirtualAlloc/FreeAlloc.
  • Posix (e.g. Mac and Linux): mmap/munmap.
  • Mac also has: vm_allocate/vm_deallocate.

So it’s very natural to add an abstraction layer: your own functions (let’s call them Map and Unmap) that use conditional compilation to choose the appropriate OS-specific call.

An abstraction layer like this appears in lots of software projects.  Firefox happens to incorporate code from a lot of other projects, and so what happens is you end up with lots of duplicate abstraction layers.  For example, in the JS engine alone we have five Map/Unmap abstraction layers.

  1. In js/src/jsgcchunk.cpp, used to allocate chunks for the GC heap.
  2. in js/src/vm/Stack.cpp, used to allocate some stack space.
  3. In js/src/nanojit/avmplus.cpp, used to allocate space for code generated by the trace JIT.
  4. In js/src/assembler/jit/ExecutableAllocator*.cpp, used to allocate space for code generated by the method JIT.
  5. In js/src/ctypes/libffi/src/dlmalloc.c, used to allocate chunks of memory that are handed out in pieces by the heap allocator defined in that file.

The duplication of 3, 4 and 5 are understandable — they all involve large chunks of code that were imported from other projects.  (Furthermore, you can see that ctypes/ has its own heap allocator, thus duplicating jemalloc’s functionality.)  The duplication between 1 and 2 is less forgiveable;  neither of those cases were imported and so they should share an abstraction layer.

How many other Map/Unmap abstraction layers are there in the rest of Firefox?  The JS engine may be more guilty of this than other parts of the code.  Is there a sane way to avoid this duplication in a world where we import code from other projects?

Categories
Firefox Memory consumption

Another leak fixed, part 2

I recently wrote about bug 654106, a memory leak that has been fixed.  In bug 653817 the reporter made some measurements that show this leak was quite a bad one.  The reporter measured “Uss” using procrank on an Android device.  This page says:

“Uss is the set of pages that are unique to a process. This is the amount of memory that would be freed if the application was terminated right now.”

Comment 19 and comment 24 have the numbers before and after the leak was fixed.  The reporter opened Firefox (with a single tab containing about:memory), measured the memory usage, then opened 8 popular sites, re-measured several times, then closed them all (except about:memory), re-measured, then re-opened them, and so on through several cycles.  The following table shows the key measurements from the first cycle.

Before After
Start-up 47,972 KiB 48,700 KiB
Open 8 tabs, wait 90 minutes 251,844 KiB 240,064 KiB
Close 8 tabs 226,328 KiB 108,908 KiB

These measurements have some noise, so don’t read too much into the minor differences.  The important difference is the last row;  the Uss after closing the 8 content tabs was 2.1x smaller after fixing the leak!

So, this is a great leak to have fixed.  But I have several concerns remaining.

  • It’s worrying that such a bad leak was able to get into Firefox 4.0 and remain undetected for this long.  My understanding is that we have various kinds of automatic leak detection tools, but I don’t know much about them, why they might not have detected this, and whether they could be improved.
  • The Uss after closing the 8 tabs is 2.2x higher than at start-up.  That seems high.  One thing I’ve been trying to understand lately is what kind of memory usage can legitimately remain when a lot of tabs have been closed and there’s only one left.  Obviously there’s a bunch of chrome stuff, but when I look at detailed profiles it’s hard for me to tell what things fall into that category and what doesn’t.  (One thought I had was that it might be worth doing some profiling on Mac, because it’s possible on Mac to close all browser windows without closing the browser itself.  Would all this chrome memory still remain in use in this case?)
  • Each time the bug reporter re-did the open/close cycle, the Uss after closing the tabs crept higher.  In the post-fix run, it was 108,909KiB the first time through, but the next three times through the figure was 121,552 KiB, 123,692 KiB, 127,588 KiB.  That smells like another leak (or more than one).

I read a lot of browser-related threads on tech websites.  They almost always descend into slanging matches where people explain why browser A is awesome and browser B sucks.  My perception from these threads is that memory leaks (be they real or perceived) are one of the things people complain about most with Firefox.  This is usually based on a measurement similar to the one described above — the person browses for a while, closes all their tabs except one, and memory usage is still high.  I’d love to hear any ideas people have about how to improve things on this front.