Categories
Firefox Memory consumption MemShrink

MemShrink progress, week 89–90

IMAGES

The big news this week is that Timothy Nikkel finished bug 689623, which causes us to (for the most part) only decode images that are visible or nearly visible.  This goes a long way towards fixing our problems with decoded images which happens to be the #1 item on the MemShrink big ticket items list.  The fixing of this bug also happened to fix the following bugs in passing:  661304, 682230.

To give you an idea of what changed, consider an image-heavy page like this one.  Prior to Timothy’s change, shortly after loading that page on my machine Firefox’s memory consumption would reach 3 GiB, with 2.7 GiB of that being decoded (uncompressed) images.  If I switched to another tab then 2.45 GiB of those decoded images would be immediately discarded, then after 10–20 seconds the remaining 250 MiB would be.  If I switched back, all the images would immediately be decoded again and memory consumption would climb back to 3 GiB.

After Timothy’s change, Firefox’s memory consumption still peaks around 2.8 GiB just after loading the page, but then quickly drops back to 600 MiB as many of those decoded images are discarded.  After that, even when scrolling quickly through the page I couldn’t get memory consumption to exceed 700 MiB.

This change will make such pages much smoother on desktop machines that don’t have lots of RAM, because there won’t be any paging, which slows things down drastically.   Andrew McCreight wrote a comment that I can’t find now, but basically he said that on one of his machines, that page was extremely slow and painful to scroll through before the change, and after the change it was smooth and pleasant.

The peak that occurs when an image-heavy page first loads is still undesirable, and bug 542158 is open for fixing that, and it will build on the infrastructure that Timothy has developed.  When that’s fixed, it’ll also greatly reduce the likelihood that an image-heavy page will cause an out-of-memory (OOM) crash, which is great for both desktop (esp. 32-bit Windows) and mobile/B2G.

Miscellaneous

Nicolas Pierron fixed a leak in IonMonkey that was causing runaway memory consumption for Shumway demos on Android.  It has been backported to the Aurora and Beta channels.

greatly reduced the size of the property cache.  This is a data structure the JS engine uses to speed up property accesses.  It used to be important, but is now barely necessary thanks to type inference and the JITs.  This reduced memory consumption by 75 KiB per JSRuntime on 32-bit, and by 150 KiB per JSRuntime on 64-bit.  There is one JSRuntime for each process, plus one for each worker, so for B2G it saves around 525 KiB at start-up.

We declared victory in the bug tracking the memory consumption of not-yet-loaded tabs.  When the bug was opened it was over 1 MiB per tab (64-bit builds), it’s now around 200 KiB.  Furthermore, if someone has many (e.g. hundreds) of not-yet-restored tabs they’re probably on a high-end machine, so the motivation for further reducing the memory consumption is low.  There is a bug open about some scripts run for each of these tabs, however;  avoiding these scripts may make start-up faster for heavy tab users.

Marco Bonardo reduced the maximum size of the cache used by each SQLite connection (excluding places.sqlite) from 4 MiB to 2 MiB.  Update: Marco corrected me in the comments — there are four places.sqlite connections, but only one of them (the one for the awesome bar) is excluded from the new 2 MiB limit.

Somebody fixed a leak in the Roboform add-on.  The fix is present in v7.8.7.5.

Olli Pettay improved our memory reporting by adding extra information to some system compartments that previously were undistinguished.  Our experience has been that as soon as somebody adds a new memory measurement, it finds a problem, and this was no exception, leading to the filing of bug 844661 .

Jonathan Kew has been busy.

I wrote a script that can take a diff of two JSON memory report dumps.  This will be very useful when comparing the memory consumption before and after doing something, for example.  If you didn’t know that JSON memory report dumps existed, you’re probably not alone — it’s an undocumented feature that’s currently only available on Linux and Android, and is triggered by sending signal 34 to the browser(!)  These dumps can be loaded by about:memory (see the buttons at the bottom) but there’s currently no easy way to trigger them, which is why today I filed a bug to make it easier.

I removed the pool used for recycling nsNodeInfo objects, which was causing them to never be released back to the OS.  Never-shrinking structures like this can become a problem in longer-running browser sessions — for example, after running MemBench, which opens and closes 150 tabs, this pool was unnecessarily hoarding 4 MiB.

INteresting Open Bugs

In my MemShrink reports I mostly write about bugs that have been fixed.  While this is satisfying and demonstrates progress, it might also give the impression that the MemShrink team has everything under control, when really we could use some help.

(In fact, the term “the MemShrink team” is something of a misnomer since there isn’t any such entity, officially.  “The people who regularly attend MemShrink meetings” would be a more accurate term.  But I’ll use “MemShrink team” as a short-hand for that.)

The MemShrink team has expertise in some areas of Mozilla code, such as DOM, cycle collector, JS engine (partial), Fennec, B2G, and memory profiling, and we tend to make good progress in those areas — we can fix bugs in those areas, and we generally pay attention to how these areas affect memory consumption.

But we lack expertise in areas like graphics, image-handling, layout, text rendering, storage, Jetpack/add-ons, and front-end.  Graphics is a particular problem, because graphics issues, esp. those involving drivers, can cause huge memory consumption blow-ups.  Bug 837542 is an example from the MemShrink:P1 list where gradients are somehow causing memory consumption to spike by 10s or even 100s of MiBs just by opening and closing a few tabs!  We triage bugs like that as well as we can, but often we’re just guessing, and we’re mostly helpless to address such problems.

Therefore, moving forwards I’m going to start mentioning interesting open bugs that don’t have someone working on them.

One example is bug 846173, which I filed when I noticed that fully loading the front page of TechCrunch takes over 100 MiB!  And it’s mostly because of the many Facebook “like” buttons, Google “+1” buttons, and Twitter “tweet this” buttons — see the about:memory output for the full gory details.  It’s obvious that most of these window objects are basically identical, except for the URL.  Could we do something clever to avoid all this duplication?  Justin Lebar wondered about a copy-on-write scheme.  Are there other ways we could improve this case?

Another example is bug 842003, which is looking for an owner.  Some basic leak-checking infrastructure in the IPC code could potentially detect some big problems for B2G.

In bug 842979 we’re seeing 10s or 100s of MiBs of orphan DOM nodes in long-running Gmail sessions.  It’s not clear if this is a bug in Gmail, or caused by add-ons, or something else.  Please comment in the bug if you have any additional data.

Another one:  DMD, which is one of our most important tools, is pretty much useless on Fennec because it can’t get stack traces.  If anyone who knows about such things would like to investigate, that would be very helpful.  (In fact, I got a crash in the stack tracing code when I most recently tried to run DMD on Mac, which I haven’t had time to investigate.)

Bug Counts

Here are the current bug counts.

  • P1: 12 (-6/+1)
  • P2: 128 (-7/+7)
  • P3: 122 (-0/+5)
  • Unprioritized: 6 (-2/+6)

The closed P1s were 661304, 689623, 837187, 841976, 841993, 842756.  Three of them related to an IPC leak fix that I mentioned in my last report.

30 replies on “MemShrink progress, week 89–90”

Just a clarification, the SQLite connections reduction also includes places.sqlite, what is excluded is the single connection that drives the Awesomebar. Places.sqlite has 4 connections for concurrency/threading reasons, 3 of those were affected by the change, the Awesomebar connection is instead capped to 6MiB, due to its current architecture we can’t go below this limit right now.

Thanks for the correction! I’ve updated the post text accordingly.

“….Furthermore, if someone has many (e.g. hundreds) of not-yet-restored tabs they’re probably on a high-end machine, so the motivation for further reducing the memory consumption is low..” Not sure if this is right assumption.

I have generally 80-120 tabs open normally. Its because of testing few things or read-it-later or “following link” ….

I think its not very unusual case. I have seen many people do int this.

I have Macbook Pro. 4G RAM standard hdd.

My FF instance is always 1GB plus even if I have <50 tabs. Mostly I end-up killing FF (as close takes eternity) ….I would be happy to help mozilla team to trace issue. Please let me know how I can. For almost a year tracking MemShrink posts, I still don't see any improvement in my day to day browsing experience. I don't really remember when last time used close firefox button (always kill).

Its difficult task to remember which tabs to close….I would suggest something like chrome where it re-renders content when switching back content. I think similar is implemented for FF on Android.

There have actually been a number of big improvements recently to shutdown speed, coming in the next version or two, so hopefully you’ll see some improvement there!

Tons of tabs are common for power users, but among the Firefox population as a whole, they are very uncommon. Most people only ever have something like 6 tabs open.

Let’s say you have 100 unloaded tabs. They’ll take up about 20 MiB. That 20 MiB is a tiny fraction of your 1 GiB+ usage. That’s why it’s not a priority. It used to be 100+ MiB, which was a significantly larger fraction.

Congrats to all on this latest batch of hard work, Timothy’s heavy image page handling sounds very impressive, especially since sites like Google images should be much smoother now.

However, the news about the MemShrink skills shortage is very concerning. Is there something Mozilla management can do about this? It seems strange that an organization about to start selling low spec, 256MB hardware lacks a tool that identifies unexplained memory usage on the same platform, which appears to be the case with the DMD problem on fennec.

There’s been a lot of work on improving the memory usage of FirefoxOS. DMD works there (and has identified a lot of things to fix!), just not on Fennec (which is Android). I think it is mostly a matter of the fact that people who know a lot about the low level details on mobile systems are focused on FirefoxOS right now.

excuse me: *interesting
Also, do you guys plan to dive into the memory usage of pdf.js? I recall reports of 600-800 MiB memory usage for displaying relatively small PDF files, and that doesn’t sound very good, especially now that pdf.js is the default for file opening.

There are a number of issues, despite how cool it would be. AreWeFastYet doesn’t actually run anything in a browser, it uses JS shell builds, which are easy to run from a scripted command line. Getting any kind of test working across multiple browsers would be a lot of work. I would guess that they’d each require their own automation framework. Not to mention that other browsers don’t have an about:memory that can produce output in a JSON format, as far as I’m aware. Additionally, comparing memory usage across browsers is likely to be a trickier affair than time to complete a benchmark.

AWSY’s test (i.e. the opening and closing of tabs) is driven by a Firefox-only extension (the Mozmill endurance tests).

If there is actually just a bunch of people who regularly demonstrate their personal interest in MemShrink, but no actual team, then “the MemShrink team” seems misleading to me, as it implies a level of organisation – of institution – that doesn’t exist. How about “the MemShrinkers”? 🙂

That congressoamericano sample page you linked to leads to a 100% reproducible crash on Firefox 20 on Windows on a fast machine with 16GB RAM. Is that the expected behavior? The crash reports say “[@ EMPTY: no crashing thread identified; corrupt dump ]”. I’m assuming this means it’s an OOM crash.

It’s definitely not expected behaviour! A crash never is.

I don’t know what the crash report message means, but I wouldn’t assume it’s an OOM crash. It shouldn’t be, on a machine with 16 GiB of RAM.

I switched to a blank profile and visited the page and it did not crash, but it did not display correctly at all. When the page is scrolled, random rectangular portions are not repainted; sometimes the entire viewport is black. Further investigation reveals that it was running right up against the 4 GB virtual address limit of a 32 bit process with only about 100 MB to spare. Having 16 GB of ram isn’t really going to help if the process only has 4 GB of VA. With the blank profile it didn’t crash, but many things stopped working — for example going into the Options menu, many icons were missing and replaced with black rectangles. So it seems that sometimes Firefox soldiers along in the face of out-of-memory conditions and sometimes it crashes, and the presence of my enabled addons in my normal profile put it over the edge into crash territory.

Oh, right. I temporarily forgot about the 4 GiB virtual limit on Windows. So this is exactly what I was talking about in this paragraph:

“The peak that occurs when an image-heavy page first loads is still undesirable, and bug 542158 is open for fixing that, and it will build on the infrastructure that Timothy has developed. When that’s fixed, it’ll also greatly reduce the likelihood that an image-heavy page will cause an out-of-memory (OOM) crash, which is great for both desktop (esp. 32-bit Windows) and mobile/B2G.”

I’m not surprised that things get flaky when memory gets tight like that. It’s not a situation that gets much testing, unfortunately.

It’s not a situation that gets much testing, unfortunately.

I’m sad to hear that, given that probably 9 out of 10 Firefox users are Windows users, and that Mozilla has expressed such staunch distaste for ever releasing an official 64 bit Windows build. Even the mere idea that a couple of power users might be using the 64 bit Windows nighty builds that they found secreted away in some hidden corner of the ftp site seemed to be so upsetting and intolerable that it required a plan to end the availability of such builds. (At least until the screams of users became too deafening.)

@anon

Lets look at what Mozilla has been up to for the last year and a half:
-Created Firefox for Android phone
-Created Firefox for Android tablet
-Recreated Firefox for Android phone without XUL
-Recreated Firefox for Android phone without XUL
-Created FirefoxOS (& FOSS Sim, Marketplace, OMTC, webapps)
-Created Firefox for Metro
-Ionmonkey

These are some major achievements (and all more significant then win64).

However, I would not be surprised if there was renewed energy for win64 in the not too distant future, once things settle down a little bit.

I just came across the same Add-On, from the ghacks article linked below. It is working on Aurora at the moment. I just booted a new Add-On that I had installed seconds before installing about:addon-memory. No waiting around for days or weeks to determine why my memory usage has increased! The Add-On added 10 MB of memory just to display a status bar set of shortcut links (with icons) to all about: pages. Handy idea but not at that sort of memory cost.

Great news that this is being built into Firefox! Such a tool has been a long time coming but is better late than never!

Short question ─ is a 66.79% heap-committed-unused-ratio normal after a longish session having closed most tabs?

I usually also see big disparities where the resident/private memory (which is roughly what the OS reports) is a lot higher than the explicit count (and the vsize is off the charts):
0.00 MB ── canvas-2d-pixel-bytes
254.11 MB ── explicit
0.06 MB ── gfx-d2d-surfacecache
4.73 MB ── gfx-d2d-surfacevram
7.42 MB ── gfx-d2d-vram-drawtarget
0.81 MB ── gfx-d2d-vram-sourcesurface
0.97 MB ── gfx-surface-image
0 ── ghost-windows
152.77 MB ── heap-allocated
254.88 MB ── heap-committed
102.07 MB ── heap-committed-unused
66.79% ── heap-committed-unused-ratio
2.54 MB ── heap-dirty
427.19 MB ── heap-unused
0.03 MB ── images-content-used-uncompressed
95.00 MB ── js-gc-heap
0 ── low-commit-space-events
471.57 MB ── private
486.89 MB ── resident
0.00 MB ── shmem-allocated
0.00 MB ── shmem-mapped
15.11 MB ── storage-sqlite
1,200.66 MB ── vsize

Not a huge problem in practice but it seems a bit off.

66.79% isn’t unusual for that scenario.

And yeah, resident typically outgrows explicit after closing lots of tabs. It’s largely due to fragmentation. Also, there are some never-shrinking structures, which can be bad. I’ve been looking at some profiles of this case recently, and https://bugzilla.mozilla.org/show_bug.cgi?id=847210 was one example of a never-shrinking structure that I fixed.

Sorry didn’t see that, forgive me for being slow. That’s great news though. Are you hoping that the functionality to land on the Nightly channel before April 2nd?

As many people said before, the work you are doing is very much appreciated!

I’m asking out of curiosity: When looking at areweslimyet.com, I get the impression that since about one year ago, memory consumption has been slowly slowly climbing up. Why is this? Is it because of new features added to the browser? Is it because many of the recent memshrink fixes are not measured by the benchmark on awsy?

Comments are closed.