about:memory B2G Memory consumption MemShrink

Nuwa has landed

A big milestone for Firefox OS was reached this week: after several bounces spread over several weeks, Nuwa finally landed and stuck.

Nuwa is a special Firefox OS process from which all other app processes are forked. (The name “Nuwa” comes from the Chinese creation goddess.) It allows lots of unchanging data (such as low-level Gecko things like XPCOM structures) to be shared among app processes, thanks to Linux’s copy-on-write forking semantics. This greatly increases the number of app processes that can be run concurrently, which is why it was the #3 item on the MemShrink “big ticket items” list.

One downside of this increased sharing is that it renders about:memory’s measurements less accurate than before, because about:memory does not know about the sharing, and so will over-report shared memory. Unfortunately, this is very difficult to fix, because about:memory’s reports are generated entirely within Firefox, whereas the sharing information is only available at the OS level. Something to be aware of.

Thanks to Cervantes Yu (Nuwa’s primary author), along with those who helped, including Thinker Li, Fabrice DesrĂ©, and Kyle Huey.

19 replies on “Nuwa has landed”

I was wondering, would it be possible to do an about:memory report for the Nuwa process and then simply subtract that memory from the other processes? It’s a bit of a hack, but..

Alas, no. When a new process is first forked from the Nuwa process, this would work. However, the new process will then modify some of the pages that it shares with the Nuwa process, and those pages will be duplicated. So you end up with a mix of shared and duplicated pages, and tracking these is difficult.

Piggybacking on this post since the old one is locked for comments. When you announced exact rooting landed a few weeks ago it was described as a base that GGC could be built on. However as shown a GGC performance line since June of 2013. Do you know if they’re jumping the gun in the graph label; or is this another case of the standalone JS engine being well ahead of the browser as a whole.

Work on the GGC part has actually been going on for a while. (For quite some time, the exact rooting was close enough to being finished that this was possible.) You can build a trunk Firefox with –enable-gcgenerational if you want to try it.

GGC sort of works, it just isn’t polished enough to be turned on by default, due to various cases where it crashes etc.

If I understand correctly, the preallocated process reduces startup time by spinning up a new Gecko process before it is needed, but that new process is created like any other, and so it isn’t sharing any memory with any other Gecko processes.

There was some weird interaction between the two, and maybe Nuwa makes spawning a new Gecko process fast enough that the preallocated process isn’t needed any more, but I don’t know how that actually ended up.

There was a single preallocated process at any time. When a new app was launched, that preallocated process was converted into the process for the app, and a new preallocated process was created. This helped reduce start-up latency, but had not effect on memory consumption.

In contrast, there is a single, eternal Nuwa process, and every launched app is forked from it. This saves memory due to the sharing. I think it also help with start-up speed in a similar fashion to the pre-allocated app, because the Nuwa process has already done some of the initialization.

Actually we still have the preallocated process to reduce app startup time, but it is now forked from the nuwa process. There are still a few refinements to come from to get a minimal cow overhead, but we are in a pretty good shape now.

From the description, it sounds like the only chance of that even being possible would be on Linux and only after multiprocess lands.

Can it be switched on and off, so it could be switched off for people doing memory work? (Presumably a leaky app leaks just as much whether it’s created by Nuwa or the old way.) Or would such a switch increase complexity and code to maintain?

You probably want to report the sum of Pss in ‘/proc//smaps’. For each region that number indicates the size of the region divided by the processes having it mapped.

This might be a bit quirky, since it might rise and fall for a given proces without that process doing anything, but you can at least give a good estimate about how memory is distrubute between workers.

You could also sum all regions whose ‘Rss’ matches their ‘Pss’ as a way to discover if a region is shared or not.

The hard part is combining the page-level data from smaps with the much-finer-grained data from memory reports. E.g. you’d have to read smaps, build up a data structure indicating which pages are shared and which aren’t, and then for every single heap block and mapping from mmap that gets reported, look up this structure to work out what fraction of it should be considered owned. And there are a *lot* of such blocks and mappings, and the reporting of them is spread all throughout the codebase. It’s possible, but would be a very large and invasive change, and would slow memory reporting down.

Comments are closed.