Since the release of Firefox 4 we’ve been working again to bring multi-process content support to Firefox. I thought that it would be good to write a post to try to lay out some of the reasons why we’re doing this. Although it might be obvious to many people, it’s good to actually lay it out on paper so that we have a clear understanding of why we’re doing something. It helps us determine what to prioritize as well as helps us measure when we’re ready to ship.
There are several areas listed below – performance, multi-core, memory behaviour, etc. For each of these areas there’s still a lot of work to do outside of the scope of the multi-process work. What this means is that every release of Firefox will get faster, more stable and has better multi-web page interactive performance even if it doesn’t include support for multi-process. But we know that in order to get across some hurdles we’re going to need to invest in multi-process model. That’s what this post is about. Multi-process is not a panacea for any of these topics, but it does give us a leg up on some of the more systemic problems.
In the case of Electrolysis we’re not talking about the kind of performance that’s usually referenced in the press or is the subject of benchmarks. What we’re really talking about with multi-process performance is responsiveness:
- How long does it take for a mouse click to be recognized?
- When you resize the window does it feel smooth?
- Does the browser mysteriously pause from time to time?
- Are animations smooth, without pauses?
These are all examples of measurements when building a responsive browser. At a basic level we’re talking about making sure that the main UI of the browser isn’t away from the mainloop for more than fifty milliseconds. We’ve made great strides here, and Firefox 5 is a great browser from a responsiveness standpoint. But we know that if we want to separate chrome and content concerns that we’re going to have to go to multi-process.
This is due to two reasons:
The cost of garbage collection goes up as the heap size of your process goes up.
Garbage collection in content causes pauses in the main UI
Sometimes content gets large. Big web applications like gmail, facebook and twitter (yes, twitter is actually a pretty big web app) cause memory and garbage collection events to happen often. When they do, for reasons stated above, they still block the chrome. Compartments mitigate much of the pain here, but even if it’s for short periods of time, little pauses add up, and the user can feel them. We’d like to make sure that garbage collection for pages doesn’t really affect the main UI.
You can start to see a preview of tools for measuring responsiveness in one of Ted’s posts. Our investment in tools is happening along side of the multi-process work so we’re able to measure if we’re making progress in overall browser responsiveness. Those tools still need to be “productized” but since responsiveness is our primary metric and purpose for multi-process Firefox, we need to measure so we know we’re actually making forward progress.
Support for multi-core machines
This doesn’t mean that we don’t use threads throughout the browser. The networking stack, image decoding, much of our I/O, video and audio decoding and all kinds of other things are threaded and off the main loop of the browser. But the content itself is required to be single threaded.
Computing is quickly moving to a multi-core model. The speeds of processors aren’t increasing as much as they have been in the past, largely due to the constraints imposed by power and heat as well as the move to mobile. Basically everyone at this point has a multi-core processor on their desktop or laptop. And multi-core processors are starting to show up in mobile devices as well.
So one of the easiest ways to take advantage of multiple processors is to have each DOM assigned to its own processor, and the easiest way to do that is to have a few processes that can each be assigned to their own CPU.
We’re also investing in longer-horizon projects to examine what a multi-threaded DOM and layout engine might look like, but those are far enough away and risky enough where we know that there will be lots of value in building a multi-process browser for quite a while.
Predictable memory behaviour
Although we’ve made vast improvements to memory handling since the release of Firefox 4, we’re still faced with the fundamental problem of memory fragmentation. Because we’re based on C and C++, objects in our graph are often not relocatable. Over the long term, heap allocation will grow and cause memory to “leak.” This isn’t a problem that’s specific to Firefox. Just about every long-running process with even mildly complex allocation patterns suffers from this problem.
You can see this in the difference between the system memory reporting tools and what the internal allocator reports as memory that’s allocated. That “missing memory” is sometimes memory held in reserve, but they are often enough holes brought about by memory fragmentation. We also do some larger allocations in anonymous memory maps, but most small allocations still happen in pools that are allocated on the heap.
Physical pages of memory are allocated at the operating system layer and handed to user processes, at the process level, as virtual pages. The best way to return those to the operating system is to exit the process. It’s a pretty high-level granularity for recycling memory, for for very long-running browser sessions it’s the only way to get predictable memory behaviour. This is why content processes offer a better model for returning memory to the operating system over time.
We introduced protection from crashes in plugins with the release of Firefox 3.6.4. We implemented it because of the reliability problems that plugins – in particular Flash – were suffering from. Crashes in Flash were causing overall browser stability problems, and reflecting poorly on Firefox’s perceived reliability.
Although the number of crashes caused by content is relatively small – on the order of 1-2 crashes per 100 users per day – crashes that can be contained to the content processes are easier to identify, easier to diagnose and don’t take down the entire browser.
There’s also another nice benefit to having content processes. When there’s a crash, it’s much easier to tell what site caused the crash. In a single-process model, you can guess based on all of the sites that a person has open, but it could be any of them, and you have to look at a large sample of data and correlate sites to crash signatures to see patterns in the data. With a single tab (or small group of tabs) the number of sites is reduced so the crash can be more easily identified.
Sandboxing for security
The last goal that we have for adding support for multiple content processes to Firefox is for security. Some operating systems now have the ability to put a process into a “low rights mode” where they can’t access a lot of system resources. This means that even if there is a security problem in a content process, that the amount of damage that content process can do is limited to what the sandbox allows.
This system is imperfect, of course. Having the ability to talk to the “more privileged” chrome process can still result in exploits that have raised permissions. And it doesn’t protect one web site from another malicious web site. But it is a positive step forward, and is well-worth the investment.