sccache, Mozilla’s distributed compiler cache, now written in Rust
November 21st, 2016
We build a lot of code at Mozilla. Every time someone pushes changes to the code that makes up Firefox we build the application on multiple platforms in a variety of build configurations. This means that we’re constantly looking for ways to make the build faster–to get faster results from our builds and tests and to use less machine time so that we can use fewer machines for builds and save money.
A few years ago my colleague Mike Hommey did some work to see if we could deploy a shared compiler cache. We had been using ccache for many of our builds, but since we use ephemeral build machines in AWS and we also have a large pool of build machines, it doesn’t help as much as it does on a developer’s local machine. If you’re interested in the details, I’d recommend you go read his series of blog posts: Shared compilation cache experiment, Shared compilation cache experiment, part 2, Testing shared cache on try and Analyzing shared cache on try. The short version is that the project (which he named sccache) was extremely successful and improved our build times in automation quite a bit. Another nice win was that he added support for Microsoft Visual C++ in sccache, which is not supported by ccache, so we were finally able to use a compiler cache on our Windows builds.
This year we started a concerted effort to drive build times down even more, and we’ve made some great headway. Some of the ideas for improvement we came up with would involve changes to sccache. I started looking at making changes to the existing Python sccache codebase and got a bit frustrated. This is not to say that Mike wrote bad code, he does fantastic work! By nature of the sccache design it is doing a lot of concurrent work and Python just does not excel at that workload. After talking with Mike he mentioned that he had originally planned to write sccache in Rust, but at the time Rust had not had its 1.0 release and the ecosystem just wasn’t ready for the work he needed to do. I had spent several months learning Rust after attending an “introduction to Rust” training session and I thought it’d be a good time to revisit that choice. (I went back and looked at some meeting notes and in late April I wrote a bullet point “Got distracted and started rewriting sccache in Rust”.)
As with all good software rewrites, the reality of things made it into a much longer project than anticipated. (In fairness to myself, I did set it aside for a few months to spend time on another project.) After seven months of part-time work on the project it’s finally gotten to the point where I’m ready to put it into production usage, replacing the existing Python tool. I did a series of builds on our Try server to compare performance of the existing sccache and the new version, mostly to make sure that I wasn’t going to cause regressions in build time. I was pleasantly surprised to find that the Rust version gave us a noticeable improvement in build times! I hadn’t done any explicit work on optimizing it, but some of the improvements are likely due to the process startup overhead being much lower for a Rust binary than a Python script. It actually lowered the time we spent running our configure script by about 40% on our Linux builds and 20% on our OS X builds, which makes some sense because configure invokes the compiler quite a few times, and when using ccache or sccache it will invoke the compiler using that tool.
My next steps are to tackle the improvements that were initially discussed. Making sccache usable for local developers is one thing, since Windows developers can’t use ccache currently this should help quite a bit there. We also want to make it possible for developers to use sccache and get cache hits from the builds that our automation has already done. I’d also like to spend some time polishing the tool a bit so that it’s usable to a wider audience outside of Mozilla. It solves real problems that I’m sure other organizations face as well and it’d be great for others to benefit from our work. Plus, it’s pretty nice to have an excuse to work in Rust. 🙂 You can find the code for the rewritten sccache on GitHub.
Overall I’ve really enjoyed the experience of working in Rust on this project. Compared to working on the Python version of the tool it was nice to have static typing to catch mistakes I made in the compile phase. I’ve really grown to love Rust as a language, I miss things like the match expression when I’m working in other languages now! There were certainly some growing pains–I hit a few cases where the crates.io ecosystem just didn’t have something I expected, or the Rust standard library was missing a feature I needed, but those were not very common occurrences for me. I would definitely reach for Rust again for a project like this!
Gamepad API Shipping in Firefox 29
April 29th, 2014
Firefox 29 ships today, and with it ships an implementation of the Gamepad API. This is the culmination of a few years’ worth of work for me so it’s very exciting to see it ship!
If you missed it, I wrote an article on the Gamepad API over at hacks.mozilla.org. It has information on how to use it as a developer as well as some code samples. Go check it out if you’re interested in using this API!
If you’re not a developer, but you have a gamepad laying around, I have a couple of games you can play:
- Combat is an homage to Atari 2600 Combat that I wrote. It gets pretty fun if you have multiple gamepads laying around and you can play against your friends. (There’s also keyboard controls if you don’t have a gamepad, and a computer-controlled tank to play against if you don’t have friends.)
- Boxes Wot Shoot is a really slick game written by Scott Graham, the co-editor of the Gamepad API spec. It requires a gamepad with dual analog sticks to play, but it’s super fun.
These games work great in the release versions of both Firefox and Chrome, which is pretty awesome to see. Even more awesome is that Microsoft seems to be working on an implementation, listing the Gamepad API as “in progress” on their IE progress tracker. Perhaps you’ll be able to play these web games on your Xbox One in the future. 🙂
A quick note on gamepad support: it varies from platform to platform. Most USB gamepads will work on any platform. The notable counter-examples are the DualShock 3 (the PlayStation 3 controller) and the Xbox One controller. You can get third-party drivers to make them work, but it’s a little shaky. Your best bet is likely to be the Xbox 360 controller (unless you’re on a Mac), the DualShock 4 (PlayStation 4 controller), or a cheap off-brand USB controller (they can be had for $15 in many places).
Happy gaming!
Gregory Szorc is now the Build Config module owner
March 7th, 2013
Effective immediately Gregory Szorc (gps) is now the Build Config module
owner. Greg joins a storied list of module owners who touched the build
system too much and got stuck owning it until they could find someone
else to pawn it off on.
Greg has been leading the way technically in build system work for a
while now, so in my mind this is more of a formality than anything else.
You have undoubtedly already interacted with the great work he’s been
doing: mach, the fantastic new front-end driver for the build system, or
moz.build, our Makefile replacement for build system data. I expect that
this will continue, so making him module owner was an easy decision for me.
I will continue to serve as a peer of the Build config module, but Greg
will now have final say on all questions regarding build system work.
Prettier Mercurial output
August 20th, 2012
I don’t use git on a daily basis, but I’m a fan of how its output is both colorful and run through a pager (like less) if necessary. As it turns out, Mercurial (which I do use daily) ships with all the functionality you need to replicate this behavior, it’s just not enabled by default. You need to enable the color and pager extensions (which ship with Mercurial), so just a few lines in your ~/.hgrc will get you there:
[extensions] color = pager =
[pager] pager = LESS=FRSXQ less quiet = True
The pager configuration simply makes less handle the colored output properly, and also behave like cat if the output fits on the screen.
I frequently find myself using hg export to view the contents of patches in my mq, so I also added an alias that I could use instead to get nice colored and pagered diffs:
[alias] show = log --patch --verbose --rev
[pager] attend = diff,status,log,qdiff,blame,annotate,pdiff,glog,show
The pager configuration is necessary because by default pager only works on a whitelist of commands, so you need to add this new show alias. Then you can simply use it like hg show tip to view the contents of your topmost mq patch.
I’ve only tested this configuration on Ubuntu 12.04 with Mercurial 2.0.2, so I’d be interested to hear if it works elsewhere.
Baby #2
February 29th, 2012
Just a quick note for those of you who haven’t already seen it on Twitter or Facebook. On Thursday, February 23rd at 2:01 AM my wife and I welcomed our second child into the world. He’s a healthy baby boy: Michael Thomas Mielczarek. We currently have our hands full sorting out how to raise both a toddler and a newborn, so I’ll be mostly offline for another week still.
Firefox Mobile on ARMv6 processors
February 17th, 2012
Most smartphones use ARM processors. Much like how most PCs use x86 processors, for various reasons ARM has become the CPU of choice for mobile devices. Similar to x86, there are different versions of ARM processors that support different features. One of the biggest differences is which instruction set is supported. Instructions are the smallest units of what a processor can do, and an instruction set are the particular units that a processor knows how to run. For Intel, instruction sets were changed when they went from the 386 to the 486 to the Pentium and so on. For ARM, the instruction sets are numbered, with the most current one in use being ARMv7 (with ARMv8 in development). Confusingly, ARM’s processors themselves have similar naming, with the ARM11 being the generation that supports the ARMv6 instruction set, and ARM Cortex being the generation that supports the ARMv7 instruction set. All high-end smartphones that are currently shipping use processors that support the ARMv7 instruction set. The Apple iPhone 4S, Samsung Galaxy S2 and Galaxy Nexus, as well as others all come with similar processors. They’re all similarly fast as processors in smartphones go, and ARMv7 contains lots of features that allow programs to run very quickly.
How is this relevant to Firefox Mobile? Currently the builds we’re producing only run on processors that support ARMv7. This is partially because we’ve been working on performance for quite a while, and it’s much harder to get acceptable performance on a slower processor, so targeting only faster processors makes sense. (This is the same reason that Chrome for Android only runs on Android phones running the latest version of Android.) It’s also partially because all modern JavaScript engines ship with a JIT, which is a highly specialized piece of code that needs to know intimate details about the type of processor it’s running on. We used to produce additional builds that supported ARMv6 alongside our ARMv7 builds, but we saw lots of ARMv6-specific crashes in our crash reporting system, and we didn’t have the resources to tackle them all. Additionally, we were focused on making Firefox Mobile run well on ARMv7 processors; so making it run well on ARMv6 seemed like a stretch at the time.
Coming back to the present, we’ve got a revitalized mobile team working on a revamped Firefox Mobile that’s much faster than previous versions, so the performance target seems much more within reach. We also had people attending MozCamps and other Mozilla events across the globe last year. Dietrich visited Nairobi for some Mozilla Kenya events and found that the most widely used Android phones in Kenya are all ARMv6 devices. In addition, there are lots of Android phones being sold in China that are ARMv6. Even in the USA there are some low-end Android devices being released that are still ARMv6, like the LG Optimus Hub, which shipped in October of 2011. As of that date roughly 58% of the Android install base was comprised of ARMv6 phones. That’s a huge segment of the market that we’re not supporting.
Because of this, during the Firefox Mobile revamp Doug roped me in and asked if I would look into getting our ARMv6 builds back up and running. I started working on it figuring it wouldn’t be too bad since we used to produce builds. As it turns out, I was wrong. We managed to break things in quite a few ways since we disabled those builds. A few of them were simple fixes in our build configuration (although one of those took Mike Hommey and I a solid week of debugging to track down), but I also ran into a few problems with our custom linker. Firefox Mobile ships with a replacement for the system dynamic linker on Android. It’s pretty complicated, but this is the reason that Firefox only takes up about 15MB, whereas Chrome for Android takes up nearly 50MB after installation. Being a complicated piece of code there were some hard-to-diagnose bugs in it. Thankfully, with some input from Jacob Bramley from ARM we were able to track down the remaining problem and get builds working again.
With all the setbacks and other issues it’s not unreasonable to ask why we’re doing this. Clearly this isn’t the end of the process by any means. We still have to get automated builds back up and running on our build farm. We will undoubtedly have to shake out more ARMv6-specific bugs in our JavaScript engine and elsewhere. We’ll almost assuredly have to do some work to make performance acceptable. It’s a lot of work and it will take time, but this seems like the right thing to do given the number of users we can reach. You can follow along in Bugzilla if you’re interested in this work.
Blogging catch-up
February 9th, 2012
Apparently I haven’t written anything in this blog for over seven months! In my defense, I’ve been really busy both at Mozilla and in my personal life. I have quite a few things that I’ve been working on, so I’m going to try to catch up and post about a few of them in the next couple of days. Expect to see posts about the Gamepad API, ARMv6 support for Firefox Mobile, and the build hackery that I did for our upcoming WebRTC support.
xpcshell manifests, phase 2
June 29th, 2011
Recently we implemented a manifest format for xpcshell unit tests (with Joel Maher doing the lion’s share of the work). After that work landed, we realized there were some things missing from our initial design, so we set out to revamp a few things to make it easier to write useful manifests.
We decided to make the manifests support boolean expressions, similar to what reftest manifests allow, except with a restricted grammar, and not “all of JavaScript”. To make this useful, we had to offer a set of values to test, so Jeff Hammel buckled down and wrote a Python module called mozinfo that had been in discussion for a long time. I wrote a few bits to hook this all up between the build system and the xpcshell test harness, and it all landed on mozilla-central this morning.
I’ll update the MDN documentation later today, but for a preview, a sample manifest entry might look like:
[test_foo.js] skip-if = os == 'win' || os == 'linux'
If you look in your object directory in a build containing these patches (or in the xpcshell directory of a test package from a tinderbox build), you’ll find a mozinfo.json, which is where most of the values you can use in these expressions come from. For example, the mozinfo.json for my 64-bit Linux build looks like:
{'os': 'linux', 'toolkit': 'gtk2', 'crashreporter': false, 'debug': false, 'bits': 64, 'processor': 'x86_64'}
You can annotate tests with “skip-if, run-if and fail-if” conditions currently. “skip-if” indicates that a test should not be run if the condition evaluates to true, “run-if” indicates that a test should only be run if the condition evaluates to true and “fail-if” indicates that the test is known to fail if the condition is true. Tests marked fail-if will produce TEST-KNOWN-FAIL output if they fail, and TEST-UNEXPECTED-PASS (which is treated as a failure) if they pass.
Hopefully this work will enable developers to more easily work with xpcshell tests. We’d appreciate any feedback you have on these changes!
Measuring UI Responsiveness
June 27th, 2011
One of the goals for the Firefox team is to ensure that the user interface remains responsive to input at all times. Clearly a responsive interface is incredibly important to making the browser a useful application, but how do we measure “responsiveness”?
Dietrich has done some work on this, writing an add-on that measures the time that various UI actions take. This covers the direct case, where a user initiates an action and expects a response in a reasonable amount of time. Clearly we want to make sure that individual actions don’t take an extraordinary amount of time.
I took the opposite tack, with an eye on being able to detect when the application was not responsive to user input regardless of what actions the user was taking. Building on some work by Chris Jones and Alon Zakai, I wrote some code that instruments the main thread event loop to find out how long it takes to respond to events, which ought to be a reasonable proxy for measuring responsiveness. When the instrumentation detects that the event loop takes too long to respond (more than 50 milliseconds, currently) it writes a data point to a log giving the current timestamp and the amount of time the event loop was not responsive.
When I implemented this I had my eye on Talos integration, where we could run the browser through some automated UI tests with this instrumentation enabled, and then correlate “UI actions” with “unresponsive periods” and ensure that the browser did not become unresponsive during those actions. Talos integration has been shifted off as a longer-term goal, with the more immediate goal being “find UI actions that are the worst offenders of unresponsiveness”. To that end we’ve filed some other bugs about correlating this unresponsiveness data with JavaScript execution, and correlating the data with C++ execution. If you’ve got any ideas please feel free to contribute to those bugs!
If you’d like to try out the responsiveness instrumentation I implemented, it landed on mozilla-central a while ago, and there’s some reasonably complete documentation in the source code. There are implementations for Windows, Linux/GTK2 and OS X currently. (And a patch for an Android implementation in a bug.)
Why it’s hard to ship non-crashy software
June 14th, 2011
I was just looking at some data produced from our crash reporting system, and I continue to be amazed at the amount of third-party code that gets loaded into Firefox on Windows. That data file contains a list of all unique binary files (EXE or DLL) that were listed in Windows crash reports in a single day. A quick look at it shows:
$ cut -f1 -d, 20110613-modulelist.txt | sort -u | wc -l
10385
There are over 10,000 unique filenames in a single day’s worth of crash reports. That sure seems like a lot! Now, certainly, a lot of these modules look like they’ve been randomly named, which probably indicates that they’re some kind of virus (like 0eYZf0QFDSGEAbTRWD3F.dll, for example), so those are likely to inflate the number. There’s a bug on file asking that we collect MD5 hashes of every DLL in our crash reports so we could more easily detect malware/virus DLLs that use these tactics, as well as integrate with lists of known malware and viruses from antivirus vendors.
In the past, we have had problems with plugins and extensions causing crashes for many Firefox users. We have ways of mitigating those through blacklisting. We can also blacklist specific DLLs from loading in the Firefox process, which is not used as often because it’s harder to get right and provides little feedback to users about what’s been disabled. However, given the sheer number of possible things that can be loaded in our process, it’s unlikely that we’ll ever be able to block all software that causes crashes for users. This is unfortunate, because any one of these pieces of software can cause a crash in Firefox, and all the user sees is “Firefox crashed“. I suppose we now know how Microsoft feels when users blame Windows for crashes caused by faulty drivers.