Skip to content

compiler-enforced locked accesses


If you’ve done any amount of threaded programming, you’ve probably run across code that looked like:

  // Only accessed with the mutex held.
  uint32_t mFlags;
  bool mConnected;
  nsTArray<int32_t> mData;

  // Only called with the mutex held.
  void DoSomething();

Perhaps you’ve even gotten to debug code which inadvertently violated the locking requirements of the members.

Several months ago, I reviewed a patch by David Keeler that addressed the second half of the above example. Methods that had locking requirements looked like:

  void DoSomething(MutexAutoLock& aProofOfLock);

which ensures (at compile time!) that you can’t call the function without locking the mutex first. I thought this was a nice technique, said as much in my review, and have been looking for places to apply it ever since.

The explicitness and the requirement to constantly pass MutexAutoLock& variables around is a feature, not a bug. Doing so encourages you to limit the amount of code that needs to be executed with locks held, keeping the concurrent parts of the code and the synchronized parts of the code clearly delimited. In this respect, this technique is similar to the IO monad in Haskell. I don’t know whether the extra verbosity would make the code more difficult to read or not, especially if the techniques for guarding members suggested below were applied as well.

This coding style also came in handy a week or so ago investigating overly high CPU usage when playing YouTube videos. Our event queue for nsIRunnables did its own locking internally, and exposed its internal reentrant monitor “for power users”. This led to code like:

  ReentrantMonitorAutoEnter mon(mEvents.GetReentrantMonitor());

where the PutEvent call would do an extra round of locking (“just to be sure”), which was wasted work. Data structures like this doing their own locking internally typically isn’t a great idea, so part of the work in the above bug was to separate the locking requirements of the queue from who actually needs to do the locking. Or, in other words, we can have the class that owns the event queue do the locking, and have the event queue’s methods enforce the locking programmatically:

  MonitorAutoLock mon(mMonitor);
  mEvents.PutEvent(event, mon);

Now there’s no wasted work in PutEvent, because it already knows the appropriate locking has been done. The ability to use non-reentrant monitors—which are more efficient—was a nice bonus resulting from the separation of concerns here.

This technique can also help solve the first half of the problem we presented at the beginning of this post: ensuring members are only accessed with locks held.

template<typename T>
class Guarded;

class Guarded<uint32_t>
  Guarded() : mValue(0) {}

  uint32_t Value(MutexAutoLock& aProofOfLock)
    return mValue;

  void Assign(MutexAutoLock& aProofOfLock, uint32_t aNewValue)
    mValue = aNewValue;

  // Since accesses should only be done under the lock, and copying
  // and moving would therefore require locks, we require the user
  // to ensure those constraints are met with explicit calls to the
  // above rather than the compiler sneaking unlocked accesses in.
  Guarded(const Guarded&) = delete;

  uint32_t mValue;

The above class isn’t production quality code; it’s intended to sketch out how explicit locking requirements might work. A more robust version might require a Mutex& reference to be passed to the constructor, and member functions assert that the MutexAutoLock& parameters actually lock the specified mutex. Specializing for each of the integer types would also get tiresome, so we’d need to do a better job there. Handling types with methods could be done with something like the following, I think:

template<typename T>
class GuardedAggregate
  GuardedAggregate() : mValue() {}

  // The core idea here is that the user would write:
  // GuardedAggregrate<nsTArray> mArray;
  // and then accesses would be done via:
  // mArray.Value(lock).Method(...);
  // This means that we don't have to proxy every single one of
  // the aggregate's methods, but the locking requirements are
  // still explicit.
  class Proxy
    Proxy(MutexAutoLock& aProofOfLock, T* aValue) : mValue(aValue)

    T* operator->()
      return mValue;

    T* mValue;

  Proxy Value(MutexAutoLock& aProofOfLock)
    return Proxy(aProofOfLock, &mValue);

  T mValue;

This can also be though of as a compiler-independent, but less flexible version of clang’s Thread Safety Analysis. Folks have been asking about bringing the annotations that analysis requires into Gecko; I wonder if it might work just as well to apply this technique more liberally throughout the codebase.

standardizing things my way


I was reading The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age and ran across a passage that resonated:

Everybody, of course, supports standardization—in theory. But human beings (particularly, but not exclusively, famous Harvard professors practicing at famous Boston hospitals) want things to be standardized their way. The difficulty that doctors face in accepting a workplace that is not custom-designed around their personal preferences is captured in this old joke about the physician staffs of hospitals: What do you call a 99-1 vote of the medical staff? A tie.

Examples abound: coding styles, version control systems, code review systems…

explicit is better than implicit: c++ implicitly defined member functions


In the tradition of The Zen of Python, I’ve been thinking about pushing for explicit declarations of otherwise implicitly-defined member functions in C++, both in code that I write and in code that I review:

// Instances of this class should not be copied.
MyClass(const MyClass&) = delete;
MyClass& operator=(const MyClass&) = delete;

// We are OK with the default semantics.
OtherClass(const OtherClass&) = default;
OtherClass& operator=(const OtherClass&) = default;
OtherClass(OtherClass&&) = default;
OtherClass& operator=(OtherClass&&) = default;

[Background: C++ specifies several member functions that the compiler will implicitly define for you in any class: the default constructor, the copy/move constructor(s), and the copy/move assignment operator(s). I say “implicitly define”, as though that always happens, but there are a number of constraints on when the compiler will do this. For the purposes of the discussion below, I’ll ignore the default constructor bit and focus on the copy/move constructor and assignment operator. (I will also happily ignore all the different variants thereof that can occur, e.g. when the compiler defines MyClass(MyClass&) for you.) I think the arguments apply equally well to the default constructor case, but most classes I deal with tend to either declare their own default constructor or have several user-defined constructors anyway, which prohibit the compiler from implicitly declaring the default constructor.]

I think the argument for = delete is more obvious and less controversial, so I’ll start there.  = delete‘ing functions you don’t want used is part of the API contract of the class.  Functions that shouldn’t be used shouldn’t be exposed to the user, and = delete ensures that the compiler won’t implicitly define part of your API surface (and users thereby unknowingly violate API guarantees).  The copy constructor/assignment operator are the obvious candidates for = delete, but using = delete for the move constructor/assignment operator makes sense in some cases (e.g. RAII classes). Using = delete gives you pleasant compiler error messages, and it’s clearer than:

  MyClass(const MyClass&);
  MyClass& operator=(const MyClass&);

If you’re lucky, there might be a comment to the effect of // Deliberately not defined.  I know which code I’d prefer to read. (Using = delete also ensures you don’t accidentally use the not-defined members inside the class itself, then spend a while waiting for the linker errors to tell you about your screw-up.)

= default appears to be a little harder to argue for.  “Experienced” programmers always know which functions are provided by the compiler, right?

Understanding whether the compiler implicitly defines something requires looking at the entire class definition (including superclasses) and running a non-trivial decision algorithm. I sure don’t want each reader of the code to do that for two or four different member functions (times superclasses, too), all of which are reasonably important in understanding how a class is intended to be used.

Explicitly declaring what you intend can also avoid performance pitfalls. In reading through the C++ specification to understand when things were implicitly declared, I discovered that the same functions can also be implicitly deleted, including this great note: “When the move constructor is not implicitly declared or explicitly supplied, expressions that otherwise would have invoked the move constructor may instead invoke a copy constructor.” So, if the move constructor was implicitly declared at some point, but then was implicitly deleted through some change, expressions that were previously efficient due to moving would become somewhat less so due to copying. Isn’t C++ great?

Being explicit also avoids the possibility of meaning to define something, but getting tripped up by the finer points of the language:

template<typename T>
class MyClass
  // This does not define a copy constructor for MyClass<T>.
  template<typename U>
  MyClass(const MyClass<U>& aOther) : ... { ... }

Comments could serve to notify the reader that we’re OK with the default definition, but if I could choose between encoding something in a place solely intended for humans, or a place both humans and the compiler will understand, I know which one I’d pick.

white space as unused advertising space


I picked up Matthew Crawford’s The World Outside Your Head this weekend. The introduction, subtitled “Attention as a Cultural Problem”, opens with these words:

The idea of writing this book gained strength one day when I swiped my bank card to pay for groceries. I watched the screen intently, waiting for it to prompt me to do the next step. During the following seconds it became clear that some genius had realized that a person in this situation is a captive audience. During those intervals between swiping my card, confirming the amount, and entering my PIN, I was shown advertisements. The intervals themselves, which I had previously assumed were a mere artifact of the communication technology, now seemed to be something more deliberately calibrated. These haltings now served somebody’s interest.

I have had a similar experience: the gas station down the road from me has begun playing loud “news media” clips on the digital display of the gas pump while your car is being refueled, cleverly exploiting the driver as a captive audience. Despite this gas station being somewhat closer to my house and cheaper than the alternatives, I have not been back since I discovered this practice.

Crawford continues, describing how a recent airline trip bombarded him with advertisements in “unused” (“unmonetized”?) spaces: on the fold-down tray table in his airplane seat, the moving handrail on the escalator in the airport, on the key card (!) for his hotel room. The logic of filling up unused space reaches even to airport security:

But in the last few years, I have found I have to be careful at the far end of [going through airport security], because the bottoms of the gray trays that you place your items in for X-ray screening are now papered with advertisements, and their visual clutter makes it very easy to miss a pinky-sized flash memory stick against a picture of fanned-out L’Oréal lipstick colors…

Somehow L’Oréal has the Transportation Security Administration on its side. Who made the decision to pimp out the security trays with these advertisements? The answer, of course, is that Nobody decided on behalf of the public. Someone made a suggestion, and Nobody responded in the only way that seemed reasonable: here is an “inefficient” use of space that could instead be used to “inform” the public of “opportunities.” Justifications of this flavor are so much a part of the taken-for-granted field of public discourse that they may override our immediate experience and render it unintelligible to us. Our annoyance dissipates into vague impotence because we have no public language in which to articulate it, and we search instead for a diagnosis of ourselves: Why am I so angry? It may be time to adjust the meds.

Reading the introduction seemed especially pertinent to me in light of last week’s announcement about Suggested Tiles. The snippets in about:home featuring Mozilla properties or efforts, or even co-opting tiles on about:newtab for similar purposes feels qualitatively different than using those same tiles for advertisements from third parties bound only to Mozilla through the exchange of money. I have at least consented to the former, I think, by downloading Firefox and participating in that ecosystem, similar to how Chrome might ask you to sign into your Google account when the browser opens. The same logic is decidedly not true in the advertising case.

People’s attention is a scarce resource. We should be treating it as such in Firefox, not by “informing” them of “opportunities” from third parties unrelated to Mozilla’s mission, even if we act with the utmost concern for their privacy. Like the gas station near my house, people are not going to come to Firefox because it shows them advertisements in “inefficiently” used space. At best, they will tolerate the advertisements (maybe even taking steps to turn them off); at worst, they’ll use a different web browser and they won’t come back.

recent tsan improvements for firefox


One of my goals for Q2 was to make Firefox usable with “modern” (version 3.6+) clang. For reasons previously unknown, Firefox would only work with relatively old clang (version ~3.3); indeed, our wiki page for TSan recommends checking out specific SVN versions!

I’m happy to say that goal has been met.  If you like, you can read (some of) the gory details in bug 1153244.  Checking out a development version of Clang and building that should enable you to test Firefox (from mozilla-central, since that’s where the aforementioned bug has landed thus far) with TSan.  Upgrading through several major releases has its benefits: TSan appears to be somewhat more stable—tests that would previously timeout reliably now run without complaint.  I’m not sure that it’s quite reliable enough to run in automation yet, but running TSan tests in automation is definitely more feasible than it was last quarter.

With an eye towards doing that, and also with an eye towards improving individual developer experience, I’ve recently landed several patches that teach our test harnesses about TSan-specific things.  For instance, when we’re running mochitests, we only care about races in the browser itself; we don’t care about races while running any of the supporting tools (particularly xpcshell, which shares a lot of code with the browser through libxul).  So TSan should only be enabled for the actual browser executable.  We already do similar things for our ASan builds.

I also discovered that my local builds weren’t using the llvm-symbolizer tool to produce backtraces in TSan’s reports, but instead defaulting to using addr2lineaddr2line is a fine tool, but llvm-symbolizer understands debug information better (for instance, it shows you synthesized frames for inlined functions, which is a huge win) and llvm-symbolizer is somewhat faster than addr2line. In my case, it was a combination of llvm-symbolizer not being in my PATH, and not informing TSan about the location of llvm-symbolizer ($OBJDIR/dist/bin/, usually). The former was addressed by setting LLVM_SYMBOLIZER in my mozconfig (not necessary if llvm-symbolizer is on your PATH), and the latter was addressed by adding a bit of code to our test scripts.

Now, to update the wiki with better instructions!

on development speedbumps


Last week, I ran into a small speedbump with my development process. I normally write my patches and commit them to git with initial lines that look like:

fix IPDL thinko for never-inline method declarations; r=bent

Then, when I use git-bz to push my patches to bugzilla, it autofills the r? flag for the patch to :bent, according to the r= bit specified in the commit message. Quite convenient.

Except last week, something broke when trying to use that feature. Bugzilla would complain that :bent was an invalid user. “OK, fine,” I initially thought, “maybe :bent is unavailable.” (People often change their Bugzilla name to not include :NAME when they won’t be reachable for a period of time.) Inspection showed this to not be the case. Although the initial evaluation was that git-bz was old and crusty (it screen-scrapes Bugzilla, rather than using Bugzilla’s REST API), it turned out that Bugzilla was indeed at fault.

OK, but what to do until Bugzilla actually gets updated with the fix? Options:

  1. Cope with a little extra work: use git-bz as normal, but manually set the r? flags from Bugzilla’s web UI.
  2. Paralysis: any extra work is unacceptable, so do nothing (at least in terms of writing patches).

I admit option 2 is a bit silly, but that’s exactly what the situation felt like: there had been this unexpected obstacle in my normally smooth development process and it was simply Not Worth going back and doing things the old way.

I then realized that this situation, in reverse, is exactly the “barriers to entry” that people like Gregory Szorc talk about: every extra step that you have to do in Firefox development (or development generally) is one more place for somebody to run into trouble and decide contributing to your project isn’t worth the hassle. Make the extra steps go away! (And then don’t bring them back via bugs. *wink*)

tsan bug finding update


At the beginning of Q1, I set a goal to investigate races with Thread Sanitizer and to fix the “top” 10 races discovered with the tool.  Ten races seemed like a conservative number; we didn’t know how many races there were, their impact, or how difficult fixing them would be.  We also weren’t sure how much participation we could count on from area experts, and it might have turned out that I would spend a significant amount of time figuring out what was going on with the races and fixing them myself.

I’m happy to report that according to Bugzilla, nearly 30 of the races reported this quarter have been fixed.  Folks in the networking stack, JavaScript GC, JavaScript JIT, and ImageLib have been super-responsive in addressing problems.  There are even a few bugs in the afore-linked query that have been fixed by folks on the WebRTC team running TSan or similar thread-safety tools (Clang’s Thread Safety Analysis, to be specific) to detect bugs themselves, which is heartening to see.  And it’s also worth mentioning that at least one of the unfixed DOM media bugs detected by TSan has seen some significant threadsafety refactoring work in its dependent bugs.  All this attention to data races has been very encouraging.

I plan on continuing the TSan runs in Q2 along with getting TSan-on-Firefox working with more-or-less current versions of Clang.  Having to download specific Subversion revisions of the tools or precompiled Clang 3.3 (!) binaries to make TSan work is discouraging, and that could obviously work much better.

measuring power usage with power gadget and joulemeter


In the continuing evaluation of how Firefox’s energy usage might be measured and improved, I looked at two programs, Microsoft Research’s Joulemeter and Intel’s Power Gadget.

As you might expect, Joulemeter only works on Windows. Joulemeter is advertised as “a software tool that estimates the power consumption of your computer.” Estimates for the power usage of individual components (CPU/monitor/disk/”base”) are provided while you’re running the tool. (No, I’m not sure what “base” is, either. Perhaps things like wifi?) A calibration step is required for trying to measure anything. I’m not entirely sure what the calibration step does, but since you’re required to be running on battery, I presume that it somehow obtains statistics about how your battery drains, and then apportions power drain between the individual components. Desktop computers can use a WattsUp power meter in lieu of running off battery. Statistics about individual apps are also obtainable, though only power draw based on CPU usage is measured (estimated). CSV logfiles can be captured for later analysis, taking samples every second or so.

Power Gadget is cross-platform, and despite some dire comments on the download page above, I’ve had no trouble running it on Windows 7 and OS X Yosemite (though I do have older CPUs in both of those machines). It works by sampling energy counters maintained by the processor itself to estimate energy usage. As a side benefit, it also keeps track of the frequency and temperature of your CPU. While the default mode of operation is to draw pretty graphs detailing this information in a window, Power Gadget can also log detailed statistics to a CSV file of your choice, taking samples every ~100ms. The CSV file also logs the power consumption of all “packages” (i.e. CPU sockets) on your system.

I like Power Gadget more than Joulemeter: Power Gadget is cross-platform, captures more detailed statistics, and seems a little more straightforward in explaining how power usage is measured.

Roberto Vitillo and Joel Maher wrote a tool called energia that compares energy usage between different browsers on pre-selected sets of pages; Power Gadget is one of the tools that can be used for gathering energy statistics. I think this sort of tool is the primary use case of Power Gadget in diagnosing power problems: it helps you see whether you might be using too much power, but it doesn’t provide insight into why you’re using that power. Taking logs along with running a sampling-based stack profiler and then attempting to correlate the two might assist in providing insight, but it’s not obvious to me that stacks of where you’re spending CPU time are necessarily correlated with power usage. One might have turned on discrete graphics in a laptop, or high-resolution timers on Windows, for instance, but that wouldn’t necessarily be reflected in a CPU profile. Perhaps sampling something different (if that’s possible) would correlate better.

finding races in Firefox with ThreadSanitizer


We use a fair number of automated tools for memory errors (AddressSanitizer/Leak Sanitizer for use-after-free and buffer overflows; custom leak checking on refcounted objects; Valgrind tests and Julian Seward’s mochitests on Valgrind periodic testing), but we do very little in terms of checking for data races between threads.  As more and more components of the browser use threads in earnest (parts of the JavaScript engine, including the GC; graphics; networking; Web features like indexedDB, workers, and WebRTC; I have probably left out some others), preventing and/or fixing data races become more and more important as a way of ensuring both correctness and stability. One of my goals this quarter is running mochitests and reftests with ThreadSanitizer (TSan), reporting any races that it finds, and either fixing some of them myself or convincing other people to fix them.

What is a data race? Informally, data races occur when two different threads operate on the same location in memory without any synchronization between the threads. So if you do:

*ptr = 1;

in one thread and:

if (*ptr == 1) {

in another thread (without locks or similar), that’s a data race. It’s not guaranteed that the second thread will see the value written by the first thread, and if the code was written with that assumption, things can (and usually do) work as expected, but can also go badly wrong. When things do go badly wrong, it can be extremely frustrating to find the actual problem, because the problems don’t typically show up at the point where the data race happened. Of course, since the bug is dependent on caches and timing issues, the problem doesn’t always reproduce on a developer’s machine, either. You can see one of Chromium’s experiences with data races producing a common crash, and it took them nearly three months to find a fix. TSan told them exactly where to look. Google has written blog posts about their experiences using TSan with Chromium and more generally with other software they develop, including Go and their own internal tools.

When faced with evidence of a data race, people sometimes say, “well, int/long/pointer accesses are atomic, so it’s OK.” This is decidedly not true, at least not in the sense that writes to memory locations of such types are immediately visible to all threads. People sometimes try using volatile to fix the problem; that doesn’t work either (volatile says nothing about concurrent accesses between threads, or visibility of operations to other threads). Even if you think your data race is benign–that it can’t cause problems–you might be surprised at what can go wrong with benign data races. Hans Boehm, who led the effort to define the C++ memory model, has written a paper describing why there is no such thing as a benign data race at the C/C++ language level. Data races are always real bugs and are explicitly undefined behavior according to the C++ standard.

I’m not the first person to try using TSan on Firefox inside Mozilla; Christian Holler started filing bugs about races detected by TSan over a year ago.  So far, I’ve filed about 30 bugs from running “interesting” groups of mochitests: mostly mochitests under dom/ and layout/. That doesn’t sound like that many, and there are a couple reasons for that. One is that the same races tend to get reported over and over again. I’ve applied some local fixes to silence some of the really obnoxious races, but you still have to sort through the same races to find the interesting ones. (I should probably spend quality time setting up suppression files, but I’ve avoided doing that thus far for the fear of inadvertently silencing other races that I haven’t seen/reported before.) Test runs are also slow (about 10x slower than a non-TSan run), and it’s fairly common for the browser to simply hang during testing. I’ve been able to fiddle around with the mochitest harness to increase timeouts or the number of timeouts permitted, but that makes tests go even slower, as tests that do timeout take longer and longer to do so.

Fortunately, people have been responsive to the bugs I’ve filed; Honza Bambas, Valentin Gosu, and Patrick McManus on the networking side of things; Terrence Cole, Jon Coppeard, Andrew McCreight, and Shu-yu Guo on the JavaScript side of things; and Milan Sreckovic on the graphics side of things have all been fixing bugs and/or helping analyze what’s going wrong as they’ve been filed.

If you want to try out TSan locally, there’s an excellent introduction to using TSan on MDN.  I strongly recommend following the instructions for building your own, local clang; my experiences have shown that the released versions of clang don’t always work (at least with Firefox) when trying to use TSan. If you find new bugs in your favorite piece of code, please file a bug, CC me, and make it block bug 929478.

multiple return values in C++


I’d like to think that I know a fair amount about C++, but I keep discovering new things on a weekly or daily basis.  One of my recent sources of new information is the presentations from CppCon 2014.  And the most recent presentation I’ve looked at is Herb Sutter’s Back to the Basics: Essentials of Modern C++ Style.

In the presentation, Herb mentions a feature of tuple that enables returning multiple values from a function.  Of course, one can already return a pair<T1, T2> of values, but accessing the fields of a pair is suboptimal and not very readable:

pair<...> p = f(...);
if (p.second) {
  // do something with p.first

The designers of tuple must have listened, because of the function std::tie, which lets you destructure a tuple:

typename container::iterator position;
bool already_existed;
std::tie(position, already_existed) = mMap.insert(...);

It’s not quite as convenient as destructuring multiple values in other languages, since you need to declare the variables prior to std::tie‘ing them, but at least you can assign them sensible names. And since pair implicitly converts to tuple, you can use tie with functions in the standard library that return pairs, like the insertion functions of associative containers.

Sadly, we’re somewhat limited in our ability to use shiny new concepts from the standard library because of our C++ standard library situation on Android (we use stlport there, and it doesn’t feature useful things like <tuple>, <function>, or <thread_local>. We could, of course, polyfill some of these (and other) headers, and indeed we’ve done some of that in MFBT already. But using our own implementations limits our ability to share code with other projects, and it also takes up time to produce the polyfills and make them appropriately high quality. I’ve seen several people complain about this, and I think it’s something I’d like to fix in the next several months.