A thread-checking toolkit for Firefox

Back in January I blogged about using Helgrind to check for threading errors in Firefox’s JS engine.  That effort was the first step towards a bigger goal, namely to find and remove all unintended data races in the browser proper.

I have been wanting to get a point where our C++ developers can routinely use Helgrind to check for threading bugs in code, both new and old, in the same way that Valgrind’s Memcheck tool is now widely used to check for memory errors.  For the reasons discussed in my January posting, race checking is more difficult than memory checking.  Now, though, I believe we’re approaching the point where routine Helgrinding is feasible.

I’d like to introduce what amounts to a kit for thread-checking Firefox.  The main resource for this is at the MDC page “Debugging Mozilla with Helgrind“.  Here’s a summary.

There’s three parts to the kit:

  • A markup patch for the Mozilla code base.  This describes to Helgrind the effect of some synchronisation events it doesn’t understand and stops it complaining about some harmless races in the JS engine.
  • A suppression file that hides error reports in system libraries.
  • A development version of Helgrind.  This contains a bunch of correctness, diagnostic and scalability improvements.  A stock Valgrind installation won’t work.

With this framework in place, I completed a first run through Mochitests with Helgrind.  It took 32 CPU hours.  Around 15 bugs have been filed.  Some of them are now fixed, and others have been declared harmless.  But that’s just a beginning: there are many more uninvestigated reports lurking in the mochitests output.

Have a look at the MDC page for more details, including directions on how to get started.  And, of course, if you want help with any of this, please feel free to contact me.

1 response

  1. Andrew Sutherland wrote on :

    Amazing and awesome stuff!

    Is the log output for the mochitest runs available somewhere? I’d be very interested to see what type of output it produces. While I am unlikely to directly fix the failures, I am interested in being able to surface the results of such analyses from automated runs in ways easier for developers to look into / etc, and the logs would give me a good idea of how tractable/useful such a thing would be and the effort required and utility gained from linking with other views of the code/etc. (See ArbPL: http://www.visophyte.org/blog/tag/arbpl/)