Mar 15

tsan bug finding update

At the beginning of Q1, I set a goal to investigate races with Thread Sanitizer and to fix the “top” 10 races discovered with the tool.  Ten races seemed like a conservative number; we didn’t know how many races there were, their impact, or how difficult fixing them would be.  We also weren’t sure how much participation we could count on from area experts, and it might have turned out that I would spend a significant amount of time figuring out what was going on with the races and fixing them myself.

I’m happy to report that according to Bugzilla, nearly 30 of the races reported this quarter have been fixed.  Folks in the networking stack, JavaScript GC, JavaScript JIT, and ImageLib have been super-responsive in addressing problems.  There are even a few bugs in the afore-linked query that have been fixed by folks on the WebRTC team running TSan or similar thread-safety tools (Clang’s Thread Safety Analysis, to be specific) to detect bugs themselves, which is heartening to see.  And it’s also worth mentioning that at least one of the unfixed DOM media bugs detected by TSan has seen some significant threadsafety refactoring work in its dependent bugs.  All this attention to data races has been very encouraging.

I plan on continuing the TSan runs in Q2 along with getting TSan-on-Firefox working with more-or-less current versions of Clang.  Having to download specific Subversion revisions of the tools or precompiled Clang 3.3 (!) binaries to make TSan work is discouraging, and that could obviously work much better.