May 12

Give DXR a try

At Mozilla we have a long history of using MXR for looking up and discussing source code. Unfortunately MXR is an unlovable mess of Perl and a crappy (in terms of performance and license) text indexing engine that is glimpse. It is dead because nobody wants to work on it.

DXR is a semantically-aware successor to MXR. Semantic information is extracted from LLVM during compilation. This makes it possible to do searches like derived:nsIFile. DXR uses a modern Full Text Search engine for text searches, so it should be much faster than MXR. There is a test instance at dxr.mozilla.org, please give it a try. The homepage lists sample searches you can do.

DXR is written in Python. It uses an SQLite database + FTS index as a backend. Useful semantic information is extracted from the source via a Clang LLVM plugin. Checkout the source code at github.

DXR should be getting close to feature parity with MXR. Give it a try and let me know of any bugs/missing features you encounter (or submit a patch!). I realize that people have gotten used to various MXR quirks and that it can be stressful to switch to a new code indexer while trying to get stuff done, but MXR IS DEAD. We need to move on. Mozilla is complex, finding relevant code can take quite a while, especially for new contributors. Using a smarter indexer should save time, reduce frustration and free up a few developer-years to make Firefox better.

We have lots of ideas for DXR, but first we need to ensure it is a suitable replacement for MXR. Take DXR for a spin!

Aug 11

Effective Static Analysis

Static analysis can be a very fun pastime. One gets to sift through giant codebases looking for interesting clues, what could be more fun? A couple things qualify: a) static analysis accompanied by cool rewrites b) static analysis accompanied by cool visualizations.

Cool Rewrite

Michael Wu’s boolcheck tool is awesome. He wrote it to check that “typedef int” bools are really being used as booleans and aren’t perversely carrying integer values. The process of writing the tool is cool. As Michael is discovers bugs/disagreements stemming from setting “typedef bool PRBool”, he just adds another pattern to check for to the tool and never has to worry about that pattern again. I hope to see someone apply boolcheck to the linux kernel, GTK projects or anything else with int booleans. Some projects don’t have the luxury of switching to real bools, so they can continue using a static checker to make up for it.

Pretty Code

I’ve blogged about DXR many times. As of this week clang-based DXR is on par with the old Dehydra-based one. Callgraph, inheritance, etc queries now work. Joshua did an outstanding job gutting and rewriting the DXR backend this summer and is now going back to school. I’m extremely impressed with his work this summer. I didn’t think it was possible to get as far as he did.

We are looking for more help with DXR. Please deploy it on your pet project, contribute plugins for various languages, simplify deployment, etc.

Additionally, now that the backed is in a fairly decent shape, we are looking for someone to help us turn DXR into the slickest code browsing tool ever(we have some ideas written down). I’d like interactive graphs, various code visualizations, integration with bugzilla, etc. This needs a JSON-query frontend and a few other bits & pieces to be implemented.

Interns Wanted

We would love to hire more static analysis interns. Are you student who dreams about making large codebases easy to grasp? Do you want to spend a few months making Control Flow Graphs behave? If that sounds like your calling: leave a comment, send me an email.

Jun 10

Galois talk

I was invited to present a Galois tech talk on Mozilla static analysis. It was really cool to give a talk locally to such an expert audience. I was surprised to discover a vibrant Programming Languages + Analysis community in Portland.

Edward Z. Yang did an excellent write-up on the talk.


Robert O’Callahan mentioned Dehydra in his PLDI talk.

Dehydra/Treehydra in GCC 4.5

There a few fixes that are about to land. I’m hoping that by the end of the week GCC 4.5 support will be production-quality. Sorry that it’s taken so long, but I’ve been busy focusing on startup. Ehren has picked up the slack, we should be able to produce a fairly polished Dehydra 1.0 by the end of the summer.

Feb 10

Static Analysis Articles

A really good ACM article about static analysis from Coverity’s perspective has been making rounds in Mozilla. What struck me most was the following paragraph:

At the most basic level, errors found with little analysis are often better than errors found with deeper tricks. A good error is probable, a true error, easy to diagnose; best is difficult to misdiagnose. As the number of analysis steps increases, so, too, does the chance of analysis mistake, user confusion, or the perceived improbability of event sequence. No analysis equals no mistake.

My personal view has been that “dumb” analyses are the most effective ones in terms of mistakes spotted vs time wasted writing/landing the analysis. It is interesting to see that sophisticated analyses are difficult to deploy even for Coverity.

In other news, LCA 2010 was my favourite conference so far. I met a number of awesome developers there. Mozilla’s static analysis work finally got mentioned in LWN!

Jan 10

State of Static Analysis At Mozilla

Mozilla has static analyses built into the buildsystem that can be turned on with –with-static-checking= flag. The analyses live in xpcom/analyses directory. The testcases (aka documentation) are in xpcom/tests/static-checker. Analyses are implemented in either Dehydra or Treehydra and run within a patched GCC 4.3.

The currently landed checks are:

  • final.js: Java-like “final” keyword for C++
  • flow.js: Ensure code in a function flows through a particular label
  • must-override.js: Force derived classes to override certain methods
  • override.js: Ensure methods exist in base class
  • outparams.js: Ensure outparameters and return error codes are in sync
  • stack.js: Mark classes as stack-only

A whole lot more analyses in various states of completion can be tracked in the static analysis bug.

Asynchronous discussion happens in the mailing list. #static irc channel is the place for interactive discussion.

Nearterm Plans For Plugins

GCC 4.5 has an official plugin framework enabled by default. I will try to switch to GCC 4.5 as soon as it is out. Currently 4.5 is still changing too often for me to bother fixing Treehydra (Dehydra usually works). As soon as 4.5 is out I will revise the installation instructions to use distribution GCC and JavaScript packages to avoid the current mess (draft can be found here). Sometime after that I’ll switch Mozilla static analysis to GCC 4.5 and drop 4.3 support.

Hopefully, this will make it easier for other open source projects to adapt the hydras.

Plans for Analyses

I’m a big believer into application-specific static analyses, but I would like to see some heavy duty open source analyzers built on top of GCC.

Some of the not-so-Mozilla-specific analyses should be bundled together to make them easy to try out on other projects.

Hopefully 2010 will be the year that open source static analysis catches on.


I posted my slides from yesterday.

Jan 10

Some developers manually grope around in the dark

Cool thing about static analysis is that you can ask painful-for-humans questions about your codebase AND have them answered.
Here are two that got answered by Ehren:

Where do function bodies continue after return statements (ie obviously dead/broken code)? Bug 535646.

How many functions in Mozilla could/should be marked static? Bug 536427.