Dehyra/Treehydra Static Analysis Thoughts

I was pleased to see Mozilla static analysis mentioned on lwn. Yes indeed, the mailing list has been pretty dead (most of our communication happens on irc.mozilla.org #static). I completely failed to build a community around my static analysis tools. Perhaps more people will try Dehydra now that it’s getting into Debian. The hydras are still alive, evidence can be seen in the mercurial commit log. Development has slowed because the hydras are now considered to be feature-complete and my primary focus is elsewhere in Mozilla now.

As to why open source static analysis has failed to take off, I have a few theories. I think the main problem is that static analysis requires a compiler/correctness/type-system-nerd/large-scale-development-nerd type personality. That’s a pretty rare intersection of hobbies to begin with. One also has to hate the stone age that C/C++ ecosystem we are in, but not move on to shiny new Haskell/Ocaml/whatever communities.

Have I failed at igniting the static analysis revolution?

  1. My goal primary goal was: provide a way to analyze Mozilla source code to speed up our development + refactoring efforts.
  2. My secondary goal was to make sure that whatever work I do, nobody else has to suffer through the unbelievably sucky infrastructure cruft I had to work through.
  3. Lastly, I did put in some effort at promoting open source static analysis (by giving talks at conferences, etc) since working in an active community is more fun.

Mozilla side:

I’m happy to report that I achieved a culture shift at Mozilla. Instead of people saying “oh god, I can’t find all instances of ___ issue in 3million lines of C++ code”, it’s pretty common to hear “lets solve this through static analysis”. Dehydra was designed to take the bitchwork (boilerplate of compiler integration, etc) out of static analysis so one can focus on the analysis part. New Dehydra users within Mozilla seem to confirm that.
Instead of pondering whether certain tool-assisted refactorings are feasible, we plan to embark on some now (turned out we were understaffed to keep up with tool output and overburdened by api compatibility before; more on this in a future blog post).

No More Static Analysis Bitchwork:

The worst aspect of dealing with C++ is parsing it. The second worst aspect is dealing with the preprocessor. With respect to parsing C++ we went from weirdo-custom-frontends(ie Elsa, EDG, etc) and “GCC will never allow plugins, don’t waste your time” to GCC adopting a plugin architecture that suited my static analysis needs. I also implemented source-location transformation tracking(-K) in mcpp, so nobody has to suffer through undoing braindamage inflicted by the C proprocessor again.
I hear at least a couple of people benefited from MCPP work and I take partial credit for every new analysis GCC plugin. I suspect I saved a few person-months for somebody :)

Btw, I think Chris Lattner’s from-scratch effort on Clang is way awesomer than anything I could ever accomplish.

Conferences & Stuff:

I admit complete and utter failure in this regard. Most open source people have low regard for static analysis. Linus seems to take a million-monkeys-with-type-writers approach (ala the open source eyeballs approach to security) to ensuring kernel code quality (which is a reasonable approach when you have mobs of contributors). Most other projects do not have the resources to spare on unproven tech such as static analysis.

To make matters worse, at first people thought JavaScript was a toy language worth only cut’n’pasting from recipes online. Then just as JavaScript was getting more popular, SpiderMonkey embedding got buggier and made for some unpleasant first experiences with the Hydras.

Conclusion

There isn’t much to show for my work outside of Mozilla; that’s fine since my primary goal was Mozilla :) The Hydras aren’t dead, they are in maintenance mode.

I’m glad to see python-as-gcc-plugin approach, it seems to fill the same niche as Treehydra. I regret not starting out with Python (I think it’s slightly better than JavaScript for this task), I hope David Malcolm succeeds in attracting wider interest.

PS. I’m super-excited about the new DXR work. DXR is something that makes my daily life easier. DXR is by far the smartest code-indexing system out there, it’s bound to transform my life as a developer far more than any static analysis ever could :)

2 comments

  1. Well, static analysis is becoming more mainstream these days. gcc has had compiler plugins since 4.5; clang also has plugins (not so easy to use, though), not to mention its static analysis tool. Now, even MSVC has a way to implement static analysis plugins. Java too has annotation parsing builtin (since 5/6).

    Now, what would be really nice is if everyone could agree on standard AST representations for C and C++, so that someone can write a static analyzer that works with all major compilers…

  2. Im building up a tool using dehydra, and dxr.
    Hopefully will be using it and replacing doxygen.

    I find my self constantly doing the same stuff over and over again with my code structures, and the sqlite database would let me automate it.

    I have a script to associate code comments to the dxr database.

    Working on a python model, and template gui engine of the sqlite code structures now.

    Modified the dxr database a little bit.

    A lot of tools do what dehydra does, but they miss out because they arn’t part of the gcc compile.

    One of the libraries that I manage is littered with #ifdef’s. Doxygen would be unable to handle that without some work. gcc-xml is from 2004 and should be updated to support the plugin system.