30
Jun 11

Dehyra/Treehydra Static Analysis Thoughts

I was pleased to see Mozilla static analysis mentioned on lwn. Yes indeed, the mailing list has been pretty dead (most of our communication happens on irc.mozilla.org #static). I completely failed to build a community around my static analysis tools. Perhaps more people will try Dehydra now that it’s getting into Debian. The hydras are still alive, evidence can be seen in the mercurial commit log. Development has slowed because the hydras are now considered to be feature-complete and my primary focus is elsewhere in Mozilla now.

As to why open source static analysis has failed to take off, I have a few theories. I think the main problem is that static analysis requires a compiler/correctness/type-system-nerd/large-scale-development-nerd type personality. That’s a pretty rare intersection of hobbies to begin with. One also has to hate the stone age that C/C++ ecosystem we are in, but not move on to shiny new Haskell/Ocaml/whatever communities.

Have I failed at igniting the static analysis revolution?

  1. My goal primary goal was: provide a way to analyze Mozilla source code to speed up our development + refactoring efforts.
  2. My secondary goal was to make sure that whatever work I do, nobody else has to suffer through the unbelievably sucky infrastructure cruft I had to work through.
  3. Lastly, I did put in some effort at promoting open source static analysis (by giving talks at conferences, etc) since working in an active community is more fun.

Mozilla side:

I’m happy to report that I achieved a culture shift at Mozilla. Instead of people saying “oh god, I can’t find all instances of ___ issue in 3million lines of C++ code”, it’s pretty common to hear “lets solve this through static analysis”. Dehydra was designed to take the bitchwork (boilerplate of compiler integration, etc) out of static analysis so one can focus on the analysis part. New Dehydra users within Mozilla seem to confirm that.
Instead of pondering whether certain tool-assisted refactorings are feasible, we plan to embark on some now (turned out we were understaffed to keep up with tool output and overburdened by api compatibility before; more on this in a future blog post).

No More Static Analysis Bitchwork:

The worst aspect of dealing with C++ is parsing it. The second worst aspect is dealing with the preprocessor. With respect to parsing C++ we went from weirdo-custom-frontends(ie Elsa, EDG, etc) and “GCC will never allow plugins, don’t waste your time” to GCC adopting a plugin architecture that suited my static analysis needs. I also implemented source-location transformation tracking(-K) in mcpp, so nobody has to suffer through undoing braindamage inflicted by the C proprocessor again.
I hear at least a couple of people benefited from MCPP work and I take partial credit for every new analysis GCC plugin. I suspect I saved a few person-months for somebody :)

Btw, I think Chris Lattner’s from-scratch effort on Clang is way awesomer than anything I could ever accomplish.

Conferences & Stuff:

I admit complete and utter failure in this regard. Most open source people have low regard for static analysis. Linus seems to take a million-monkeys-with-type-writers approach (ala the open source eyeballs approach to security) to ensuring kernel code quality (which is a reasonable approach when you have mobs of contributors). Most other projects do not have the resources to spare on unproven tech such as static analysis.

To make matters worse, at first people thought JavaScript was a toy language worth only cut’n’pasting from recipes online. Then just as JavaScript was getting more popular, SpiderMonkey embedding got buggier and made for some unpleasant first experiences with the Hydras.

Conclusion

There isn’t much to show for my work outside of Mozilla; that’s fine since my primary goal was Mozilla :) The Hydras aren’t dead, they are in maintenance mode.

I’m glad to see python-as-gcc-plugin approach, it seems to fill the same niche as Treehydra. I regret not starting out with Python (I think it’s slightly better than JavaScript for this task), I hope David Malcolm succeeds in attracting wider interest.

PS. I’m super-excited about the new DXR work. DXR is something that makes my daily life easier. DXR is by far the smartest code-indexing system out there, it’s bound to transform my life as a developer far more than any static analysis ever could :)


09
Jun 10

Galois talk

I was invited to present a Galois tech talk on Mozilla static analysis. It was really cool to give a talk locally to such an expert audience. I was surprised to discover a vibrant Programming Languages + Analysis community in Portland.

Edward Z. Yang did an excellent write-up on the talk.

PLDi

Robert O’Callahan mentioned Dehydra in his PLDI talk.

Dehydra/Treehydra in GCC 4.5

There a few fixes that are about to land. I’m hoping that by the end of the week GCC 4.5 support will be production-quality. Sorry that it’s taken so long, but I’ve been busy focusing on startup. Ehren has picked up the slack, we should be able to produce a fairly polished Dehydra 1.0 by the end of the summer.


04
Jan 10

Some developers manually grope around in the dark

Cool thing about static analysis is that you can ask painful-for-humans questions about your codebase AND have them answered.
Here are two that got answered by Ehren:

Where do function bodies continue after return statements (ie obviously dead/broken code)? Bug 535646.

How many functions in Mozilla could/should be marked static? Bug 536427.

Awesome!


20
Nov 09

Dehydra Testsuite Passes on GCC 4.5

I spent couple of days fixing the remaining test-suite failures on GCC 4.5 trunk for Dehydra. Since the last time I looked into this, GCC went from crashing all over the place to only crashing if I did something bad. It was nice to discover that as a result of switching to 4.5 Dehydra users will get saner .isExplicit behavior and more precise location info.

Treehydra will take more work due to me misunderstanding GTY annotations.

By the way, I am really grateful for all of the people who contributed GCC 4.5 fixes so far. You guys have been a big help in getting Dehydra testsuite to 100% on 4.5. Looks like I will meet my goals to finish De+Treehydra by the end of the year in time for GCC 4.5 release and my “Introducing Dehydra to the Developer World”-type talk at LinuxConf.au.nz 2010.

Startup
I reduced my focus on startup speed at the moment to catch up on Dehydra. I plan to work on reducing xpconnect overhead during startup next, ie more of this bug.


06
Nov 09

FSOSS & Dehydra Update

Last week I was in Canada to present at FSOSS with David Humphrey on awesome Mozilla Tools: Dehydra, DXR, Pork, etc. I think we managed to convey the message regarding what a sad affair that current developer development tools are.

General-Purpose Dehydra Scripts

Dehydra grew out of Mozilla’s constant need to figure out what is going on in the source code. As a result most of our scripts are very Mozilla API-specific. This makes harder for people outside of Mozilla to learn Dehydra. There is no library of Dehydra code that one can just plugin to start analyzing their codebase. Instead one has to sit down, figure out what Dehydra is capable of and then see if any of the problems facing the developer can be solved this way. If anyone wants to contribute such a library, let me know.

In the meantime, more general-purpose analyses are surfacing.

Shadowed Members

My favourite script so far is the member-shadowing checker. I ran into a member-shadowing warning that is unique to Sun’s C++ compiler. It was triggered by some code that I just landed on the tree. I fixed the warning, but within a few days a coworker ran into a bug caused by that member shadowing(due to having an unlucky revision of the code). The following example shows how simple it was to implement the warning in GCC/Dehydra.

See bug 522776 for the complete story on adding the member shadowing check to Mozilla.

Printf
Another general purpose analysis was done outside of Mozilla by Philip Taylor for his game. His script checks wide printf format strings (which are overlooked by gcc).
Independently, Benjamin wrote a printf checker for Mozilla printf-like code, see bug 493996.

Custom Sections in Object Files
We have long speculated about how nice it would be if Dehydra could emit info into object files that could then be yanked out of the resulting binary (by say, valgrind). bug 523435 will soon make that a reality.

Update:
Two photos from FSOSS.


18
Sep 09

Enforcing Inheritance Rules

While writing C++ sometimes one wishes that one could squeeze a little more out of the type system. In this particular case, Zack Weinberg (layout-refactorer extraordinaire), wanted to make sure that certain methods always get overridden in derived classes. Unfortunately, in that particular design, those methods were not pure-virtual. At this point most C++ hackers would cry a little and move on without any compiler assistance.

Instead of crying, Zack added a NS_MUST_OVERRIDE attribute to methods along with a matching Dehydra script. See the source code and the bug for how simple it can be to extend C++ with a useful new check.

Nothing makes me happier than seeing developers land big code changes and accompany them with compiler checks instead of relying on programming folklore to maintain important invariants.


03
Aug 09

This Month In Static Analysis

Lately I have been focusing on optimizing Fennec startup on a delightfully inadequate platform: Windows Mobile. More on fascinating startup, performance problems and solutions later. As a result I have been doing relatively little static analysis stuff.

The main reason for taking a break is that I feel that I went from having no way to do any analysis to having production-quality tools for analysis and rewriting.  I finally have a chance to move on from developing tools to using them in everyday development. The main puzzle piece that needs completion is GCC 4.5 support in Dehydra. We are feature-complete on 4.5, just need to stabilize once the trunk stabilizes.

Drowning In Pork

A number of other people did some cool stuff in the meantime. First and foremost: Joshua Cranmer has ventured into the land of Pork and is publishing a guide to doing refactoring tools on this blog (part 1, part 2, part 3). This is cool, because until now, there were no Pork docs and nothing I write could ever match Joshua’s documenting talents.  Thanks a bunch, Joshua.

I have also received my first-ever bugfix patches to Elsa. Previously, I’ve received miscellaneous build fixes, etc, but these are the first patches that involved somebody pounding their head against the wall until they figured out why things were crashing or not accepting valid C++ code.

Introducing Dan Witte

Dan is the new static analysis go-to person. So far he facilitated an explosion of static analysis ideas (they are tracked in bug 430328). A lot of these can be expressed as <10line Dehydra analyses, so they are excellent introductory projects. If you are dying to start analyzing code, but don’t know where to begin, look in that bug. Dan has written an interesting analysis to do with finding accidental temporaries due to C++’s “wonderful” implicit conversions/etc (expect to see a blog post on that). He is also working on the holy grail of Mozilla static analysis: a full callgraph. It’s a little embarrassing that we don’t have that yet, but it’s hard and once we do have it, a whole new world of analyses will be possible.

Speaking of Callgraphs…

So while various Mozillians were pondering how awesome it would be to do inter-function analysis, an intern has beat us to writing the first useful inter-function analysis! Sully had a problem, after a tiny bit of  Dehydra coaching, he solved his problem in the amount of time it took me to eat my lunch. Brilliant! See his blog post for details. My conclusion: either Dehydra is pretty easy to use and/or we get mad genius interns :).


29
Jun 09

DXR: The Most Impressive Code Navigation Tool Ever

Are you are a developer who has been frustrated by the pathetic state of the art in code search engines and code browsing experience on the web? Have you been longing for being able to view code with more aid than coloured words?

David Humphrey just released his DXR bombshell. The “basic” concept behind DXR is extraction of the rich semantic information gathered by tools like GCC, Spidermonkey and xpidl. This data is then coherently linked together into a pretty UI in order to provide cleverness during code browsing sessions.

DXR will be happy to answer seemingly trivial queries:

  • List implementations of interfaces in C++ (and soon JS).
  • Provide relevant search results by searching semantic data first. No, grep is no longer state of the art for searching code.
  • Switch between definition and declaration.
  • Walk up/down class hierarchies.
  • Lookup typedefs, types, etc.

I’ve been wanting to see this sort of tool built on top information exposed by Dehydra since I got it working. Words can not express how pumped I am about DXR and the magic powers that we will be granting it.


25
Jun 09

Dehydra & Pork Sources Moved

I moved dehydra to a more official location, please update your scripts and hg settings.
New dehydra url:

http://hg.mozilla.org/rewriting-and-analysis/dehydra/

Pork got reshuffled during the move, it’s now 2 repositories. oink is dead. It now depends on current versions of flex (as opposed to flex-old) and features a cleaned up buildsystem.

New way to checkout pork:

hg clone http://hg.mozilla.org/rewriting-and-analysis/pork
hg clone http://hg.mozilla.org/rewriting-and-analysis/elsa pork/elsa


11
May 09

Dehydra Updates

How well are you packing your structs?
Arpad asked that question with an awesome Dehydra script and came up with an interesting list.

GCC 4.3.3 Is supported
GCC 4.3.3(4.3.[210] worked) broke C++ compatibility in the headers used by Dehydra.
Zach pointed out that passing -fpermissive to g++ solves the problem. Sorry to all the people who had issues building the hydras with GCC 4.3.3, that’s fixed now.

As I mentioned before, we are skipping GCC 4.4 support in Dehydra and aiming for supporting unpatched GCC 4.5. I wish that the small GCC patches were as quick to land as that big one I landed a couple of weeks ago :(.