26
Sep 08

Living Dead Code

The coolest part about my job is that I get to work on tasks that are cool, but typically are forever laid to rest in the would-be-cool-if-we-could-but-we-don’t-have-time-or-resources category. However as project code sizes increases, the usefulness/coolness ratio of static analysis grows and moves from would-be-cool, to nice-to-have to this-is-the-only-way. Now, I’m not sure where Mozilla lies on this scale, but I do know for sure that the giant codebase is an analysis treasure trove.

Dead Code Motivation

I have been talking about dead code detection in Mozilla for ages. As software changes, some pieces of code unintentially get left behind, but often there is no way to tell. It results in unnessary maintanance burden and increased footprint. In roc‘s case randomly spotting dead methods may also involve a little IRC griping at me about not having any tools for it, even rudimentary ones. Between that and blizzard’s cheering over reduced code size, I had no choice, but to give it a try once I had enough outparamdelling being reviewed.

Dead Code Results

First of, the approach described below seems to work well: see bug with initial results. Now I get a few thousand reported methods to inspect and refine the results.

Dead Code Approach

I’ve been pondering dead code detection since I’ve started working on this stuff. There are lots of ways to do it, but I finally settled on the dead-simplest one: method-level granularity. Ingredients are:

  • Dehydra/Treehydra extraction JavaScript: Dehydra makes it easy to enumerate methods and class hierarchies. Treehydra makes it possible to extract every mention of a method from the code(except for pointers to virtual functions that were casted…in that case we are left with vtable index and a type that the pointer was cast to). Additionally, Treehydra counts the number of AST nodes visited in every function body.
  • Shell script to aggregate the result of processing Mozilla source. Gotta admit, perl’s hashtables give GNU sort -u a run for its money.
  • Ocaml program to do the super-dumb algorithm. Classes with the same names, are assumed to be the same class. Method overloads are assumed to be a single method. All methods are assumed to be virtual. Every ClassType::FUNCTION_CALL call walks down to all children & up through all the parents and marks the derived methods as called. Then all derivatives of scriptable XPIDL are filtered out and the uncalled methods are printed out. In order to find most exciting functions first, methods in the results are sorted by their AST count =D.
    Language Nerd Trivia: Why OCaml – because ADTs are no fun in other languages. Why is my OCaml so bad – because I lost the hang out if due to not writing anything it for the past 2 years.

I’m sure most people reading this will go: “Wait a minute, this is unsound if you don’t see the virtual function pointers?”, to which I reply: “Once I nuke the method and mozilla compiles successfully, it’s sound”. In reality someone could make a puny GCC patch to preserve more data in virtual function pointer assignments.

Since this is all in early stages there a bunch of things I don’t deal with: method overloads, constructors, destructors and overloaded operators….and templates. All function bodies are scanned, but some function names are a pain to deal with or they aren’t straightforward function calls in GCC so they don’t participate in the dead/alive contest.

Where To Go From Here

Well, once we run out of dead code detected by the primitive caveman approach above, we’ll have to investigate less conservative approaches possibly involving abstract interpretation and callgraphs.

How To Get Involved in Screwing With Software Cost Models By Contributing Negative Line Counts

This project has a lot of places to help out:

  • work through the dead method list, filing bugs accordingly and deleting any related code
  • Extend the machinery to work on all non-static function
  • Try this on your favourite large codebase.
  • Write the virtual function pointer annotation patch :)

19
Sep 08

Outparamdelling this way comes

Recently I dusted off outparamdel to see if I can get some refactorings landed. About a year ago, with the great QueryInterface outparamdelling experiment we ended up with a smaller binary footprint and tiny performance gain, but that never landed due to us not wanting to break compatability yet. Ever since, outparamdel has been patiently bitrotting within Pork waiting for the day it’s allowed to break APIs.

Recently jst listed some private API candidates for deCOMtamination and I have been letting outparamdel loose on them for the past week. So far only one such change has been committed, the rest are sitting in jst’s review queue. It’s exciting, as it’s the first ever application of outparamdel that didn’t get canned. For the whole list see the most recent bugs blocking my analysis metabug.

Cool part about these patches is that after various outparamdel special-casing and manual cleanup the line count is reduced by 10-30% while maintaining existing functionality!

So far I haven’t touched much outside of content/, so if you know of any APIs that could have the outparam rotated into the return value, file some rewriting bugs against me.

Rewriting with Mercurial

Mercurial is now my favourite python program ever. It makes rewrites so easy. Here is my typical workflow:


# Write an outparamdel input file specifying functions to rewrite..run outparamdel with pork-barrel to generate patch
hg qimport -f autopatch.diff && hg qpush && hg qref #make the patch nice and readable
hg qnew manual.diff #create a manual cleanup patch for cosmetic touchups and rewrites screwed up by macros#of course to submit patch for review, those two patches need to be combined
#do manual stuff
hg diff -r 19277 #produce a combined diff without loosing ability to edit each applied patch individually! For example, can regenerate the autopatch.diff with different outparamdel parameters or a bugfixed outparamdel. 19277 is a revision that has to be looked up with hg log, unfortunately there isn't a relative syntax to do hg diff -r "tip - 2"

This is a huge improvement over the ad-hoc workflow prior to hg switch when I was prototyping my tools. CVS + quilt don’t hold a candle to a modern revision control system.

Prcheck

I also have been doing another round of prbool corrections. Somehow I didn’t notice that the system stopped working due to the CVS->hg switch. Once again when reviving my nightly checker scripts, it was a pleasure to substitute all of the CVS hacks with mercurial commands.

I am waiting for the few remaining largeish (ie 3-4 bugs per file) patches to land before I can start submitting whackamole bugs with a single prbool correction per module.

It also seems that cairo and every other pre-C99 project have the same set of issues as Mozilla with their typedefed-int boolean types. Perhaps prcheck isn’t mozilla-specific at all.


08
Sep 08

MUST_FLOW_THROUGH(“label”)

Some time ago, Igor mentioned that there is code in SpiderMonkey that pleads to the programmer that from a certain point in a function code must flow through a label(ie a finalizer block). Treehydra made it to possible to turn that weak plea into an error message when static checking is enabled. See the bug for more details. My favourite static analyses are all about turning informal “gurantees” into angry compiler complaints.

This is my first static analysis that landed in the mozilla-central tree. It’s also the simplest one and may be a decent starting point for solving similar problems. I’d be cool to see this particular feature utilized outside of SpiderMonkey. Unlike human-powered code-inspection, it excels at finding accidental early returns covered up by macros.


03
Sep 08

Error Presentation

Certain other cool open source projects are doing cool static analysis work. In this case, here is an analysis of one of my favourite operating systems projects, DragonFly BSD.

I’m blown away by the clean UI. The error filter and the interleaving of static analysis results in the source code are drool-inducing. This is powered by the clang checker. Clang’s checker doesn’t yet do C++, doesn’t do application-specific checks and has a lot of false positives, but it’s an exciting preview of things to come.

Oh and I hope that DXR will have similar analysis awesomeness. In the future, I hope to see static analysis become almost as common as unit-testing.


02
Sep 08

Converging Elsa Strains

One of the purposes of this blog is to inform people that while the original Elsa author is no longer actively developing it, Elsa is being used in production at Mozilla and is actively maintained within Pork.

Recently two previously unknown to me Elsa forks have come to my attention via comments on my blog. Both of these are extrimely cool and something we have been wanting:

  • ellcc C (and soon C++) compiler via Elsa + LLVM. I’ve heard of attempts to get this to work before, but this looks like it is much further along than similar efforts.
  • Alex Telia’s souped up elsa with parser error recovery and an integrated C preprocessor among other awesomeness. See this comment for more details. Some of these tools are built on this Elsa fork.

Both of these projects are interested in converging on a single codebase. It sounds like Alex’s work will be ready for merging soon.

I love open source.

I’m Back

Some might’ve noticed that I disappeared off the net for two weeks. I have a good excuse: I was getting married.