Jul 07

Superity Complex & Static Analysis

It is always frustrating to see a compiler complain about something trivial. It is especially annoying since most of the trivial complaints are trivial to fix automatically (eg. superfluous semicolon).

I think this is a bigger problem in the static analysis industry. Vendors/researchers ship their tools with a superiority complex built-in. Most of the error messages produced by error checking tools can be paraphrased as “Gee look, I found some trivial to fix bugs in your code, but I ain’t gonna do nothing about them! Neeener! Go worker-human!”

My policy is to make my tools more polite than that. Starting from prcheck, all of my tools will point out simple errors by suggesting patches (when possible). It is impossible to produce a correct patch every time, but I am not worried about that since developers are quite good at disregarding stupid suggestions.

Automatic Whining

Now I have a few extra scripts that lay the foundation for regular code inspection via static analysis. PRBool checks are my first step. Here is a sample email:

Continue reading →

Jul 07

CPP-Aware C++ Rewriting Can Be Fun

Recently MCPP started working well enough to process all of Mozilla with the special macro-expansion undoing markup. Below is the most exciting patch line I have produced so far. A few months ago, I didn’t think was possible to rewrite macro parameters with elsa.

- ENCODE_SIMPLE_ARRAY(PRBool, Bool, (PRUint16) values[i]);
+ ENCODE_SIMPLE_ARRAY(PRBool, Bool, 0 != ((PRUint16) values[i]));

To produce this:

  1. Elsa consumed the expanded .ii file
  2. Boolean checker found the problematic subexpression
  3. Original source position was calculated from the expansion log
  4. Then prcheck figured out that only the macro parameter needs to be rewritten.
  5. Original source was read in and enclosed within 0 != ()

Integrating Static Checks – PRBool check

This bug contains the rest of the rewriting results. Prcheck checks every assignment to a prbool and outputs a patch if the expression may evaluate to anything other than 0/1. A message is produced if a patch can not be generated because the expression is within a macro.

The goal is to have prcheck run on every commit(or every day) and send out an email or another form of a complaint when a new prbool violation is introduced. Since the current prbool misuses aren’t going to be fixed overnight, there should be a db kept somewhere with the old misuses to prevent the new ones from being introduced and old ones from being annoying. I’m thinking of associating misuse counts with a filename, so when the number goes up an error gets mailed out. Anyone get a better idea?

Latest Recipe to Running Oink Tools

  1. If you want to rewrite sources either wait for the next mcpp release or use my tarball.
  2. Install gcc 3.4.x. If using mcpp, configure mcpp with –enable-replace-system so it replaces the gcc preprocessor
  3. Checkout and build my oink version.
  4. Build mozilla with -save-temps gcc option. If using mcpp set environment variables CC=”gcc -Wp,-K -Wp,-W0 -save-temps ” and CXX=”g++ -Wp,-K -Wp,-W0 -save-temps ” and do make -f client.mk build as usual. The extra gcc parameters tell it to save intermediate files to pass -K and -W0 to enable macro expansion annotation and silence warnings.
  5. Run tools individual files
  6. Produce a global patch with ./pork-barrel 4 /path/to/prcheck /tmp/input.txt where input.txt is a list of absolute filenames of all of the .i and .ii files produced by the build process

Jul 07

Pondering Pre/Post Conditions to Enforce Software Correctness

That other software company has a some brilliant people working on verifying their source code. Their approach is hardcore and seems quite good. Their recipe to robust C# (applying this to C is research-in-progress, C++ isn’t yet considered) code is:

  • sprinkle pre/post-conditions for functions. Reduce annotation labour by additional rule inference
  • dataflow analysis to verify type-system stuff like not-null annotations
  • Abstract interpretation + model checker to enforce pre/post-conditions
  • BoogiePL intermediate language. Anything can be accomplished by an awesomely named intermediate language!

Resulting code runs fast has a lot fewer bugs than code without all this fancy machinery. Unfortunately, the benefits can only be reaped if a whole program gets annotated. Otherwise, the model checker gets frustrated. Essentially the language being checked is turned into another much safer (more boogie!) language. I’m looking forward to the day (that will never come) when all new C code is written this way.

If adding these to source code makes you rewrite the whole program, wouldn’t it be easier to invent another low-level language that comes with safety by default?

I am guessing that the dehydra method of checking for a relatively small number of application specific bugs is the most practical approach available and it will get us 60% towards Boogieing. Now that elsa has support for preprocessor macros it should be possible to specify non-formal pre/post-conditions to check for common errors. It’s merely a complicated question of defining what should be checked for.

I don’t have the motivation to write/buy/adopt/integrate a formal tool like a model checker, but it seems that it’s possible to write application specific checks in JavaScript that can run circles around what most model checkers can do.

That other company isn’t the only vendor working on this stuff. There brilliant people taking a somewhat different (and open source) approach.

Jul 07

Dehydra, prcheck, squash – in mercurial

New Repository

Since I do not yet have write access to oink svn, I have been doing all of my development in ad-hoc repositories within the svn checkout. This made it rather hard to collaborate with others. I finally got sick of the situation (and stumbled upon hgsvn) and converted all 11 svn repositories to mercurial. To my surprise, mercurial even let me merge my repositories while preserving history (hg has yet to fail me!).

oink uses svn-externals to aggregate the repositories into a single checkout. hg doesn’t have anything similar, so to checkout all 11 repositories use a script:

checkout.sh http://hg.mozilla.org

Released Differences from Oink Mainline

  • New oink tool – prcheck: ensures that bool-like integer typedefs behave like bools
  • New oink tool – dehydra: source query tool with queries specified in JavaScript
  • New oink tool – squash: source refactoring tool. This is now deprecated since most of the code in it dealt with working around elsa limitations to do with macro expansion & lack of precise locations. The patching engine used in squash lives on to provide a simple refactoring API for use in other tools (like prcheck).
  • Minor grammar changes to parse more of Mozilla
  • Compilation fixes for OSX
  • Elsa fixes to parse OSX headers
  • make -j support for elsa
  • end-of-ast-node location support for elkhound & elsa
  • preprocessor expansion markup support for elsa

Coming Soon

  • Amazing new version of MCPP capable of preprocessing mozilla while outputting refactoring-friendly annotations.
  • Web front-end for squash which will likely be refactored to be tool-agnostic.
  • Front-end to run patch-producing tools in parallel for multi-core machines

Near Future

  • squash will be split up into a library with each major feature ripped out into a standalone tool. Two tools coming soon:outparam rewriter & class member renamer.
  • RAD for static analysis: oink tool templates to make it trivial to write custom new tools with minimal amount of boilerplate

Some time in the Future

  • Collaboration with the author of Olmar to provide an OCaml API for Elsa. If everything goes as expected it will be possible to write analyses that are more powerful and more concise than DeHydra ones except they will perform at C/C++ speeds. Plus it should be possible to perform them from a native interactive OCaml toplevel. Most of this work already exists in bits and pieces. It’s a matter of adding some AST transformations, fixing a few issues and tying it all together.
  • MapReduce inspired front-end: generic framework for executing transformations/analyses in-parallel and Mozilla-wide without blowing the 32bit address space (as it typical when static analysis tools meet Mozilla).