Lossy AST Traversals for Good Code Health

Two weeks ago I finally listened to Graydon and spent some quality time playing with UNO and reading the paper on it. UNO provides a simple DSL language to traverse interesting parts of the C abstract syntax tree. It simplifies the AST down to the bare minimum of variable and function call info and iterates user-defined scripts over that.

Since I was looking for a way to get away from C++ for code analysis I was very excited and decided to implement the ideas in UNO in my own tool which goes by a temporarily name of “dos”. There limitations in UNO: it has no hope of parsing C++ without switching parsers and the DSL is rather limited. Thus, dos is built on the incredible Elsa/Oink C/C++ parser and uses SpiderMonkey to provide JavaScript as a scripting language.

Status

  • dos builds a Control Flow Graph from the Elsa AST. It isn’t finished and while for/while loops, break/continue/return and if statements are supported (as opposed to the trinary operator or goto), but the CFG isn’t quite right yet. However, it’s good enough for proof of concept purposes.
  • Dos does not do DFS traversal when iterating the script like UNO, but instead iterates through statements sequentially because I did not feel like writing my own traversal of the massive Elsa AST.
  • I mostly implemented an UNO-compatibility JS lib in system.js which should allow for easy ports of UNO analyses like malloc.js. It’s missing the negation conditions from UNO, but those should be quick to implement.

Instructions

  1. Download my oink tarball with the squash & dos additions.
  2. Install ossp spidermonkey distribution:
    cd dos_and_squash-0.0/js-1.6.20060820 ; configure –prefix=/usr (or use /usr/include) && make install
  3. Build the oink suite:
    cd ../oink ; ./configure && make
  4. I ported a simple UNO analysis which checks that malloc() is always paired with a corresponding free() statement. To run it:
    ./dos -dos-javascript malloc.js simple.cpp
    Now modify line 12 of simple.cpp by putting ; after the while statement. The error will be detected:
    func:test
    Error at simple.cpp:13: malloc without free

Writing Scripts

Take a look at simple.js for a barebone analysis. There should be at least 2 functions in every script uno_check(vars, state) and path_end(state). uno_check() is called for every statement in a control flow path. vars is an array of interesting variables and function calls in the current AST statement and state is where the script should store all state info. state gets copied on CFG block transitions with an optionally user-provided clone() function (see system.js for an example). path_end() is called when the end of the CFG path is reached.

./dos -dos-javascript simple.js simple.cpp # this runs the user script

UNO-compatibility

UNO and the corresponding paper on it is currently the best documentation for dos. Thus I also assign vars and state to the global object to achieve UNO-like scripting feel. See malloc.js and system.js for details.

Goals

  • My foremost goal is to find a better name for dos. I’m open to suggestions that are both humorous and somehow related to source analysis.
  • Easy: finish UNO compatibility functions. I’ll do that as I port more scripts, but I wont mind if anyone does it for me.
  • Improve the CFG, do some value analysis to reduce the graph and provide more info to the scripts
  • Easy: Add types to the variable info, should get dos closer to doing the GC safety-analysis.
  • Reuse the JS parser to build a JS CFG to allow the same kind of analyses for JavaScript. This would be great for verifying API usage in extensions. For bonus points one can allow the user to tie the JS and C/C++ analysis together for cross-language analyses.
  • UNO has a fairly complicated global variable analyses that is not very scriptable and thus boring. I would be interested in scripting intra-procedural analyses to figure out call-graphs, do some sort of behavior inference (ie figure out indirect malloc calls, certain other heap modifying behaviors and propagate them as attributes to function calls), etc.

Update: New name for dos -> DeHydra

The tool is now named DeHydra since it is supposed identify bugs in the multi-headed Control Flow Graph monster.

Comments are closed.