28
Feb 07

WebSquash

Looking for developers to test the web frontend for squash
I got the web frontend to squash working. Right now I’m looking for people to test it on my test server before I open it to the wild web. It ended up in a further frontend script explosion, but all of the pieces seem to make sense. As it stands right now there are 5 pieces:

  1. JavaScript client-side provides progress notification
  2. A PHP frontend to communicate with the stateful server
  3. Python server that handles command queuing, progress reporting and error handling
  4. Python library to build a list of possible candidates for squashing, produce the necessary .i files and an invocation command from squash
  5. Squash: the friendly neighborhood class member renamer

Passion of CPP: Macros are Considered Painful

In the process of testing the web frontend I updated the Mozilla sourcecode only to notice that Elsa can no longer parse files for tasks that worked before. At first I got a little discouraged thinking that I’ll have to teach Elkhound about yet another obscure C++ feature that wasn’t handled correctly before. However, turned out that in one case I was feeding squash a file that didn’t even compile and in the other 2 cases CPP was messing with my head.

The first case was the magic of CPP leading to unintentional code duplication and squash confusion: PR_MAX(GetPresContext()->PointsToAppUnits(0.5f), onePixel)
gets expanded and parsed as
GetPresContext()->PointsToAppUnits(0.5f) ? GetPresContext()->PointsToAppUnits(0.5f) : onePixel

I ended up putting in a special case teaching squash to not get upset if it can only find one of the two instances of class member to replace when PR_MAX is involved.

The second case was exciting. In my innocent perception of CPP wonder I thought that running g++ on a .cpp or a .i file produced from the said .cpp would result in pretty similar behavior. Not so.

PR_LOG(gLog, PR_LOG_DEBUG,
("xul: %.5d. %s %s=%s",
-1, // XXX pass in line number
NS_ConvertUTF16toUTF8(extraWhiteSpace).get(),
NS_ConvertUTF16toUTF8(qnameC).get(),
NS_ConvertUTF16toUTF8(valueC).get()));

yields

do { if (((gLog)->level >= (PR_LOG_DEBUG))) { PR_LogPrint ("xul: %.5d. %s %s=%s", -1, // XXX pass in line number NS_ConvertUTF16toUTF8(extraWhiteSpace).get(), NS_ConvertUTF16toUTF8(qnameC).get(), NS_ConvertUTF16toUTF8(valueC).get()); } } while (0);

Here the // comment ends up being promoted to being inside a line due to PR_LOG contracting and the resulting line won’t parse since half of it is commented out.

This kind of CPP mischief leads me to believe that something has got to give. If we are to embrace automated tools to aid in verification and development either CPP use has to be reduced considerably or Elsa needs to get a builtin preprocessor. I suspect the solution to this will involve a mixture of the two approaches.


07
Feb 07

Dotting Pretty Graphs

It is difficult to follow error messages from control-flow analyzing tools. After struggling to visualize what I was debugging, I added a a native JavaScript function to graph the CFG and display the current path through it.

Now debugging control flow errors is as easy as looking at a (sometimes giant) picture of this function. The red represents the current flow and gray indicates flows that the script deemed correct.

Recipe

In this example I am checking that control flows through a particular point in the code (from line 1128 into line 1200). Since dos only deals with variables I added variables that it can match to the AST.

I added DFLOW_START to line 1127, DFLOW_STOP to line 1200 and produced my .i files with make CC=”gcc -DFLOW_START=’{int __flow_start;}’ -DFLOW_STOP=’{int __flow_start=1;}’” jsarray.i

Then I ran dos with the tiny analysis script: ./dos -dos-javascript ensure_out.js -o-lang GNU_C ~/work/ff-build/js/src/jsarray.i

Future Work
This should allow function-local dead code detection. Once dos is mature enough for function-local CFG traversals it will be interesting(but challenging) to try to expand this to detect dead functions or classes in Mozilla.


01
Feb 07

Lossy AST Traversals for Good Code Health

Two weeks ago I finally listened to Graydon and spent some quality time playing with UNO and reading the paper on it. UNO provides a simple DSL language to traverse interesting parts of the C abstract syntax tree. It simplifies the AST down to the bare minimum of variable and function call info and iterates user-defined scripts over that.

Since I was looking for a way to get away from C++ for code analysis I was very excited and decided to implement the ideas in UNO in my own tool which goes by a temporarily name of “dos”. There limitations in UNO: it has no hope of parsing C++ without switching parsers and the DSL is rather limited. Thus, dos is built on the incredible Elsa/Oink C/C++ parser and uses SpiderMonkey to provide JavaScript as a scripting language.

Status

  • dos builds a Control Flow Graph from the Elsa AST. It isn’t finished and while for/while loops, break/continue/return and if statements are supported (as opposed to the trinary operator or goto), but the CFG isn’t quite right yet. However, it’s good enough for proof of concept purposes.
  • Dos does not do DFS traversal when iterating the script like UNO, but instead iterates through statements sequentially because I did not feel like writing my own traversal of the massive Elsa AST.
  • I mostly implemented an UNO-compatibility JS lib in system.js which should allow for easy ports of UNO analyses like malloc.js. It’s missing the negation conditions from UNO, but those should be quick to implement.

Instructions

  1. Download my oink tarball with the squash & dos additions.
  2. Install ossp spidermonkey distribution:
    cd dos_and_squash-0.0/js-1.6.20060820 ; configure –prefix=/usr (or use /usr/include) && make install
  3. Build the oink suite:
    cd ../oink ; ./configure && make
  4. I ported a simple UNO analysis which checks that malloc() is always paired with a corresponding free() statement. To run it:
    ./dos -dos-javascript malloc.js simple.cpp
    Now modify line 12 of simple.cpp by putting ; after the while statement. The error will be detected:
    func:test
    Error at simple.cpp:13: malloc without free

Writing Scripts

Take a look at simple.js for a barebone analysis. There should be at least 2 functions in every script uno_check(vars, state) and path_end(state). uno_check() is called for every statement in a control flow path. vars is an array of interesting variables and function calls in the current AST statement and state is where the script should store all state info. state gets copied on CFG block transitions with an optionally user-provided clone() function (see system.js for an example). path_end() is called when the end of the CFG path is reached.

./dos -dos-javascript simple.js simple.cpp # this runs the user script

UNO-compatibility

UNO and the corresponding paper on it is currently the best documentation for dos. Thus I also assign vars and state to the global object to achieve UNO-like scripting feel. See malloc.js and system.js for details.

Goals

  • My foremost goal is to find a better name for dos. I’m open to suggestions that are both humorous and somehow related to source analysis.
  • Easy: finish UNO compatibility functions. I’ll do that as I port more scripts, but I wont mind if anyone does it for me.
  • Improve the CFG, do some value analysis to reduce the graph and provide more info to the scripts
  • Easy: Add types to the variable info, should get dos closer to doing the GC safety-analysis.
  • Reuse the JS parser to build a JS CFG to allow the same kind of analyses for JavaScript. This would be great for verifying API usage in extensions. For bonus points one can allow the user to tie the JS and C/C++ analysis together for cross-language analyses.
  • UNO has a fairly complicated global variable analyses that is not very scriptable and thus boring. I would be interested in scripting intra-procedural analyses to figure out call-graphs, do some sort of behavior inference (ie figure out indirect malloc calls, certain other heap modifying behaviors and propagate them as attributes to function calls), etc.

Update: New name for dos -> DeHydra

The tool is now named DeHydra since it is supposed identify bugs in the multi-headed Control Flow Graph monster.