13
Jul 07

Dehydra, prcheck, squash – in mercurial

New Repository

Since I do not yet have write access to oink svn, I have been doing all of my development in ad-hoc repositories within the svn checkout. This made it rather hard to collaborate with others. I finally got sick of the situation (and stumbled upon hgsvn) and converted all 11 svn repositories to mercurial. To my surprise, mercurial even let me merge my repositories while preserving history (hg has yet to fail me!).

oink uses svn-externals to aggregate the repositories into a single checkout. hg doesn’t have anything similar, so to checkout all 11 repositories use a script:

checkout.sh http://hg.mozilla.org

Released Differences from Oink Mainline

  • New oink tool – prcheck: ensures that bool-like integer typedefs behave like bools
  • New oink tool – dehydra: source query tool with queries specified in JavaScript
  • New oink tool – squash: source refactoring tool. This is now deprecated since most of the code in it dealt with working around elsa limitations to do with macro expansion & lack of precise locations. The patching engine used in squash lives on to provide a simple refactoring API for use in other tools (like prcheck).
  • Minor grammar changes to parse more of Mozilla
  • Compilation fixes for OSX
  • Elsa fixes to parse OSX headers
  • make -j support for elsa
  • end-of-ast-node location support for elkhound & elsa
  • preprocessor expansion markup support for elsa

Coming Soon

  • Amazing new version of MCPP capable of preprocessing mozilla while outputting refactoring-friendly annotations.
  • Web front-end for squash which will likely be refactored to be tool-agnostic.
  • Front-end to run patch-producing tools in parallel for multi-core machines

Near Future

  • squash will be split up into a library with each major feature ripped out into a standalone tool. Two tools coming soon:outparam rewriter & class member renamer.
  • RAD for static analysis: oink tool templates to make it trivial to write custom new tools with minimal amount of boilerplate

Some time in the Future

  • Collaboration with the author of Olmar to provide an OCaml API for Elsa. If everything goes as expected it will be possible to write analyses that are more powerful and more concise than DeHydra ones except they will perform at C/C++ speeds. Plus it should be possible to perform them from a native interactive OCaml toplevel. Most of this work already exists in bits and pieces. It’s a matter of adding some AST transformations, fixing a few issues and tying it all together.
  • MapReduce inspired front-end: generic framework for executing transformations/analyses in-parallel and Mozilla-wide without blowing the 32bit address space (as it typical when static analysis tools meet Mozilla).

12
Jun 07

Undoing CPP Expansion in 3 simple steps. Say “Hello” to easier C++ rewriting.

This is incredibly exciting: I believe that I finally solved the messy and mind-numbingly boring CPP/C++ integration problem! Having code displaced or generated due to CPP-expansion should no longer be a fatal problem for Squash. I believe macro-expansion is (or was) the single biggest problem between me and large-scale automated refactoring of the Mozilla codebase.

What’s even more exciting is that I think my solution is both incredibly simple to implement and more general than prior work. Most other tools combine the CPP expansion & C parsing into a single step and then integrate (or should I say violently shove?) CPP constructs into the AST. This results in complete lack of separation between preprocessing and program analysis. For example, due to this tight coupling existing solutions were useless to me because the fancy CPP logic could not be separated from the C parser. I would also have a hard time submitting a more convoluted C++ parser upstream to the Elsa maintainer.

Design

There are three parts to my solution:

  1. Critical component. A CPP expansion undo-log injected during CPP-expansion by a modified C preprocessor (upcoming version of MCPP). The statements are wrapped in C comments such that the preprocessed result can be parsed by any C/C++/etc parser or compiler. Implementation-wise this is the hardest part since MCPP(as most other C proprocessors) was never designed it keep track of macro expansion info.
  2. A small modification to the Elsa lexer to parse the undo-log and set it aside in a separate data structure.
  3. Tricky. A function that utilizes the cpp undo-log to map the preprocessed source locations to the unpreprocessed ones. This is a a ridiculously simple solution to a tricky design problem of how to efficiently advertise the fact that every AST node has at least 2 different source positions (pre expansion, post expansion & a stack of positions resulting from expanding nested macros).

The MCPP maintainer is almost done with 1. I have a prototype implementation of 2 & 3 weighing in at less than 500lines. Now that the design phase is complete, the amount of changes to Elsa is trivial, so I should be done with those real soon now.

Looking Ahead

Now I need to modify Elsa to retain more precise source locations. This includes adding end-of-ast-node-location and adding positions to nodes(such as expressions) that don’t even have a start position at the moment. This combined with cpp-undo-log enhanced precise positions should allow for code rewrites to retain as much original source code as possible. This reduces the amount of ugly machine-generated code and results in better correctness (existing code is likely to work).

CPP Undo-log Example

The undo-log took a couple of tries to get right. Now macro-parameters have a notion of scope and sensible names. The following example features macro-induced column displacement and macro-expansion causing line shrinkage.

#define NULL 0L
#define FOO(a, b) a + b
int i = NULL; int j;
int k = FOO(
FOO(NULL , 1),
2);

Preprocessed version

# 1 "testcase4.c"
/*mNULL 1:8-1:15*/
/*mFOO 2:8-2:23*/

int i = /*<NULL 3:8-3:12*/0L/*>*/;
# 3 “testcase4.c”
int j;
int k = /*<FOO 4:8-6:3*//*!FOO#0-0 5:0-5:13*//*!FOO#0-1 6:1-6:2*//*<FOO#0-0*//*<FOO*//*!FOO#1-0*//*!FOO#1-1*//*<FOO#1-0*//*<NULL*/0L/*>*//*>*/ + /*<FOO#1-1*/1/*>*//*>*//*>*/ + /*<FOO#0-1*/2/*>*//*>*/;

Conclusion

It took a lot to arrive at such a simple solution. I expect that all of my work is likely to end up upstream in BSD-licensed projects: MCPP & and Elsa/Oink. I sincerely hope that other people will be able to build on it for their CPP-infested analysis needs and avoid the unbearable mind-numbing discomfort associated with making CPP play along.


11
May 07

CPP Strikes Back

I have gotten used to dodging CPP-expansion issues by fudging column & line information until the position info in squash mostly matches the source positions in the original source code. That sufficed for rewriting declarations, but I have finally hit a brick wall. Continue reading →


10
May 07

Nicely rewriting outparams

Automatic code rewriting business can be a little depressing sometimes. I tend to run into funny issues caused by CPP, oink limitations or just unpleasant-to-rewrite parts of C++. After banging my head against the wall due to all these issues I finally arrived at a workable approach for the easy part of the outparam rewrite.

Continue reading →


07
May 07

Status Update: Outparam work

Squash Outparams

The following took me a few days to achieve.

./squash -sq-rewrite-outparams out2.txt -sq-implementation nsBidiPresUtils -sq-no-squash -o-lang GNU_Cplusplus ~/work/ff-build/dom/src/base/nsFocusController.i

where out2.txt contains instructions on which functions to modify

nsFocusController::GetFocusedElement,0=mCurrentElement,

produces

--- /Users/tarasglek/work/mozilla/dom/src/base/nsFocusController.h
+++ /Users/tarasglek/work/mozilla/dom/src/base/nsFocusController.h
@@ -72,1 +72,1 @@
- NS_IMETHOD GetFocusedElement(nsIDOMElement** aResult);
+ nsIDOMElement* GetFocusedElement();

This still doesn’t add the already_AddRefed or other important attributes, but that should be easy. The result looks simple, but getting squash from working with a testcase to an actual source file was a little on the painful side.

After my experience with renaming I have realized that squash should avoid the C++ pretty printer for now. Thus the result is produced in a verbose AST-sensitive regexp-like way. However figuring out where things start and end is incredibly painful due to the presence of the preprocessor.

My plan is to get squash rewriting some basic Mozilla code the painful way and then I use what I learned to integrate mcpp along with the much coveted end-of-ast-node info into elsa.

JavaScript is an AST’s Best Friend

Continue reading →


01
May 07

Status Report

Automated Analyses and Rewrites

Dehydra and Squash are now mature enough to assist with mundane tasks like renames and various kinds of tedious code inspection. If you ever suspect that part of the Mozilla hacking you are doing could be done by a tool, contact me to see if I have a suitable tool for you.

Also, these tools are in no way limited to working with Mozilla source code. I would be happy to see people use them for other projects too.

Short-term Plans

For the next week or two I plan to focus on out-parameter rewriting and the Mozilla-wide C++ callgraph.

Mozilla-wide Callgraph
This is proving to be a little painful. Things work for basic test-cases, but I am running into scalability issues with Mozilla (as expected). My current approach of serializing everything into a giant JSON graph blows the 32bit address space after a few hundred files. Even doing a Mozilla-wide inheritance graph causes out of memory errors, but that runs almost to competition. The best solution to this will be to break up the graph into as many smaller JSON files as possible and only load ones that are absolutely required into memory.

The callgraph will be a useful starting point for many other useful analyses (dead code one is going to be lots of fun) and it’s a good test of dehydra’s scalability, but I have suspended work on it for a few days to focus on more productive tasks.

Out-parameter Rewriting

Due to XPCOM, Mozilla getters typically return an error code and a value via an out parameter. This requires checking the error code and likely propagating it at the callsite.

For many places in the code there are performance and aesthetical reasons to stop using error codes. Brendan talks discusses some reasons here. This would be cool stuff, but switching to exceptions isn’t going to happen right away. However, I can already start working on my tools to assist with simpler cases (like nsBidiPresUtils::GetBidiEngine?). I’m focusing on getters that return NS_OK/(some error) and a value and rewriting them to return NULL on error and non-NULL on success. This could be ready in time for Firefox 3. Once I’m done with the tool, I’ll just need someone to help me figure which functions are ok to simplify like that.

I suspended work on out-param rewriting some time ago. It was proving to be too complicated to do within squash. Now that I can use dehydra to verify the control flow graph, things are a lot simpler. Current plan is to have the dehydra script produce a list of candidates for out-param surgery and have squash consume that list and produce the appropriate patches. Currently, the script works for some very simple cases and I am working on the squash side.

Smaller Tasks

  • Sayrer’s uninitialized member analysis: added more complete constructor support to dehyra, wrote a sample script to get sayrer started. Fixed dehydra’s 64bit support. Bug 378763
  • Made some squash-generated patches for bz, helped me find a bug in squash. Bug 378780
  • Pushing squash upstream into oink. This is time consuming because it is a combination of legal and many minor technical issues. Dehydra will follow later.

02
Apr 07

Automated Code Refactoring

Squash

If you are working on any C++ refactoring, especially if it involves function calls, spans multiple files or feels like you need a compiler in your head to help you, drop me a note to see if squash can help. Squash provides a great deal of control over the refactoring process because it is not tied to a particular IDE and can be customized to accommodate for special cases.

On Friday, two squash-produced patches landed:

  1. A 212K patch to rename nsIFrame::GetPresContext to PresContext. It took a couple of minutes to produce a patch for mac & linux, and then some manual labour to complete it so it builds on Windows too. Unfortunately, Microsoft C++ is not yet supported by Oink. Windows-specific code will require magnitudes more of human labour until such support is contributed.
  2. A much simpler patch to calls to remove uses of the deprecated ::Recycle(). This took a few minutes once I added support for renaming global functions to squash.

Dehydra

C++ support in dehydra is coming along splendidly. I started working on cross-function analysis support. Currently my goal is to allow the user to build callgraphs of Mozilla. The first application of that is going to be dead code detection.

In the meantime, contact me if you are looking for patterns in the code that grep wont help with : control flow-sensitive code, type & syntax-aware matching, API misuse, etc. Dehydra can probably help.


28
Feb 07

WebSquash

Looking for developers to test the web frontend for squash
I got the web frontend to squash working. Right now I’m looking for people to test it on my test server before I open it to the wild web. It ended up in a further frontend script explosion, but all of the pieces seem to make sense. As it stands right now there are 5 pieces:

  1. JavaScript client-side provides progress notification
  2. A PHP frontend to communicate with the stateful server
  3. Python server that handles command queuing, progress reporting and error handling
  4. Python library to build a list of possible candidates for squashing, produce the necessary .i files and an invocation command from squash
  5. Squash: the friendly neighborhood class member renamer

Passion of CPP: Macros are Considered Painful

In the process of testing the web frontend I updated the Mozilla sourcecode only to notice that Elsa can no longer parse files for tasks that worked before. At first I got a little discouraged thinking that I’ll have to teach Elkhound about yet another obscure C++ feature that wasn’t handled correctly before. However, turned out that in one case I was feeding squash a file that didn’t even compile and in the other 2 cases CPP was messing with my head.

The first case was the magic of CPP leading to unintentional code duplication and squash confusion: PR_MAX(GetPresContext()->PointsToAppUnits(0.5f), onePixel)
gets expanded and parsed as
GetPresContext()->PointsToAppUnits(0.5f) ? GetPresContext()->PointsToAppUnits(0.5f) : onePixel

I ended up putting in a special case teaching squash to not get upset if it can only find one of the two instances of class member to replace when PR_MAX is involved.

The second case was exciting. In my innocent perception of CPP wonder I thought that running g++ on a .cpp or a .i file produced from the said .cpp would result in pretty similar behavior. Not so.

PR_LOG(gLog, PR_LOG_DEBUG,
("xul: %.5d. %s %s=%s",
-1, // XXX pass in line number
NS_ConvertUTF16toUTF8(extraWhiteSpace).get(),
NS_ConvertUTF16toUTF8(qnameC).get(),
NS_ConvertUTF16toUTF8(valueC).get()));

yields

do { if (((gLog)->level >= (PR_LOG_DEBUG))) { PR_LogPrint ("xul: %.5d. %s %s=%s", -1, // XXX pass in line number NS_ConvertUTF16toUTF8(extraWhiteSpace).get(), NS_ConvertUTF16toUTF8(qnameC).get(), NS_ConvertUTF16toUTF8(valueC).get()); } } while (0);

Here the // comment ends up being promoted to being inside a line due to PR_LOG contracting and the resulting line won’t parse since half of it is commented out.

This kind of CPP mischief leads me to believe that something has got to give. If we are to embrace automated tools to aid in verification and development either CPP use has to be reduced considerably or Elsa needs to get a builtin preprocessor. I suspect the solution to this will involve a mixture of the two approaches.


24
Jan 07

Will Rename Class Members for Food

Squash may now be ready as a class member renaming tool for early adopters. I would like people to use me as a frontend to squash. Email me your requests for renames and I will reply with giant patches. This way squash can be immediately useful. Plus I can fix bugs in squash and figure out actual usecase while I get the frontend set up.Progress

Squash can now produce a good looking 92K patch for renaming nsIFrame::GetPresContext. This means that squash can now correctly traverse 167 files and produce a patch that affects 103 of them. I am going to work on the web frontend next.

Some issues below.

Continue reading →


11
Jan 07

Squash Progress and Plans

Out-param Rewriting Work

Since the last post I worked on rewriting functions that use out-parameters to use return values instead. I got as far as rewriting method definitions and simple call sites, but decided to hold off further work until the rest of squash is more complete.

Squash Development Roadmap
Robert O’Callahan helped me devise a near term roadmap. I am going to focus getting squash to be production quality for member renames and to produce commit-quality patches. An example query would be to rename sIFrame::GetPresContext to nsIFrame::PresContext. This involves a couple of big details:

  • Produce aesthetically pleasing code via text substitution instead of oink pretty printing. The advantage of this is that the original coding style, comments and indentation will all be preserved. This involves reparsing the resulting code to verify correctness (doubles-memory usage & processing time).
  • To produce a complete patch squash needs to process all of the relevant source code. This increases memory usage and processing time linearly. I’ll use grep to narrow down candidates for processing and in the future will use a AST database of mozilla to figure out exactly what needs changing.
  • It is useful to be able to process all interesting source code in one invocation but just processing the layout/generic directory sequentially uses over 2GB of RAM (Elsa’s AST does not support deallocation) and takes 3 minutes on a quad Opteron. So in order to reduce RAM usage and be a trendy multi-core developer I’ll fork() a process for every file and use that for both parallelism and memory cleanup purposes.
  • Develop a web frontend that maintains an up-to-date mozilla source tree and has squash setup on it where one would be able to enter their rename operation and have patch emailed back to them. Rob even had a cool idea to have the user enter a bugzilla id and have the patch automatically attached to that. This will be useful so I don’t have to work so hard on packaging squash and users will get instant gratification. Plus people without quad Opterons will be able to test squash too :)

All that is Milestone 1. After that I’ll work on infrastructure like AST-node-location info, cleaning up pretty printing and defining the exact goal for the next milestone.

Current Status

Over the past 3 days I refactored squash to be able to do renames without having to go through class squashing, etc. I added the ability to rename class members and now it can produce ugly patches for that.

The current workflow to rename nsIFrame::GetPresContext to nsIFrame::PresContext is:

  1. Identify possible targets
    find ~/work/ff-build -name \*.o |xargs grep nsIFrame > /tmp/output.sh
  2. My sed is rusty so I used regexps in Kate to convert resulting lines into something like
    make -C ./layout/generic/ nsSpacerFrame.i
    make -C ./layout/generic/ nsFrameSetFrame.i
    make -C ./layout/generic/ nsBlockFrame.i
  3. Run the script to produce the needed .i files
    . /tmp/output.sh
  4. Grand-finale:
    find ~/work/ff-build/ -name \*.i |time xargs ./squash -o-lang GNU_Cplusplus -sq-implementation nsIFrame -sq-no-squash -sq-rename-member GetPresContext PresContext > nsiframe.diff
    Note that find outputs absolutely filenames which is essensial for squash to resolve relative include files.

The setup and squashing itself is a bit laborious and RAM/CPU intensive and is the reason for a web frontend. I am going to be ecstatic once this all works.