24
Jan 07

Will Rename Class Members for Food

Squash may now be ready as a class member renaming tool for early adopters. I would like people to use me as a frontend to squash. Email me your requests for renames and I will reply with giant patches. This way squash can be immediately useful. Plus I can fix bugs in squash and figure out actual usecase while I get the frontend set up.Progress

Squash can now produce a good looking 92K patch for renaming nsIFrame::GetPresContext. This means that squash can now correctly traverse 167 files and produce a patch that affects 103 of them. I am going to work on the web frontend next.

Some issues below.

Continue reading →


11
Jan 07

Squash Progress and Plans

Out-param Rewriting Work

Since the last post I worked on rewriting functions that use out-parameters to use return values instead. I got as far as rewriting method definitions and simple call sites, but decided to hold off further work until the rest of squash is more complete.

Squash Development Roadmap
Robert O’Callahan helped me devise a near term roadmap. I am going to focus getting squash to be production quality for member renames and to produce commit-quality patches. An example query would be to rename sIFrame::GetPresContext to nsIFrame::PresContext. This involves a couple of big details:

  • Produce aesthetically pleasing code via text substitution instead of oink pretty printing. The advantage of this is that the original coding style, comments and indentation will all be preserved. This involves reparsing the resulting code to verify correctness (doubles-memory usage & processing time).
  • To produce a complete patch squash needs to process all of the relevant source code. This increases memory usage and processing time linearly. I’ll use grep to narrow down candidates for processing and in the future will use a AST database of mozilla to figure out exactly what needs changing.
  • It is useful to be able to process all interesting source code in one invocation but just processing the layout/generic directory sequentially uses over 2GB of RAM (Elsa’s AST does not support deallocation) and takes 3 minutes on a quad Opteron. So in order to reduce RAM usage and be a trendy multi-core developer I’ll fork() a process for every file and use that for both parallelism and memory cleanup purposes.
  • Develop a web frontend that maintains an up-to-date mozilla source tree and has squash setup on it where one would be able to enter their rename operation and have patch emailed back to them. Rob even had a cool idea to have the user enter a bugzilla id and have the patch automatically attached to that. This will be useful so I don’t have to work so hard on packaging squash and users will get instant gratification. Plus people without quad Opterons will be able to test squash too :)

All that is Milestone 1. After that I’ll work on infrastructure like AST-node-location info, cleaning up pretty printing and defining the exact goal for the next milestone.

Current Status

Over the past 3 days I refactored squash to be able to do renames without having to go through class squashing, etc. I added the ability to rename class members and now it can produce ugly patches for that.

The current workflow to rename nsIFrame::GetPresContext to nsIFrame::PresContext is:

  1. Identify possible targets
    find ~/work/ff-build -name \*.o |xargs grep nsIFrame > /tmp/output.sh
  2. My sed is rusty so I used regexps in Kate to convert resulting lines into something like
    make -C ./layout/generic/ nsSpacerFrame.i
    make -C ./layout/generic/ nsFrameSetFrame.i
    make -C ./layout/generic/ nsBlockFrame.i
  3. Run the script to produce the needed .i files
    . /tmp/output.sh
  4. Grand-finale:
    find ~/work/ff-build/ -name \*.i |time xargs ./squash -o-lang GNU_Cplusplus -sq-implementation nsIFrame -sq-no-squash -sq-rename-member GetPresContext PresContext > nsiframe.diff
    Note that find outputs absolutely filenames which is essensial for squash to resolve relative include files.

The setup and squashing itself is a bit laborious and RAM/CPU intensive and is the reason for a web frontend. I am going to be ecstatic once this all works.


03
Jan 07

Squashed and Compiled

Squash Milestone Reached
Squash can now produce a patch that squashes my testcase class nsCSSLoaderImpl into the nsICSSLoader interface such that the resulting code compiles, links and runs!

Gory Details
Patching function bodies turned out easier than expected. Since the last post, I’ve added the ability to rewrite variable declarations, casts and static method calls. This was enough to get nsCSSLoader.cpp compiling.

I also ran into an issue where some methods need to remain virtual such that they can be referenced from other modules. I added a -sq-virtual flag to specify method names which need to stay virtual.

I discovered that the implementation class can be used from other source files so now squash can work on multiple files. Unfortunately, this made me run into another Elsa misfeature: memory allocation. Elsa data structures do not attempt to clean up in their destructors. Once an AST is produced, it will remain in memory for the duration of execution. This is an issue, because merely parsing all of the .i files in layout/style/ takes over 600M of memory even though squash is strictly sequential and processes a single file at a time. Hopefully, converting Elsa to use auto_ptr is feasible and I wont have to resort to funny fork() tricks to reclaim memory.

ML vs C++ for Compilers: Rant
I wonder why people insist on using C++ for symbolic manipulations instead of an *ml like O’Caml and either give up or, more frequently, reinvent features such as the ml type system, list processing, garbage collection or pattern matching. Isn’t it more productive to not have to deal with segfaults, slow compilation times and have a tenfold reduction in code size?

Squash Usage
First I grep the build directory for usage of CSSLoaderImpl. This imprecise and will eventually be handled by squash itself, but first the memory deallocation issue has to be addressed or an index of the whole sourcetree needs to be built.

find -name \*.o | xargs grep CSSLoaderImpl

This returned nsCSSLoader.o and nsLayoutStatics.o . Now .i files are produced by running make in their respective directories

make nsCSSLoader.i
make nsLayoutStatics.i

For convenience I gather the .i files in a moz directory and run squash.

./squash -o-lang GNU_Cplusplus -sq-exclude-include string/nsTString.h -sq-include nsString.h -sq-include nsCOMArray.h -sq-virtual LoadSheetSync -sq-virtual LoadSheet -sq-implementation CSSLoaderImpl moz/nsCSSLoader.i moz/nsLayoutStatics.i > cssloader.patch

Turns out pretty printing C++ is hard and Oink/Elsa’s pretty printer still needs a lot of work. By producing patches and only rewriting part of the code squash rewrites only the code that needs changing. This avoids pretty printer bugs, and maximally preserves comments and the original code structure. The wackyness of the pretty printed code is apparent in the cssloader.patch, especially in the function bodies.

Future Work
I am happy to see that patching is viable even without precise source coordinates or preprocessor support in Elsa. My near term goals for squash are:

  • Push squash upstream
  • Add the ability to translate out-parameters to return values where possible
  • Get a list of candidates for DeCOMtamination and improve squash enough to process all of them
  • Work on a source code indexer. This would be useful to both squash as a semantic grep database and could be used to improve lxr.

In the longer term, I would also like to see some Elsa changes:

  • Figure out a memory de-allocation strategy
  • Resolve the pain that is caused by Elsa having own “string” class which makes using STL an exercise in namespace verbosity. If Elsa were to switch to C++ strings, the above de-allocation job would be simplified too.
  • Elsa is lossy when it comes to C++ extensions: may need to extend the Elsa AST a little
  • It would be nice to improve Elsa memory consumption further. This would be hard.
  • It would be great to make Elsa’s C++ const-correct