03
Jan 07

Squashed and Compiled

Squash Milestone Reached
Squash can now produce a patch that squashes my testcase class nsCSSLoaderImpl into the nsICSSLoader interface such that the resulting code compiles, links and runs!

Gory Details
Patching function bodies turned out easier than expected. Since the last post, I’ve added the ability to rewrite variable declarations, casts and static method calls. This was enough to get nsCSSLoader.cpp compiling.

I also ran into an issue where some methods need to remain virtual such that they can be referenced from other modules. I added a -sq-virtual flag to specify method names which need to stay virtual.

I discovered that the implementation class can be used from other source files so now squash can work on multiple files. Unfortunately, this made me run into another Elsa misfeature: memory allocation. Elsa data structures do not attempt to clean up in their destructors. Once an AST is produced, it will remain in memory for the duration of execution. This is an issue, because merely parsing all of the .i files in layout/style/ takes over 600M of memory even though squash is strictly sequential and processes a single file at a time. Hopefully, converting Elsa to use auto_ptr is feasible and I wont have to resort to funny fork() tricks to reclaim memory.

ML vs C++ for Compilers: Rant
I wonder why people insist on using C++ for symbolic manipulations instead of an *ml like O’Caml and either give up or, more frequently, reinvent features such as the ml type system, list processing, garbage collection or pattern matching. Isn’t it more productive to not have to deal with segfaults, slow compilation times and have a tenfold reduction in code size?

Squash Usage
First I grep the build directory for usage of CSSLoaderImpl. This imprecise and will eventually be handled by squash itself, but first the memory deallocation issue has to be addressed or an index of the whole sourcetree needs to be built.

find -name \*.o | xargs grep CSSLoaderImpl

This returned nsCSSLoader.o and nsLayoutStatics.o . Now .i files are produced by running make in their respective directories

make nsCSSLoader.i
make nsLayoutStatics.i

For convenience I gather the .i files in a moz directory and run squash.

./squash -o-lang GNU_Cplusplus -sq-exclude-include string/nsTString.h -sq-include nsString.h -sq-include nsCOMArray.h -sq-virtual LoadSheetSync -sq-virtual LoadSheet -sq-implementation CSSLoaderImpl moz/nsCSSLoader.i moz/nsLayoutStatics.i > cssloader.patch

Turns out pretty printing C++ is hard and Oink/Elsa’s pretty printer still needs a lot of work. By producing patches and only rewriting part of the code squash rewrites only the code that needs changing. This avoids pretty printer bugs, and maximally preserves comments and the original code structure. The wackyness of the pretty printed code is apparent in the cssloader.patch, especially in the function bodies.

Future Work
I am happy to see that patching is viable even without precise source coordinates or preprocessor support in Elsa. My near term goals for squash are:

  • Push squash upstream
  • Add the ability to translate out-parameters to return values where possible
  • Get a list of candidates for DeCOMtamination and improve squash enough to process all of them
  • Work on a source code indexer. This would be useful to both squash as a semantic grep database and could be used to improve lxr.

In the longer term, I would also like to see some Elsa changes:

  • Figure out a memory de-allocation strategy
  • Resolve the pain that is caused by Elsa having own “string” class which makes using STL an exercise in namespace verbosity. If Elsa were to switch to C++ strings, the above de-allocation job would be simplified too.
  • Elsa is lossy when it comes to C++ extensions: may need to extend the Elsa AST a little
  • It would be nice to improve Elsa memory consumption further. This would be hard.
  • It would be great to make Elsa’s C++ const-correct

05
Dec 06

Static Analysis

Introduction

My name is Taras Glek. I have been tasked with working on static analysis tools to automate deCOMtamination, verify code, etc.

Progress

C++ is a hard language to parse and no static analysis can be done before Mozilla source code be represented as an abstract syntax tree. My first step was to get the Mozilla trunk parsing with Elsa. With a small fix, Elsa now parses all files needed to build Firefox with -DNS_DISABLE_LITERAL_TEMPLATE (Elsa’s template support is incomplete).

Now that code parses we can move to the next step of writing tools that can analyze the Mozilla AST. These would prove certain parts of code correct or do source-to-source transformations such as the planned patch tool. More applications of static analysis here.
DeCOMtamination of the Gecko core is a time consuming task and is the main candidate for static analysis. I started formalizing the broad tasks collectively referred to as DeCOMtamination into an algorithm that could be implemented. I am prototyping the analysis in O’Caml using the Olmar binding to Elsa. O’Caml is a great language to this task because it has a lot of features for writing compilers and it allows me to load and traverse parts of the AST in an interactive console which greatly speeds up development. I am currently able to process class declarations, graph the class hierarchy into a giant pseudo-UML class diagram and am working on an ability to move class members up from the implementation class into the interface class while adding forward declarations, etc.

Additionally I am involved in garbage collection related analysis that prevents unsafe uses of the garbage collection API and improving LXR precision and functionality via feeding it data from Elsa/Olmar.

Update: This mini-presentation should clarify some of the reasons why we are doing deCOMtamination.