21
Dec 06

Oink, Meet Squash

Progress

I spent some time proving to myself that it is possible to automate DeCOMtamination. The result is squash – a tool that aims to accomplish the first step of DeCOMtamination. My near term goals are to be able to squash together a real life XPCOM interface and implementation such that:

  1. The resulting code compiles
  2. Firefox runs correctly
  3. Simple functions using out parameters are converted to use return values instead

Currently I am approximately halfway through with first requirement. In its current form squash lives as a patch to the oink suite of tools.

Usage

  • Pick a simple XPCOM implementation class to squash. I like CSSLoaderImpl.
  • Produce a .i file because Oink/Elsa do not have an integrated preprocessor yet. make CXX="gcc -E" nsCSSLoader.o; mv nsCSSLoader.o nsCSSLoader.i
  • Squash it: ./squash -o-lang GNU_Cplusplus -sq-implementation CSSLoaderImpl nsCSSLoader.i -sq-exclude-include string/nsTString.h -sq-include "nsString.h" > /tmp/cssloader.patch > cssloader.patch
  • Apply the patch: patch -p0 < cssloader.patch
  • Run make, deduce the problem with the patch and start over

Current functionality at the header level:

  • Class member merging
    • virtual interface members are replaced with non-virtual ones from implementation
    • Class members and their parameters are renamed: eg s/CSSLoaderImpl/nsICSSLoader/
  • Additional members from the implementation are added to the interface
  • Extra class/struct definitions used by the implementation members are moved up into the header as needed
  • Header and forward declaration inference
    • The resulting class will usually not compile because more headers are needed. Squash computes the set difference of class definitions as used by the implementation and the interface. Part of the list ends up as forwad declarations, the other part is translated futher into #include statements.
    • The above is trickly algorithm to fully implement fully, so in the meantime there are also manual overrides with -sq-include and -sq-exclude-include

At the source level squash renames the class part of function definitions along with their arguments.

Results

Sample patch output.

Oink patch.

Challenges and Limitations

I am really grateful for Elsa’s ability to parse C++ as used in Mozilla. I think this opens a lot of doors to automation, optimizations and new features that would be too laborious to even consider implementing otherwise. However Elsa still has some maturing to do. We loose typedef information, pretty-printing is still spotty and looks too different from parsed C++, but the biggest challenge is the lack of an “end-of-ASTNode” position information. All these will be addressed in the future, but in the meantime I have to make unfortunate choices of either getting distracted and working on Elsa/Oink internals or do work-around and continue with writing tools.
My short term goal is to finish step 1 and to get squash into the oink SVN.


05
Dec 06

Static Analysis

Introduction

My name is Taras Glek. I have been tasked with working on static analysis tools to automate deCOMtamination, verify code, etc.

Progress

C++ is a hard language to parse and no static analysis can be done before Mozilla source code be represented as an abstract syntax tree. My first step was to get the Mozilla trunk parsing with Elsa. With a small fix, Elsa now parses all files needed to build Firefox with -DNS_DISABLE_LITERAL_TEMPLATE (Elsa’s template support is incomplete).

Now that code parses we can move to the next step of writing tools that can analyze the Mozilla AST. These would prove certain parts of code correct or do source-to-source transformations such as the planned patch tool. More applications of static analysis here.
DeCOMtamination of the Gecko core is a time consuming task and is the main candidate for static analysis. I started formalizing the broad tasks collectively referred to as DeCOMtamination into an algorithm that could be implemented. I am prototyping the analysis in O’Caml using the Olmar binding to Elsa. O’Caml is a great language to this task because it has a lot of features for writing compilers and it allows me to load and traverse parts of the AST in an interactive console which greatly speeds up development. I am currently able to process class declarations, graph the class hierarchy into a giant pseudo-UML class diagram and am working on an ability to move class members up from the implementation class into the interface class while adding forward declarations, etc.

Additionally I am involved in garbage collection related analysis that prevents unsafe uses of the garbage collection API and improving LXR precision and functionality via feeding it data from Elsa/Olmar.

Update: This mini-presentation should clarify some of the reasons why we are doing deCOMtamination.