Squashed and Compiled

Squash Milestone Reached
Squash can now produce a patch that squashes my testcase class nsCSSLoaderImpl into the nsICSSLoader interface such that the resulting code compiles, links and runs!

Gory Details
Patching function bodies turned out easier than expected. Since the last post, I’ve added the ability to rewrite variable declarations, casts and static method calls. This was enough to get nsCSSLoader.cpp compiling.

I also ran into an issue where some methods need to remain virtual such that they can be referenced from other modules. I added a -sq-virtual flag to specify method names which need to stay virtual.

I discovered that the implementation class can be used from other source files so now squash can work on multiple files. Unfortunately, this made me run into another Elsa misfeature: memory allocation. Elsa data structures do not attempt to clean up in their destructors. Once an AST is produced, it will remain in memory for the duration of execution. This is an issue, because merely parsing all of the .i files in layout/style/ takes over 600M of memory even though squash is strictly sequential and processes a single file at a time. Hopefully, converting Elsa to use auto_ptr is feasible and I wont have to resort to funny fork() tricks to reclaim memory.

ML vs C++ for Compilers: Rant
I wonder why people insist on using C++ for symbolic manipulations instead of an *ml like O’Caml and either give up or, more frequently, reinvent features such as the ml type system, list processing, garbage collection or pattern matching. Isn’t it more productive to not have to deal with segfaults, slow compilation times and have a tenfold reduction in code size?

Squash Usage
First I grep the build directory for usage of CSSLoaderImpl. This imprecise and will eventually be handled by squash itself, but first the memory deallocation issue has to be addressed or an index of the whole sourcetree needs to be built.

find -name \*.o | xargs grep CSSLoaderImpl

This returned nsCSSLoader.o and nsLayoutStatics.o . Now .i files are produced by running make in their respective directories

make nsCSSLoader.i
make nsLayoutStatics.i

For convenience I gather the .i files in a moz directory and run squash.

./squash -o-lang GNU_Cplusplus -sq-exclude-include string/nsTString.h -sq-include nsString.h -sq-include nsCOMArray.h -sq-virtual LoadSheetSync -sq-virtual LoadSheet -sq-implementation CSSLoaderImpl moz/nsCSSLoader.i moz/nsLayoutStatics.i > cssloader.patch

Turns out pretty printing C++ is hard and Oink/Elsa’s pretty printer still needs a lot of work. By producing patches and only rewriting part of the code squash rewrites only the code that needs changing. This avoids pretty printer bugs, and maximally preserves comments and the original code structure. The wackyness of the pretty printed code is apparent in the cssloader.patch, especially in the function bodies.

Future Work
I am happy to see that patching is viable even without precise source coordinates or preprocessor support in Elsa. My near term goals for squash are:

  • Push squash upstream
  • Add the ability to translate out-parameters to return values where possible
  • Get a list of candidates for DeCOMtamination and improve squash enough to process all of them
  • Work on a source code indexer. This would be useful to both squash as a semantic grep database and could be used to improve lxr.

In the longer term, I would also like to see some Elsa changes:

  • Figure out a memory de-allocation strategy
  • Resolve the pain that is caused by Elsa having own “string” class which makes using STL an exercise in namespace verbosity. If Elsa were to switch to C++ strings, the above de-allocation job would be simplified too.
  • Elsa is lossy when it comes to C++ extensions: may need to extend the Elsa AST a little
  • It would be nice to improve Elsa memory consumption further. This would be hard.
  • It would be great to make Elsa’s C++ const-correct

6 comments

  1. In the presentation, you posted in your “Static Analysis” post on Dec 5th, there are a lot of reasons why you would want to *not* use xpcom. But I would guess that there also are some reasons why you *would* want to use xpcom. I guess you would use xpcom if you expected the runtime to be changed from out under you and wanted your app to continue running with no recompilation. Now it apparently have been decided that at least in some cases this separation isn’t needed. But are there some sort of published policy or list of objects that should or should not use xpcom?

    Wouldn’t it be possible to implement all classes using C++ features (ie. with exceptions, namespaces and without xpcom refcounting, but perhaps with GC) and just have some auto generated wrappers that would add the backwards compatible xpcom facade (eg. by inheriting from both the xpcom interface and the implementation, catching the exceptions and return error codes instead). That way you wouldn’t need to deCom and all usages in one monolithic patch, but the object could be deComed first and the implementations later as needed. Also in this way every class could be deComed leaving no risk of overdeComification.

    But since I don’t quite know the justification or use case for xpcom in mozilla, I find this decomification quite scary.

  2. Anders, XPCOM’s best use-case is for APIs exposed by dynamically linked libraries that evolve independently. Its overhead is negligable given low relative inter-DLL call counts.

    In Gecko, XPCOM or a primordial form of it was overused inside Gecko, where there are (or should be) no DLL boundaries, for private APIs that are not usable by other, independently evolving code. The overhead here in code space and runtime is significant (we will quantify it as we go), and deCOMtamination consists of getting rid of this bogus COM-like glue.

    Sure, it’s possible to implement code using canonical C++ forms and synthesize APIs “on the outside” — that’s an explicit goal of Mozilla 2 (see my blog). But we don’t want to do it by hand, since we have some suitable public APIs already, and too many unsuitable private COMtaminated ones, plus hundreds of thousands of lines of internal code that calls the latter private APIs.

    Hence Taras’s fine work based on the Oink framework.

    (Taras, are you filing Elsa bugs as you go? Let me know.)

    /be

  3. In the interest of not reinventing wheels and maximizing leverage, could GNU indent do a decent job cleaning up patched files, even with the pretty-printer’s flaws? From what I hear there will be p-p fixes for Oink, but perhaps we don’t need to reinvent indent? It has lots of options.

    /be

  4. I’m sorry, I did not mean to imply a criticism of Taras work. My skepticisms was solely based on my ignorance of the Mozilla code base. And I was curious as to what had changed since, some bright people, I assume, have chosen to use xpcom. But it sound as if, the clarity of hindsight have revealed unnecessary use in a specific module.

    I didn’t suggest that the wrappers would be created by hand. I just wondered if it was feasible to extend Taras work to the entire code base for added speed, less bugs and less verbose code while wrappers could maintain backwards compatibility with the outside world. I assumed, while speaking of things I have no knowledge, that the wrappers could be generated from the idl.

  5. Quick comment about the ML rant: I would *love* to be able to write actual code for the Mozilla platform in OCaml or Haskell. I don’t mean XPCom-related stuff, for which I consider Python and JS2 will be more appropriate.

    What I do mean is the actual implementation of JS or my upcoming project related to static analysis of JS code. Do you think there’s any chace this might happen in any discernable future ?

  6. Hi,
    can any one tell me how can i modify AST in Elsa and print the modified AST as pretty print?

    i have been trying to use Elsa, for a project in which i have to modify some statements which uses global variables and pretty print/regenerate the source code.

    But Elsa prints always from the main source even i change the AST at runtime.
    Authors email is bouncing from the main site.

    Thanks.