Rewriting Tools for Mozilla 2: Moving Forward as Planned

In the Beginning There Was a Void

Approximately a year ago, Brendan discussed with me the crazy possibility of rewriting most of the Mozilla code automatically to modernize the codebase. The benefits were huge. Gecko would use the C++ standard library to improve code readability and reducing size, XPCOM would be ripped out of the core to improve performance and decrease footprint, etc.

It seemed like a good idea, but in reality no other giant C++ project has attempted this before so we were not sure of how realistic it was. I spent a year in a lonely corner of Mozilla trying to materialize the idea.

Brendan & Graydon pointed me to elsa, the C++ parser that supposedly could parse Mozilla. However, it turned out that it was only able to parse an old version of Mozilla and rejected the new source. One of the elsa maintainers even tried to convince us to it was not designed for source-to-source transformations and wouldn’t work that way.

After I patched up elsa and started devising ways to use it for source rewriting I ran into more pain. After a few false starts, I realized that C++ in Mozilla is actually a mix of CPP and C++ and one can not rewrite C++ without dealing with the mess that is macro expansion. MCPP was pointed out to me as a good starting point for hacking on a preprocessor. So I designed an inline log for macro expansion. To my surprise the maintainer of MCPP, Kiyoshi MATSUI, volunteered to implement the spec and thus saved me from a world of pain. (For which I am eternally grateful as I can’t imagine a more depressing pastime than working on the root of all evil: the C preprocessor).

In parallel with Kiyoshi’s work I modified elkhound & elsa to make the C++ parser a lot more suitable for source transformations. I learned about LR & GLR parsing and confirmed my suspicion that I don’t want to write parser generators for a living.

Happy Conclusion

All this work finally got us what we discussed last September: a framework for doing lots of boring code rewrites.

The first big Moz2 task is switching from reference counting to garbage collection. Today, garburator produced a gigantic patch for subset of the content/ module and all of the affected files compiled. Hopefully next week I’ll have a multi-megabyte patch for the whole of Mozilla that compiles and possibly runs.

13 comments

  1. What’s the difference between CPP and C++? I thought they were the same thing.

  2. I guess by “CPP” he might mean “C PreProcessor”, not “C Plus Plus”. So that’d be between preprocess(or) and parse(r)…

  3. Jesse: CPP == C Pre-Processor.

    I remember reading and hacking a bit (as part of a DEC-2060 port of Unix tools) the original John Reiser CPP. It was assembly-style C with lots of raw pointers into a giant buffer, sliding things around during macro expansion as overflow threatened, all coded and indented in an even-uglier-than-usual style for the time; Unix code outside of the kernel and dmr’s C compiler was not always pretty. My eyes bled for a week ;-).

    /be

  4. How will this be QA’d for regressions, crashes, etc? Seems like it would be rather difficult to test all those parts of the code base in 1 patch.

  5. Robert: it’ll be QA’d by mozilla2 devs and (I hope this is coming on line soon) the usual (and growing over time) testing infrastructure, cloned from the CVS trunk: mochitest, reftest, the latest and greatest leak tests including sayrer’s brute-force leak detector, etc.

    Then on to building the Mozilla 2 effort to include more and more of the community as 1.9 / Firefox 3 wraps up. After a few alphas, we’ll have shaken out any issues.

    But I do not expect lurking badness in a generated patch, if the analysis that generated the patch is sound and valid. That analysis development is the real process to QA here.

    So Taras has been focusing on tools in order to get Mozilla 2 to a smaller, faster, easier to hack codebase, and he and Benjamin are making builds that can be tested. Help welcome.

    /be

  6. I’ve been poking around Elkhound and Elsa trying to learn how to write a parser using it. It’s fun, and occasionally “fun as in having a root canal”.

    I’m working on a small Lua parser mostly to get my feet wet, but the ultimate goal is JavaScript. I’m no “l33t k0d3r” by far, but I’m a fast learner, heh. I have this thing where I want to build a successor to MXR/Bonsai, with multiple-VCS support (pluggable, essentially), syntax highlighting and semantic parsing.

    One thing I noticed was that trying to clone either of the Elkhound and Elsa repositories from hg.m.o did not work. I had to download using one of the zip links to get all the files. Something seems weird in that regard. (“hg pull -r tip” after cloning basically says “you already have the tip” even though I don’t have all the files visible on hgweb.)

  7. Hi

    I normally follow closely on what happens at Mozilla. Normally i do not leave a comment. I just wanted to tell one thing. What you are doing if properly managed has great potential for Mozilla.

    You are a Hero. Period.

    Regards
    Vijay

  8. You should do a post showing how what you are doing is more beneficial than some Python + logic + regex search and replace :-)

    I’d be *really* interested in the types of job you will automate.

    Extremely interesting project though.

    monk.e.boy

  9. Why do you have rel nofollow if you have spam detection?

    I get no link love for my comment?

    bah.

  10. @monk.e.boy : The benefit over textual, regex based, replacement is easy : As the tools *understands* what it’s doing, it will be able to handle zillions of /small/ variation that in the regex case every time require to you to add yet another special case, and if some case is really complex enough to break it, it will give you a syntax error instead of happily outputting garbage or worse creating code that compiles but will crash on execution.

  11. Hm, I believe the OpenOffice.org codebase also contains a lot of legacy code which is there because C++ and its libraries didn’t offer all what they offer now. They’d probably be interested in such a framework.

  12. @Arthur: I think the *entire world* will be interested in porting old c++ libraries to the new improved ones.

    I think this tool is the start of a new era, never before have we had a wealth of good quality, open source code to re-use. A tool that helps de-cruft this code? God – it’s priceless.

    monk.e.boy

  13. We have a C++ parser that was designed from the beginning to be a C++ source-to-source transformation tool. See http://www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html.

    This has been used to transform legacy C++ avionics code to change the underlying OS from a proprietary RTOS to Real Time CORBA.
    There’s a paper at
    http://www.semanticdesigns.com/Company/Publications/ that describes this experience.