29
Jan 09

Semantic Rewriting of Code with Pork – A bitter recap

LWN published an article about a tool that does refactoring of C code. Guess what, it’s yet another tool on top of a crappy C-parser that will never grok C well or even hope to support C++. To my great disappointment the author was not aware of my work on Pork. Clearly I have failed in letting people know that complex C and C++ can be refactored with (somewhat raw, but powerful) open source tools.

In addition to Dehydra (which is even mentioned in the first comment, yay!), I also maintain Pork – a fork of oink that is well suited to large-scale refactoring of real-world C/C++ code.

So far pork has been used for “minor” things like renaming classes&functions, rotating outparameters and correcting prbool bugs. Additionally, Pork proved itself in an experiment which involved rewriting almost every function(ie generating a 3+MB patch) in Mozilla to use garbage collection instead of reference-counting.

So to summarize:

  • Refactoring C is hard, but C++ is much harder
  • For refactoring C++ there is no better toolchain to start with than Pork
  • Pork shares no code with Dehydra.
  • Pork is built on the Elsa parser which makes it well-suited for rewriting large amounts of code. Dehydra’s isn’t suitable for rewriting code due to GCC providing a very lossy AST and incomplete location information.
  • Pork is not as convenient for analysis needs as Dehydra

For any questions regarding Pork feel free to post on the mailing list or ping me on IRC.

Language Wars

I find it depressing that the comments to the LWN article ended up being about language wars rather than the refactoring topic. Pork is written in C++ which is much more widely known than OCaml. However, I seriously doubt it’s easier for anyone to hack on advanced compiler frontend pieces in a language as ill-suited for the task as C++.


19
Sep 08

Outparamdelling this way comes

Recently I dusted off outparamdel to see if I can get some refactorings landed. About a year ago, with the great QueryInterface outparamdelling experiment we ended up with a smaller binary footprint and tiny performance gain, but that never landed due to us not wanting to break compatability yet. Ever since, outparamdel has been patiently bitrotting within Pork waiting for the day it’s allowed to break APIs.

Recently jst listed some private API candidates for deCOMtamination and I have been letting outparamdel loose on them for the past week. So far only one such change has been committed, the rest are sitting in jst’s review queue. It’s exciting, as it’s the first ever application of outparamdel that didn’t get canned. For the whole list see the most recent bugs blocking my analysis metabug.

Cool part about these patches is that after various outparamdel special-casing and manual cleanup the line count is reduced by 10-30% while maintaining existing functionality!

So far I haven’t touched much outside of content/, so if you know of any APIs that could have the outparam rotated into the return value, file some rewriting bugs against me.

Rewriting with Mercurial

Mercurial is now my favourite python program ever. It makes rewrites so easy. Here is my typical workflow:


# Write an outparamdel input file specifying functions to rewrite..run outparamdel with pork-barrel to generate patch
hg qimport -f autopatch.diff && hg qpush && hg qref #make the patch nice and readable
hg qnew manual.diff #create a manual cleanup patch for cosmetic touchups and rewrites screwed up by macros#of course to submit patch for review, those two patches need to be combined
#do manual stuff
hg diff -r 19277 #produce a combined diff without loosing ability to edit each applied patch individually! For example, can regenerate the autopatch.diff with different outparamdel parameters or a bugfixed outparamdel. 19277 is a revision that has to be looked up with hg log, unfortunately there isn't a relative syntax to do hg diff -r "tip - 2"

This is a huge improvement over the ad-hoc workflow prior to hg switch when I was prototyping my tools. CVS + quilt don’t hold a candle to a modern revision control system.

Prcheck

I also have been doing another round of prbool corrections. Somehow I didn’t notice that the system stopped working due to the CVS->hg switch. Once again when reviving my nightly checker scripts, it was a pleasure to substitute all of the CVS hacks with mercurial commands.

I am waiting for the few remaining largeish (ie 3-4 bugs per file) patches to land before I can start submitting whackamole bugs with a single prbool correction per module.

It also seems that cairo and every other pre-C99 project have the same set of issues as Mozilla with their typedefed-int boolean types. Perhaps prcheck isn’t mozilla-specific at all.


19
Dec 07

Exceptions

Often there are two ways to write code. One way is to design an API and have code patterns adhere to how the API is supposed to be used. Another way is to rely on language features to accomplish the same thing. Typically API-pattern approaches are chosen because compilers are too immature or just don’t provide the necessary features. Sometimes compilers do catch up and the possibility of utilizing newer language features appears.

In the case of the exception rewrite the task is to rewrite code from a pattern-based (compiler in your head) approach to a more strict C++ construct-based exception paradigm. Unfortunately APIs don’t enforce their usage as much as a compiler (in part because we don’t have app-spefic compiler plugins) so transforming that into a strict compiler-friendly form automatically isn’t always realistic (example). See my previous post for more examples.

Having done more work on the exception conversion I believe that it is possible to switch Mozilla to exceptions to the point of getting it to compile. Unfortunately, I don’t think that it’s possible to do this in the Mozilla2 timeframe due to the large amount of manual labour required.

Due to various use cases that don’t fit the exception model there is a need for an nsresult-lint tool to detect funny (see above) nsresult patterns so code can be manually fixed to enable thrower to transform code correctly.

I expect conversion to exceptions to consist of the following large steps:

XPCOMGC -> nsresult-lint -> thrower automatic conversion -> nsexception-lint -> outparamdel

  1. XPCOMGC needs to land first to simplify memory management. Otherwise there will be a lot more nsCOMPtr<>s already_AddRefed<>s and friends.
  2. nsresult-lint would flag code for clean up to assist with the multitude of special cases in the code preventing it from transformation
  3. thrower needs to do some reasonably sophisticated static analysis (sensitive to control flow) to ensure that code is rewritten correctly. The analysis step isn’t ridiculously hard, but it is considerably more complex than what is done in existing tools.
  4. nsexception-lint tool will flag exception-unsafe code. I expect this to highlight a fair amount of code that needs to be converted to RAII. It will take more manual labour to fix flagged code here than in than step2.
  5. Once exceptions are used the return value is freed up for outparamdel to utilize. This will be a nice optimization and code clean up.

I think the best bet with exceptions would be to start working on them during the moz2 development cycle to have them land early in post-moz2.

Or as an alternative we could try to do just the OOM exceptions which are less frequent which would look like:

XPCOMGC -> thrower automatic conversion(OOM cases are easier) -> nsexception-lint

In this case the only significant piece of work is nsexception-lint which would be needed later for a full-blown exception rewrite. It wouldn’t be so bad to convert code to RAII even before that is required for the full exception rewrite.

For now I’m going let thrower rest in the pork hg repository while I try to make a static checker plugin for gcc.
Continue reading →


05
Dec 07

Exceptional Circumstances

My previous post on outparam rewriting described the wealth of functions that can be rewritten. Unfortunately, most functions in Mozilla are declared in XPIDL interfaces.

I have been convinced that my plan to rewrite xpidlgen to avoid outparameters wont be possible because most XPIDLinterfaces can be implemented by JavaScript in a few different ways. That is problematic because in addition to return values, JavaScript can also have an exception thrown at any point and have that converted to an nsresult error code by XPConnect. That means that the getters implemented in JavaScript are not in the set of functions that only return NS_OK+outparam/someerror. I wouldn’t be at all disappointed if someone proved me wrong here.

nsexception

There is one other way to rid the code of outparameters (including getter_AddRefs and friends). Time to face my greatest reluctance: rewriting Mozilla to use exceptions. Brendan has been talking about it for a long time, but I have been skeptical until now, mostly due to the complexity of rewriting that much code. However, I have more confidence in rewriting huge amounts of code now since the XPCOMGC rewrite which touched most functions in Mozilla without too much trouble (in relative terms).

Motivation

There are some obvious benefits to be gained from switching to exceptions other than a reduction in code-size (and footprint?) and having code that looks more like common C++.

We would like to modify tamarin to use C++ exceptions such that an exception thrown from JavaScript would unroll the mixed C++/JS stack. This would simplify and enable significant optimizations for XPConnect.

I am dreaming of JITed marshaling code for C++->JS calls and having a low level FFI interface(ie being able to call most C/C++ methods directly) on the JavaScript side such that tracing JIT could automatically optimize common XPConnect calls. This an exciting area and there are lots of details to be worked out, so I’d love to see some feedback (or better yet proof of concept code!) on this.

The Plan – Rewriting

I have started implementing thrower (I am not a great namer), a tool for converting various code patterns involving nsresult into something that uses an nsexception wrapper.

Since this rewrite requires a lot more scanning for code patterns I added an elsa feature to allow pattern matching on AST nodes in C++ (also using exceptions). Since there are lot of patterns to transform, for documentation I will be writing many minimal testcases documenting(and testing) exactly what gets rewritten. Any interested parties are welcome to contribute Mozilla error handling patterns as testcases.

Verifying the Result

Just like in the XPCOMGC rewrite, code will have to be scanned to verify that it fits in the “new world order”. Unlike XPCOMGC, there are additional flow-sensitive issues to scan for to ensure that the code is thread-safe. The scans are at a lower level than dehydra currently works at, so it’s a perfect opportunity to either extend dehydra or write the new tool.

It would be especially cool to implement the code analysis tool as a gcc plugin.  Sean Callanan’s “Extending GCC with Modular GIMPLE Optimizations” paper in the GCC summit proceedings should be an excellent starting point.

This is an exciting experiment. I look forward to reducing speculation on the risks/benefits of switching the codebase to use exceptions with some concrete data.


28
Nov 07

Volume of Refactoring Ahead

In the previous post, I described the simple rewriting case that I am working on at the moment. Someone was quick to point out that the approach wouldn’t work for all methods (XPIDL Arrays were the example). Indeed, anything more complicated than simple getters can’t be rewritten to “Succeeded/Failed” pattern without switching to C++ exceptions. However, in the codebase the size of Mozilla there are several megabytes worth outparam getters to be rewritten.

Currently my wimpy little outparams.js script identifies 100 methods that can be rewritten by outparamdel without any manual intervention. However doing a search for ::Get with a ** parameter yields over 2700 candidates, of which most look like they can be rewritten. Reason for the laughable detection rate is that the detection script currently refuses to flag methods that are defined in XPIDL interface or are implemented by more than one class. Soon the script will make heavier use of the class hierarchy and we will probably change XPIDL to support more efficient getters.

Complete Class Hierarchy

I finally managed to convince Dehydra to serialize the Mozilla class hierarchy into JSON files without running out of virtual memory. This will generate lots of input for refactoring and analysis tools. All kinds of interesting stats can be produced with simple scripts. Generating the index is relatively straightforward. It would be awesome if someone could figure out how to expose this data as a web app. Since there is so much being loaded incrementally, I don’t see how one can keep things simple but use an asynchronous API.

In the coming months, I am looking forward to extending this to be a complete callgraph to find dead code and other fun data.


26
Nov 07

Mozilla 2: Outparamdel

There will be a lot of under-the-hood code changes in Mozilla 2. Our goal is to end up with a simpler, safer and faster codebase.

This is my perspective on the work ahead with respect to outparamdel.

Outparamdel

In the presence of a garbage collector we will be getting rid of stack nsCOMPtr<> usage (using raw pointers instead), but the getter_Addrefs() will still be needed to pass references to heap-allocated nsCOMPtr so they can be assigned to. Eliminating as many outparams as possible will eliminate much getter_Addrefs() footprint/perf/code bloat.

Within the next week I hope to attempt to rewrite all of the auto-detectable outparamdel candidates.

Determining What To Rewrite

outparams.js is a dehydra script for flagging code that can be rewritten to use outparameters. It turned out a bit trickier than I initially expected. Here are the checks required

  1. Check that a function only ever returns 1 failure error code (NS_OK and something else). This enough if a function is not virtual.
  2. Ensure that either a) This is the only implementation of a particular virtual function or b) All other implementations of this function satisfy 1
    Currently I only do a).
  3. Also check that all overloads of this function have the same outparameter type. This is required since C++ (thankfully) doesn’t not allow function overloading by varying the return type.
  4. Checks 1-3 ensure that the function can be rewritten, however one also needs to determine if the return type should be wrapped in getter_Addrefs<>. This can not be deterministically done from looking at the getter. So one has to scan the code for usage of the function to see if the outparam is ever passed getter_Addrefs.
  5. Check that none of the callers are within macros to minimize non-automatic rewriting.

Checks 2 and 3 require the complete class hierachy of Mozilla so I finally made a few more dehydra scripts to produce that. This was on my TODO list for a while and should make a few other interesting analyses possible (my favourite one is finding interfaces which only have 1 implementation to get rid of excessive virtual functions).

Checks 4 and 5 were easiest to implement as warnings in outparamdel.

One should keep in mind transitivity. Once the first outparamdel candidates are rewritten, some of their callers should become flagged for rewriting in the same manner.

Rewriting Code

Here is an example of how code will be simplified.

given a function:

nsresult getFoo(nsIFoo **out);

And usage like:

nsCOMPtr<nsIFoo> bla;
nsresult rv = getFoo(getter_Addrefs(bla));

if ((NS_SUCCEEDED(rv) && bla) {
...
} else {
return rv;
}
nsCOMPtr<nsIFoo> bla2;
return getFoo(getter_Addrefs(bla2));

The function definition will become:

nsIFoo* getFoo();

Before this can be done, several issues come up

  1. It is not clear if the original getFoo() is allowed to always override the value passed to it. This is hard to determine automatically, so we make the assumption that in the general case it is ok.
  2. It isn’t obvious if getFoo() is returns null in the outparam to indicate some non-error condition. This is rare so the parameter shall be annotated. Currently, the plan is to annotate with a NULLABLE_OUTPARAM() macro which would be detectable by dehydra and serve as documentation for the function behavior.

nsCOMPtr<nsIFoo> bla = getFoo(); //after XPCOMGC rewrite the left side will become nsIFoo* bla
if (bla) { // could even merge the above declaration into the if condition
...
} else {
return NS_ERROR;// or NULL if this is another outparamdel candidate
}
nsCOMPtr<nsIFoo> bla2 = getFoo();
return bla2 ? NS_OK : NS_ERROR_SOMETHING; // I'm not sure if outparamdel should use an explicit ternary operator or an inline function to convert new style errors into nsresult

Currently the code is being rewritten in a much uglier way. So this cleaner version will likely be implemented as an optimization pass (probably with a new tool outparamdel-opt?). There several tricks here:

  1. Connect the declaration with initialization of bla and bla2
  2. Detect the error check and replace it with “bla”(or !bla for NS_FAILED).Then realize that bla && bla contains a redundant statement and take it out.
  3. Do something similar to return statements.

Result

This should result in prettier code that compiles quicker and to a smaller, more efficient binary. It will also be more GC-friendly.

C++ Exceptions

Right now outparamdel does rewrites that are useful even if C++ exceptions will not be introduced. There are further code reduction gains possible if above error checks were converted into C++ exceptions, but I am not clear on performance characteristics of exceptions. We would also need to change tamarin exceptions to match C++ ones before any experimentation can be done.


10
Sep 07

Automatic DeCOMtamination: Roadmap For Automated Refactorings

This is an update on the ongoing deCOMtamination work from the automated rewriting perspective. I think it’s pretty exciting that Mozilla is the first large-scale C++ project to attempt automated large scale source code cleanups and optimizations. I think the tools are finally getting mature enough for the job.

The downside is that there isn’t a published roadmap of what we are planning to achieve with deCOMtamination as we are still in the planning stages. The upside is that there is still time for any interested parties to think up the next great improvement on how things are done, checkout the refactoring toolchain and either extend an existing tool or implement a new one.

Below is a list of things that I plan to have working in the near future. The idea is to try to implement various optimizations that would be impractical(or impossible?) to do manually and see if they yield the expected performance and code quality benefits.

Step 1: Outparam Elimination

QueryInterface() and other ok/fail methods have a redundant nsresult value which can be eliminated without changing any logic in the code. The QueryInterface() rewrite is my first serious tree-wide refactoring attempt. QueryInterface() is probably the most well known and frequently-used method within Mozilla. It also one of the most CPP-encumbered methods in the tree, so it made for a good test of my CPP-aware elsa work.

The getting code to compile phase is over. Currently Benjamin is working on getting the modified code to run which requires XPConnect changes, debugging the manually-rewritten macros and verifying that the generated patch is correct.

The next step will be to eliminate local nsresult variables in the callees when they are used to store & check return values of QueryInteface(). This is basically a make-resulting-source-look-prettier optimization.

I hope to measure a speed-up and slight footprint decrease with the new QueryInterface call.

Step 2: Try Mozilla Without Reference Counting?

In my mind the most exciting part of Moz2 is Tamarin. Few things are cooler than an elegant JIT VM.

Tamarin comes with a modern garbage collector.

The goal of this rewrite is to aid Jason and Benjamin with switching XPCOM from reference counting to garbage collection. This might end up in gigantic patches to rid the stack of nsCOMPtr objects. This might be hard as it will be affecting a lot of code and might reveal more shortcomings in my version of elsa and MCPP. Details are in the wiki.

Step 3: Try C++ Exceptions

This is the most ambitious rewrite I know of for Mozilla 2. It’s similar to step 1 in that the goal is to eliminate outparameters and nsresult error codes, except in this case it would happen for all functions. Brendan mentioned this is in his blog.

The idea is that exceptions in Mozilla happen in exceptional circumstances, thus most of the time the return value will be NS_OK and the outparameter will have something valid in it. So we should rewrite all methods that return nsresult to return the outparameter value through the return value and have the errors thrown as exceptions.

In the simplest case this would involve rewriting return statements into throw statements and rewrite callers to use try/catch. Instead of

rv = bla(&outparam)…if(rv == foo) .. else if(rv == boo) return rv

the code would be

try{ outparam=bla() } catch(foo) {…} catch(boo) {throw boo}

Note this is still inefficient since the code would be manually unrolling the stack instead of letting the exceptions do that. So the next iteration of the rewrite would get rid of the

catch(boo) {throw boo}

code to streamline the execution path. Ideally this would provide a significant reduction of footprint (due to getting rid of the error propagation code) and provide a speed boost.

However, there are a lot of issues that need to be solved. How to ensure that the C++ code is exception-safe (everything has destructors to do appropriate cleanup)? How to deal with the case of the stack being a mix of platform C, C++, JavaScript, Python, etc? Most runtimes are not aware of C++ exceptions.

Infrastructure Work

Unfortunately, not all of the automated refactoring work is about exciting rewrites. Elsa is still tied to the stone-age gcc 3.4 as it can’t yet process C++ headers from the newer gcc releases due to template complexity.

There is also work that needs to be done to get OSX supported as well as Linux by elsa & mcpp. I think very little work remains there.

Another big issue is getting elsa/mcpp to work on Windows. This may involve teaching elsa about the Microsoft windows flavour of C++ or getting Mozilla to reliably build with mingw and merely teaching elsa about mingw’s flavour of windows C++.

There is also an issue of maturity. Mozilla is probably the biggest codebase to make use of elsa and mcpp, so there are teething issues to solve. Having said that, the current version of MCPP in svn should be able to compile Mozilla. Elsa can process all of Mozilla with a small patch to two files attached in the QueryInterface() bug.

Overall I’m doing less and less infrastructure work as time goes by, hopefully the tools will mostly just work from now on.


30
Aug 07

Pork: The Brave New World

It doesn’t look like my oink patches are going to reach upstream anytime soon. In fact, in the past year no patches have landed in the oink tree. I think this is unfortunate, but I have my own repository at http://hg.mozilla.org and will continue working on my fork. So allow me to introduce Pork, the Oink fork. So if anyone has any exciting source location, pretty printing or other elsa improvements, I hope to eventually see those land in pork.

Note, pork is just an informal name for the fork, I am not currently planning to do any source renaming or repository renaming.

Pork: Now in a VMware flavour

bsmedberg pointed out that oink seems a bit intimidating to setup and it would be nice if it came in virtual machine. So I have oink/mcpp/etc packaged up in a virtual machine that can build Mozilla with gcc3.4, run dehydra analyses and produce automated patches. If you want to play with that, post a comment and I’ll reply with a download link.

A couple of people asked me about some simple code-scans to see for problems and optimization opportunities. Next week I am going to post few simple dehydra recipes on how to look for patterns in the code. It’s dead simple with JavaScript and some minimal shell knowledge to aggregate the results. The VM should provide a good way to get started.

Outparams: QueryInterface Rewrite Milestone Reached

I finally got Mozilla to compile with the outparam-less qi call. The outparamdel part of that was surprisingly easy and straightforward. I   also learned all kinds of fun details about multiple inheritance and fun ways to construct functions out of macro segments. The next step is to get the code to run. I’ll also have to look into a few minor MCPP issues that the rewrite uncovered. See QI bug for the gory details.


06
Aug 07

Outparams: Take 2

Will Rid Code of Outparams!

I resumed my outparam rewriting work last week. Having fixed the CPP induced architectural limitation that I ran into, it was quite straight-forward to factor out squash’s rewriting code into a new tool. Unlike squash, outparamdel (creatively named new tool), can rewrite code precisely and reliably. I still don’t have end-of-ast-node information in every Elsa AST member, but I think I have added position info to enough AST nodes to be able to do most of the Mozilla 2 rewrites.

Continue reading →