29
May 07

CPP Integration Progress

Existing Work

All of existing work is based on basic C parsers so it can’t be directly applied to C++.

I found out that someone did a PhD on refactoring C code resulting in the CRefactory(down atm) project. Looks like reversing the C preprocessor was by far the hardest task to address. Languages consisting of stacked lexical syntaxes (like OCaml & camlp4) or preprocessorless ones like Java don’t even have this problem.

A way to tackle the problem is to violently shove the some of the preprocessor markup into the C AST. Unfortunately this is an incredibly hard task because CPP is purely lexical and works at the token level, whereas the C compilers works with higher level AST syntax trees. Combining the two can result in an ambiguous grammar which is useless for refactoring.

CRefactory integrates CPP into the AST where possible, it even handles some CPP conditionals. The author’s argument is that every CPP conditional represents a separate configuration and processing one configuration at a time will result in a combinatorial explosion of configurations to process thus conditionals must be integrated into the AST.

However, CPP-in-AST solution is error prone and has issues scaling to large projects. Besides I think processing every configuration within the AST is still potentially a combinatorial explosion, the major benefit being that one can eliminate unfeasible conditionals if they cause syntax errors. This conditional elimination would be incredibly slow for C++. I also don’t believe that this would be enough to solve Mozilla’s conditionals since most of the troublesome macros are platform specific and would have dependencies on the system headers. Having said that, I appreciate seeing people prove that with enough effort even seemingly impossible tasks can be accomplished.

A paper on ASTEC presents another solution which involves translating [lexical] CPP constructs into a CPP-like AST-based language. This works great in theory, but the translation process is only semi-automatic and requires a lot of hand-holding. I have mixed feeling about this approach. A simpler intermediate language is an excellent thing but as soon as it becomes “CPP sux, so I invented this better language: use it” the world gains yet another troublesome programming language.

My Approach

Continue reading →


17
May 07

Undoing CPP – Deep in Enemy Territory

Positive

Since the last blog post, I have been investigating on how to teach elsa about the C preprocessor. Overall, I am really happy that MCPP exists and is actively maintained. Preprocessed source code may not be invertible into the original form, but at least it can be undone enough for refactoring purposes.

Looks like I should be able to preprocess the source code, and while at it record macro expansion information and then have elsa fed with a somewhat xml-escue data structure.

Continue reading →


11
May 07

CPP Strikes Back

I have gotten used to dodging CPP-expansion issues by fudging column & line information until the position info in squash mostly matches the source positions in the original source code. That sufficed for rewriting declarations, but I have finally hit a brick wall. Continue reading →


10
May 07

Nicely rewriting outparams

Automatic code rewriting business can be a little depressing sometimes. I tend to run into funny issues caused by CPP, oink limitations or just unpleasant-to-rewrite parts of C++. After banging my head against the wall due to all these issues I finally arrived at a workable approach for the easy part of the outparam rewrite.

Continue reading →


07
May 07

Status Update: Outparam work

Squash Outparams

The following took me a few days to achieve.

./squash -sq-rewrite-outparams out2.txt -sq-implementation nsBidiPresUtils -sq-no-squash -o-lang GNU_Cplusplus ~/work/ff-build/dom/src/base/nsFocusController.i

where out2.txt contains instructions on which functions to modify

nsFocusController::GetFocusedElement,0=mCurrentElement,

produces

--- /Users/tarasglek/work/mozilla/dom/src/base/nsFocusController.h
+++ /Users/tarasglek/work/mozilla/dom/src/base/nsFocusController.h
@@ -72,1 +72,1 @@
- NS_IMETHOD GetFocusedElement(nsIDOMElement** aResult);
+ nsIDOMElement* GetFocusedElement();

This still doesn’t add the already_AddRefed or other important attributes, but that should be easy. The result looks simple, but getting squash from working with a testcase to an actual source file was a little on the painful side.

After my experience with renaming I have realized that squash should avoid the C++ pretty printer for now. Thus the result is produced in a verbose AST-sensitive regexp-like way. However figuring out where things start and end is incredibly painful due to the presence of the preprocessor.

My plan is to get squash rewriting some basic Mozilla code the painful way and then I use what I learned to integrate mcpp along with the much coveted end-of-ast-node info into elsa.

JavaScript is an AST’s Best Friend

Continue reading →


01
May 07

Status Report

Automated Analyses and Rewrites

Dehydra and Squash are now mature enough to assist with mundane tasks like renames and various kinds of tedious code inspection. If you ever suspect that part of the Mozilla hacking you are doing could be done by a tool, contact me to see if I have a suitable tool for you.

Also, these tools are in no way limited to working with Mozilla source code. I would be happy to see people use them for other projects too.

Short-term Plans

For the next week or two I plan to focus on out-parameter rewriting and the Mozilla-wide C++ callgraph.

Mozilla-wide Callgraph
This is proving to be a little painful. Things work for basic test-cases, but I am running into scalability issues with Mozilla (as expected). My current approach of serializing everything into a giant JSON graph blows the 32bit address space after a few hundred files. Even doing a Mozilla-wide inheritance graph causes out of memory errors, but that runs almost to competition. The best solution to this will be to break up the graph into as many smaller JSON files as possible and only load ones that are absolutely required into memory.

The callgraph will be a useful starting point for many other useful analyses (dead code one is going to be lots of fun) and it’s a good test of dehydra’s scalability, but I have suspended work on it for a few days to focus on more productive tasks.

Out-parameter Rewriting

Due to XPCOM, Mozilla getters typically return an error code and a value via an out parameter. This requires checking the error code and likely propagating it at the callsite.

For many places in the code there are performance and aesthetical reasons to stop using error codes. Brendan talks discusses some reasons here. This would be cool stuff, but switching to exceptions isn’t going to happen right away. However, I can already start working on my tools to assist with simpler cases (like nsBidiPresUtils::GetBidiEngine?). I’m focusing on getters that return NS_OK/(some error) and a value and rewriting them to return NULL on error and non-NULL on success. This could be ready in time for Firefox 3. Once I’m done with the tool, I’ll just need someone to help me figure which functions are ok to simplify like that.

I suspended work on out-param rewriting some time ago. It was proving to be too complicated to do within squash. Now that I can use dehydra to verify the control flow graph, things are a lot simpler. Current plan is to have the dehydra script produce a list of candidates for out-param surgery and have squash consume that list and produce the appropriate patches. Currently, the script works for some very simple cases and I am working on the squash side.

Smaller Tasks

  • Sayrer’s uninitialized member analysis: added more complete constructor support to dehyra, wrote a sample script to get sayrer started. Fixed dehydra’s 64bit support. Bug 378763
  • Made some squash-generated patches for bz, helped me find a bug in squash. Bug 378780
  • Pushing squash upstream into oink. This is time consuming because it is a combination of legal and many minor technical issues. Dehydra will follow later.