CPP Strikes Back

I have gotten used to dodging CPP-expansion issues by fudging column & line information until the position info in squash mostly matches the source positions in the original source code. That sufficed for rewriting declarations, but I have finally hit a brick wall.

CPP Fun

I got as far with call-site outparam rewriting as this patch. It demonstrates an interesting flaw.
@@ -8297,1 +8297,1 @@
- GetInsertionPoint(parentFrame, nsnull, &insertionPoint, &multiple);
+ insertionPoint = GetInsertionPoint(parentFrame, &insertionPoint, &multiple);
@@ -8346,1 +8346,1 @@
- GetInsertionPoint(parentFrame, child, &insertionPoint);
+ insertionPoint = GetInsertionPoint(parentFrame, child);

Due to macro expansion, nsnull contracts to 0 such that the .i file has &insertionpoint positioned right in the middle of nsnull (in the .cpp file). So when squash trims the param including the surrounding commas, it ends up removing the wrong parameter.

Elsa Limitation

I have mentioned lack of end-of-ast-node position information in Elsa. It also lacks start-of-ast-node information for most expressions. This makes selectively rewriting source code rather difficult.

Plan

Instead of fighting an uphill fudging battle against CPP, I am going to have to suspend outparam rewriting yet again to work on better position information and integrating a preprocessor into elsa. This is unfortunate because I was looking forward to finally doing something more sophisticated than renames. Now my elsa fork is going to grow even bigger before I get commit access.

3 comments

  1. Jason Orendorff

    *delurk*

    So, are you planning to write an Elkhound-based CPP for Elsa? The word “integrate” makes it sound like you’re going to integrate an existing CPP, and I don’t understand how that will solve your problem.

    I don’t mean to encourage you to do the “easy” thing, but it’s pretty easy to auto-detect this bug when it happens, right? Just compare the .i line you *think* you’re modifying against the .cpp line you’re actually modifying. So… How often does this come up? How often would it come up if you special-cased nsnull? Can’t your tool easily detect this kind of bug and spit out error markers which a human can then review?

    I don’t mean to pester you, just very interested in this problem.

  2. Jason Orendorff

    Just thinking out loud here:

    On the other hand, reading preprocessed code and then going back and blindly patching the source seems error-prone from the start.

    I feel like I’ve seen this problem before. When I hack XML, I want to use a high-level API, but I also want round-tripping. The high-level libraries always seem to throw away the low-level information only needed for round-tripping, like whitespace around attributes. (Some parsers go further, irreversibly expanding entity references, throwing away comments, etc. Your CPP problem is similar to the problem of handling entity references in XML.)

    What you really need is a library that keeps the (low-level) raw source code–and can fix it up for whatever (high-level) AST changes you make.

  3. Detecting that CPP is moving about code is easy. However marking that for human correction isn’t really viable. I can’t force people to write 0 instead of NULL or nsnull in their code.
    http://mcpp.sourceforge.net/ is what I’ll be integrating into elsa. I know it is possible as it has been done before, but didn’t get integrated into the upstream elsa.

    Regarding the second comment, squash aims to be that bridge between the ast and the source-code. It’s tricky, but if I succeed, it could make C++ a lot easier to refactor.