GCC + SpiderMonkey = GCC Dehydra


GCC Dehydra is starting to work. I encourage people try it out for their code scanning needs. The main missing feature is control-flow-sensitive traversal, which means that currently function bodies are traversed represented in a sequential fashion. It is the most complicated part of Dehydra, but most of the time this feature is not needed.

So far I got Benjamin’s stack-nsCOMPtr finding script to do stuff, which indicates that most of the features are working.

My vision is to switch to the GCC backend for all of our code analysis needs since it is well tested, fairly feature complete works with new versions of GCC (by definition).

Not everything is perfect in GCC land. There are some frustrating typedef issues to solve.

Source Re-factoring

Elsa still holds its own when it comes to refactoring code because it has a much cleaner lexer/parser and rarely opts to “optimize away” original AST structure. We should stick with Elsa’s arcane requirement of having to preprocess files with gcc <= 3.4 until either GCC becomes viable as a platform for refactoring or clang matures.

GCC is not suitable for refactoring work because it:

  1. Starts simplifying the AST  too early
  2. The parser is handwritten and therefore would be hard to modify to maintain end-of-AST-node location info.
  3. GCC reuses many AST nodes which means their locations point at the declaration rather than usage-point.
  4. Handwritten nature of GCC makes any of these above improvements time-consuming to implement and the political issues are something I’d rather not deal with.

Most of these wouldn’t have been an issue if GCC was written in ML :)
What’s Next?

Time to start using GCC Dehydra to enforce GC-safety and lots of fun exception-rewrite preparation work.

Stay tuned for more exciting developments regarding regaining control over source code here and on Dave Mandelin’s blog.

1 comment

  1. I’m really happy to hear that this is working. I think this is an important direction for GCC to go, and GCC needs someone to drive it a bit…

    As to your points: for #1, I agree completely and I already know how to fix this for C.

    For #2, I’m not so sure. I saw some problems here with the column number work, but it isn’t clear that this would be any better with a generated parser. Can you get end-of-node information by looking at all the sub-nodes? I suppose spelling of numbers, and parens, get in the way here. More stuff to do :-)

    #3 is definitely a problem, and kind of a pain internally too. I’m sure it is fixable but I’d imagine it is a lot of work.

    On politics, unfortunately you are right :(. There’s some hope we can get forward motion here, though.