25
Jan 08

Dehydra progress

GCC Dehydra is evolving much faster than the Elsa version did and it is easier to use. Once I implemented virtual methods correctly, Joshua was able to do his thing in no time at all. All it takes is a custom GCC (I’d love to see it packaged) and specifying plugin parameters in CXXFLAGS.

Dehydra has some new tricks now like a tree representation of types (instead of a string) with full typedef support. Lisp remnants in GCC are getting a new life as JavaScript objects.

I’m current working on exposing the full GCC tree structure in JavaScript so one could do any analysis they wanted in pure JS. Dynamically typed GCC tree nodes are great for that. I’m starting with middle-end GIMPLE representation so in theory one will be able to analyze anything gcc can compile (Java, C++, C, ObjC, ObjC++, FORTRAN?). Eventually this will be expanded to support frontend specific tree nodes to be able to look at code closer to the way it was written. Oh and I expect people will be able to script large parts of C++ -> JavaScript rewrites with Dehydra.

In theory, one could make tree node conversion two way which would enable writing optimization passes in JS, but that would be silly.

What’s the point?

I want to be able to do Exception-safety analysis in pure JS. I want to enable unit checking (thought typedefs and inline conversion functions) in pure JS.

Additionally, Dehydra should be awesome for generating bindings. For example, I’ll be able use Dehydra to import GCC’s autogenerated enums to get string names for nodes.

Also it will become easy to extract callgraphs and various other stats out of the code if they are accessible in JS. Eventually we’ll be switching Dehydra to Tamarin to do all of the above really really fast.

GCC Plugins

While I am messing with the GCC AST, Dave is working on utilizing GCC’s control flow graphs with a separate plugin. Eventually we’ll merge our work, but for now it’s nice to not step on each others toes while adding features to the compiler. Given how easy life is with plugins I am amazed that people chose to go uphill bothways and not collaborate on a plugin interface for their crazy GCC extensions. Yes, I’m looking at you: mygcc and gccxml.

Aren’t there IDEs interested in making use of GCC internals too or is everybody interested in maintaining yet another crappy C parser like Linux’s Sparse tool?

I’m looking forward to exploring the many ways we can reuse what’s in the compiler to empower developers for Mozilla 2.


17
Jan 08

GCC + SpiderMonkey = GCC Dehydra

Analysis

GCC Dehydra is starting to work. I encourage people try it out for their code scanning needs. The main missing feature is control-flow-sensitive traversal, which means that currently function bodies are traversed represented in a sequential fashion. It is the most complicated part of Dehydra, but most of the time this feature is not needed.

So far I got Benjamin’s stack-nsCOMPtr finding script to do stuff, which indicates that most of the features are working.

My vision is to switch to the GCC backend for all of our code analysis needs since it is well tested, fairly feature complete works with new versions of GCC (by definition).

Not everything is perfect in GCC land. There are some frustrating typedef issues to solve.

Source Re-factoring

Elsa still holds its own when it comes to refactoring code because it has a much cleaner lexer/parser and rarely opts to “optimize away” original AST structure. We should stick with Elsa’s arcane requirement of having to preprocess files with gcc <= 3.4 until either GCC becomes viable as a platform for refactoring or clang matures.

GCC is not suitable for refactoring work because it:

  1. Starts simplifying the AST  too early
  2. The parser is handwritten and therefore would be hard to modify to maintain end-of-AST-node location info.
  3. GCC reuses many AST nodes which means their locations point at the declaration rather than usage-point.
  4. Handwritten nature of GCC makes any of these above improvements time-consuming to implement and the political issues are something I’d rather not deal with.

Most of these wouldn’t have been an issue if GCC was written in ML :)
What’s Next?

Time to start using GCC Dehydra to enforce GC-safety and lots of fun exception-rewrite preparation work.

Stay tuned for more exciting developments regarding regaining control over source code here and on Dave Mandelin’s blog.


08
Jan 08

Dehydra as a GCC plugin

Thanks to the 2-fold increase in manpower working on pork, we finally have an opportunity to work on the nice-to-have things.

Progress

Recently I have been working on a GCC plugin to do Mozilla-specific analyses with GCC.

Unfortunately, I didn’t notice that GCC had a plugin branch so I reinvented the wheel there. Fortunately that part was rather easy and turned out that the plugin branch isn’t very useful to work with as it is in SVN, doesn’t link GCC with -rdynamic nor does it install the hooks I need in the C++ frontend. Overall the plugin shim is relatively trivial and it will be pretty easy to merge with other similar efforts.

My first and only plugin is a C reimplementation of Dehydra. GCC sources are currently fairly hostile to C++, so I elected to not make my head spin by mixing in C++ in addition to C and JavaScript. I think the C Dehydra has reached the hello world state, to take it for a spin see the wiki page.

GCC Thoughts

Integrating with GCC is pretty awesome. So far I regret not jumping in earlier. I was reluctant to do so as everyone I’ve talked to (other than Tom Tromey) claimed that GCC is ridiculously complicated and impossible to do stuff with. In fact academic people are so scared of GCC that they tend to opt to go with commercial frontends that have ridiculus licensing terms and make it impossible to release their work to general public.

GCC internals are pretty crazy since everything is done with macros and the AST is dynamically typed so it’s fairly painful to figure out seemingly simple things like “what AST nodes does this AST node contain”. Additionally, GCC loves rewriting AST nodes inplace as the compilation progresses which sucks when one wants to analyze the AST while it looks as close as possible to the source. GCC parser also sucks to work with as it is implemented as a C code hodge-podge (technical term which applies to much code in GCC). Luckily, I am mainly concerned with poking at data that’s already in GCC.

The upside is that GCC is a well-tested production compiler that most source compiles with. Integrating with GCC means that the AST is correct (Elsa is a frontend so there is no way of knowing if AST has mistakes in it) . Integration also means that the user doesn’t have to worry about making preprocessed files and maintain obsolete versions of GCC or old GCC headers. Unlike Elsa, GCC already has useful features like typedef tracking and doesn’t implement location tracking with a stupid programming trick. Additionally, I hope to reuse computations from from middle-end GCC passes to build my control flow graph, do value numbering and other useful, but tricky to implement stuff.

GCC isn’t scary at all, it’s just another way of implementing a compiler. Some people elect to have more pain in life by electing to reinvent ML in C++ instead of using ML for compiler writing,  others get their pain dosage from working on a C compiler originally generated from LISP sources.

Lastly, I’d like to thank patient gcc hackers in #gcc without whom I wouldn’t stand a chance in figuring out how to get this far.