30
Apr 09

GCC Rant + Progress

I feel strange working on GCC-specific stuff and then discussing it on planet mozilla as mozilla work. However, without GCC, Dehydra and Treehydra would not be half as awesome (much less feasible even). The power of open source is that it allows us to leverage the entire open source ecosystem to achieve specific goals. When open source projects combine their efforts, not even the biggest software companies can compete as cross-project goals would be incredibly expensive and unpleasant otherwise.

Occasionally, it is very frustrating to see people treat open source software as immutable and independent black boxes. In my personal experience, the browser and the compiler are viewed as finished products and therefore it is OK to bitch and complain about them. That’s frustrating because the same users could be channeling that energy in a more positive way by reporting bugs, contributing code/documentation, etc.

Sometimes these rants result in rather comical conclusions: Ingo’s rant is priceless. My perspective on this:

  • what have Linux kernel devs done to help GCC help them?
  • <flame>Sparse is a deadend. Writing compiler code in C is silly, writing analysis code in C is sillier (and frustrating and limiting). Taking a crappy parser and bolting a crappy compiler backend onto it will result in bigger pile of crap :) Given how smart kernel devs are, they sure like wasting their time on crappy solutions in crappy languages.</flame>
  • Wouldn’t it be cool if instead of complaining these talented people wrote a GCC plugin to do what they want?

GCC Plugin Progress

I finally landed the massively boring and annoying GTY patch. I can barely believe that the patch went in so smoothly without excess complaining from GCC devs. From GCC perspective it’s merely a cosmetic cleanup that affects a large number of headers. For us it enables Treehydra to be generated via Dehydra with little manual effort. It basically makes Treehydra possible without patching GCC. I have another 3-4 patches that need to land before trunk GCC can run the hydras out of the box. Those are mainly localized bugfixes and cleanups so I fully expect them to go in and for GCC 4.5 to rock my world.
Once GCC 4.5 ships. analyzing code will depend on a trivial matter of apt-getting(or equivalent) the hydras and specifying the analysis flags on the GCC commandline!


24
Jun 08

Status Report: Nearterm plans for Pork, Dehydra

Pork

I planned to release Pork 1.0 for a while now. The tools work great, even if all the love is going to the GCC-based toolchain. However, after hearing grumpy comments from a certain coworker about the uglyness of the oink build system it dawned on me that it’s rather mean to release such a mess and call it 1.0.

So I think I’ll release Pork 0.9 in the current state, so I can focus on near term GCC toolchain work. Pork in the current form means oink stack + my refactoring tools + changes to elsa and other libs to support C/C++ refactoring needs.

This will be followed up by Pork 1.0. 1.0 will involve changes to the build system to get rid of oink(we only use the oink build system and rarely use oink API). To put this another way: I don’t expect any functionality changes between 0.9 and 1.0 other than an improved build system to make it easier to get started with writing new tools.

Pork – Future

I am pretty happy with Pork as it is. I think we’ve taken Elsa as far as it’ll let us go. The only realistic improvement on the Pork side may be to have Dehydra generate a JS binding to Elsa’s extensive AST to make rewriting stuff easier. However, I’m not sure if that’s worth the effort nor that a C++ AST will reflect into JavaScript as well as GCC GIMPLE.

Preprocessing

On the other hand, something needs to be done about the main ingradient that makes Pork tick: MCPP. MCPP does a lovely job of annotating what the C preprocessor is doing, but configuring GCC to use a foreign preprocessor is a giant hassle and making sure it works correctly is troublesome. At the GCC summit, Tom gave me an idea on how similar functionality can be added to GCC directly by extending the include backtrace with macro expansions. Not only would such integration simplify Pork setup and increase Pork’s operating speed, but it is also a clean way to expose preprocessor constructs to the AST presented in De/Treehydra. It should allow for more preprocessor awareness directly in analysis stage of refactoring instead of only in the final rewriting stage as is currently done. As a side-effect, GCC would gain better error messages too.

So while this isn’t going to affect Pork directly, it will simplify the lives of Pork users while opening new analysis frontiers. Even though I hate working on preprocessor stuff, I think this work will need to happen sometime in the near future.

The Hydras

Dehydra 0.9 has been out for a while, I planned to release 1.0 soon after unless there are major flaws discovered in the API. The situation changed at the GCC summit. The fact that FSF reversed their stance on GCC plugins means that we should be concentrating on getting the plugin stuff reviewed.

So in the near term I’m forward porting the plugin stuff to trunk GCC, then I’ll be generalize the plugin API to suit at least one other GCC plugin user that we met with at the summit. The downside is that I don’t want to release Dehydra 1.0 and immediately break the plugin API. The upside is that the new API should be more general and more minimalistic and will likely be close to what will eventually become an official plugin API.

Summary: In my mind Dehydra and Pork are 1.0 quality, but I want to future-proof them a little bit before calling them 1.0.


19
Jun 08

GCC Summit

Our presentation on Treehydra and Dehydra GCC plugins was received well at the summit.

The big news is that FSF is working on license changes to allow GPL-only GCC plugins. I’m looking forward to having our work be compatible with future GCC without any patching.

In a few minutes we’ll be having a meeting with users of other plugin frameworks to have an initial discussion on a common API. I’m working on forward porting my patches, so they can start getting reviewed ahead of license changes.


09
Jun 08

Dehydra 0.9: It’s alive!

I am finally happy enough with Dehydra API and functionality to release 0.9. Dehydra is basically feature complete, the main reason I’m not calling it 1.0 is in case there are outstanding API bugs.

I believe Dehydra is the first useful open source static analysis tool. I hope to see projects outside of Mozilla benefitting from it too.

I would love to see someone package this up for various Linux distributions. You can grab there release here.

Note, this release also features as a preview release of Treehydra. Most of the development lately has been focused on improving Treehydra and building analyses on top of it.


27
May 08

Treehydra goes Push and Pop

After writing a ton of docs and working through other Dehydra 0.9 blockers, I decided to cool off by doing some actual analyses. Before I get to that, I’d like to say that the last big task is to setup a buildbot for Dehydra on Linux/OSX. Thanks to yet another awesome contribution from Vlad, that’s mostly done.

So I got working on GC-safety static analysis. Originally we tried to define a complete spec before writing a single line of code. That turned to be a bad idea and resulted in a spec full of bugs. This time we are defining the analysis incrementally and as a surprise reward, it already caught a bug.

Pushing and Popping Our Way

SpiderMonkey has a lot of complex code doing applying Push/Pop-like operations on variables in a function-local manner. Examples of functions that this analysis would look at are: JS_PUSH_TEMP_ROOT/JS_POP_TEMP_ROOT and JS_LOCK/JS_UNLOCK. See bug for more. Essentially, this will help with “code must flow through here” comments on “out:” goto labels that inhabit the SpiderMonkey source.

This is an example of control-flow-sensitive analysis. It impossible without a compiler-like view of the code that Treehydra provides. It also helps to have a scalable algorithm to iterate the CFG. Luckily, David Mandelin wrote such a beast by implementing ESP for his outparam analysis. David factored-out the ESP analysis and made it available for reuse. See esp_lock.js in the test suite for an example of how to write control-flow sensitive analyses. locks_valid*.cc and locks_bad*.cc illustrate the code patterns that can be scanned for.

So if you know of any further push/pop patterns in the rest of Moz that can be checked in this manner, leave a comment.

PS. This is yet another account of Treehydra rocking the static analysis world. Exposing the slightly scary, but awesome GCC gutts via JavaScript allows one to perform precise static analyses in a civilized manner. What could be more fun?