Aug 09

This Month In Static Analysis

Lately I have been focusing on optimizing Fennec startup on a delightfully inadequate platform: Windows Mobile. More on fascinating startup, performance problems and solutions later. As a result I have been doing relatively little static analysis stuff.

The main reason for taking a break is that I feel that I went from having no way to do any analysis to having production-quality tools for analysis and rewriting.ย  I finally have a chance to move on from developing tools to using them in everyday development. The main puzzle piece that needs completion is GCC 4.5 support in Dehydra. We are feature-complete on 4.5, just need to stabilize once the trunk stabilizes.

Drowning In Pork

A number of other people did some cool stuff in the meantime. First and foremost: Joshua Cranmer has ventured into the land of Pork and is publishing a guide to doing refactoring tools on this blog (part 1, part 2, part 3). This is cool, because until now, there were no Pork docs and nothing I write could ever match Joshua’s documenting talents.ย  Thanks a bunch, Joshua.

I have also received my first-ever bugfix patches to Elsa. Previously, I’ve received miscellaneous build fixes, etc, but these are the first patches that involved somebody pounding their head against the wall until they figured out why things were crashing or not accepting valid C++ code.

Introducing Dan Witte

Dan is the new static analysis go-to person. So far he facilitated an explosion of static analysis ideas (they are tracked in bug 430328). A lot of these can be expressed as <10line Dehydra analyses, so they are excellent introductory projects. If you are dying to start analyzing code, but don’t know where to begin, look in that bug. Dan has written an interesting analysis to do with finding accidental temporaries due to C++’s “wonderful” implicit conversions/etc (expect to see a blog post on that). He is also working on the holy grail of Mozilla static analysis: a full callgraph. It’s a little embarrassing that we don’t have that yet, but it’s hard and once we do have it, a whole new world of analyses will be possible.

Speaking of Callgraphs…

So while various Mozillians were pondering how awesome it would be to do inter-function analysis, an intern has beat us to writing the first useful inter-function analysis! Sully had a problem, after a tiny bit ofย  Dehydra coaching, he solved his problem in the amount of time it took me to eat my lunch. Brilliant! See his blog post for details. My conclusion: either Dehydra is pretty easy to use and/or we get mad genius interns :).

Jun 09

Dehydra & Pork Sources Moved

I moved dehydra to a more official location, please update your scripts and hg settings.
New dehydra url:

Pork got reshuffled during the move, it’s now 2 repositories. oink is dead. It now depends on current versions of flex (as opposed to flex-old) and features a cleaned up buildsystem.

New way to checkout pork:

hg clone http://hg.mozilla.org/rewriting-and-analysis/pork
hg clone http://hg.mozilla.org/rewriting-and-analysis/elsa pork/elsa

Jan 09

Semantic Rewriting of Code with Pork – A bitter recap

LWN published an article about a tool that does refactoring of C code. Guess what, it’s yet another tool on top of a crappy C-parser that will never grok C well or even hope to support C++. To my great disappointment the author was not aware of my work on Pork. Clearly I have failed in letting people know that complex C and C++ can be refactored with (somewhat raw, but powerful) open source tools.

In addition to Dehydra (which is even mentioned in the first comment, yay!), I also maintain Pork – a fork of oink that is well suited to large-scale refactoring of real-world C/C++ code.

So far pork has been used for “minor” things like renaming classes&functions, rotating outparameters and correcting prbool bugs. Additionally, Pork proved itself in an experiment which involved rewriting almost every function(ie generating a 3+MB patch) in Mozilla to use garbage collection instead of reference-counting.

So to summarize:

  • Refactoring C is hard, but C++ is much harder
  • For refactoring C++ there is no better toolchain to start with than Pork
  • Pork shares no code with Dehydra.
  • Pork is built on the Elsa parser which makes it well-suited for rewriting large amounts of code. Dehydra’s isn’t suitable for rewriting code due to GCC providing a very lossy AST and incomplete location information.
  • Pork is not as convenient for analysis needs as Dehydra

For any questions regarding Pork feel free to post on the mailing list or ping me on IRC.

Language Wars

I find it depressing that the comments to the LWN article ended up being about language wars rather than the refactoring topic. Pork is written in C++ which is much more widely known than OCaml. However, I seriously doubt it’s easier for anyone to hack on advanced compiler frontend pieces in a language as ill-suited for the task as C++.

Sep 08

Converging Elsa Strains

One of the purposes of this blog is to inform people that while the original Elsa author is no longer actively developing it, Elsa is being used in production at Mozilla and is actively maintained within Pork.

Recently two previously unknown to me Elsa forks have come to my attention via comments on my blog. Both of these are extrimely cool and something we have been wanting:

  • ellcc C (and soon C++) compiler via Elsa + LLVM. I’ve heard of attempts to get this to work before, but this looks like it is much further along than similar efforts.
  • Alex Telia’s souped up elsa with parser error recovery and an integrated C preprocessor among other awesomeness. See this comment for more details. Some of these tools are built on this Elsa fork.

Both of these projects are interested in converging on a single codebase. It sounds like Alex’s work will be ready for merging soon.

I love open source.

I’m Back

Some might’ve noticed that I disappeared off the net for two weeks. I have a good excuse: I was getting married.

Aug 08

Meanwhile in a parallel universe

Someone else is developing their own app-specific rewrite tools. In this case app-specific refers to automating porting code from gtk2 to gtk3. The approach is similar in that patches are produced, but it doesn’t look like a patch aggregating tool is written yet. Instead of the elsa/mcpp magic sauce, clang is being used, so this is limited to C at the moment.

KDE folks are behind in automated code rewrites arms race, perhaps the trolls should try some pork to accelerate KDE3->4 transition ๐Ÿ™‚

All kidding aside, it is awesome to see that less-manual-labour-through-compiler-assisted-refactoring approach is gaining mindshare.

Jul 08

Pork, MCPP, Oink and Elsa…What’s going on?

It seems that there is some confusion as to what pork is and how it’s related to oink and elsa. So here is my view of it.

Pork is my set of tools that use Elsa to rewrite sourcecode (mainly Mozilla code). Our use of Pork is solely for rewriting as it is not suited for convenient and hardcore analysis needs as much as the GCC based tools are.

MCPP is the secret sauce C preprocessor that makes C++ rewriting with Elsa possible by annotating preprocessed files with information to undo the lexical braindamage resulting from macro expansion.

Elsa is a awesome C++ parser. Awesome in that is can preserve more information regarding parsed code than any other C/C++ parser and it is easy to extend.

We maintain our own version of Elsa within pork.

I think our version of Elsa is the most up to date and most compatible with newer C++ features and headers used by newer GCC releases. We encourage other projects with C++ parsing/rewriting needs to collaborate with us. We will be parsing code with Elsa for a few years to come and it’s a lot of work to maintain a C++ parser by a single entity. I think elsa is a much better backend to build refactoring support onto than any other C++ parsing project out there right now.

The Messy Details

Now lets move on the more confusing parts: oink, oink-stack, and the oink mailing list.

oink consists of some static analysis tools and was meant to be a central place where all of the Elsa and Elsa-related development was supposed to happen. When people refer to oink, they usually mean the oink-stack which is a subversion meta repository that pulls in a dozen of subrepositoes(smbase, elkhound, elsa, oink(where static analysis tools live), etc).

So when I started working on refactoring tools I was told that I should aim to have my tools added to oink, but there were some legal hassles to work out in the meantime so I cloned the oink-stack and developed my tools with minimal changes to oink-stack. This included various elsa extensions, bugfixes, etc.

However, the little momentum that oink had has fizzled out due to various personality conflicts and various academics loosing interest. The code has been bitrotting for as long as I’ve been working at Mozilla.

So the end result of oink is that we have pork which is a superset of oink. I’m not even sure if I mention the name pork anywhere in the sources. So pork at the moment means “Taras’ continuation and extension of oink”. I am using the oink mailing list for any discussion on changes to Elsa/etc in hopes that at least some of the genius lurkers there will regain their interest in elsa.

Where do We Go From Here?

Onward! Due to the original authors vision of what C++ is and the state of C++ at the time Elsa was conceived, current pork code causes people to have many WTF moments (followed by banging head against keyboard) when they first start using it.

The short version of my plan is:

  • allow one to do “using namespace std” when using elsa
  • Restructure pork repositories such that there are only 3 of them rather than 11 (elsa, elkhound, pork)
  • get rid of the oink repository (those tools do not work for us)
  • Make pork only consist of just my tools (with a sane build system) rather than be mixed into unmainted oink stuff
  • Make pork compile with new compilers (GCC 4.3 and recent MSVC++)
  • Keep track of this in a bug
  • Clean up various misc things

Some of you might ask “But Taras, why now, why not just keep doing what you’ve been doing?”. I was doing what I was doing because I had an overwhelming goal of devising a way to automate static analysis and refactoring of Mozilla on my shoulders and I wasn’t convinced that it was feasible. I had to learn to split my time between tool development and actually using the tools. Naturally I cut corners on tool development ๐Ÿ™‚

Since then slowly, but surely various awesome hackers have started doing rewrites and analyses themselves freeing me up to focus more on development. To make matters sweeter, various hackers have started submitting bugreports, fixes, ports to my tools. This gives me more time to focus on the big picture.

Finally, I belive that automation of the sort we are doing at Mozilla is something that has been missing from open source development practices and it will catch on once people realize what they’ve been missing. Reducing those WTF moments will help people think positively.

Continue reading →

Jun 08

Pork 0.9 in the wild

Those who would like to play with Pork, but are allergic to pulling sources from version control can now download an actual pork release. Now someone needs to hook this into a GUI to provide easy Eclipse-style refactoring for C++.