29
Jan 09

Semantic Rewriting of Code with Pork – A bitter recap

LWN published an article about a tool that does refactoring of C code. Guess what, it’s yet another tool on top of a crappy C-parser that will never grok C well or even hope to support C++. To my great disappointment the author was not aware of my work on Pork. Clearly I have failed in letting people know that complex C and C++ can be refactored with (somewhat raw, but powerful) open source tools.

In addition to Dehydra (which is even mentioned in the first comment, yay!), I also maintain Pork – a fork of oink that is well suited to large-scale refactoring of real-world C/C++ code.

So far pork has been used for “minor” things like renaming classes&functions, rotating outparameters and correcting prbool bugs. Additionally, Pork proved itself in an experiment which involved rewriting almost every function(ie generating a 3+MB patch) in Mozilla to use garbage collection instead of reference-counting.

So to summarize:

  • Refactoring C is hard, but C++ is much harder
  • For refactoring C++ there is no better toolchain to start with than Pork
  • Pork shares no code with Dehydra.
  • Pork is built on the Elsa parser which makes it well-suited for rewriting large amounts of code. Dehydra’s isn’t suitable for rewriting code due to GCC providing a very lossy AST and incomplete location information.
  • Pork is not as convenient for analysis needs as Dehydra

For any questions regarding Pork feel free to post on the mailing list or ping me on IRC.

Language Wars

I find it depressing that the comments to the LWN article ended up being about language wars rather than the refactoring topic. Pork is written in C++ which is much more widely known than OCaml. However, I seriously doubt it’s easier for anyone to hack on advanced compiler frontend pieces in a language as ill-suited for the task as C++.


07
Nov 08

Enabling prcheck email notifications

I’ve been running my prbool checker with nightly notifications for about a year now. There are some false positivies, but mostly it picks up real bugs. The checker pulls the blame source to try to guess who is responsible for the error, but so far it has only been emailing me.

I’m going to make it emailed the hg blamed person starting this weekend, so if you commit something and get an email from me complaining about it don’t take it too personally. Hopefully this will keep the amount of new detectable prbool bugs at 0. If you do get an email and decide you don’t want to get these in the future, please complain and we’ll try to come up with a better notification solution.


19
Sep 08

Outparamdelling this way comes

Recently I dusted off outparamdel to see if I can get some refactorings landed. About a year ago, with the great QueryInterface outparamdelling experiment we ended up with a smaller binary footprint and tiny performance gain, but that never landed due to us not wanting to break compatability yet. Ever since, outparamdel has been patiently bitrotting within Pork waiting for the day it’s allowed to break APIs.

Recently jst listed some private API candidates for deCOMtamination and I have been letting outparamdel loose on them for the past week. So far only one such change has been committed, the rest are sitting in jst’s review queue. It’s exciting, as it’s the first ever application of outparamdel that didn’t get canned. For the whole list see the most recent bugs blocking my analysis metabug.

Cool part about these patches is that after various outparamdel special-casing and manual cleanup the line count is reduced by 10-30% while maintaining existing functionality!

So far I haven’t touched much outside of content/, so if you know of any APIs that could have the outparam rotated into the return value, file some rewriting bugs against me.

Rewriting with Mercurial

Mercurial is now my favourite python program ever. It makes rewrites so easy. Here is my typical workflow:


# Write an outparamdel input file specifying functions to rewrite..run outparamdel with pork-barrel to generate patch
hg qimport -f autopatch.diff && hg qpush && hg qref #make the patch nice and readable
hg qnew manual.diff #create a manual cleanup patch for cosmetic touchups and rewrites screwed up by macros#of course to submit patch for review, those two patches need to be combined
#do manual stuff
hg diff -r 19277 #produce a combined diff without loosing ability to edit each applied patch individually! For example, can regenerate the autopatch.diff with different outparamdel parameters or a bugfixed outparamdel. 19277 is a revision that has to be looked up with hg log, unfortunately there isn't a relative syntax to do hg diff -r "tip - 2"

This is a huge improvement over the ad-hoc workflow prior to hg switch when I was prototyping my tools. CVS + quilt don’t hold a candle to a modern revision control system.

Prcheck

I also have been doing another round of prbool corrections. Somehow I didn’t notice that the system stopped working due to the CVS->hg switch. Once again when reviving my nightly checker scripts, it was a pleasure to substitute all of the CVS hacks with mercurial commands.

I am waiting for the few remaining largeish (ie 3-4 bugs per file) patches to land before I can start submitting whackamole bugs with a single prbool correction per module.

It also seems that cairo and every other pre-C99 project have the same set of issues as Mozilla with their typedefed-int boolean types. Perhaps prcheck isn’t mozilla-specific at all.


04
Oct 07

Multiple Degrees of Correctness

Prcheck

The trouble with prcheck and the automated prbool validation is that one can’t attach the giant patch it produces to bugzilla and expect it to get committed. So I am spending this week combing through prcheck outputĀ  and patch-bombing bugzilla with per-module patches.

I find going through the errors manually to be a lot of fun than I expected. I am finding types of errors that I was not considering when I was writing prcheck. For example, I expected the biggest gains to come at runtime from making all prbool values 0/1, but it seems that most of the cool errors are due to PRBool & PRInt32 resolving to the same type. That results in code mayhem ranging from wrong method overloads being called to method signatures claiming to return PRBool where method bodies act like the function returns nsresults.

The prbool check is just an incredibly minor restriction of the C++ system, yet it resulted in hundreds of errors(almost all of which are typos). In my mind this reinforces the importance of static typing (which C++ doesn’t do enough of).

The main lesson I learned today is that code doesn’t have to be correct in order to work correctly.

Mobile

While on vacation in Ukraine I finally got to try out GPRS Internet through my cellphone. Sure EDGE is slow, but the convenience of having internet everywhere I go while traveling is unparalleled. It’s just too bad that I had to go to a developing country to be able to afford mobile internet. In Canada I would’ve paid over $750 for the $10 worth of Internet in Ukraine. So I am very excited that governments are starting to regulate mobile pricing. Looks like EU is first. I hope the local cellular oligopoly gets a kick to the head soon.


27
Jul 07

Superity Complex & Static Analysis

It is always frustrating to see a compiler complain about something trivial. It is especially annoying since most of the trivial complaints are trivial to fix automatically (eg. superfluous semicolon).

I think this is a bigger problem in the static analysis industry. Vendors/researchers ship their tools with a superiority complex built-in. Most of the error messages produced by error checking tools can be paraphrased as “Gee look, I found some trivial to fix bugs in your code, but I ain’t gonna do nothing about them! Neeener! Go worker-human!”

My policy is to make my tools more polite than that. Starting from prcheck, all of my tools will point out simple errors by suggesting patches (when possible). It is impossible to produce a correct patch every time, but I am not worried about that since developers are quite good at disregarding stupid suggestions.

Automatic Whining

Now I have a few extra scripts that lay the foundation for regular code inspection via static analysis. PRBool checks are my first step. Here is a sample email:

Continue reading →


26
Jun 07

Status Report: Recent Work

New Tool: Prcheck – PRBool’s best friend

Mozilla has a number boolean types and most of them are a form of an int. People expect them to behave like a bool, but since they can be assigned more than 1 value for true, this assumption can lead to bugs. Prcheck will mandate that prbools can only be assigned 1 or 0. Typically static checkers output errors, but prcheck outputs errors & fixes for them in diff format so they can be fixed automatically. See this bug for more info. I think the tool is almost complete. Hopefully, once the MCPP issues described below are addressed, I’ll have one giant patch to eliminate this problem.

MCPP Teething Troubles

Even in the open-source world there are some problems with vendor monoculture. For example we a single vendor providing the C++ compiler and one C preprocessor that is widely used. Even though MCPP is a portable preprocessor that can plug into multiple compilers, it still chokes on some GCCisms. Things are slowly changing and new open source compilers are on the horizon: I can’t wait.

Additionally, since Mozilla is one of the largest projects to be preprocessed with MCPP, we found some scalability bugs in MCPP. The MCPP maintainer addressed those.

Elsa Backend Source Location Work – Works Well?

I am amazed, but hack & slash approach to adding end-of-ast-node info to elkhound still seems to work correctly! It mostly required consulting wikipedia on LR & GLR parsing algorithms to understand how to modify the data structures used in elkhound. I love the combination of Wikipedia and pretty & easy to modify source code.

The preprocessor undo-log appears to function exactly as intended too. That combined with the end-of-ast-node info means that I finally have the ability to easily cut’n’paste code to move it around programmatically. It basically makes the C/C++ pretty printer in oink obsolete for my purposes. This is cool because now I can output prettier source code AND rely on less oink C++ code to do so (ie no need to call the pretty printer[s]), so more refactoring tools could be written in higher level languages such as OCaml or JavaScript in the future.

Publishing My Oink Mods

I have been sending people tarballs on request, due to not being able to integrate my changes into the upstream svn repository. I tried a few svn2hg tools, but they didn’t work very well. hgsvn is good enough to work for OpenJDK, so it might work for me. I’ll try to publish an hg repository of my work within the next week or two.

I’m sorry for not doing this earlier, but it’s extremely time consuming to switch gears from coding oink stuff to trying to package it, especially due to all of the politics involved. Hopefully once the repository is easier to work with I’ll have more people sharing the oink worldload with me. Now that dust settled and the major missing pieces are either implemented or decided upon, a lot of exciting possibilities opened up.

libgcc

Another day, another piece of preprocessor trivia. Turned out there was an alternative to MCPP that I could have used: gcc’s libcpp. It is common knowledge that gcc uses an integrated preprocessor. It is not so well known that the preprocessor is factored out into what appears to be a mostly standalone library inside of gcc called libgcc. Google barely knows about it and there are no docs or other webpages pointing to it, so i missed it in my search.

This could be a useful project, take libcpp turn it back into a standalone preprocessor and add the cpp undo log comments to it. The only downside of libgcc is that it is GPL which would normally be a pain for BSD-licensed projects, but by turning it back into a standalone tool there is no linking to to worry about.

So if anyone finds implementing the macro-undo log with libgcc interesting, please feel free to do so :)

Random Rant on Parallels vs VMWare Fusion vs BootCamp vs 64bit Linux

Continue reading →