Semantic Rewriting of Code with Pork – A bitter recap

LWN published an article about a tool that does refactoring of C code. Guess what, it’s yet another tool on top of a crappy C-parser that will never grok C well or even hope to support C++. To my great disappointment the author was not aware of my work on Pork. Clearly I have failed in letting people know that complex C and C++ can be refactored with (somewhat raw, but powerful) open source tools.

In addition to Dehydra (which is even mentioned in the first comment, yay!), I also maintain Pork – a fork of oink that is well suited to large-scale refactoring of real-world C/C++ code.

So far pork has been used for “minor” things like renaming classes&functions, rotating outparameters and correcting prbool bugs. Additionally, Pork proved itself in an experiment which involved rewriting almost every function(ie generating a 3+MB patch) in Mozilla to use garbage collection instead of reference-counting.

So to summarize:

  • Refactoring C is hard, but C++ is much harder
  • For refactoring C++ there is no better toolchain to start with than Pork
  • Pork shares no code with Dehydra.
  • Pork is built on the Elsa parser which makes it well-suited for rewriting large amounts of code. Dehydra’s isn’t suitable for rewriting code due to GCC providing a very lossy AST and incomplete location information.
  • Pork is not as convenient for analysis needs as Dehydra

For any questions regarding Pork feel free to post on the mailing list or ping me on IRC.

Language Wars

I find it depressing that the comments to the LWN article ended up being about language wars rather than the refactoring topic. Pork is written in C++ which is much more widely known than OCaml. However, I seriously doubt it’s easier for anyone to hack on advanced compiler frontend pieces in a language as ill-suited for the task as C++.

14 comments

  1. Bitter indeed! If you ever want to better your competitor, you’ll have to learn to respect their work. As you state, they chose a more suitable language for their tool (your choice is ‘ill-suited for the task’, yet according to you they will never grow beyond ‘yet another tool on top of a crappy C-parser’.

    I can imagine you being peeved because their tool got publicity and yours didn’t. The way forward though is to up your work, not down theirs.

  2. Taras, your work is not that ignored.

    About the fact Pork is written in C++, it doesn’t impact the refactoring scripts themselves, so it’s just a problem for the developpers of Pork, it doesn’t make it less effective for users.

    BTW how hard would it be to make it possible to use some ML language for the scripts ?

  3. Could you elaborate on “crappy C parser that will never grok C ” ? While I agree on the C++ part (but we are working on it), I disagree for the C part. Our parser handles quite well C+CPP and most gcc extensions. We can parse as-is more than 99% of the linux kernel. I have a paper at CC’09 on this parser:

    http://aryx.cs.uiuc.edu/~pad/papers/yacfe-cc09.pdf

    Concerning Pork, we knew that mozilla developers are working around elsa on refactoring and static analysis tools. But the goal of the coccinelle is quite different: the goal is to make it easy to write program transformations with a domain specific language: SmPL.
    It’s not just for renaming functions …

    While I agree that the article should have mentioned Pork/Deydra/oink/…, it would also be good if there was a central place that explains what each of this elsa-fork do …a way to install them … and a few examples of code showing how to use those tools.

  4. yoann padioleau

    Why did you erase my comment ?

  5. I didn’t delete the comment as I was asleep :), looks like it failed the captcha. I approved it now.

  6. I’m sorry, perhaps I was too harsh on the C part. My main frustration is that C is much easier to do this to than C++, but the fact that you claim you can parse most of C is not reassuring.

    Another issue is that fact you and other academics insist on write parsers from scratch. In my experience 99% parsing of Mozilla is insufficient, it needs to be 100% as that is more likely to result in a patch that compiles. Writing a C++ parser from scratch is a huge undertaking and based on experience of others I think you will eventually be forced to give up (due to the fact that full C++ parsing requires type elaboration which requires a couple of years of effort).

    Also we aren’t just renaming functions. If you look at the links in the article we can change function signatures and function bodies in significant way(with indepth analysis of existing code in prcheck and garburator. ie more than anything that eclipse/etc could ever hope for). In theory Pork supports any change one can dream up as opposed to being limited to what the DSL was designed to(obviously there are big upsides and downsides to that).

    I’m also an OCaml guy myself, but I picked Elsa/etc(and ended up maintaining it) because it was(and still is) the only way to make complex code changes to C++ code. So I’m sorry if I came across as attacking your particular tool, I was more upset that in general people keep thinking they can solve the C++ refactoring problem by writing yet-another-parser(ie by building upon their C work).

    I don’t approve of the ml-mocking C++ style used in Elsa, nor do I approve of the stupid build system, but I do believe Pork(Elsa+MCPP+some utility code) is the best possible toolchain to build a C++ refactoring tool on at this point in time(in another 2-5 years Clang might take that place). So it’s frustrating that researchers working in the area would rather give up on C++ refactoring instead of building upon what already works to take it further.

    If you have questions you’d like to see answered on the Pork page, I would really appreciate a list of questions and I’ll make an FAQ out of them.

  7. yoann padioleau

    > but the fact that you claim you can parse most of C is not > reassuring.

    Ok maybe I was not clear. We can parse all the C language with no problem. The problem is not C but cpp. We want to parse as-is the C files, and so to have in the AST the ifdefs, includes, define of macros, etc, and this is what is difficult. If you compile with gcc the kernel you will see that it compiles only 51% of the C files and only a portion of those C files (because of ifdefs). We, on the opposite, want to analyze all the source, not just the one you compile on your architecture. Maybe this is not a problem for mozilla, but for the linux kernel it is.

    > you and other academics insist on write parsers from scratch

    We didn’t really insist on that. We wanted at the beginning to reuse CIL but it turns out to be not well suited to what we wanted to do so we had to start from scratch. In the same way McPeak I guess was not satisfied with CIL either so he started elsa.
    I looked at Elsa but it was coded in C++ and could not handle cpp the way we needed at that time (it was 3 years ago).
    And except Necula I don’t know many academics that did a parser from scratch. I know Torvalds did one with sparse, but Torvalds is not an academic.

    > In my experience 99% parsing of Mozilla is insufficient, it needs to be 100%

    Well I don’t have the same experience with Linux. If you can automate 99% of the work then people are already happy.

    > thinking they can solve the C++ refactoring problem by > writing yet-another-parser

    We never said this in the article or on our webpage or in our papers on coccinelle.

    > Pork(Elsa+MCPP+some utility code) is the best possible > toolchain to build a C++ refactoring tool on at this >point in time

    Pork, or EDG, yes probably. I agree.
    But we didn’t want to make a refactoring tool (for C++), we wanted to make an easy-to-use program transformation DSL (for C).

    > So it’s frustrating that researchers working in the >area would rather give up on C++ refactoring instead of building upon what already works to take it further.

    We didn’t give up on C++ refactoring, we just had another research problem to solve: the need for a DSL for program transformation for the evolution of device drivers in Linux.

    > If you have questions you’d like to see answered on the Pork page,

    Just the one I asked you on LWN: how to specify with Pork the simple transformation about alloc and adding the if checking code (that must use the same variable), and what command line to apply it on the linux kernel.

  8. Would it be right to assume that pork is supposed to eventually facilitate code transformation/re-engineering scenarios such as those laid out at http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?action=browse&id=BoostSpiritCXXParser&revision=48 ?

  9. Mike,
    yes, that’s what Pork does for Mozilla.

  10. Hi,
    thanks for clarifying this-well, if that’s really the case, this project definitely needs more exposure, there’s a whole multitude of related efforts, and there have been numerous attempts to come up with C++ parser to do such code analysis/transformation tasks.

    In fact, most major IDEs also come up with their own home-brewed C++ parser (check out eclipse and kdevelop in particular: http://bugs.kde.org/show_bug.cgi?id=66683)

    There are so many tools that come up with half-working solutions, it would be truly awesome to have one self-contained piece of code, be it a standalone tool or library, which could be leveraged by all interested efforts.

    Hopefully, this would help pork grow and mature, so that everybody can contribute modifications.

    Also, even after having checked out the docs here (and basically everything else I could find), it’s hard to get a good grasp of the current status of the project, meaning I guess that it might help to spend one or two rainy weekends on improving the docs and documenting the state of affairs.

    In fact, if there was a way to run pork via some simple web frontend CGI where users could submit a custom snippet of code, pick a transformation (rename, expand, move to scope…), might additionally help to improve awareness of the power of this project.

    I really feel that it is pity that this project is so underdocumented. However, I also realize that the toolchain may really have to be purpose-agnostic, so that whatever tools are built on top of it, pork doesn’t cause any unnecessary grief or extreme dependencies.

    Maybe one needs to provide wrapper on top of it, or even turn it into a “service” (daemon) where a user may connect to, submit a snippet of code, a transformation and the service responds with a diff and closes the connection. I think such things may actually help improve awareness. In addition, providing a service-wrapper on top of pork might actually help some people who’d like to make use of pork functionality, without wanting to locally depend on the binaries. So, maybe this might really be an interesting idea?

    Interested programmers could use such a thing in their perl/python scripts. Depending on the feedback, this may be extended, if there’s demand.

    In any case, if there’s a way to set up a “refactoring playground” somewhere, this would probably be interesting for most people.

    At the moment, I’d imagine that many more people are interested in such a facility, than those who are interested in downloading, configuring, compiling and installing the toolchain.
    So, some simple way to try it out might be a good idea.

    Even if it’s just a web form with an text area where they can type some code (or optionally upload a file), and then run the transformation and be presented with the diff.

    Actually, there are probably many open source projects that could (and probably would like to) benefit from something like pork, but usage needs to be as intuitive and straightforward as possible, making pork available via some sort of demo service might prove to be a good way for fellow open source project to become familiar with it.

    regards,

    Mike

  11. I agree, a robust C++ front end is a bitch.
    Especially if you want to handle multiple dialects,
    such as ANSI, GCC, Microsoft, …

    I’m surprised that all this discussion hasn’t raised the topic of the DMS Software Reengineering Toolkit, and its dialect-agile C++ front end and general program transformation capabilities.

    DMS has been applied to C++ Avionics software to carry out massive program restructuring tasks.

    Here’s a reference to a paper describing the tool
    and the approach.

    Case study: Re-engineering C++ component models via automatic program transformation
    RL Akers, ID Baxter, M Mehlich, BJ Ellis, KR … – Information and Software Technology, 2007 – Elsevier

    The website contains a fair amount of informaiton about DMS itself

    http://www.semanticdesigns.com

  12. yoann padioleau

    Yes, my paper at CC’09 that I mentionned earlier in this
    thread, talks about DMS in its related work.

  13. Mihai Vasilian

    I am glad to see this really great work.
    I am interested in these kind of subjects.
    I am looking for a decent refactoring solution that I will adapt for emacs. Or maybe I should already start writings one for myself. And later to give it to others too. I am thinking that everybody says reuse the existing work, but very few succeeded. Like you with Elkhound and Elsa, which is great.
    Hm,.. and meanwhile I started to write some containers. I with every day closer to my true project.
    Mihai.

  14. I just had a look after reading your comment on LWN. Maybe you should spend some time documenting it, then it might see more use. For example, if I add a new parameter to a method, how would I go about changing all existing callers to, say, pass a NULL for that parameter?