Old Dijkstra Essays Considered

Sometime when I was an undergraduate, I came across a news article announcing that all the EWDs has been published online.  After figuring out what EWD meant (EWD are Dijkstra’s initials; he used “EWD” followed by a number to index his essays), I started reading a few and got hooked.  For having an ivory-tower sort of reputation (stemming from opinions on formal verification and how computer science should be taught), it was surprising how personable, humble and clear his writing is.  Many of the essays are handwritten and even his handwriting style is delightful (someone even turned it into a font).

For fun, I recently decided to re-read some of the outstanding EWDs to see how they sounded to me now; for the last few years I’ve been hacking on the Mozilla JS engine (SpiderMonkey) so presumably my perspective on software development has changed.  To my surprise, instead of finding the essays milder than remembered (as happens with movies that used to be terrifying (Critters) or incredibly awesome (Dino-Riders)), I found some new things relevant to what is happening now in SpiderMonkey.  That’s what I’d like to share in this post.

One of, if not the most, famous EWD is “Notes on Structured Programming” (NOSP).  This essay is the expansion of the famous “Go To Statement Considered Harmful” CACM article that started the “X considered harmful” and, more generally, “X considered Y” memes.  Primarily, NOSP makes a well-reasoned argument for why it is better to use structured control flow constructs (like if/then/else and loops) than goto.  Great, we all get that — so much so that goto is now often rejected a priori despite having legitimate (albeit uncommon) applications (cf. Knuth’s “Structured programming with go to statements“).

However, NOSP’s agenda is broader than just browbeating us into not using goto.  Dijkstra frames the goto issue with some general notes on making programs simpler and in the process states two principles of program simplicity that I really like.  He doesn’t announce them as principles — they’re just sentences buried in the middle of paragraphs — but they seem like principles to me… so I’ll call them principles.  Often, judging the simplicity of a patch or piece of code feels highly subjective and more a matter of taste than anything.  However, I was delighted to see how many simplifications (considering recent SpiderMonkey patches I’ve written or seen flying by) could be viewed as following from these two principles.

Let’s start with the first principle, taken from the following quote:

In vague terms we may state the desirability that the structure of the program text reflects the structure of the computation.  Or, in other terms, ‘What can we do to shorten the conceptual gap between the static program text (spread out in “text space”) and the corresponding computations (evolving in time)?’

For the specific case of goto, Dijkstra proceeds to spell out in great detail how structured control flow admits an unequivocally shorter conceptual gap.

In SpiderMonkey, many recent simplifications have simply been removing old uses of goto (e.g., check out js_GC before a heroic set of patches by Jason Orendorff, as well as obj_eval and js_Invoke before several incremental rewritings).  I should note that switching SpiderMonkey to C++ a few years ago has been invaluable in control flow refactoring since with C++ comes RAII which has been used in practically every big function cleanup since.

But there is a lot more to this “shortening the syntax/computation gap” than control flow level refactoring.  It seems to me that trying to encode more invariants and program structure in static types falls under this category since static typing entails statements of the form “for all executions, variables of this type must have some property”.  One recent banner example is Chris Leary’s recent refactoring of JSAtomList et al into something far more typeful.  As demonstrated in his patch, C++ types often ends up simulating discriminated unions found in higher-level language (or, more generally, ADTs).  Another example of this is the type representing a JavaScript value.

Because of the, to put it kindly, dependently typed nature of many central SpiderMonkey data structures, types must often be bolted on, like an exoskeleton, to an underlying representation that is being treated like a bag of bits.  Here, C++’s unsafe casts, inlining, and templates are critical for avoiding performance penalties for using abstraction.  The data structure that holds the JS call stack is one example.  Another example is the recently-refactored set of types iteratively concocted by Gregor Wagner, Igor Bukanov, and Bill McCloskey to abstract GC data structures.

Another interesting non-standard application of types can be seen in the recent refactoring of strings.  Here, the C++ class hierarchy captures the logical hierarchy of string invariants.  Unlike an ordinary C++ class hierarchy, the string hierarchy contains no virtual functions and all instances of the hierarchy are necessarily the same size.  JSObject is incrementally growing a similar hierarchy (which I hope continues!).  In both cases, the C++ type system is being used (and sometimes abused) to provide the desired connection between static types and dynamic properties of strings/objects.

Finally, in the worst case, no language or type system will help you simplify the mapping from syntax to dynamic computation; you just need to suck it up and completely change tack.  An admirable example of this last year was Andreas Gal’s transformation of jsiter.  It is a beautiful thing when a patch, in total, removes 500 lines and SunSpider gets 4% faster.

The second principle contained in NOSP that I’d like to point out comes from the following quote:

Eventually, one of our aims is to make such well-structured programs that the intellectual effort (measured in some loose sense) needed to understand them is proportional to program length (measured in some equally loose sense).  In particular, we have to guard against an exploding appeal to enumerative reasoning, […]

To give a bit of context, “enumerative reasoning” is defined in the essay to mean the mode of reasoning where you are forced to consider every possible case and make sure the desired property holds for each of them individually.  Dijkstra contrasts this to the more desirable “abstractive” and “inductive” reasoning and explains how building up structured control flow allows their use instead of enumerative reasoning.

One example of this in SpiderMonkey is typified by pretty much any work that Jeff Walden does.  In his patches (and, as the style spreads, others’ too), each step of the implementation is made to correspond closely to the individual steps of the corresponding ECMAScript sections, complete with comments labeling the steps.  This reduces enumerative reasoning by allowing one to judge the spec-conformance of an individual statement instead of an entire function.

Another way enumerative reasoning is reduced is by simply having less code.  This of course is Reusability, the Holy Grail of Software Engineering (apparently, a solved problem).  One of the biggest examples of this I can think of is the security wrapper rewrite that was part of the mammoth compartmentalization effort.  Allegedly, thousands of lines of mind-bending security-critical code were removed, replaced by a small set of composable policy templates.  And if that wasn’t enough, harmony:proxies got to ride along at a bargain rate!

That last class of enumerative-reasoning-reducing changes that come to mind are those that cut down on the number of VM-wide concepts — be they states, requirements, corner cases, gotchas, possibilities, etc — that must be kept in mind to effectively work on the SpiderMonkey.  Examples from recent memory include the removal of: JSScope, slow natives, watchpoint hacks, JSObjectMap, GC-from-malloc, concurrent JSRuntime access, local rooting scopes, newborn roots, display optimizations, multi-threaded objects and titles, multi-threaded strings, dormant stack frames, __count__, __parent__, non-identifier atoms, heap-doubles, null-is-an-object, the no-int-valued-doubles-in-values rule… and those are just things that appeared on my radar; a simple bugzilla search for “remove” shows a lot more.

Altogether, I think this demonstrates that these two simplification principles cover a lot of real-world patches.  It’s neat to find them nestled in a 40 year old essay written when goto still ruled the land.

On a side note, I think continual simplification is vital to maintaining a healthy, long-lived codebase.  Thus, I see the size of the — far from complete– set of simplifications listed above to be a very good sign for an already rather long-lived codebase.

I’ll conclude with the summary of a section in NOSP entitled “On Our Inability To Do Much”:

Summarizing: as a slow-witted human being I have a very small head and I had better learn to live with it and to respect my limitations and give them full credit, rather than to try to ignore them, for the latter vain effort will be punished by failure.

12 Responses to Old Dijkstra Essays Considered

  1. Nicholas Nethercote

    Wow, nice article. Just finding and embedding all those links must have taken *ages*.

    • Yeah, it took the better part of an evening :) It was kinda fun, though, both to see old discussions, knowing how things eventually worked out, and realize how far things have come.

  2. I just started a Twitter account a few weeks ago that publishes, in order, one EWD every day. Think you might enjoy it !
    http://twitter.com/#!/EWDijkstra

  3. At least one of those simplification links was not my doing; I believe I asked jandem to apply that trick to String.prototype.split when we fixed our handling of passing undefined to it. But I’m happy to be deemed a trend-setter.

    Your “dependently typed” phrasing is an amazingly obscure way of putting it. I think I see what you were driving at, but even with knowledge of what you were talking about I had to squint a little to make sense of it. :-)

    Your “solved problem” link enlightens me! When will we port SpiderMonkey to VB6 to improve our code-sharing abilities so?

  4. Interresting enough, he never meant ‘goto statement considered harmful’ to be the main subject of his essay. That statement was used as the title by the editor when it was published.

    You can see it on YouTube: http://www.youtube.com/user/smpusr , but I don’t know which video. To bad the recording of “Edsger Dijkstra’s Turing Award Speech” on YouTube has been recorded over in part 4 and more than the first half of part 5.

    In videos like “Discipline in Thought” where he is interviewed you can also see he talks like he writes. :-)

    Althought it is in Dutch you might still notice how he thinks before he starts a sentence.

  5. The “C hangover” is mostly cured, what with all the templates, RAII, and other “good parts” of C++ being well-used. But there is still too much C-looking code from the dark ages when we didn’t have C++ interop, back at Netscape in the 90s. Keep going! You are absolutely right about living code requiring constant simplification.

    /be

  6. Also, old code implementing built-in functions that does not resemble ES3 or later spec pseudocode, in the absence of “post-mature optimization”, often in fact predates ES3 — i.e., I wrote it in 1996 or 1997. Cleaning such code up to match the spec as much as is reasonable (as Jeff says, don’t try to match the spec and use “O” as a variable name), especially to fix real edge-case bugs, is welcome work.

    /be

  7. I love your wordpress web template, wherever would you get a hold of it from?

    • The theme is called Coraline-dherman 1.0 by Dave Herman. It was already installed for blog.mozilla.org, so I don’t know how to go about finding it, maybe you can ask Dave :)

  8. Hey Luke! I was just curious to see what would pop up if I searched for “Luke Wagner Mozilla”, and here you are! Nice blog. :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>