Category Archives: Programming

Premature Optimisation

I loved this sentence from Olin Shivers’ description of some Scheme history:

I fashionably decried premature optimisation in college without really understanding it until I once committed an act of premature opt so horrific that I can now tell when it is going to rain by the twinges I get in the residual scar tissue. Now I understand premature optimisation.

I’d love to know exactly what the premature optimisation was.

I also read Olin’s Dissertation Advice about fifty times in 2004.  Great stuff.

Libraries should permit custom allocators

Some C and C++ libraries permit the use of custom allocators, which are registered through some kind of external API.  For example, the following libraries used by Firefox provide this facility.

  • FreeType provides this via the FT_MemoryRec_ argument of the FT_New_Library() function.
  • ICU provides this via the u_setMemoryFunctions() function.
  • SQLite provides this via the sqlite3_config() function.

This gives the users of these libraries additional flexibility that can be very helpful.  For example, in Firefox we provide custom allocators that measure the size of all the live allocations done by the library;  these measurements are shown in about:memory.

In contrast, libraries that don’t allow custom allocator are very hard to account for in about:memory.  Such libraries are major contributors to the dreaded “heap-unclassified” value in about:memory.  These include Cairo and the WebRTC libraries.

Now, supporting custom allocators in a library takes some effort.  You have to be careful to always allocate in a fashion that will use the custom allocators if they have been registered.  Direct calls to vanilla allocation/free functions like malloc(), realloc(), and free() must be avoided.  For example, SpiderMonkey allows custom allocators (although Firefox doesn’t need to use that functionality), and I just fixed a handful of cases where it was accidentally using vanilla allocation/free functions.

But, it’s a very useful facility to provide, and I encourage all library writers to consider it.

Duplicated abstraction layers in Firefox

Just about every operating system provides a mechanism for directly allocating and deallocating memory at the page level (ie. not malloc/free or new/delete).  The functions to do this vary from OS to OS:

  • Windows: VirtualAlloc/FreeAlloc.
  • Posix (e.g. Mac and Linux): mmap/munmap.
  • Mac also has: vm_allocate/vm_deallocate.

So it’s very natural to add an abstraction layer: your own functions (let’s call them Map and Unmap) that use conditional compilation to choose the appropriate OS-specific call.

An abstraction layer like this appears in lots of software projects.  Firefox happens to incorporate code from a lot of other projects, and so what happens is you end up with lots of duplicate abstraction layers.  For example, in the JS engine alone we have five Map/Unmap abstraction layers.

  1. In js/src/jsgcchunk.cpp, used to allocate chunks for the GC heap.
  2. in js/src/vm/Stack.cpp, used to allocate some stack space.
  3. In js/src/nanojit/avmplus.cpp, used to allocate space for code generated by the trace JIT.
  4. In js/src/assembler/jit/ExecutableAllocator*.cpp, used to allocate space for code generated by the method JIT.
  5. In js/src/ctypes/libffi/src/dlmalloc.c, used to allocate chunks of memory that are handed out in pieces by the heap allocator defined in that file.

The duplication of 3, 4 and 5 are understandable — they all involve large chunks of code that were imported from other projects.  (Furthermore, you can see that ctypes/ has its own heap allocator, thus duplicating jemalloc’s functionality.)  The duplication between 1 and 2 is less forgiveable;  neither of those cases were imported and so they should share an abstraction layer.

How many other Map/Unmap abstraction layers are there in the rest of Firefox?  The JS engine may be more guilty of this than other parts of the code.  Is there a sane way to avoid this duplication in a world where we import code from other projects?

Using Valgrind to get stack traces

Sometimes I want to do some printf-style debugging where I print not only some values, but also the stack trace each time a particular code point is hit. GNU provides a backtrace() function that supposedly does this, but I tried it and got hopeless results, little more than code addresses.

Fortunately, you can do this pretty easily with Valgrind.  First, add this line somewhere in your source code:

  #include <valgrind/valgrind.h>

Then, at the point where you want to print the stack trace, add this:

  VALGRIND_PRINTF_BACKTRACE("foo");

You can of course print something other than “foo”.  In fact, VALGRIND_PRINTF_BACKTRACE is a variadic printf-style function, so you can do stuff like this:

  VALGRIND_PRINTF_BACKTRACE("%s: %d\n", str, i);

You then have to run the program under Valgrind as usual, except you probably should use --tool=none because that’ll run the quickest.

This is a trick I find occasionally invaluable.

Another Go at language design

The other day I attended a very interesting talk at the University of Melbourne given by Rob Pike.  The title was “Another Go at language design” and it was all about Google’s new programming language, Go.  It was a high level overview of the language with an emphasis on why certain design decisions were made.

Here is a random selection of the things that struck me as most interesting.

  • Pike lamented the state of modern industrial languages, by which he meant C++ and Java.  He started with a quote from Dick Gabriel “Old programs read like quiet conversations between a well-spoken research worker and a well-studied mechanical colleague, not as a debate with a compiler.”  In particular, one of the aims of Go is to show programmers whose only exposure to static typing is through C++ and Java that statically typed languages can have the simpler, more concise feel (fewer declarations!) of dynamically typed languages like JavaScript and Python.
  • Compile times of Go programs are small.  This is because the compiler doesn’t need to know about transitive dependencies between packages (their name for modules).  For example, if you have a file A.go which depends on B.go which depends on C.go, at first you have to compile C.go, then B.go, then A.go.  But all the information exported from A.go is stored in the compiled B.o file, which means that if you recompile A.go you don’t have to recompile anything else.  Hmm, now I’m unsure if that’s exactly right.  But the broader point is that the language avoids C++’s problem where bazillions of header files have to be read for every module.  He said with a completely straight face that their goal was to have a 1,000,000x speed-up over C++ for the compilation time of large programs, though they’d probably be satisfied with 100,000x.  Impressive!  Imagine if Firefox compiled in less than a second.
  • Identifiers that start with an upper-case letter are public, and identifiers that start with a lower-case letter are private.  The other language I’m familiar with that has a similar distinction is Haskell, where identifiers that start with an upper-case letter are used for types, and identifiers that start with a lower-case letter are used for values.  The nice thing about Go’s approach is that it gives you strictly more information:  you can determine from an identifier’s use point whether it’s public or private, which saves you from having to find it’s declaration.
  • There is no automatic conversion between numeric types.  This is for simplicity;  Pike said (probably exaggerating) that 1/3 of the C standard deals with this topic.  But the handling of numeric constants avoids many cases where explicit conversions are needed, because numeric constants are platonic in the sense that they don’t have a particular type until they are assigned to a variable.
  • They have a reformatting program, gofmt, that rewrites Go code into an “approved” layout.  This ensures that all Go code looks consistent, and avoids fights over style.  (Robert O’Callahan would approve!)  Interestingly, gofmt is separate from the compiler, and the idea is that you run it once you have finished a change.  At Google when code is committed into a repository, gofmt is run and if the layout doesn’t match the code is rejected.  I asked why they made it a separate program, rather than having stronger syntax/layout checking in of the compiler.  He said that (a) they hadn’t thought of putting it in the compiler, (b) it seemed like it would be less painful to just run it occasionally rather than having to worry about it every time you compile, and (c) it would slow down the compiler.

It was a good talk, Pike is an engaging speaker.  I’m glad he made the trip to Melbourne.

A win for code hygiene

I’ve written recently about clean-ups I’ve been making in Nanojit — simplifying code, refactoring, adding more internal sanity checking.  I’m a firm believer that these kinds of changes are worth doing, that “if it ain’t broke, don’t fix it” is not always true.  But there’s always a voice in the back of my head wondering “am I just polishing this code for little gain?  Should I be working on something else instead?”  Yesterday I received confirmation that this work has been worthwhile.  But before I can explain why, I need to give some background.

Nanojit serves as the back-end of TraceMonkey.  It converts LIR (a low-level intermediate representation) generated by TraceMonkey’s front-end into native code.  So Nanojit is a critical part of TraceMonkey.  And TraceMonkey is a critical part of Firefox.

However, code generators (and register allocators) are tricky beasts — easy to get wrong and hard to debug.  (A long time ago I heard someone, I can’t remember who, say “code generators and garbage collectors should crash as early and noisily as possible”.)  So with a piece of code this important and inherently difficult, it’s a good idea to arm yourself with some weapons that give you confidence that it is correct.

Weapon 1: clean IR semantics

One way to do this is to give your IR clean semantics.  LIR’s semantics are mostly clean, but there are a few nasty cases.  One of the worst is the ‘ov’ instruction.  It performs an overflow check on an arithmetic (add, mul, sub, neg) operation.

JavaScript numbers are doubles by default, but TraceMonkey sometimes demotes them to integers for speed.  TraceMonkey has to check for integer overflow in these cases;  if an integer does overflow TraceMonkey has to step back and use doubles for the computation.

Here’s some example LIR that uses ‘ov’:

 ld6 = ld sp[-8]
 add1 = add ld6, 1
 ov1 = ov add1
 xt4: xt ov1 -> pc=0x883ed8 imacpc=0x0 sp+0 rp+0 (GuardID=218)

The first instructions loads a value from the stack.  The second instruction adds 1 to that value.  The third instruction performs an overflow check and puts the result in ‘ov1′.  The fourth instruction is a guard;  if ‘ov1′ is true it exits the code block.

So why is ‘ov’ semantically nasty?  First consider ‘add’, which is a semantically cleaner instruction.  Its result (in this case ‘add1′) is a function of its inputs (‘ld6′ and 1).  In comparison, ‘ov’ does not have this property.  The ‘ov’ doesn’t really take ‘add1′ as an input –  you can’t tell by looking at ‘add1′ whether the addition overflowed.  In fact you can’t really understand what is happening here at the LIR level;  you have to know what happens in the native code that is generated from this LIR.  It turns out that the native code for the ‘add’ also sets some condition codes, and the native code generated for the ‘ov’/’xt’ pair inspects those condition codes.  For this to work, the ‘add’ has to immediately precede the ‘ov’;  if it does not, whatever intermediate native code that is generated could overwrite the condition codes, in which case the behaviour of the guard becomes completely unpredictable.  This is obviously bad.

The real problem is that the addition has two outputs:  the result, and the overflow status indicator (which maps to condition codes in the hardware).  But LIR has no explicit way to represent that second output.  So the second output is implicit, and an extra constraint must be imposed on the LIR instead, i.e. ‘ov’ must immediately follow an arithmetic operation.  Because this constraint is so arbitrary it’s easy to forget and easy to break.

In May 2009 Julian Seward opened a bug to improve LIR’s semantics, giving five suggestions.  It generated a lot of discussion and Julian drafted a patch to replace ‘ov’ with some cleaner opcodes, but there was enough disagreement that it never went anywhere.  But I didn’t forget about it, and ‘ov’ has annoyed me enough times recently that I decided to resurrect Julian’s idea, this time in a new bug to escape the baggage of the old one.  The idea is simple: if ‘ov’ always has to follow an arithmetic operation, then why not create new opcodes that fuse the two parts together?  These new opcodes are ‘addxov’, ‘subxov’ and ‘mulxov’.  With my patch the code from above now looks like this:

 ld6 = ld sp[-8]
 add1 = addxov ld6, 1 -> pc=0x883ed8 imacpc=0x0 sp+0 rp+0 (GuardID=218)

The ‘addxov’ adds ‘ld6′ and 1, puts the result in ‘add1′, and exits if there was an overflow.  The generated native code is unchanged, but the implicit output of the addition is now hidden within a single LIR instruction, and it’s impossible for overflow checks in the native code to become separated from the potentially-overflowing arithmetic operation.

Weapon 2: strong IR sanity checking

Compilers are usually built in a pipeline fashion, where you have multiple passes over the code representation, and each pass does a task such as optimisation, register allocation or code generation.  It’s also a really good idea to have a pass that does a thorough sanity check of the code representation.

In November 2008 Jim Blandy filed a bug suggesting such a pass for Nanojit, one that performs type-checking of the LIR.  The bug sat untouched for over six months until some extra discussion occurred and then (once again) Julian wrote a patch.  It found at least one case where TraceMonkey was generating bad LIR code, but again the bug stalled, this time because we spent three months merging Mozilla’s and Adobe’s copies of Nanojit.  I resurrected the bug again recently and added some “structure checking” for things like the ‘ov’ constraint, and it landed in the TraceMonkey repository on January 21st.  Happily, my version of the checker was simpler than Julian’s;  this was possible because LIR had had some other semantic clean-ups (e.g. properly distinguishing 64-bit integers from 64-bit floats).  My patch replaced a very basic type-checker (called “SanityFilter”) that had been in TraceMonkey, and immediately found two bugs, one of which involved ‘ov’.

Synergy

It’s not often that a bug report involving your code makes you smile.  Yesterday Gary Kwong hit an assertion failure in Nanojit when fuzz-testing:

    LIR structure error (end of writer pipeline):
    in instruction with opcode: ov
    argument 1 has opcode: add
    it should be: located immediately prior, but isn't
  One way to debug this:  change the failing NanoAssertMsgf(0, ...) call to a
  printf(...) call and rerun with verbose output.  If you're lucky, this error
  message will appear before the block containing the erroneous instruction.

In other words, there’s an ‘ov’ following an ‘add’, but there are one or more other LIR instructions between them.  This means that the code generated will be bogus, as I explained above.  This bug may have been in TraceMonkey for a long time, but it wasn’t detected until strong internal sanity checking was added.  The good news is that the ‘ov’-removal patch (which hasn’t landed as it’s still awaiting review) will fix this patch.

Lessons learned

This story tied together a number of related ideas for me.  In particular:

  • Clean IR semantics are worth the effort.  Hacks may save time initially but they will come back to bite you.
  • Strong IR sanity checking is worth the effort.  (And cleaner semantic allows for stronger checking.)
  • Listen to Julian, especially when he talks about correctness.  (And read his code!  VEX/pub/libvex_ir.h in the Valgrind source code describes Valgrind’s IR, which is a wonderfully clean IR, particularly in the way it separates statements (which have side-effects) from expressions (which don’t).)
  • Don’t put too many ideas into one bug report.  Too much discussion can kill a bug, or at least put it in a coma.
  • Follow-through is important.  A patch can’t do much good unless it lands.
  • Fuzz testing is awesome (but we already knew that).

My Contribution to Firefox 3.6

Firefox 3.6 is the first Firefox release that has my code in it.  Much of it was written over six months ago, so it’s nice that it’s finally seeing the light of day.  It’s also fun to know that my code is running on the machines of my friends and family;  that’s new for me.

One friend asked what my specific contribution to the release was.  I said “in some cases it might be a little faster and a little less likely to crash”.  At the December all-hands, when people asked me what I’ve been working on, I said “filing the sharp edges off Nanojit“.  Nanojit is the JIT compiler back-end used by TraceMonkey;  it translates a low-level intermediate representation called LIR into native code.  It has some clever features and does a pretty good job in many respects but a year ago a lot of its code was, well, awful.  It’s now much better:  simpler, faster, with fewer ways to go wrong, more internal sanity checking, and better testing (including some fuzzing).  And many of these improvements didn’t make it into 3.6.

I have a few more fixes to make, but Nanojit improvements are winding down, and my attention will soon be focused elsewhere within TraceMonkey.  I thought that I would be spending some time this year improving the quality of Nanojit’s code generation, but I’ve found that the quality is already pretty good.  This perhaps isn’t so surprising because LIR is so low-level that it’s pretty close to machine code.  However, while determining that I found that the LIR generated by the TraceMonkey front-end is often awful.  It looks like there is some fat to trim there!

Safer refactoring in Nanojit

Recently I’ve been making a lot of refactoring changes to Nanojit that I think of as “code hygiene” — changes that clean up the code, but don’t change its behaviour at all.  (Example 1, example 2.)  When making changes like these I like to do the changes carefully and gradually to make sure that I don’t affect behaviour unintentionally.  In Nanojit’s case, this means that the generated code shouldn’t change.  But it’s always possible to make mistakes, and sometimes a particular change is disruptive enough that you can’t evolve the code from one state to another without breaking it temporarily.

In such cases, if the regression tests pass it tells me that the new code works, but it doesn’t tell me if I’ve managed to avoid changing the behaviour as I intended.  So I’ve started doing comparisons of the native code generated by Nanojit for parts of SunSpider using TraceMonkey’s verbose output, which is controlled with the TMFLAGS environment variable.  For such changes, the generated code should be identical to an unchanged version.  The way I organise my workspaces is such that I always have an unchanged repository available, which makes such comparisons easy.

Except, the generated code is never quite identical because it contains addresses and the slightest change in the executable can affect these.  So I run the verbose output through this script:

perl -p -e 's/[0-9a-fA-F]{4,}/....../g'

It just replaces numbers with four or more digits with “……”, which is enough to filter out these address differences.

I tried this technique on all of SunSpider, but it turns out it doesn’t work reliably, because crypto-aes.js is non-deterministic — it contains a call to Date.getTime() which introduces enough non-determinism that TraceMonkey sometimes traces different code.  A couple of the V8 benchmarks have similar non-deterministic behaviours.  Non-determinism is a rather undesirable feature of a benchmark suite so I filed a SunSpider bug which is yet to be acted on.

Fortunately, experience has shown me that I don’t need to do code diffs on all of SunSpider to be confident that I haven’t changed Nanojit’s behaviour — doing code diffs on two or three of the bigger benchmarks suffices, as enough code is generated that even small behavioural differences are quickly found.

In fact, this code diff technique is useful enough that I’ve started using it sometimes even when I do want to change behaviour — I split such changes into two parts, one which doesn’t change the behaviour and one which does.  For example, when I added an opcode to distinguish between 64-bit integer loads and 64-bit integer floats I landed a non-behaviour-changing patch first, then did a follow-up change that improved x86-64 code generation.  This turned out to be a good idea because I introduced a regression with the follow-up change, and the fact that it was separated from the non-behavioural changes made it easy to determine what had gone wrong, because the regressing patch was much smaller than a combined patch would have been.

I also recently wrote a script to automate the diff process, which makes it that much easier to perform, which in turn makes it that much more likely that I’ll actually do it.

In summary, it’s always worth thinking about new ways to test your code, and when you find a good one, it’s worth making it as easy as possible to run those tests.

“No-else-after-return” considered harmful

(Update below)

At the time of writing this, the Mozilla Coding Style guidelines have this recommendation under “General C/C++ Practices”:

Don’t put an else right after a return. Delete the else, it’s unnecessary and increases indentation level.

I can appreciate this in some circumstances, such as when checking for an error case early in a long function:

void f()
{
    if (errorcase())
        return;

    a();
    b();
    c();
    d();
}

But as a blanket guideline I think it’s misguided because it promotes an impure style of programming.  And by “impure” here I mean the opposite of “pure” in this sense, i.e. “free from side-effects and other such error-prone junk”.  Let’s consider a simple example:  the max() function.  Here’s a way to write it that is quite sensible, in my opinion:

int max(int a, int b)
{
    if (a > b)
        return a;
    else
        return b;
}

But this violates the no-else-after-return guideline.  The Mozilla way to write this is like so:

int max(int a, int b)
{
    if (a > b)
        return a;
    return b;
}

(I’m not exaggerating, see this comment and this comment from Mozilla’s VP of Engineering and CTO respectively, about an example very similar to this.)

Urk.  That’s horrible.  Any time you’re using an ‘if’ without a corresponding ‘else’, you’re relying on an impure feature — you must be doing something with a side-effect in the ‘then’ branch, or some kind of impure control flow.  Mozilla’s coding guidelines insist that we write max(), a pure function, in an impure way.  Also, it’s just harder to read because the two return statements have different indentation levels despite the fact that they are closely related.

In this case there’s a way around the problem:

int max(int a, int b)
{
    return (a > b ? a : b);
}

Even though C’s if-then-else expression syntax is awful, it’s a better way to do it.

In general, statements suck, expressions rule.  Even in a language like C/C++ you can do something approaching functional programming by avoiding statements where possible.  In a tiny example like max() the difference may seem trivial, but in bigger examples the difference becomes more important.  Consider the following code from Nanojit’s register allocator which picks a register by looking at the free registers, the registers that are allowed to be used, and possibly the preferred registers.  You don’t have to understand it closely, but it helps to know that RegisterMask is just a bitfield with one bit per register.

Register Assembler::registerAlloc(RegisterMask allow)
{
    RegAlloc &regs = _allocator;
    RegisterMask prefer = SavedRegs & allow;
    RegisterMask free = regs.free & allow;

    RegisterMask set = prefer;
    if (set == 0) set = allow;

    if (free)
    {
        // at least one is free
        set &= free;

        // ok we have at least 1 free register so let's try to pick
        // the best one given the profile of the instruction
        if (!set)                                                             
        {
            // desired register class is not free so pick first of any class
            set = free;
        }
        NanoAssert((set & allow) != 0);
        Register r = nRegisterAllocFromSet(set);
        regs.used |= rmask(r);
        return r;
    }

    // ... more code here ...
}

I find this code incredibly hard to understand.  There are three ‘ifs’ that lack ‘elses’, so it relies heavily on state updates to do its work.  In fact, in order to understand it, I ended up rewriting it in a more pure style, by using a sequence of small refactorings:

Register Assembler::registerAlloc(RegisterMask allow)
{
    RegAlloc &regs = _allocator;
    RegisterMask allowedAndFree = allow & regs.free;

    if (allowedAndFree)
    {
        // At least one usable register is free -- no need to steal.  
        // Pick a preferred one if possible.
        RegisterMask preferredAndFree = allowedAndFree & SavedRegs;
        RegisterMask set = ( preferredAndFree ? preferredAndFree : allowedAndFree );
        Register r = nRegisterAllocFromSet(set);
        regs.used |= rmask(r);
        return r;
    }
    // ... more code here ...
}

There’s now only a single ‘if’ lacking an ‘else’, and it turns out even that can be removed by putting the “… more code here …” part into an else branch, and then the ‘return r;’ can be moved to the end of the function.

The code is now comprehensible — it picks a register that is free and allowed, and preferred as well if possible.

Part of the art of programming involves knowing when to avoid using a language feature.  Everybody knows that ‘goto’ is occasionally highly useful, but should be avoided whenever something less dangerous (a for-loop, a while-loop, a ‘break’ statement) can be used.  And everybody knows it because everybody is taught it.  But programmers aren’t taught to avoid unnecessary side-effects… at least, I wasn’t, certainly not via examples such as those I’ve given above.  I’ve only come to realise these things as I’ve switched several times between using C/C++ and some pure functional languages over the years, and come to appreciate how purity helps make programs more readable and less error-prone.

To come full circle, I’d like to see the no-else-after-return guideline removed from the Mozilla style guide, or at the very least, be qualified in a way that allows the cases I’ve argued for.  I’d also love to see a guideline (with examples) added such as “prefer expressions to statements”.

Update:

(The arguments had the wrong names in the initial post, I’ve fixed them now.)

All three versions of max() are pure, because it’s possible to combine impure pieces of code (such as an else-less if) into a pure piece of code.  So I’ll augment my recommendation: pure code is preferable, and furthermore, given that C/C++ makes it so easy to write impure code, it’s preferable that pure code be obviously pure. I argue that the 3rd version of max() with the ?: operator is the most obviously pure, because a ?: is pure if its operands are pure.  In comparison, it’s harder to tell if an if-then-else is pure (although this case is still pretty easy thanks to the matching return statements).  And an if-without-else almost never results in pure code;  the max() example above is a rare exception.

I also omitted a fourth formulation of max:

int max(int a, int b)
{
    int m;
    if (a > b)
        m = a;
    else
        m = b;
    return m;
}

For this very simple case, such a formulation is overkill.  But if the ‘then’ and ‘else’ cases are more complicated it makes more sense.  (Furthermore, much of this discussion should be thought of in more complex cases, as these kinds of small decision can make a bigger difference then.)  This version also doesn’t violate the no-else-after-return rule.  So perhaps that rule isn’t as bad as I thought, if only because it’s easier to work around than I thought… but I still think rewriting the 1st version of max() into the 2nd version is a bad idea.

What I currently hate most about C++

Everyone knows that global variables are bad and should be avoided wherever possible.  Why?  Because each global variable is, in effect, an implicit argument to every function that can see the global variable.  The same thing is true of any non-local state.

And the presence of non-local state means that you can’t reason locally about your code.  That makes your code more complex, and complex code is likely to have more defects.

And the thing I hate about C++ (and other object-oriented languages) is that it vigorously encourages non-local state.

Non-local state within classes

First, of all, C++ encourages (nay, forces) non-local state within classes, because all class methods have access to all fields within a class, even the ones they don’t need to.  In other words, every class field is an implicit argument to every class method.  This can work well for, let’s say, a “Date” class, because the number of fields is small, and most class methods will access most fields.

But problems appear when classes grow larger, when they start to look like what would be a whole module in a non-OO language like C.  For example, Nanojit, the compiler core in TraceMonkey, contains a class called Assembler, which encapsulates the translation of Nanojit’s low-level intermediate representation (called “LIR”) to assembly code.  If you exclude members that are only included when debugging is enabled, there are 18 data fields and 102 methods.  And some of those 18 data fields are pointers to objects that are themselves complex.

Let’s consider a single field, _thisfrag, which holds a fragment of LIR code. It gets set via an argument passed into the method beginAssembly().  It then gets overwritten — but with the same value! — via an argument passed into the method assemble().  It is accessed directly in only 7 of those 103 methods:

  • assemble(): which increments _thisfrag->compileNbr
  • gen(), printActivationState(), asmspilli(): which use _thisfrag->lirbuf->names, but only when verbose output is asked-for
  • assignSavedRegs(), reserveSavedRegs(), assignParamRegs(): where parts of _thisfrag->lirbuf are read

And that’s just one example, which I chose because I’d been thinking about this problem and then just this morning I had to hunt down all those uses of _thisfrag in order to understand its purpose and whether I could change some related code safely.  I’m sure a similar story will hold for a lot of the fields in this class.

Just imagine, if you were writing Assembler as a C module, would you make _thisfrag a (module-level) global variable?  Almost certainly not, you’d pass it only to the functions that need it;  actually you’d probably only pass parts of _thisfrag around.  But C++ encourages you to make everything a class, and stick everything a class ever needs in as a data field, creating lots of non-local state that complicates everything.

(An aside:  Assembler probably also isn’t a very good basis for a class because it’s a *process*.  I figure that if you’d write something as a struct in C, then it makes for a good class in C++.  But I need to think about that some more.)

Non-local state beyond classes

But it gets even worse.  Good C++ practice encourages everyone to create private fields and use public get/set methods to access class data fields from outside the class.  But get/set methods are just lipstick on a pig; all too often you end up with something like this example, again from the Assembler class:

    private:
        AssmError   _err;

    public:
        void        setError(AssmError e) { _err = e; }
        AssmError   error() { return _err; }

Oh great, I feel much safer now.

It would be better to just make _err public and avoid the get/set obfuscation;  at least then it would be obvious how exposed _err is.  It also saves you from having to check the definitions of error() and setError().

Even better, in this case _err gets set from various places within class Assembler, but also from various places outside class Assembler.  I’ve tried twice to simplify this, by passing error codes around explicitly instead of implicitly through this quasi-global variable, but both times I was defeated by the complexity of the control flow governing how _err is accessed, in particular the fact that’s it’s set on some control paths but not others.  This is a big part of the reason why out-of-memory handling in Nanojit is a total nightmare.

The end result

Currently Nanojit has a number of large, complex classes, and many of them link to other large complex classes.  At many points in the code there is a bewildering amount of accessible non-local state.  (And I haven’t even mentioned how this can complicate memory management, if you end up with multiple pointers to objects.)  The complexity caused by this is a tax on development that we are all paying daily.

A better way

Before joining Mozilla, I spent three years programming in a functional language called Mercury.  Mercury entirely lacks global variables (except for some very restricted cases which are rarely used).  This means that you have to pass more data around as arguments than you do in C++.  But it also means that when you look at a function, you know exactly what its inputs and outputs are, and so you can use purely local reasoning to understand what it does.  This is an *enormous* help, and one that’s easy to underestimate if you haven’t experienced it.

Obviously we’re not going to rewrite Firefox in a functional language any time soon.  And of course non-local state is necessary sometimes. But even C is better than C++ in this respect, because at least in C global variables are obvious and everyone knows that you should minimise their use — the language doesn’t actively encourage you to put non-local state everywhere and let you feel good about it.  Information hiding is one of the fundamental principles of programming, and object-oriented programming is meant to promote it, but unless you are very disciplined it tends to do the opposite.

So next time you are thinking about adding a field to a class, ask yourself: is it really necessary?  Could it be passed in as an argument instead, or something else?  Can you make your life easier by avoiding some non-local state?