Categories
Correctness Cplusplus Firefox

Limits of reliability

Julian Seward asked me an interesting question a while ago:  “what are the factors that limit Firefox’s reliability?”  (You can use “crash rate” as a reasonable definition of “reliability”.)

He suggested two things:

  1. Firefox depends on external code, such as plug-ins.
  2. Many crashes are hard to reproduce and so don’t get fixed.

For the first, Electrolysis (a.k.a. process separation) is on track to pretty much make it a non-problem.  It’s already in place for Flash, and will eventually be for other plug-ins.  So that’s good.

For the second, I see two main sub-factors.

  1. Firefox is implemented in C++ which is prone to memory-related bugs and data races, both of which can make crash reproduction difficult.  Using a safer language like Rust would make many (all?) of these bugs impossible.  Unfortunately, Rust isn’t production-ready, and rewriting even parts of the browser is a huge undertaking.  So we better get started ASAP 🙂
  2. Second, Firefox has some nasty low-level code like the garbage collector;  bugs in it be very difficult to reproduce.  I don’t see an obvious way to improve this other than the usual:  testing, code review, using simple algorithms, etc.
Categories
Cplusplus Firefox

The dangers of -fno-exceptions

When Firefox is built with GCC, the -fno-exceptions option is used, which means that exception-handling is disabled.  I’ve been told that this is because the performance of code that uses exceptions is unacceptable.

Sounds simple, until you realize that libraries such as libstdc++.so are not built with this option.  This means, for example, that the vanilla operator new will throw an exception if it fails, because it’s in libstdc++.so, but Firefox code cannot catch the exception, because -fno-exceptions is specified.  (If you write a try-block, GCC will give you an error.)

This has important consequences:  if you compile your application with -fno-exceptions, you cannot use any standard library functions that might throw exceptionsSpiderMonkey’s C++ coding standard is succinct, perhaps overly so: “No exceptions, so std is hard to use.”

Another fine example of the “so you think you’ll be able to use a subset of C++, eh?” fallacy.  See bug 624878 for a specific manifestation of this problem.  I wonder if there are others case like that in Firefox.

Categories
C Correctness Cplusplus Programming

“No-else-after-return” considered harmful

(Update below)

At the time of writing this, the Mozilla Coding Style guidelines have this recommendation under “General C/C++ Practices”:

Don’t put an else right after a return. Delete the else, it’s unnecessary and increases indentation level.

I can appreciate this in some circumstances, such as when checking for an error case early in a long function:

void f()
{
    if (errorcase())
        return;

    a();
    b();
    c();
    d();
}

But as a blanket guideline I think it’s misguided because it promotes an impure style of programming.  And by “impure” here I mean the opposite of “pure” in this sense, i.e. “free from side-effects and other such error-prone junk”.  Let’s consider a simple example:  the max() function.  Here’s a way to write it that is quite sensible, in my opinion:

int max(int a, int b)
{
    if (a > b)
        return a;
    else
        return b;
}

But this violates the no-else-after-return guideline.  The Mozilla way to write this is like so:

int max(int a, int b)
{
    if (a > b)
        return a;
    return b;
}

(I’m not exaggerating, see this comment and this comment from Mozilla’s VP of Engineering and CTO respectively, about an example very similar to this.)

Urk.  That’s horrible.  Any time you’re using an ‘if’ without a corresponding ‘else’, you’re relying on an impure feature — you must be doing something with a side-effect in the ‘then’ branch, or some kind of impure control flow.  Mozilla’s coding guidelines insist that we write max(), a pure function, in an impure way.  Also, it’s just harder to read because the two return statements have different indentation levels despite the fact that they are closely related.

In this case there’s a way around the problem:

int max(int a, int b)
{
    return (a > b ? a : b);
}

Even though C’s if-then-else expression syntax is awful, it’s a better way to do it.

In general, statements suck, expressions rule.  Even in a language like C/C++ you can do something approaching functional programming by avoiding statements where possible.  In a tiny example like max() the difference may seem trivial, but in bigger examples the difference becomes more important.  Consider the following code from Nanojit’s register allocator which picks a register by looking at the free registers, the registers that are allowed to be used, and possibly the preferred registers.  You don’t have to understand it closely, but it helps to know that RegisterMask is just a bitfield with one bit per register.

Register Assembler::registerAlloc(RegisterMask allow)
{
    RegAlloc &regs = _allocator;
    RegisterMask prefer = SavedRegs & allow;
    RegisterMask free = regs.free & allow;

    RegisterMask set = prefer;
    if (set == 0) set = allow;

    if (free)
    {
        // at least one is free
        set &= free;

        // ok we have at least 1 free register so let's try to pick
        // the best one given the profile of the instruction
        if (!set)                                                             
        {
            // desired register class is not free so pick first of any class
            set = free;
        }
        NanoAssert((set & allow) != 0);
        Register r = nRegisterAllocFromSet(set);
        regs.used |= rmask(r);
        return r;
    }

    // ... more code here ...
}

I find this code incredibly hard to understand.  There are three ‘ifs’ that lack ‘elses’, so it relies heavily on state updates to do its work.  In fact, in order to understand it, I ended up rewriting it in a more pure style, by using a sequence of small refactorings:

Register Assembler::registerAlloc(RegisterMask allow)
{
    RegAlloc &regs = _allocator;
    RegisterMask allowedAndFree = allow & regs.free;

    if (allowedAndFree)
    {
        // At least one usable register is free -- no need to steal.  
        // Pick a preferred one if possible.
        RegisterMask preferredAndFree = allowedAndFree & SavedRegs;
        RegisterMask set = ( preferredAndFree ? preferredAndFree : allowedAndFree );
        Register r = nRegisterAllocFromSet(set);
        regs.used |= rmask(r);
        return r;
    }
    // ... more code here ...
}

There’s now only a single ‘if’ lacking an ‘else’, and it turns out even that can be removed by putting the “… more code here …” part into an else branch, and then the ‘return r;’ can be moved to the end of the function.

The code is now comprehensible — it picks a register that is free and allowed, and preferred as well if possible.

Part of the art of programming involves knowing when to avoid using a language feature.  Everybody knows that ‘goto’ is occasionally highly useful, but should be avoided whenever something less dangerous (a for-loop, a while-loop, a ‘break’ statement) can be used.  And everybody knows it because everybody is taught it.  But programmers aren’t taught to avoid unnecessary side-effects… at least, I wasn’t, certainly not via examples such as those I’ve given above.  I’ve only come to realise these things as I’ve switched several times between using C/C++ and some pure functional languages over the years, and come to appreciate how purity helps make programs more readable and less error-prone.

To come full circle, I’d like to see the no-else-after-return guideline removed from the Mozilla style guide, or at the very least, be qualified in a way that allows the cases I’ve argued for.  I’d also love to see a guideline (with examples) added such as “prefer expressions to statements”.

Update:

(The arguments had the wrong names in the initial post, I’ve fixed them now.)

All three versions of max() are pure, because it’s possible to combine impure pieces of code (such as an else-less if) into a pure piece of code.  So I’ll augment my recommendation: pure code is preferable, and furthermore, given that C/C++ makes it so easy to write impure code, it’s preferable that pure code be obviously pure. I argue that the 3rd version of max() with the ?: operator is the most obviously pure, because a ?: is pure if its operands are pure.  In comparison, it’s harder to tell if an if-then-else is pure (although this case is still pretty easy thanks to the matching return statements).  And an if-without-else almost never results in pure code;  the max() example above is a rare exception.

I also omitted a fourth formulation of max:

int max(int a, int b)
{
    int m;
    if (a > b)
        m = a;
    else
        m = b;
    return m;
}

For this very simple case, such a formulation is overkill.  But if the ‘then’ and ‘else’ cases are more complicated it makes more sense.  (Furthermore, much of this discussion should be thought of in more complex cases, as these kinds of small decision can make a bigger difference then.)  This version also doesn’t violate the no-else-after-return rule.  So perhaps that rule isn’t as bad as I thought, if only because it’s easier to work around than I thought… but I still think rewriting the 1st version of max() into the 2nd version is a bad idea.

Categories
C Correctness Cplusplus Programming

What I currently hate most about C++

Everyone knows that global variables are bad and should be avoided wherever possible.  Why?  Because each global variable is, in effect, an implicit argument to every function that can see the global variable.  The same thing is true of any non-local state.

And the presence of non-local state means that you can’t reason locally about your code.  That makes your code more complex, and complex code is likely to have more defects.

And the thing I hate about C++ (and other object-oriented languages) is that it vigorously encourages non-local state.

Non-local state within classes

First, of all, C++ encourages (nay, forces) non-local state within classes, because all class methods have access to all fields within a class, even the ones they don’t need to.  In other words, every class field is an implicit argument to every class method.  This can work well for, let’s say, a “Date” class, because the number of fields is small, and most class methods will access most fields.

But problems appear when classes grow larger, when they start to look like what would be a whole module in a non-OO language like C.  For example, Nanojit, the compiler core in TraceMonkey, contains a class called Assembler, which encapsulates the translation of Nanojit’s low-level intermediate representation (called “LIR”) to assembly code.  If you exclude members that are only included when debugging is enabled, there are 18 data fields and 102 methods.  And some of those 18 data fields are pointers to objects that are themselves complex.

Let’s consider a single field, _thisfrag, which holds a fragment of LIR code. It gets set via an argument passed into the method beginAssembly().  It then gets overwritten — but with the same value! — via an argument passed into the method assemble().  It is accessed directly in only 7 of those 103 methods:

  • assemble(): which increments _thisfrag->compileNbr
  • gen(), printActivationState(), asmspilli(): which use _thisfrag->lirbuf->names, but only when verbose output is asked-for
  • assignSavedRegs(), reserveSavedRegs(), assignParamRegs(): where parts of _thisfrag->lirbuf are read

And that’s just one example, which I chose because I’d been thinking about this problem and then just this morning I had to hunt down all those uses of _thisfrag in order to understand its purpose and whether I could change some related code safely.  I’m sure a similar story will hold for a lot of the fields in this class.

Just imagine, if you were writing Assembler as a C module, would you make _thisfrag a (module-level) global variable?  Almost certainly not, you’d pass it only to the functions that need it;  actually you’d probably only pass parts of _thisfrag around.  But C++ encourages you to make everything a class, and stick everything a class ever needs in as a data field, creating lots of non-local state that complicates everything.

(An aside:  Assembler probably also isn’t a very good basis for a class because it’s a *process*.  I figure that if you’d write something as a struct in C, then it makes for a good class in C++.  But I need to think about that some more.)

Non-local state beyond classes

But it gets even worse.  Good C++ practice encourages everyone to create private fields and use public get/set methods to access class data fields from outside the class.  But get/set methods are just lipstick on a pig; all too often you end up with something like this example, again from the Assembler class:

    private:
        AssmError   _err;

    public:
        void        setError(AssmError e) { _err = e; }
        AssmError   error() { return _err; }

Oh great, I feel much safer now.

It would be better to just make _err public and avoid the get/set obfuscation;  at least then it would be obvious how exposed _err is.  It also saves you from having to check the definitions of error() and setError().

Even better, in this case _err gets set from various places within class Assembler, but also from various places outside class Assembler.  I’ve tried twice to simplify this, by passing error codes around explicitly instead of implicitly through this quasi-global variable, but both times I was defeated by the complexity of the control flow governing how _err is accessed, in particular the fact that’s it’s set on some control paths but not others.  This is a big part of the reason why out-of-memory handling in Nanojit is a total nightmare.

The end result

Currently Nanojit has a number of large, complex classes, and many of them link to other large complex classes.  At many points in the code there is a bewildering amount of accessible non-local state.  (And I haven’t even mentioned how this can complicate memory management, if you end up with multiple pointers to objects.)  The complexity caused by this is a tax on development that we are all paying daily.

A better way

Before joining Mozilla, I spent three years programming in a functional language called Mercury.  Mercury entirely lacks global variables (except for some very restricted cases which are rarely used).  This means that you have to pass more data around as arguments than you do in C++.  But it also means that when you look at a function, you know exactly what its inputs and outputs are, and so you can use purely local reasoning to understand what it does.  This is an *enormous* help, and one that’s easy to underestimate if you haven’t experienced it.

Obviously we’re not going to rewrite Firefox in a functional language any time soon.  And of course non-local state is necessary sometimes. But even C is better than C++ in this respect, because at least in C global variables are obvious and everyone knows that you should minimise their use — the language doesn’t actively encourage you to put non-local state everywhere and let you feel good about it.  Information hiding is one of the fundamental principles of programming, and object-oriented programming is meant to promote it, but unless you are very disciplined it tends to do the opposite.

So next time you are thinking about adding a field to a class, ask yourself: is it really necessary?  Could it be passed in as an argument instead, or something else?  Can you make your life easier by avoiding some non-local state?