Everyone knows that global variables are bad and should be avoided wherever possible. Why? Because each global variable is, in effect, an implicit argument to every function that can see the global variable. The same thing is true of any non-local state.
And the presence of non-local state means that you can’t reason locally about your code. That makes your code more complex, and complex code is likely to have more defects.
And the thing I hate about C++ (and other object-oriented languages) is that it vigorously encourages non-local state.
Non-local state within classes
First, of all, C++ encourages (nay, forces) non-local state within classes, because all class methods have access to all fields within a class, even the ones they don’t need to. In other words, every class field is an implicit argument to every class method. This can work well for, let’s say, a “Date” class, because the number of fields is small, and most class methods will access most fields.
But problems appear when classes grow larger, when they start to look like what would be a whole module in a non-OO language like C. For example, Nanojit, the compiler core in TraceMonkey, contains a class called Assembler, which encapsulates the translation of Nanojit’s low-level intermediate representation (called “LIR”) to assembly code. If you exclude members that are only included when debugging is enabled, there are 18 data fields and 102 methods. And some of those 18 data fields are pointers to objects that are themselves complex.
Let’s consider a single field, _thisfrag, which holds a fragment of LIR code. It gets set via an argument passed into the method beginAssembly(). It then gets overwritten — but with the same value! — via an argument passed into the method assemble(). It is accessed directly in only 7 of those 103 methods:
- assemble(): which increments _thisfrag->compileNbr
- gen(), printActivationState(), asmspilli(): which use _thisfrag->lirbuf->names, but only when verbose output is asked-for
- assignSavedRegs(), reserveSavedRegs(), assignParamRegs(): where parts of _thisfrag->lirbuf are read
And that’s just one example, which I chose because I’d been thinking about this problem and then just this morning I had to hunt down all those uses of _thisfrag in order to understand its purpose and whether I could change some related code safely. I’m sure a similar story will hold for a lot of the fields in this class.
Just imagine, if you were writing Assembler as a C module, would you make _thisfrag a (module-level) global variable? Almost certainly not, you’d pass it only to the functions that need it; actually you’d probably only pass parts of _thisfrag around. But C++ encourages you to make everything a class, and stick everything a class ever needs in as a data field, creating lots of non-local state that complicates everything.
(An aside: Assembler probably also isn’t a very good basis for a class because it’s a *process*. I figure that if you’d write something as a struct in C, then it makes for a good class in C++. But I need to think about that some more.)
Non-local state beyond classes
But it gets even worse. Good C++ practice encourages everyone to create private fields and use public get/set methods to access class data fields from outside the class. But get/set methods are just lipstick on a pig; all too often you end up with something like this example, again from the Assembler class:
private:
AssmError _err;
public:
void setError(AssmError e) { _err = e; }
AssmError error() { return _err; }
Oh great, I feel much safer now.
It would be better to just make _err public and avoid the get/set obfuscation; at least then it would be obvious how exposed _err is. It also saves you from having to check the definitions of error() and setError().
Even better, in this case _err gets set from various places within class Assembler, but also from various places outside class Assembler. I’ve tried twice to simplify this, by passing error codes around explicitly instead of implicitly through this quasi-global variable, but both times I was defeated by the complexity of the control flow governing how _err is accessed, in particular the fact that’s it’s set on some control paths but not others. This is a big part of the reason why out-of-memory handling in Nanojit is a total nightmare.
The end result
Currently Nanojit has a number of large, complex classes, and many of them link to other large complex classes. At many points in the code there is a bewildering amount of accessible non-local state. (And I haven’t even mentioned how this can complicate memory management, if you end up with multiple pointers to objects.) The complexity caused by this is a tax on development that we are all paying daily.
A better way
Before joining Mozilla, I spent three years programming in a functional language called Mercury. Mercury entirely lacks global variables (except for some very restricted cases which are rarely used). This means that you have to pass more data around as arguments than you do in C++. But it also means that when you look at a function, you know exactly what its inputs and outputs are, and so you can use purely local reasoning to understand what it does. This is an *enormous* help, and one that’s easy to underestimate if you haven’t experienced it.
Obviously we’re not going to rewrite Firefox in a functional language any time soon. And of course non-local state is necessary sometimes. But even C is better than C++ in this respect, because at least in C global variables are obvious and everyone knows that you should minimise their use — the language doesn’t actively encourage you to put non-local state everywhere and let you feel good about it. Information hiding is one of the fundamental principles of programming, and object-oriented programming is meant to promote it, but unless you are very disciplined it tends to do the opposite.
So next time you are thinking about adding a field to a class, ask yourself: is it really necessary? Could it be passed in as an argument instead, or something else? Can you make your life easier by avoiding some non-local state?