Categories
C Correctness Cplusplus Programming

What I currently hate most about C++

Everyone knows that global variables are bad and should be avoided wherever possible.  Why?  Because each global variable is, in effect, an implicit argument to every function that can see the global variable.  The same thing is true of any non-local state.

And the presence of non-local state means that you can’t reason locally about your code.  That makes your code more complex, and complex code is likely to have more defects.

And the thing I hate about C++ (and other object-oriented languages) is that it vigorously encourages non-local state.

Non-local state within classes

First, of all, C++ encourages (nay, forces) non-local state within classes, because all class methods have access to all fields within a class, even the ones they don’t need to.  In other words, every class field is an implicit argument to every class method.  This can work well for, let’s say, a “Date” class, because the number of fields is small, and most class methods will access most fields.

But problems appear when classes grow larger, when they start to look like what would be a whole module in a non-OO language like C.  For example, Nanojit, the compiler core in TraceMonkey, contains a class called Assembler, which encapsulates the translation of Nanojit’s low-level intermediate representation (called “LIR”) to assembly code.  If you exclude members that are only included when debugging is enabled, there are 18 data fields and 102 methods.  And some of those 18 data fields are pointers to objects that are themselves complex.

Let’s consider a single field, _thisfrag, which holds a fragment of LIR code. It gets set via an argument passed into the method beginAssembly().  It then gets overwritten — but with the same value! — via an argument passed into the method assemble().  It is accessed directly in only 7 of those 103 methods:

  • assemble(): which increments _thisfrag->compileNbr
  • gen(), printActivationState(), asmspilli(): which use _thisfrag->lirbuf->names, but only when verbose output is asked-for
  • assignSavedRegs(), reserveSavedRegs(), assignParamRegs(): where parts of _thisfrag->lirbuf are read

And that’s just one example, which I chose because I’d been thinking about this problem and then just this morning I had to hunt down all those uses of _thisfrag in order to understand its purpose and whether I could change some related code safely.  I’m sure a similar story will hold for a lot of the fields in this class.

Just imagine, if you were writing Assembler as a C module, would you make _thisfrag a (module-level) global variable?  Almost certainly not, you’d pass it only to the functions that need it;  actually you’d probably only pass parts of _thisfrag around.  But C++ encourages you to make everything a class, and stick everything a class ever needs in as a data field, creating lots of non-local state that complicates everything.

(An aside:  Assembler probably also isn’t a very good basis for a class because it’s a *process*.  I figure that if you’d write something as a struct in C, then it makes for a good class in C++.  But I need to think about that some more.)

Non-local state beyond classes

But it gets even worse.  Good C++ practice encourages everyone to create private fields and use public get/set methods to access class data fields from outside the class.  But get/set methods are just lipstick on a pig; all too often you end up with something like this example, again from the Assembler class:

    private:
        AssmError   _err;

    public:
        void        setError(AssmError e) { _err = e; }
        AssmError   error() { return _err; }

Oh great, I feel much safer now.

It would be better to just make _err public and avoid the get/set obfuscation;  at least then it would be obvious how exposed _err is.  It also saves you from having to check the definitions of error() and setError().

Even better, in this case _err gets set from various places within class Assembler, but also from various places outside class Assembler.  I’ve tried twice to simplify this, by passing error codes around explicitly instead of implicitly through this quasi-global variable, but both times I was defeated by the complexity of the control flow governing how _err is accessed, in particular the fact that’s it’s set on some control paths but not others.  This is a big part of the reason why out-of-memory handling in Nanojit is a total nightmare.

The end result

Currently Nanojit has a number of large, complex classes, and many of them link to other large complex classes.  At many points in the code there is a bewildering amount of accessible non-local state.  (And I haven’t even mentioned how this can complicate memory management, if you end up with multiple pointers to objects.)  The complexity caused by this is a tax on development that we are all paying daily.

A better way

Before joining Mozilla, I spent three years programming in a functional language called Mercury.  Mercury entirely lacks global variables (except for some very restricted cases which are rarely used).  This means that you have to pass more data around as arguments than you do in C++.  But it also means that when you look at a function, you know exactly what its inputs and outputs are, and so you can use purely local reasoning to understand what it does.  This is an *enormous* help, and one that’s easy to underestimate if you haven’t experienced it.

Obviously we’re not going to rewrite Firefox in a functional language any time soon.  And of course non-local state is necessary sometimes. But even C is better than C++ in this respect, because at least in C global variables are obvious and everyone knows that you should minimise their use — the language doesn’t actively encourage you to put non-local state everywhere and let you feel good about it.  Information hiding is one of the fundamental principles of programming, and object-oriented programming is meant to promote it, but unless you are very disciplined it tends to do the opposite.

So next time you are thinking about adding a field to a class, ask yourself: is it really necessary?  Could it be passed in as an argument instead, or something else?  Can you make your life easier by avoiding some non-local state?

25 replies on “What I currently hate most about C++”

“C++ encourages you to make everything a class” – so why isn’t __thisfrag a class itself, with things that (are allowed to) happen encapsulated as methods?

And get/set allow you, and more importantly the compiler, know whether you meant to alter a member or not. Ideally, there’d be more than a few const declarations in there, to define the contract more closely. (And yes, a set method that simply sets the member variable may well be less than optimal. But it’s a function, so you can make whatever checks you like to ensure the new value is valid.)

But yes, C++ doesn’t force you to do OO properly. Scott Meyers’ Effective C++ is very good if you want to increase your C++-fu.

Another advantage to get/set methods rather than exposing the member variable directly is that it allows you to, at a later date, change (or add!) the validation enforced on set without changing the interface of the class and hense all the other code which uses it.

Now, you may write perfect code and never need this, but…

Mark, _thisfrag is an object of class Fragment (I didn’t make this clear). It probably would be better to have an assemble() method within class Fragment, and no Assembler class… but when you already have a rat’s nest of classes set up it’s difficult to change.

Hi

Its a bit off topic, but i just wonder why the mozilla hackers did implement their own intermediate representation and jit compiler.

There is the LLVM project (http://www.llvm.org/) which implements a great IR and JIT infrastructure.

Shouldnt it be possible to implement the TraceMonkey engine on top of LLVM?

Wondering
M. Brotz

The great thing about C++ is that it lets you use it how YOU decide to, and not by the how someone else THINKS you should. It’s extremely flexible.

You appear to be more concerned about HOW you put togethor a program, not the final result. Enforcing no global variables results in slower code. Period.

As one in the game industry where speed is of always the utmost importance, it becomes more important how the program runs, not how it was written.

Instance variables are not global state. You do not have implicit 18 arguments in your case: you have a single implicit argument, “this”. And in case of C++ you also have lookup rules that allow you to omit “this->” before variable names. But definitely no global state (static/class variables are global state though).

This is clearly visible in eg. Python where you have explicit “self” argument, but again – this is just a matter of implicit/explicit and lookup rules/syntactic sugar, not global/local state.

You could just write smaller classes. A class with “18 data fields and 102 methods” is quite a monster in any language, and will have many other issues making it difficult to comprehend.

I think one of the main problems with Assembler is how it handles the platform-specific nonsense, which is where the majority of those methods come from.
Using the preprocessor and make for brittle compile-time polymorphism to support multiple instruction sets is probably not right. The asm_ prefix hints that they are all similar methods, and they should probably all be in a separate class from the others (perhaps as static methods, then you could pick platform with a template parameter on Assembler) You’re right, though.. that code is really entangled and it’d be a serious task to sort it out into a nice design at this point.

First step would probably be to group some of those members into classes, and migrate methods there. Then, you could pass those classes around instead of relying on the class as a huge scope of mutable state.

> Good C++ practice encourages everyone to create private fields and use public get/set methods to access class data fields from outside the class.

Nope. Good C++ practice encourages you to make all member variables private. Get/set member functions are usually a bad idea, since that means client code has pretty much direct access to member variables — rendering the “private” specification useless.

When I call std::vector::size(), I do not care whether there’s a member variable that stores the size, or if it’s calculated on the fly, or whatever. I only care about the number of elements “stored” in the vector, not how internal management is done.

If client code needs to know about a class’s member variables, you’re doing it wrong.

(Of course, the exception is the “bunch-of-data”-type classes, where everything is public, and where there usually isn’t any member functions.)

On that point, the one big flaw of C++ is that the private members are visible in the class definition (in the .h file).

> there are 18 data fields and 102 methods

Uh? 18 member variables and 102 member functions in a class? Isn’t that pretty much what’s called “spaghetti code”?

If you put all that in one big class, well, obviously, it doesn’t feel organized. But it’s not C++’s fault, it’s yours. OTOH, the C++ standard committee is at fault on one point: std::string is the perfect example of what not to do. Ugh.

I’m not sure you have any understanding of what Object Oriented programming and design are. I suggest you read some books about those, then you won’t have bloated classes with lots of non-local variables.

You seem to have a fundamental misunderstanding about how C++ is to be used.

You understand C++ code in terms of objects that DO things.
You do not ever look at class member variables.

Change your way of thinking.
As long as you were coding in C, it was ok to think in terms of data structures.
In C++ you are expected to think in terms of operations.

Get that part clear and you wont have a problem.
And please don’t provide get/set methods.
People who provide get/set methods are just displaying their ignorance of the basic OO programming paradigm.

It’s worth noting that I didn’t write the code I was discussing; I started working on an existing code base and have been fighting these problems. Kyle seems to get this.

And I’m amused that some of the commenters are espousing the virtues of get/set methods, and some are saying they are terrible and should be avoided at all costs.

Not really, we’re both saying roughly the same thing.

The other commenter said “accessing member variables is bad. Don’t do it. If you MUST, then provide get/set”.

I’m saying “accessing member variables is BAD. Don’t do it. EVER.”.

🙂

It sounds like this code base is not properly componentized.

C++ has myriad problems (large number of complex language features which interact in complex ways) but I don’t think it can be blamed for what sounds like a design level problem

If you have classes larger than 300-1500 lines of code, then you should spend time analyzing alternate decompositions

OK. You’ve started to work on existing code base. So what do you think about idea to improve current design of the code?
Sorry, but 108 methods in 1 class smells really bad. Are you sure that you can’t move part of them into Assembler components (composition in UML) or subclass Assembler to make it more clear?

This argument is flawed. First of all is this not a critique of C++ but nearly all OO imperative languages, you mentioned it yourself – so change the title: “What i hate most about oo-programming”. This is not a peculiarity of C++ but an industry current practice (using classes and objects like you described them).

As someone already mentioned, C++ doesn’t force you to do anything. You can really program referentially transparent. Do it all with static members and pass all arguments as you like them, no pointers, only copy by value and you are rid of all the stuff. Of course its not the way its meant to be.

By the way one can view classes as solution to the massive argument passing: aggregate common data that is worked on by a set of methods and pass them altogether – just getting rid of passing a million arguments 500 times – which sucks by the way too.

C++ does not force/encourage you to make anything a member, you didn’t seem to mention that. It is simple aggregation, you want a data type composed of this and that and you choose yourself.

I think the problem isn’t with C++ or oo languages but with software engineering how to structure and divide programs into modules/classes.

Also, you argue with non-local state and reasoning about that. But within a large software product you build up layers to reach higher levels and it becomes obvious that the state needs to be nonlocal and local reasoning will not suffice.

Lets take some procedure like: loadMachine() { moveRobotTo( pos ); openGate(); moveIn(); startCalibration(); moveOut(); startProcessing() }

Now tell me how you keep state local with robot and machine modules composed of thousands of ministates of sensors and actuators. Of course there is no total view of the affected data because of the high layer you are working.

“First, of all, C++ encourages (nay, forces) non-local state within classes, because all class methods have access to all fields within a class, even the ones they don’t need to.”

You say that if you were writing C, you’d pass this state as function arguments. Then why don’t you in C++? It’s not only perfectly valid, but actually highly encouraged to put local state into arguments. Class variables are for the class state.
What gives you the idea that C++ forces, or even encourages, non-local state? Just because the possibility of misuse exists, doesn’t mean it’s encouraged. Nothing in the C++ standard says you have to do it this way, and that’s the only authorative source on what C++ does. If you go beyond the language, no book on C++ best practices I’ve ever seen encourages making local state non-local by putting it into classes. (I’ve seen some misguided OOP guides do this, but that’s not a C++ problem.)

“Let’s consider a single field, _thisfrag, which holds a fragment of LIR code. It gets set via an argument passed into the method beginAssembly(). It then gets overwritten — but with the same value! — via an argument passed into the method assemble().”

That sounds like a class design problem to me. But class design is mostly language-independent. The problem would be exactly the same in Java, Python, Smalltalk, or any other object-oriented language you care to name. In fact, it would even be exactly the same in Mercury, OCaml, Haskell or Lisp if you decide, for whatever reason, to bundle up all that state into a single record type (or your language of choice’s equivalent) and only pass that around instead of lots of free variables – suddenly you exposed state to functions that shouldn’t have seen it, and thus you’ve increased coupling and decreased maintainability. But the reason is still insufficient separation of state, not any particular language or any particular design method.

“Good C++ practice encourages everyone to create private fields and use public get/set methods to access class data fields from outside the class.”

That’s just nonsense. No respectable guide on C++ will ever encourage direct access to state (be it in raw member variables or via get/set) where it isn’t necessary. (Unlike Java, where the Beans specification comes very close to doing that.)

“It would be better to just make _err public and avoid the get/set obfuscation; at least then it would be obvious how exposed _err is.”

Also wrong, for reasons mentioned by other commenters.

So in summary, you’re frustrated about the poorly designed classes in Nanojit, but for some reason you chose to formulate your blog post in a way that blames C++ for this. But C++ is just a general-purpose language that supports object-oriented programming. There is no way you can blame it for poor class design.

Hi,

Could you please tell me if I can access the rendering engine(Gecko) of Firefox. My main task is to see the dimensions of a HTML element from a Java/Perl program.

Thanks.

Hmm, sounds like you’re ready for Haskell and Ocaml 😉

Another good language sharing the same space as C++ is D: http://www.digitalmars.com/d/
It’s like a vastly improved, safer C++ but it also lets you get down and dirty if you want to. I would love it if all C++ programmers took a good hard look at D.

One word – “refactoring”.

I haven’t seen the code and how it came to be in what appears to be a real mess. Maybe the class can’t be refactored into a dozen classes because of performance limitations? Maybe the code really has to do so much in one class (and has accumulated so many tweaks and hacks that it’s impossible to de-entangle it without it breaking). But from the sound of it, it just sounds like badly written object oriented code (C with classes, instead of idiomatic C++).

In any case – complaining about state in C++ classes is not really about C++, but about OOP in general. Virtually all OOP languages (from Smalltalk to Java) do it like this.

Coming from a functional world where you don’t have state that is the result of code with side effects (and are just used to passing everything you need as parameters), it’s understandable to blame a particular language, but in this case, it really looks like it’s just bad code (you can write more “functional” code using objects too).

I really recommend reading Martin Fowler’s book on refactoring.

Sebastian, Kim,

You’re right that this problem is OO-specific, not C++ specific. (I mentioned that briefly at the start.)

And yes, it may just be that nanojit needs refactoring, and that C++ is not to blame. But if I’d written “what I currently hate about Nanojit” I wouldn’t have got so many comments 🙂

But I suspect this problem crops up in a lot of OO programs. TraceMonkey has some large awful classes too, and I bet the rest of Firefox does as well. It’s easy to blame bad programmers or bad OO design, but I suspect that OO programs tend in this direction if the programmer’s aren’t very disciplined. And if a language or programming paradigm requires you to be really disciplined to avoid pitfalls, then it arguably deserves some blame when those pitfalls occur.

I am looking at ways to refactor the code, but it’s going to be a long, slow process.

Object orientated programming is hard. It’s major failing is that at first glance it looks easy and straight forward which makes n00bs jump straight in and make a big mess of it.

Functional programming has the great feature of scaring the shit out of n00bs who avoid it and don’t even know where to start. The advantage of this is that the n00bs have to learn before they get a chance to make a mess of it.

Comments are closed.