{"id":130,"date":"2009-06-19T15:37:54","date_gmt":"2009-06-19T04:37:54","guid":{"rendered":"http:\/\/blog.mozilla.org\/nnethercote\/?p=130"},"modified":"2009-06-19T15:37:54","modified_gmt":"2009-06-19T04:37:54","slug":"what-i-currently-hate-most-about-c","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nnethercote\/2009\/06\/19\/what-i-currently-hate-most-about-c\/","title":{"rendered":"What I currently hate most about C++"},"content":{"rendered":"<p>Everyone knows that global variables are bad and should be avoided wherever possible.\u00a0 Why?\u00a0 Because each global variable is, in effect, an implicit argument to every function that can see the global variable.\u00a0 The same thing is true of any non-local state.<\/p>\n<p>And the presence of non-local state means that you can&#8217;t reason locally about your code.\u00a0 That makes your code more complex, and complex code is likely to have more defects.<\/p>\n<p>And the thing I hate about C++ (and other object-oriented languages) is that it vigorously encourages non-local state.<\/p>\n<h3>Non-local state within classes<\/h3>\n<p>First, of all, C++ encourages (nay, forces) non-local state within classes, because all class methods have access to all fields within a class, even the ones they don&#8217;t need to.\u00a0 In other words, every class field is an implicit argument to every class method.\u00a0 This can work well for, let&#8217;s say, a &#8220;Date&#8221; class, because the number of fields is small, and most class methods will access most fields.<\/p>\n<p>But problems appear when classes grow larger, when they start to look like what would be a whole module in a non-OO language like C.\u00a0 For example, Nanojit, the compiler core in TraceMonkey, contains a class called Assembler, which encapsulates the translation of Nanojit&#8217;s low-level intermediate representation (called &#8220;LIR&#8221;) to assembly code.\u00a0 If you exclude members that are only included when debugging is enabled, there are 18 data fields and 102 methods.\u00a0 And some of those 18 data fields are pointers to objects that are themselves complex.<\/p>\n<p>Let&#8217;s consider a single field, _thisfrag, which holds a fragment of LIR code. It gets set via an argument passed into the method beginAssembly().\u00a0 It then gets overwritten &#8212; but with the same value! &#8212; via an argument passed into the method assemble().\u00a0 It is accessed directly in only 7 of those 103 methods:<\/p>\n<ul>\n<li>assemble(): which increments _thisfrag-&gt;compileNbr<\/li>\n<li>gen(), printActivationState(), asmspilli(): which use _thisfrag-&gt;lirbuf-&gt;names, but only when verbose output is asked-for<\/li>\n<li>assignSavedRegs(), reserveSavedRegs(), assignParamRegs(): where parts of _thisfrag-&gt;lirbuf are read<\/li>\n<\/ul>\n<p>And that&#8217;s just one example, which I chose because I&#8217;d been thinking about this problem and then just this morning I had to hunt down all those uses of _thisfrag in order to understand its purpose and whether I could change some related code safely.\u00a0 I&#8217;m sure a similar story will hold for a lot of the fields in this class.<\/p>\n<p>Just imagine, if you were writing Assembler as a C module, would you make _thisfrag a (module-level) global variable?\u00a0 Almost certainly not, you&#8217;d pass it only to the functions that need it;\u00a0 actually you&#8217;d probably only pass parts of _thisfrag around.\u00a0 But C++ encourages you to make everything a class, and stick everything a class ever needs in as a data field, creating lots of non-local state that complicates everything.<\/p>\n<p>(An aside:\u00a0 Assembler probably also isn&#8217;t a very good basis for a class because it&#8217;s a *process*.\u00a0 I figure that if you&#8217;d write something as a struct in C, then it makes for a good class in C++.\u00a0 But I need to think about that some more.)<\/p>\n<h3>Non-local state beyond classes<\/h3>\n<p>But it gets even worse.\u00a0 Good C++ practice encourages everyone to create private fields and use public get\/set methods to access class data fields from outside the class.\u00a0 But get\/set methods are just lipstick on a pig; all too often you end up with something like this example, again from the Assembler class:<\/p>\n<pre>\u00a0\u00a0\u00a0 private:\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AssmError\u00a0\u00a0 _err;\r\n\r\n\u00a0\u00a0\u00a0 public:\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 void\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 setError(AssmError e) { _err = e; }\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AssmError\u00a0\u00a0 error() { return _err; }<\/pre>\n<p>Oh great, I feel much safer now.<\/p>\n<p>It would be better to just make _err public and avoid the get\/set obfuscation;\u00a0 at least then it would be obvious how exposed _err is.\u00a0 It also saves you from having to check the definitions of error() and setError().<\/p>\n<p>Even better, in this case _err gets set from various places within class Assembler, but also from various places outside class Assembler.\u00a0 I&#8217;ve tried twice to simplify this, by passing error codes around explicitly instead of implicitly through this quasi-global variable, but both times I was defeated by the complexity of the control flow governing how _err is accessed, in particular the fact that&#8217;s it&#8217;s set on some control paths but not others.\u00a0 This is a big part of the reason why out-of-memory handling in Nanojit is a total nightmare.<\/p>\n<h3>The end result<\/h3>\n<p>Currently Nanojit has a number of large, complex classes, and many of them link to other large complex classes.\u00a0 At many points in the code there is a bewildering amount of accessible non-local state.\u00a0 (And I haven&#8217;t even mentioned how this can complicate memory management, if you end up with multiple pointers to objects.)\u00a0 The complexity caused by this is a tax on development that we are all paying daily.<\/p>\n<h3>A better way<\/h3>\n<p>Before joining Mozilla, I spent three years programming in a functional language called <a href=\"http:\/\/www.cs.mu.oz.au\/research\/mercury\/\">Mercury<\/a>.\u00a0 Mercury entirely lacks global variables (except for some very restricted cases which are rarely used).\u00a0 This means that you have to pass more data around as arguments than you do in C++.\u00a0 But it also means that when you look at a function, you know exactly what its inputs and outputs are, and so you can use purely local reasoning to understand what it does.\u00a0 This is an *enormous* help, and one that&#8217;s easy to underestimate if you haven&#8217;t experienced it.<\/p>\n<p>Obviously we&#8217;re not going to rewrite Firefox in a functional language any time soon.\u00a0 And of course non-local state is necessary sometimes. But even C is better than C++ in this respect, because at least in C global variables are obvious and everyone knows that you should minimise their use &#8212; the language doesn&#8217;t actively encourage you to put non-local state everywhere and let you feel good about it.\u00a0 Information hiding is one of the fundamental principles of programming, and object-oriented programming is meant to promote it, but unless you are very disciplined it tends to do the opposite.<\/p>\n<p>So next time you are thinking about adding a field to a class, ask yourself: is it really necessary?\u00a0 Could it be passed in as an argument instead, or something else?\u00a0 Can you make your life easier by avoiding some non-local state?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Everyone knows that global variables are bad and should be avoided wherever possible.\u00a0 Why?\u00a0 Because each global variable is, in effect, an implicit argument to every function that can see the global variable.\u00a0 The same thing is true of any non-local state. And the presence of non-local state means that you can&#8217;t reason locally about [&hellip;]<\/p>\n","protected":false},"author":139,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[540,528,616,617],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/130"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/users\/139"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/comments?post=130"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/130\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/media?parent=130"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/categories?post=130"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/tags?post=130"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}