{"id":499,"date":"2019-10-28T13:11:38","date_gmt":"2019-10-28T17:11:38","guid":{"rendered":"http:\/\/blog.mozilla.org\/nfroyd\/?p=499"},"modified":"2019-10-28T13:11:38","modified_gmt":"2019-10-28T17:11:38","slug":"evaluating-bazel-for-building-firefox-part-1","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nfroyd\/2019\/10\/28\/evaluating-bazel-for-building-firefox-part-1\/","title":{"rendered":"evaluating bazel for building firefox, part 1"},"content":{"rendered":"<p>After the <a href=\"https:\/\/wiki.mozilla.org\/All_Hands\/Whistler2019\">Whistler All-Hands<\/a> this past summer, I started seriously looking at whether Firefox should switch to using <a href=\"https:\/\/bazel.build\/\">Bazel<\/a> for its build system.<\/p>\n<p>The motivation behind switching build systems was twofold.\u00a0 The first motivation was that build times are one of the most visible developer-facing aspects of the build system and everybody appreciates faster builds.\u00a0 What&#8217;s less obvious, but equally important, is that making builds faster improves automation: less time waiting for try builds, more flexibility to adjust infrastructure spending, and less turnaround time with automated reviews on patches submitted for review.\u00a0 The second motivation was that our build system is used by exactly one project (ok, <a href=\"https:\/\/developer.thunderbird.net\/\">two projects<\/a>), so there&#8217;s a lot of onboarding cost both in terms of developers who use the build system and in terms of developers who need to develop the build system.\u00a0 If we could switch to something more off-the-shelf, we could improve the onboarding experience and benefit from work that other parties do with our chosen build system.<\/p>\n<p>You may have several candidates that we should have evaluated instead.\u00a0 We did look at other candidates (although perhaps none so deeply as Bazel), and all of them have various issues that make them unsuitable for a switch.\u00a0 The reasons for rejecting other possibilities fall into two broad categories: not enough platform support (read: Windows support) and unlikely to deliver on making builds faster and\/or improving the onboarding\/development experience.\u00a0 I&#8217;ll cover the projects we looked at in a separate post.<\/p>\n<p>With that in mind, why Bazel?<\/p>\n<p>Bazel advertises itself with the tagline &#8220;{Fast, Correct} &#8211; Choose two&#8221;.\u00a0 What&#8217;s sitting behind that tagline is that when building software via, say, <a href=\"https:\/\/www.gnu.org\/software\/make\/\">Make<\/a>, it&#8217;s very easy to write <code>Makefile<\/code>s in such a way that builds are fast, but occasionally (or not-so-occasionally) fail because somebody forgot to specify &#8220;to build thing X, you need to have built thing Y&#8221;.\u00a0 The build doesn&#8217;t usually fail because thing Y is built before thing X: maybe the scheduling algorithm for parallel execution in <code>make<\/code> chooses to build Y first 99.9% of the time, and 99% of those times, building Y finishes prior to even starting to build X.<\/p>\n<p>The typical solution is to become more conservative in how you build things such that you can be sure that Y is always built before X&#8230;but typically by making the dependency implicit by, say, ordering the build commands Just So, and not by actually making the dependency explicit to <code>make<\/code> itself.\u00a0 Maybe specifying the explicit dependency is rather difficult, or maybe somebody just wants to make things work.\u00a0 After several rounds of these kind of fixes, you wind up with <code>Makefile<\/code>s that are (probably) correct, but probably not as fast as it could be, because you&#8217;ve likely serialized build steps that could have been executed in parallel.\u00a0 And untangling such systems to the point that you can properly parallelize things and that you don&#8217;t regress correctness can be&#8230;challenging.<\/p>\n<p>(I&#8217;ve used <code>make<\/code> in the above example because it&#8217;s a lowest-common denominator piece of software and because having a concrete example makes differentiating between &#8220;the software that runs the build&#8221; and &#8220;the specification of the build&#8221; easier.\u00a0 Saying &#8220;the build system&#8221; can refer to either one and sometimes it&#8217;s not clear from context which is in view.\u00a0 But you should not assume that the problems described above are necessarily specific to <code>make<\/code>; the problems can happen no matter what software you rely on.)<\/p>\n<p>Bazel advertises a way out of the quagmire of probably correct specifications for building your software.\u00a0 It does this&#8212;at least so far as I understand things, and I&#8217;m sure <a href=\"https:\/\/xkcd.com\/386\/\">the Internet will come to correct me<\/a> if I&#8217;m wrong&#8212;by asking you to explicitly specify dependencies up front.\u00a0 Build commands can then be checked for correctness by executing the commands in a &#8220;sandbox&#8221; containing only those files specified as dependencies: if you forgot to specify something that was actually needed, the build will fail because the file(s) in question aren&#8217;t present.<\/p>\n<p>Having a complete picture of the dependency graph enables faster builds in three different ways.\u00a0 The first is that you can maximally parallelize work across the build.\u00a0 The second is that Bazel comes with built-in facilities for <a href=\"https:\/\/docs.bazel.build\/versions\/1.1.0\/remote-execution.html\">farming out build tasks to remote machines<\/a>.\u00a0 Note that all build tasks can be distributed, not just C\/C++\/Rust compilation as via <a href=\"https:\/\/github.com\/mozilla\/sccache\/\">sccache<\/a>.\u00a0 So even if you don&#8217;t have a particularly powerful development machine, you can still pretend that you have a large multi-core system at your disposal.\u00a0 The third is that Bazel also comes with built-in facilities for <a href=\"https:\/\/docs.bazel.build\/versions\/1.1.0\/remote-caching.html\">aggressive caching of build artifacts<\/a>.\u00a0 Again, like remote execution, this caching applies across all build tasks, not just C\/C++\/Rust compilation.\u00a0 In Firefox development terms, this is Firefox artifact builds done &#8220;correctly&#8221;: given appropriate setup, your local build would simply download whatever was appropriate for the changes in your current local tree and rebuild the rest.<\/p>\n<p>Having a complete picture of the dependency graph enables a number of other nifty features.\u00a0 Bazel comes with <a href=\"https:\/\/docs.bazel.build\/versions\/1.1.0\/query-how-to.html\">a query language for the dependency graph<\/a>, enabling you to ask questions like &#8220;what jobs need to run given that these files changed?&#8221;\u00a0 This sort of query would be valuable for determining what jobs to run in automation; we have a half-hearted (and hand-updated) version of this in things like <code>files-changed<\/code> in Taskcluster job specifications.\u00a0 But things like &#8220;run <code>$OS<\/code> tests for <code>$OS<\/code>-only changes&#8221; or &#8220;run just the mochitest chunk that contains the changed mochitest&#8221; become easy.<\/p>\n<p>It&#8217;s worth noting here that we could indeed work towards having the entire build graph available all at once in the current Firefox build system.\u00a0 And we have remote execution and caching abilities via sccache, even moreso now that sccache-dist is being deployed in Mozilla offices.\u00a0 We think we have a reasonable idea of what it would take to work towards Bazel-esque capabilities with our current system; the question at hand is how a switch to Bazel compares to that and whether a switch would be more worthwhile for the health of the Firefox build system over the long term.\u00a0 Future posts are going to explore that question in more detail.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After the Whistler All-Hands this past summer, I started seriously looking at whether Firefox should switch to using Bazel for its build system. The motivation behind switching build systems was twofold.\u00a0 The first motivation was that build times are one of the most visible developer-facing aspects of the build system and everybody appreciates faster builds.\u00a0 [&hellip;]<\/p>\n","protected":false},"author":320,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[72457,153,5],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/499"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/users\/320"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/comments?post=499"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/499\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/media?parent=499"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/categories?post=499"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/tags?post=499"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}