{"id":504,"date":"2013-02-15T08:26:39","date_gmt":"2013-02-15T15:26:39","guid":{"rendered":"http:\/\/blog.mozilla.org\/axel\/?p=504"},"modified":"2013-02-15T08:26:39","modified_gmt":"2013-02-15T15:26:39","slug":"risk-management-for-releases-at-scale","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/axel\/2013\/02\/15\/risk-management-for-releases-at-scale\/","title":{"rendered":"Risk management for releases at scale"},"content":{"rendered":"<p>Let me share some recent revelations I had. It all started with the infamous Berlin airport. Not the nice one in Tegel, but the <abbr title=\"Berlin-Brandenburg International\">BBI<\/abbr> desaster. The one we&#8217;ve thought we&#8217;d open last year, and now we don&#8217;t know which year.<\/p>\n<p>Part of the newscoverage here in Germany was all about how they didn&#8217;t do any risk analysis, and are doomed, and how that other project for the Olympics in London did do risk analysis, and got in under budget, ahead of time.<\/p>\n<p>So what&#8217;s good for the Olympics can&#8217;t be bad for Firefox, and I started <a href=\"http:\/\/pike.github.com\/release-scale\/docs\/math.html\" title=\"annoted source code used later\">figuring out the math<\/a> behind our risk to ship Firefox, at a given time, with loads of localizations. <em>How likely is it that we&#8217;ll make it?<\/em><\/p>\n<p>Interestingly enough, the same algorithm can also be applied to a set of features that are scheduled for a particular Firefox release. Locales, features, blockers, product-managers, developers, all the same thing :-). Any bucket of N things trying to make a single deadline have similar risks. And the same cure. So bear with me. I&#8217;ll sprinkle graphs as we go to illustrate. They&#8217;ll <a href=\"http:\/\/pike.github.com\/release-scale\/\">link to a site<\/a> that I&#8217;ve set up to play with the numbers, reproducing the shown graphs.<\/p>\n<p>The setup is like this: Every single item (localization, for exampe) has a risk, and I&#8217;m assuming the same risk across the board. I&#8217;m trying to do that N times, and I&#8217;m interested in how likely I&#8217;ll get all of them. And then I evaluate the impact of different amounts of freeze cycles. If you&#8217;re like me, and don&#8217;t believe any statistics unless they&#8217;re done by throwing dices, check out the <a href=\"http:\/\/pike.github.com\/release-scale\/dices\/\">dices demo<\/a>.<\/p>\n<p>Anyway, let&#8217;s start with 20% risk per locale, no freeze, and up to 100 locales.<\/p>\n<p><a href=\"http:\/\/pike.github.com\/release-scale\/?freezes=0&amp;locales=1,3,20,100&amp;likely=80\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/blog.mozilla.org\/axel\/files\/2013\/02\/Bildschirmfoto-2013-02-13-um-15.20.07.png\" alt=\"80%, no freeze, up to 100\" width=\"796\" height=\"391\" class=\"aligncenter size-full wp-image-505\" srcset=\"https:\/\/blog.mozilla.org\/axel\/files\/2013\/02\/Bildschirmfoto-2013-02-13-um-15.20.07.png 796w, https:\/\/blog.mozilla.org\/axel\/files\/2013\/02\/Bildschirmfoto-2013-02-13-um-15.20.07-300x147.png 300w\" sizes=\"(max-width: 796px) 100vw, 796px\" \/><\/a><\/p>\n<p>Ouch. We&#8217;re crossing 50-50 at 3 items already, and anything at scale is a pretty flat zero-chance. Why&#8217;s that? What we&#8217;re seeing is an exponential decay, the base being 80%, and the power being how often we do that. This is <strong>revelation one<\/strong> I had this week.<\/p>\n<p>How can we help this? If only our teams would fail less often? Feel free to play with the numbers, like setting the successrate from 80% to 90%. Better, but the system at large still doesn&#8217;t scale. To fight an exponential risk, we need a cure that&#8217;s exponential.<\/p>\n<p>Turns out freezes are just that. And that&#8217;d be <strong>revelation two<\/strong> I had this week. Let&#8217;s add some 5 additional frozen development cycles.<\/p>\n<p><a href=\"http:\/\/pike.github.com\/release-scale\/?freezes=5&amp;locales=1,3,20,100&amp;likely=80\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/blog.mozilla.org\/axel\/files\/2013\/02\/Bildschirmfoto-2013-02-13-um-15.55.08.png\" alt=\"80%, 5 freezes, up to 100\" width=\"776\" height=\"374\" class=\"aligncenter size-full wp-image-507\" srcset=\"https:\/\/blog.mozilla.org\/axel\/files\/2013\/02\/Bildschirmfoto-2013-02-13-um-15.55.08.png 776w, https:\/\/blog.mozilla.org\/axel\/files\/2013\/02\/Bildschirmfoto-2013-02-13-um-15.55.08-300x144.png 300w\" sizes=\"(max-width: 776px) 100vw, 776px\" \/><\/a><\/p>\n<p>Oh hai. At small scales, even just one frozen cycle kills risks. Three features without freeze have a 50-50 chance, but with just one freeze cycle we&#8217;re already at 88%, which is better than the risk of each individual feature. At large scales like we&#8217;re having in l10n, 2 freezes control the risk to mostly linear, 3 freezes being pretty solid. If I&#8217;m less confident and go down to 70% per locale, 4 or 5 cycles create a winning strategy. In other words, for a base risk of 20-30%, 4-5 freeze cycles make the problem for a localized release scale.<\/p>\n<p>It&#8217;s actually intuitive that freezes are (kinda) exponentially good. The math is a tad more complicated, but simplified, if your per-item success rate is 70%, you only have to solve your problem for 30% of your items in the next cycle, and for 9% in the second cycle. Thus, you&#8217;re fighting scale with scale. You can see this in action on the <a href=\"http:\/\/pike.github.com\/release-scale\/dices\/\">dices demo<\/a>, which plays through this each time you &#8220;throw&#8221; the dices.<\/p>\n<p>Now onwards to my <strong>third revelation<\/strong> while looking at this data. Features and blockers are just like localizations. Going in to the rapid release cycle with Firefox 5 etc, we&#8217;ve made two rules:<\/p>\n<ul>\n<li>Feature-freeze and string-freeze are on migration day from central to aurora<\/li>\n<li>Features not making the freeze take the next train<\/li>\n<\/ul>\n<p>That worked fine for a while, but since then, mozilla has grown as an organization. We&#8217;ve also built out dependencies inside our organization that make us want particular features in particular releases. That&#8217;s actually a good situation to be in. It&#8217;s good that people care, and it&#8217;s good that we&#8217;re working on things that have organizational context.<\/p>\n<p>But this changed the risks in our release cycle. We started off having a single risk of exponential scale after the migration date (l10n). Today, we have features going in to the cycle, and localizations thereof. At this point, having feature-freeze and string-freeze being the same thing becomes a risk for the release cycle at large. We should think about how to separate the two to mitigate the risk for each effectively, and ship awesome and localized software.<\/p>\n<p>I learned quite a bit looking at our risks, I hope I could share some of that.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Let me share some recent revelations I had. It all started with the infamous Berlin airport. Not the nice one in Tegel, but the BBI desaster. The one we&#8217;ve thought we&#8217;d open last year, and now we don&#8217;t know which year. Part of the newscoverage here in Germany was all about how they didn&#8217;t do [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,5],"tags":[23779,23778],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/posts\/504"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/comments?post=504"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/posts\/504\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/media?parent=504"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/categories?post=504"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/tags?post=504"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}