{"id":183,"date":"2011-05-17T18:31:59","date_gmt":"2011-05-18T01:31:59","guid":{"rendered":"http:\/\/blog.mozilla.org\/sfink\/?p=183"},"modified":"2021-06-13T14:08:56","modified_gmt":"2021-06-13T21:08:56","slug":"mozilla-central-automated-landing-proposal","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/sfink\/2011\/05\/17\/mozilla-central-automated-landing-proposal\/","title":{"rendered":"mozilla-central automated landing proposal"},"content":{"rendered":"<p>This was originally a post to the monster thread &#8220;Data and commit rules&#8221; on dev-planning, which descended from the even bigger thread &#8220;Proposing a tree rule change for mozilla-central&#8221;. But it&#8217;s really an independent proposal, implementable with or without the changes discussed in those threads. It is most like Ehsan&#8217;s <a href=\"http:\/\/ehsanakhgari.org\/blog\/2011-04-28\/assisted-landing-patches-mozilla-central\" target=\"_blank\" rel=\"noopener\">automated landing proposal<\/a> but takes a somewhat different approach.<\/p>\n<ul>\n<li>Create a mozilla-pending tree. All pushes are queued up here. Each gets  its own build, but no build starts until the preceding push&#8217;s build is  complete and successful (the tests don&#8217;t need to succeed, nor even  start.) Or maybe mostly complete, if we have some slow builds.<\/li>\n<li>Pushers have to watch their own results, though anyone can star on their behalf.<\/li>\n<li>Any failures are sent to the pusher, via firebot on IRC, email,  instant messaging, registered mail, carrier pigeon, trained rat, and  psychic medium (in extreme circumstances.)<\/li>\n<li>When starring, you have to explicitly say whether the result is  known-intermittent, questionable, or other. (Other means the push was  bad.)<\/li>\n<li>When any push &#8220;finishes&#8221; &#8212; all expected results have been seen &#8212; then  it is eligible to proceed. Meaning, if all results are green or starred  known-intermittent, its patches are automatically pushed to  mozilla-central.<\/li>\n<li>Any questionable result is automatically retried once, but no matter  what the outcome of the new job is, all results still have to be starred  as known-intermittent for the push to go to mozilla-central.<\/li>\n<li>Any bad results (build failures or results starred as failing) cause  the push to be automatically backed out and all jobs for later pushes canceled. The push is evicted from the queue, all later pushes are  requeued, and the process restarts at the top.<\/li>\n<li>When all results are in, a completion notification is sent to the pusher with the number of remaining unmarked failures<\/li>\n<\/ul>\n<p><a href=\"http:\/\/blog.mozilla.org\/sfink\/files\/2011\/05\/autoproposal.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-medium wp-image-188\" title=\"Flow diagram\" src=\"http:\/\/blog.mozilla.org\/sfink\/files\/2011\/05\/autoproposal-300x225.png\" alt=\"\" width=\"400\" height=\"300\" srcset=\"https:\/\/blog.mozilla.org\/sfink\/files\/2011\/05\/autoproposal-300x225.png 300w, https:\/\/blog.mozilla.org\/sfink\/files\/2011\/05\/autoproposal.png 800w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><\/a>Silly 20-minute Gimped-up example:<\/p>\n<ol>\n<li>Good1 and Good2 are queued up, followed by a bad push Bad1<\/li>\n<li>The builds trickle in. Good1 and Good2 both have a pair of intermittent oranges.<\/li>\n<li>The pusher, or someone, stars the intermittent oranges and Good1 and Good2 are pushed to mozilla-central<\/li>\n<li>The oranges on Bad1 turn out to be real. They are starred as failures, and the push is rolled back.<\/li>\n<li>All builds for Good3 and Good4 are discarded. (Notice how they have fewer results in the 3rd line?)<\/li>\n<li>Good3 gets an unknown orange. The test is retriggered.<\/li>\n<li>Bad1 gets fixed and pushed back onto the queue.<\/li>\n<li>Good3&#8217;s orange turns out to be intermittent, so it is starred. That is the trigger for landing it on mozilla-central (assuming all jobs are done.)<\/li>\n<\/ol>\n<p>To deal with needs-clobber, you can set that as a flag on a push when  queueing it up. (Possibly on your second try, when you discover that it  needs it.)<\/p>\n<p>mozilla-central doesn&#8217;t actually need to do builds, since it only gets exact tree versions that have already passed through a full cycle.<\/p>\n<p>On a perf regression, you have to queue up a backout through the same mechanism, and your life kinda sucks for a while and you&#8217;ll probably have to be very friendly with the Try server.<\/p>\n<p>Project branch merges go through the same pipeline. I&#8217;d be tempted to allow them to jump the queue.<\/p>\n<p>You would normally pull from mozilla-pending only to queue up landings. For development, you&#8217;d pull mozilla-central.<\/p>\n<p>Alternatively, mozilla-central would pull directly from the relevant  changeset on mozilla-pending, meaning it would get all of the backouts  in its history. But then you could use mozilla-pending directly. (You&#8217;d  be at the mercy of pending failures, which would cause you to rebase on  top of the resulting backouts. But that&#8217;s not substantially different  from the alternative, where you have perf regression-triggered backouts  and other people&#8217;s changes to contend with.) Upon further reflection, I think I like this better than making mozilla-central&#8217;s history artificially clean.<\/p>\n<p>The major danger I see here is that the queue can grow arbitrarily. But  you have a collective incentive for everyone in the queue to scrutinize  the failures up at the front of the queue, so the length should be self-limiting  even if people aren&#8217;t watching their own pushes very well. (Which gets  harder to do in this model, since you never know when your turn will  come up, and you&#8217;re guaranteed to have to wait a <em><strong>whole<\/strong><\/em> build cycle.)<\/p>\n<p>You&#8217;d probably also want a way to step out of the queue when you discover a problem yourself.<\/p>\n<p>Did I just recreate Ehsan&#8217;s long-term proposal? No. For one, this one doesn&#8217;t depend on fixing the intermittent  orange problem first, though it does gain from it. (More good pushes go through without waiting on human intervention.)<\/p>\n<p>But Ehsan&#8217;s proposal is sort of like a separate channel into  mozilla-central, using the try server and automated merges to detect  bit-rotting. This proposal relies on being the only path to  mozilla-central, so there&#8217;s no opportunity for bitrot.<\/p>\n<p>What&#8217;s the justification for this? Well, if you play  fast and loose with assumptions, it&#8217;s the optimal algorithm for landing a collection of unproven changes. If all changes are good, you trivially get almost the best pipelining of tests (the best would be spawning builds immediately). With a bad change, you have to assume that all results after that point are useless, so you have no new information to use to decide between the remaining changes. There are faster algorithms that would try appending pushes in parallel, but they get more complicated and burn way more infrastructural resources. (Having two mozilla-pendings that merge into one mozilla-mergedpending before feeding into mozilla-central might be vaguely reasonable, but that&#8217;s already more than my brain can encompass and would probably make perf regressions suck too hard&#8230;)<\/p>\n<p>Side question: how many non-intermittent failures happen on Windows PGO builds that would not happen on (faster) Windows non-PGO builds?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This was originally a post to the monster thread &#8220;Data and commit rules&#8221; on dev-planning, which descended from the even bigger thread &#8220;Proposing a tree rule change for mozilla-central&#8221;. But it&#8217;s really an independent proposal, implementable with or without the changes discussed in those threads. It is most like Ehsan&#8217;s automated landing proposal but takes [&hellip;]<\/p>\n","protected":false},"author":206,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[228,666,137],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/sfink\/wp-json\/wp\/v2\/posts\/183"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/sfink\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/sfink\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/sfink\/wp-json\/wp\/v2\/users\/206"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/sfink\/wp-json\/wp\/v2\/comments?post=183"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/sfink\/wp-json\/wp\/v2\/posts\/183\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/sfink\/wp-json\/wp\/v2\/media?parent=183"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/sfink\/wp-json\/wp\/v2\/categories?post=183"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/sfink\/wp-json\/wp\/v2\/tags?post=183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}