What happened? Last week (9 May 2012 23:24 PDT until the following morning at 07:20), BrowserID had an outage affecting 50% of login requests. Embarrassingly, we didn’t find out until a user filed bug 753728 with us. Root Cause To put it bluntly, human error. Specifically, the accidental draining of all nodes of a load [...]
Before ‘The Sync Platform Meltdown, Explained‘ was posted, I did a “brown bag” presentation at Mozilla HQ explaining what happened. Here is the recorded stream, if you’d prefer this instead of reading the long post about it: -jv
Also filed in
|
|
TL;DR: Recently, the Firefox Sync platform suffered serious performance degradation. We’d like to explain what happened, as well as the steps taken to deal with the issue from the perspective of Mozilla Services Operations (“Ops”). The Sync platform melted down after the release of Firefox 7… …to put it bluntly. This was the worst outage [...]
TL;DR: Paste your most recent about:sync-log into a mozservices.pastebin.mozilla.org and send a link to @mozservices for us to check out. Sometimes when there’s a problem with Firefox Sync, Mozilla Services Operations (@mozservices) needs some info to figure out the root cause We strive to make sure Sync is an always-available service, but as with any [...]
Summary Since the Python Sync deployment, we’ve been receiving user reports of issues with clearing Sync data in Account Portal. After a couple weeks of investigation, we determined the root cause this morning (bug 670975) and will be pushing a hotfix at 3pm PDT today (bug 670993). The Problem While reviewing logs a few days [...]