Wednesday was the first Firefox release since AMO 3.0 (Remora) launched in late March. It’s expected that traffic to Mozilla websites will increase following a release, but it’s usually in the range of 1.5 times normal traffic. Thursday, traffic to our San Jose facility tripled normal traffic, breaking 600 Mb/s. (The historical graph below is averaged down and doesn’t show that high, but Justin will be giving more details from IT’s point of view soon.)
When Firefox is updated, a separate update check for each add-on installed is performed, causing AMO to get quite a bit of traffic. Thursday, however, much of the traffic AMO was seeing was not from the update check – it was from real people searching, downloading, and browsing the site.
We had over 2000 user accounts created Thursday alone, over three times a normal day. 1 out of every 17 people that saw the What’s New page after updating Firefox clicked on the “Firefox Add-ons” link. Both sessions and pageviews on addons.mozilla.org tripled on Thursday – not including update and blocklist pings. The number of add-on downloads more than tripled from 2 days before.
While all of this increased activity is great news for us, it wasn’t so great for our app cluster. We had a number of issues throughout the day ranging from memcache hitting connection limits and refusing connections, database server issues, and app servers dying in a domino effect. A huge thanks to IT for keeping the issue under control the whole day, especially mrz who was on call and did not get to sleep.
We were able to make a number of changes Thursday and quickly push them to production, such as directing search to the shadow/read-only database server, adding to the list of areas of the site we can disable if necessary, and adding more memcache servers. We’ll be evaluating what we can do to prepare for this if it happens again.