Mozilla Network Outage Report – 08/09/2009, 1:30pm PDT – 9:30pm PDT

mrz

2

At around 1:30pm PDT Sunday afternoon Mozilla’s primary data center in San Jose went offline.

CoreSite, Mozilla’s San Jose data center provider, indicated that they had suffered a CRAC unit failure on the 16th floor. This unit provided the majority of the cooling for Mozilla’s cage. Without it, the ambient air temperature quickly rose to about 120° F.

To prevent thermal damage, the servers automatically shut themselves down.

Mozilla IT was onsite by 2:50pm PDT. By 7pm CoreSite had brought the ambient air temperature under 90° F. Mozilla IT had the majority of the infrastructure back online by 8:30pm PDT and declared everything “all clear” by 9:30pm PDT.

We apologize for any inconvenience this may have caused. We are working with CoreSite to better understand the points of failure and how they will work to prevent a re-occurrence.

2 responses

  1. Daniel Einspanjer wrote on ::

    My father-in-law had a few words of sage advice from way back in the halcyon years of big hardware. He said that in their machine room, they ran three CRAC units on a rotation of one week on, two weeks off. That way even if one unit failed, the other two could alternate usage while the other was repaired and there was very little chance of an outage that could cause a massive failure like this.
    Seems like maybe this would be a useful requirement to look for in your future needs since this is the second major HVAC failure in less than a year. Is CoreSite the same company as CRG?

  2. Pingback from Add-on Statistics Issues « Mozilla Add-ons Blog on ::

    [...] a result of the outage at Mozilla’s primary data center, add-on statistics available to developers have not been [...]