Mozilla Outage Report – mozilla.org DNSSEC – 09/16/2010

For several hours this morning, mozilla.org failed DNS resolution for sites that required DNSSEC validation.

This appears to only have affected early DNSSEC adopters and not the larger widespread Internet.

Background:
As part of Thursday (September 16th) night’s scheduled maintenance, we had planned to upgrade Mozilla’s nameservers and enable DNSSEC for mozilla.org.

Despite weeks of internal testing, we botched the deployment. The deployment steps should have been:

1. Roll out signed zones
2. Update DS records

Unfortunately we did the reverse and had the timing incorrect. Mozilla’s registrar pushed out our DS records before our scheduled roll out of the signed zone for mozilla.org which forced us to do an earlier deployment.

We estimate that this outage lasted about 4 hours. By about 2:45am Pacific time, we had completed upgrading Mozilla’s nameservers and enabling DNSSEC.

Since DNSSEC isn’t widely deployed and even less widely adopted by clients, we suspect the outage was only noticed by a small percentage of users. This is, of course, still unacceptable and was entirely avoidable (and should serve as a deployment gotcha for future generations hoping to deploy DNSSEC).

We apologize for any inconvenience this may have caused.

A little technical background for those interested:

.org‘s root servers knew of mozilla.org‘s DS but could not find mozilla.org‘s DNSKEY (since we’d not pushed out the signed zones). This caused validating resolvers to fail.