Emergency Mozilla Downtime – 11/12/2008, 6pm – 11pm PST (0200 – 0700 11/13/2008 UTC)

mrz

(Apologies for the short notice.  We generally want to stick to our normal advertised maintenance windows but sometimes technology works against us.)

We will have an emergency maintenance window tonight from 6pm to 11:00pm PST.

The following work will take place:

  • 6:00pm PST (0200 UTC) ) EqualLogic storage array firmware upgrade.  See bug 464458 for details.  We’ll be locking the tree to all check-ins at 5:00pm PST tonight to start the maintenance window and will unlock it once the machines have come back up and gone green for a cycle.

Last night one of our network attached storage devices experienced an error and didn’t fail over properly.  The effect was such that we had to close the tree as Tinderbox wasn’t getting proper data for updates, and we couldn’t tell what was happening with our builds. While we were able to restore that functionality, there is a high chance that any of our VMs attached to this storage device may experience further problems.

To fix this, we’ll need to pause all of our VM systems (most notably build machines & unit test machines, but also litmus and some others) for a few hours. The affected systems would be:

fxdbug-win32-tbox
fx-win32-1.9-slave2
moz2-linux-slave1
patrocles
fx-linux-1.9-slave09
try-master
try-unit-linux-01
xr-linux-tbox
egg
fxdbug-linux-tbox
fx-linux-1.9-slave2
moz2-linux64-slave01
l10n-linux-tbox
prometheus-vm
tb-linux-tbox
try1-win32-slave
try-unit-linux-02
bm-l10n-win2k3-01
dm-bugs-test-app01
dm-graphs-stage01
dm-litmus02
dm-mailman01
fx-win32-1.9-slave08
fx-win32-1.9-slave09
moz2-linux-slave04
moz2-win32-slave04
pm-app01
sand
sm-summit01
moz2-linux-slave09
moz2-linux-slave10
moz2-linux-slave11
moz2-linux-slave12
moz2-win32-slave09
moz2-win32-slave10
moz2-win32-slave11
moz2-win32-slave12
moz2-win32-slave13
test-linslave
test-mgmt
moz2-linux-experimental1
moz2-linux-slave01
moz2-linux-slave13
moz2-linux-slave14
moz2-linux-slave15
moz2-linux-slave16
moz2-win32-slave14
moz2-win32-slave15
moz2-win32-slave16
moz2-win32-slave17
moz2-win32-slave18
try-unit-win32-01
test-winslave
try-linux-slave03
try-win32-slave03
bm-buildgraph01
bm-symbolfetch01
dm-ausstage01
dm-chat01
geodns01.sj
dm-graphs01
dm-webtools01
im-bes01
mrapp-intranet01
mrapp-stage01
pm-ns03
production-1.9-master
qm-evtest01
tm-amo01-webdev01

After discussing this with the sheriff, product leads and release engineering team, we’ve decided it would be best to do this as soon as possible. We’ll be locking the tree to all checkins at 5:00pm PST tonight to start the maintenance window, and we’ll unlock it once the machines have come back up and gone green for a cycle. Vlad will try to shepherd the remaining patches for beta 2 blockers that are in the checkin queue before 5pm, but at that time we’ll stop all work and pick it up tomorrow morning.

Let us know if there are any objections as soon as possible.