Mozilla DB News, Fri 26 October: puppetizing backups

Sheeri

This week was a great catch-up week. There is only one conference I am doing for the rest of the year, CodeConnexx, though submissions for conferences in the first quarter of 2013 are happening, too. We have some great candidates we are interviewing for our open Database Administrator position.

The database team has gotten a lot of great stuff done this week (I know, I say that all the time, but it’s true!):

  • Audited and got rid of legacy MyISAM tables in our Addons database.
  • Decommissioned an old MySQL cluster that has not been in use for a while. It is the database that used to back the predecessor to Mozillians.
  • Moved half our backups in one data center to another machine in the same data center, as they were on a netapp that’s having problems with being overloaded. Our Storage Team is talking to NetApp, but for now we alleviated some of the problems by moving the backups to another head. We also opened the process to get hardware allocated so our backups aren’t using NFS.
  • We took the move as an opportunity to puppetize the backup servers. Now all the backup scripts and backup instances are puppetized, with just a few more challenging items remaining: config files and startup scripts for each backup instance.
  • We have enabled a Nagios check for pt-config-diff so that we will be alerted (by e-mail only) when a running configuration on MySQL is different from the configuration file.
  • Fixed automatic slow query log copying from Addons to a machine our web developers use.
  • The entire IT team is working on documenting our Nagios checks – specifically, what to do for each Nagios check so our oncall folks can handle more problems before we have to be called in. We have documented 6 checks so far.
  • Fixed a fascinating problem in which ulimits we put in place were not being read by Percona, when we upgraded from MySQL 5.1 to Percona 5.1. (I have to blog about this, actually, with all the details)
  • We upgraded the kernel for 3 different Addons database servers due to a crashing bug.
  • Finished work on one of our multi-use staging clusters – upgrading and converting to innodb_file_per_table.
  • Reduced the innodb_buffer_pool_size on one of our multi-use staging clusters, so that swapping and its corresponding cpu load would be reduced.
  • Loaded in missing data due to a failed cron job on our Crash Stats cluster.
  • Deleted some spam comments in Bugzilla.
  • Created two new read/write accounts for the development database cluster for Mozillians.
  • Moved our soon-to-be deprecated cacti database off an SSH jump host, which means the jump host no longer has a MySQL installation on it.
  • Ran a query to figure out how many Bugzilla e-mail addresses have + or – in them as a percentage of total emails.

Next week is Halloween! Are you ready?