Mozilla’s self-assessment is pretty good – it asks two questions – what your accomplishments were, and what your challenges were. This is pretty much what most self-assessments boil down to. Corey Shields is an excellent boss, and Matthew Zeier is a great director – specifically because they are tuned into what my needs are. This self-assessment is not for them – they know what my accomplishments are, and what my challenges are.
Rather, the self-assessment is for higher-ups and People (what Mozilla calls HR). The questions of “what are your accomplishments and challenges?”, for the People/HR audience, are really answers to the questions:
Why should we give you that raise your boss wants to give you? (a.k.a. How are you kicking ass and taking names?)
What do you need to do your job better, that we can give you?
Since I do blog posts about what I do on the database team (or at least, I have since mid-February), it was not difficult to pare that down into accomplishments. However, given the audience, my goal for the first question was to be able to explain that, at Mozilla, databases are in everything – public and private faces of Mozilla, from blogs to wikis to our e-mail system to how we submit vacation and sick time.
My goal for the second question was to express the need for more staff.
That all being said, here is the self-assessment I submitted:
Describe each of your key accomplishments for the past year. How have these key accomplishments impacted the Project?
These key accomplishments have led to greater stability and availability of the MySQL databases at Mozilla. There are 24 MySQL database clusters, with each cluster having at least 2 database machines, which serve hundreds of Gb of data for dozens of website properties that everyone in the Mozilla community depends on – from public sites like all the Mozilla blogs, addons.mozilla.org, support.mozilla.org and mozillians.org, to corporate tools like Zimbra e-mail, submitting PTO and all the Mozilla wikis, to group-specific tools like etherpad, pastebin and the systems inventory database.
Coordinated and performed the MySQL moves to the scl3 data center, including ensuring key systems are backed up, ensuring all systems can failover if needed, and making sure documentation is up-to-date. This involved configuring new backup servers. Also was able to help out in the data center, assembling disks and racking hardware.
Started to verify data integrity in our database clusters.
Worked on making our database configurations more consistent by using puppet (configuration control management) for configuration changes. Expanded an existing module to be more comprehensive, to put key variables such as default storage engine, default character set, log expiration, making a server read only, and using a data file for tables (innodb_file_per_table) – having the wrong configuration for these variables has been a pain point, so making these configurations eliminates these pain points.
Added monitoring checks to our alerting systems to prevent problems or catch problems as they occur.
Created new databases and authentications for the DevOps team.
Migrated the buildbot database to a new server, splitting it off from the shared server it used to be on. This means the build tree is more reliable, because fewer services interfere with the proper working of the database.
Started to upgrade out-of-date database software (MySQL to 5.5; or 5.1, on the way to 5.5)
Tested and selected a load balancer replacement for the MySQL databases, which can also scale MySQL in the future as we need it to.
Restored databases when necessary, whether asked for by developers or when building new slaves.
Worked with Shyam and Corey on the GTFTS project, which has resulted in higher productivity and morale.
Fixed critical data integrity issues when they occurred (most visibily with Bugzilla, but it occurred on other systems from time to time).
Started to learn a bit about the Postgres database, which is behind socorro (crash-stats).
Provided a few data reports where required, such as non-vouched non-employee mozillians.
Regularly blogged about database activities, increasing the profile of the database team and its successes (since 17 Feb 2012)
Presented at many conferences as a DBA for Mozilla, sharing expertise with other database administrators and system administrators. This includes showing and promoting B2G on a device at the Percona Live: MySQL Conference & Expo.
Where were your challenge areas this year? What areas of development do you identify as a result of these challenge areas?
The biggest challenge is that the database team is vastly understaffed, which makes it difficult to get to all the tasks that need to be done. While it is wonderful that everyone at Mozilla understands our time is limited, and that they’re not always the highest priority, there is a lot I would like to be able to do, that I just cannot.
Many of our database machines are out-of-date, and I would like to upgrade all the servers to the most recent version of MySQL (which has been out for 18 months at this point). We are 1-2 major revisions out of date on almost all our database clusters.
Currently our developers get no support for writing queries. I’d like to get the day-to-day administration taken care of as much as possible (via configuration control, for example) to be able to support the developers in writing good queries. This will also help eliminate problems caused by suboptimal queries before they occur. For software that is shared with the world, such as bugzilla, this will result in more positive user experiences with our software.
I’d also like to be able to support developers by being able to compile and send reports about slow queries.
I’d like to increase the graphing we do for MySQL, so we can better identify problems before they happen (cacti templates)
Configuration consistency is still a problem, and I’d like to depend more on configuration control for more aspects of MySQL, including the setup and scripts of new backup servers (we have 3 now, so we don’t have new ones very frequently).
We do not regularly verify that the data on the slaves (including backup slaves) matches that of the master. I’d like to get the time to put a system in place to do that on all our database clusters (starting with key clusters first).
There are many alerts that I get that I do not have time to investigate – for example, long-running queries that are killed regularly – that should be looked at (that’s why we get alerts) but are not the highest priority, so often get overlooked until they are on fire. I’d like to be able to have time to make sure there are no alerts (from cron to infra-dbnotices) – either pare down alerts that are noise, or fix alerts that actually need fixing.
There are many legacy databases and users on several systems that I would like to archive and remove. Unused users are a security concern and unused databases take up disk space and make backups take longer and use more disk space.
I’d like to create more checklists for database work, to spread knowledge among the DBA team and the systems team, in case I’m not available.
I think I have met my goals of informing People/HR how important my job is, and that I am doing it well, and that we need more staff. It helps me, when doing these types of assessments, to figure out what it is I want to say to my audience, to get what I need to do a better job, and to explain the importance of what I do.