Here at Mozilla, we have grown to thousands of servers in a short period of time (and even more individual instances when you count virtual machines). Like most other organizations, we have to rely on tools that help the sysadmins keep their sanity at such scale. We have picked Puppet as our tool of choice, and are still in the process of migrating older systems into our centralized management while making sure that future servers are built out “in puppet” first before they reach production. For those of you unfamiliar with Puppet and similar management tools (like cfengine or chef), the idea is simple: you define the way a server should be configured and Puppet will make sure that it is always setup that way. For instance, in not so simple terms you tell it that apache should be installed with certain vhost configurations, and needs to be restarted after those configurations are put into place. Puppet makes that happen. Multiply that action by hundreds of webservers and Puppet shows its value.
Last week, most of the IT team spent a few days locked in the Holodeck conference room for training offered up by Puppet Labs. Between IT and Release Engineering we filled the room with 25 people.
So what now? We still have the daunting task of integrating our existing infrastructure and old servers into puppet. This kind of work is easy to put on the backburner with the other projects and bugs at hand, but in the future this work pays off. Some day those servers will need to be retired, outgrow the current hardware capacity, or need to be moved to a new data center. Being able to define a server state with puppet makes all of these tasks a lot easier. More work up front pays off in the long run.
The training brought unfamiliar admins up to speed with Puppet and gave us all new ideas for refactoring some of our current modules and manifests. We’ll be working hard throughout the coming months to make those changes. The end result is simple, everything in our infrastructure will be managed by Puppet. Email, webservers, collaboration tools, development resources. If it is on our network it needs to be centrally managed. We are making great progress toward this goal and it will lead to great payoffs in the future of Mozilla’s IT infrastructure.
That’s all for this week. Next week: we resolve a bug!