Last week, Baron wrote a great post entitled “What’s the benefit of the cloud?” The post was short and made the point that “the benefit of the cloud” is “less ops, more dev.” But Baron is coming from the point of a developer, and from the point of an ops person, there is not necessarily “less ops”.
Some commenters made points along the lines of, “you can just rent rack space in some datacenter for that.” And I agree. There are some ops benefits that Amazon adds, such as easier monitoring and backups, but for the most part, there is not *less* work from an operations standpoint when you are in a cloud environment – my time doing remote DBA work at Pythian and PalominoDB certainly taught me that!
There are still operating systems to install, maintain and upgrade. There are still compatibility issues and having to upgrade and maintain software and configurations. There are software-as-a-service (SaaS) technologies like Amazon’s RDS (which provides MySQL as a service) and the benefit there is not having to worry about configuration or upgrades. There are Amazon machine images (AMIs) that folks share, so that the operating system initial installation requires little knowledge.
The cloud is really useful if you need a machine up and running very quickly. I totally understand that developers want to use the cloud instead of waiting for a machine to arrive, and even in IT it is useful to have another machine for a while. For example, if you wanted to test MySQL 5.6 but do not have a spare machine, you can spin up an instance of a machine in the cloud.
From a production or staging perspective though, there is still a LOT of work to be done to architect a system. The ops benefit of the cloud is NOT “less ops”. The ops benefit of the cloud is actually thanks to how Amazon built its cloud – it was built as a cloud computing platform. The Amazon Cloud was built to provide extra CPU cycles (“elastic cloud computing”). In the days before persistent data stores with “elastic block systems” (EBS), many developers and system administrators lost time when an instance would reboot and all their work was gone – not just the development work, but their setup work – the operating system users, any software packages they had installed, etc. Running any important system in the cloud, these days, means having some kind of installation and configuration management in place, so that if an instance reboots or if another instance is needed, the rebooted/new instance can be brought back to a working state as quickly as possible.
Some folks get their environment set up how they want it, and take a snapshot that can be used as an Amazon machine instance (AMI). This works great, until you need to update any software or make any changes in configuration, whether it’s operating system configuration (like adding a user), or software configuration (like a my.cnf file).
The benefit of the cloud from an ops side is that it forces us to do what we should be doing anyway – running installation and configuration management. At Mozilla, we have a plan to move some services to “the cloud”, but we are already using kickstart for installation management and Puppet for configuration management, so we are already set with those benefits.