The Cloud Responsibility Crisis

| No TrackBacks

If you develop and deploy software the way I do, you've fallen in love with cloud hosting. It's just so easy. Need to test something in a safe sandbox? Spin up a new instance. Need to scale? Spin up a new instance, or maybe resize an existing instance. If you're dealing with actual hardware, all of those can be thorny, but with cloud hosting, it's magically simple.

If there's one thing I learned from Spider Man, it's that great power comes with great responsibility. (Secondary lesson: everybody loves a guy in tights.) Cloud hosting delivers on the great power, but it's not helping us with the great responsibility.

Let me explain what I mean by responsibility by pointing to a typical cloud hosting scenario for a growing enterprise. This company has a few developers and a few VMs. From time to time, a developer needs to spin up one or more new instances to help with particularly complex computation; hey, no problem, that's like $1 an hour! This company backs up the important stuff, maybe partial or whole images, and shoves it all onto S3 or Rackspace Files. Not much is monitored or analyzed, but in the short term, this is all just fine.

The problem arises when the company continues to operate this way. Even if it's a small technical team, they'll quickly reach a point where no one knows what all these VMs are for. No one will know what backup belongs to what instance, and if these backups are even operational. The company will see a gigantic bill and a ton of resources, and not know what of this is waste and what is actually powering the business.

Think back to the crufy old IT department at your last job. They were a pain to deal with, because they insisted on process for everything. Want a new server? Justify your case, go through the proper channels, and you'll get it in a couple of weeks. It wasn't fast or efficient, but it was... responsible.

That wasn't the end of their responsible behavior. Once you got your server, those IT dudes monitored it, kept its OS up to date, backed it up, and ensured that the failover procedure actually worked. Again, this is all a lot of work and process and time, but there are some great side effects. You know how much you're spending; you know who's using what; you know that if something goes wrong, you can get back up quickly.

Like a lot of other folks, I left a larger IT organization for a startup, and along the way, I eschewed hosting or colocating my own servers for the cloud. We get a ton more power at a fraction of the cost, and we can scale up and down on the fly. In moving to the cloud though, a lot of us have forgotten something: Amazon or Rackspace provides the tools, but they don't fulfill the responsibilities.

This brings me to two separate points. First, there's a huge opportunity: find out the ways we're using the cloud irresponsibly, and fix it for us. I know one startup looking at this, but there's just so much to do.

Second, there's an opportunity for all of us to learn a lot here. Imagine something like a cloud maturity model, where the people who've had a lot of success here can help lay out a roadmap for the right way to use all of these powerful tools. Given the scenario I described above (which I think is totally common), where do we go next? Anybody have any ideas on how to get started here?

By maturity model, I do NOT mean something huge and bloated, like the Capability Maturity Model. I just picked a catchy name. Something short and sweet, like the Agile Manifesto, would be better.

No TrackBacks

TrackBack URL:

About the Author

The Art of Delightful Software is written by Cody Powell. I'm currently Director of Engineering at TUNE here in Seattle. Before that, I worked on Amazon Video. Before that, I was CTO at Famigo, a venture-funded startup that helped families find and manage mobile content.

Twitter: @codypo
Github: codypo
LinkedIn: codypo's profile
Email: firstname + firstname lastname dot com