July 2011 Archives

I like my laptop.  It's great for email, web browsing, spotify, and a ton of other applications.  It sucks for programming, though.

Most of my programming time is spent on our platform and supporting services at Famigo.  It's a nice Django app that relies on a few different datastores, and much of the CPU-bound work occurs in asynchronous tasks fired off by Celery.  It's not a terribly complex environment, and we keep it manageable and portable with virtualenv (Ruby folk, imagine I just referenced rvm).  All our config files are stored in github, as are shell scripts to automate directory creation, etc.

Even with all of that, it still takes hours to setup a new environment for the first time on someone's machine.  It's a damn nightmare just to get everything installed.  I'll spend hours googling cryptic error messages about a mangled dependency several layers deep in the OS.  (The exact error will differ depending on whether it's Ubuntu 10.04, 10.10, or 11.04, of course.  Ditto Windows XP versus Windows 7, and whatever kitty cat they're naming OS X releases after.)

When I finally get everything installed and I run our test suite, I discover a whole new set of baffling dependency issues, perhaps because the database versions are slightly off.  When that works, I'll run some of our asynchronous tasks and discover yet another esoteric problem, related to directory permissions or path issues or Spaghetti Monster knows what.  (It's easy for me to accidentally skip this step, and only discover that things aren't working a week later.  Whoopsie!)

Once the new development environment is up and functioning properly, we still see environment issues, but they're now reversed.  Here, we make changes that look great locally, then we push to staging or production and see an explodes.  After much complicated debugging, it's typically another unpredicted dependency issue, where the dev environment OS comes with v1.1.8.4 of an imaging library while our production instances have v1.1.7.1.  This is way scarier, because we've probably been frustrating actual users.

I've made this sound awful, but there is good news!  You don't have to worry about this.  There's an easy technical solution that, once implemented, allows you to save your brain power for interesting problems, like actual programming or creating a thunderously-jammin' Turntable.fm playlist.

We already have a wonderful environment where everything works: it's called Production.  For us, it's all hosted in the cloud via Amazon EC2 instances.  If you wish to work on our great projects, then I just take a snapshot of Production and spin up a new EC2 instance for you.  This process takes about a minute.  Sure, there's a bit of maintenance: every once in a while, when our production environment changes, we need to update our dev instances with new snapshots.  This takes another minute.  Through cloud hosting, we've turned our development environment into a commodity.

Of course, this doesn't work for everything.  If you do a lot of development without internet access (perhaps you're a hobo riding the rails?), this won't work.  (You could do something quite similar via the aptly-named Vagrant, though!)  It's also not such a great idea for non-web development.  You unlucky folks get to wrangle your own, local, development environments.  Have fun with that!

One powerful tactic at the heart of lean startup is continuous deployment. For many web apps, continuous deployment is a no brainer: by delivering smaller batches, you can please your customers more frequently, you stay in the flow, you can experiment more, and less work languishes in a queue.

Unfortunately, continuous deployment doesn't work for all software. Imagine you're writing the firmware for a missile; there's no room there to experiment with your coordinate system, and push a quick update in the event you accidentally blow up Euro Disney. The same has been true for software that goes through a gatekeeper, like Apple for iOS apps. There, you submit a release to Apple, and wait a week or more as they test. While you wait, you might've deployed your API and website 50 times. It seems arbitrary that you can't do the same thing with your user interface, but then in Apple's store, Apple makes the rules.

The great news for mobile developers is that Android is completely different. When you push a new build of an Android app to the Android Market, it goes live within an hour. While you definitely can deploy and iterate very rapidly in the Android Market, you can't automate these deployments: the Android Market has no API allowing for this. Without that, truly continuous deployment isn't an option for the Android Market.

This does not rule out continuous deployment for Android apps, though. Almost all Android devices have a setting that, once configured, allows them to install non-market apps. It's through this setting that third party app stores like Amazon work.

Here's the interesting part: if this setting is configured properly, you can distribute your own Android application. Since you're distributing the app, you can also automate the building and deployment of your Android app. This can all be done in a simple shell script inside of Jenkins or any other continuous integration framework. You run your tests, you build in release mode using ant, you sign the apk using keytool and jarsigner, then you place your apk at a public URL. Once you do that, bam, you're continuously deploying your mobile app!

We actually tried this at Famigo, and the results were... not good. Okay, they were bad. Well, let me be a little more descriptive: they were straight up cover-your-eyes, hide-the-children, ye-gods-what-have-we-done? bad.

We have a site where we review hundreds of Android apps to determine if they're safe for your family; this content is visible via the web, but you can also access it via an app. Once we detected a visitor to the site was on an Android device, we'd render a little banner saying, "Hey, download our Android app!" We weren't linking to the Android Market there, we were linking to our own, continuously deployed Android app, residing on our web server.

While we were running this experiment, we saw several users follow the link to install the app (don't have the numbers, but it was greater than 20). How many actually installed the app? Zero. The results were bleak enough that, after 2 days, we started redirecting people to the Android Market again. Our conversion rate quickly went way up, effectively an infinite percent increase. Uhh, hooray?

It was a beautiful strategy, but it failed. Why? First, many Android users don't have "Allow install of non-Market applications" checked. In fact, many devices (looking at you, ATT!) actively prevent their users from setting this. (This is also an interesting commentary on the relative popularity of third party app stores.) Even if the user could set that option, they were unwilling to do so for us. That brings us to the second, larger reason this failed: no one else is distributing their own app, so users assumed we were doing something sketchy. After all, if it's just a regular app, why isn't it in the Market instead of some random website?

As more third party app stores emerge, users may no longer be wary of apps distributed outside of the Android Market. Until then, continuous deployments of your Android app are a technical success that will probably lead to a business failure.

Amazon EC2 Lessons Learned

| No TrackBacks
I am so impressed with Amazon's Elastic Compute Cloud service that I just don't see myself worrying over hosting or colocating servers ever again.  It's cheap, it's powerful, and it makes me sound like somebody in an IBM commercial because I get to use the word 'cloud' a lot. Currently, at Famigo, we exclusively use EC2 for all production and API development instances.  

It almost feels like science fiction to be able to spin up a powerful VM in seconds, use it for an hour, and terminate it, spending just pennies in the process.  At the same time, there are enough options and complexity with EC2 that certain facets are complicated at best, and baffling at worst.  I've put together a few lessons learned for the bold folks looking to join me in an EC2 wonderland.

Community AMIs are wonderful. Ubuntu is my Linux distro of choice, but the only  Amazon Machine Images offered for Linux are Redhat Enterprise, Suse Enterprise, and Amazon Linux. (Don't be confused by the Amazon Linux distro; it's a secure version of Redhat optimized for AWS.) Are we Ubuntu fans hopelessly hosed? Nope. There are also a wide variety of community AMIs (over 6000!), including official Ubuntu images.  Just make sure you choose the right image; there's a lot to choose from, with images for alphas, betas, and release candidates.

Watch where your data goes. Upon launching an EC2 instance, you absolutely must check the size of your partitions. A Linux instance will typically launch with a relatively small root partition (/) of around 10GB, and then a gigantic partition mounted from an ephemeral drive.  If you attempt to persist all your data under your root partition, you will run out of space very quickly.  Locate your big directories accordingly; I put rapidly-expanding directories like logs, db data, and my collection of Bieber jams on the giant partition.

EBS is the only way to go.  An EC2 instance can either boot from the local instance store or from an Elastic Block Storage snapshot.  Simply put, I have no idea why you would want to use instance-store.  EBS instances allow for vastly quicker backups and restores, they can be paused and resumed, they don't lose their instance storage on a crash (for the root partition, at least), and they scale much faster, since you can quickly spin up new instances based off a snapshot.

Internal IPs rule.  Each EC2 instance has a public hostname (ec2-127-0-0-1.compute-1.amazonaws.com), a private hostname (ip-127-0-0-1.ec2.internal), and a private IP address. When instances must communicate with one another, use private IP addresses; any alternative requires either a ton of typing or an unhealthy amount of firewall rejiggering. I also find it very useful to define a bunch of my own hostname aliases inside of /etc/hosts for these private IP addresses.

Micro instances are beautiful for experimentation.  I love how cheaply I can test complicated environment changes via EC2.  This has become even more compelling recently, with Amazon announcing their Free Usage tier.  Included in this free tier is 750 usage hours of a micro instance.  Micro instances aren't too powerful, but they work very well for limited-use testing, like architecture spikes or emergency recovery scenarios.  Even if you do exceed the free usage on a micro instance, the actual cost is only a couple of cents per hour.

Termination protection protect you from your own idiocy.  It's easy to get carried away with creating and terminating new EC2 instances, especially if you're as excitable as I am.  On more than one occasion, I've attempted to accidentally terminate the wrong instance.  You can block this from occurring by enabling Termination Protection on all of your important instances.

Multiple Availability Zones protect you from the fickle nature of the Amazon gods.  This is obvious, but important.  If all of your instances are in the same availability zone and there's an outage in that zone, then you're dead to the outside world.  The simple solution here is to launch instances in separate zones before the outage occurs.  Notice that I said 'before the outage occurs'; if you simply try to spin up a new image from a snapshot when you notice an outage, it might not work, especially if the outage affects EBS replication.  That sounds awfully specific, but that's what happened this April for many EC2 users.

With EC2, Amazon has not only given us enough rope to hang ourselves with, but they've also given us enough rope to hang innocent bystanders, pets, and various bits of shrubbery.  Despite that and the sheer amount of complexity you initially face, it's an incredibly powerful tool, particularly if you're on a budget.

About the Author

The Art of Delightful Software is written by Cody Powell. I'm currently Director of Engineering at TUNE here in Seattle. Before that, I worked on Amazon Video. Before that, I was CTO at Famigo, a venture-funded startup that helped families find and manage mobile content.

Twitter: @codypo
Github: codypo
LinkedIn: codypo's profile
Email: firstname + firstname lastname dot com