Today is my last day as a full-timer at Famigo. It's been a great 2+ years, and I'm very grateful for all the things I've learned and all the friends I've made. However, it's time for my next adventure. Next week, we move to Seattle where new, big challenges await. (I'll go into more detail on my new job in a later post.)
Why are we leaving? We've built fabulous products at Famigo, raised VC money, had lots of cool stuff written about us, and the company continues to charge ahead. Not only that, we have great friends in this city and a wonderful house. Why go? Well, all of this startup stuff has been way more complicated than I expected. Allow me to explain.
You Can't Put Life On Hold
I joined Famigo at the ground-level as cofounder and CTO. My plan was to immerse myself in my work and make this company a great success. I knew that would probably take years, but I was ready to make the commitment. In true comedic fashion, we immediately found out that my wife was pregnant. All of a sudden, questions about user retention and referral mechanisms looked trivial compared to questions about diaper rash and tummy time.
My plan to put real life on hold and focus on the startup failed in a million different ways. We had health issues, car issues, house issues, and family issues. Not only was all of that stressful for our entire family on top of the startup rollercoaster, it was also expensive. And, unfortunately, I couldn't pay for any of these expenses with Famigo stock options, even if I offered them with no vesting and a ridiculously low strike price. (Note to American economy: come on, man!)
Even more than the issue of money, there's the issue of time. The idea of doing a startup is something that had appealed to me for a long time. It was my dream. After a couple of years, I began to realize that my dream was causing all other dreams to be deferred. We wanted more kids, my wife wanted to take some time to be a stay at home mom, we wanted to give my son a neat childhood with plenty of adventure. Slowly, I began to realize that pursuing this startup dream to the exclusion of everything else was a little bit selfish.
The lifecycle of a startup wasn't what I was expecting. There's a lot written about huge successes like Instagram, where lots of good things happened. There's also a decent amount written about startup failures, where lots of bad things happen. There's not much written about all the startups in the middle, where you experience some success but Facebook isn't exactly shaking in its boots.
Everything was a surprise; that goes for just about every good and bad thing that happened to us. You can read about startups all day long, but ultimately, that's no preparation for the experience itself. Deals take way longer or way shorter than expected, helpful people appear or disappear as if by magic, and the key human relationships beneath the business can spontaneously combust. Not all of these surprises are bad, but they are a constant.
The tough part about all of the surprises is that it makes it impossible to plan anything. That's tough, because things like families, investors, and employees like to know what might happen next. Do the inherent surprises behind a startup ever end? I'd venture a guess, but then life would find a way to surprise me.
Do It Anyway, But Do It Right
Given what I've written thus far, would I do this again? Would I recommend this experience to others? Absolutely. I have had a blast, I've learned so much, and I have met some of my best friends this way. I am incredibly proud of the entire experience. I also discovered an unexpected benefit: if you're a technical person and you do it right, people notice.
When I say 'do it right', I mean build something fabulous that people know about. Both parts of that are equally important. As an introspective person, I feel weird drawing attention to the work I do, either here on this blog or on Twitter. (You probably would not guess this from the giant ego fest that is my website!) It simply must be done, though. If you don't do it, no one will, and then you run the risk of all your hard work never getting the notice it deserves.
If you do manage to build something fabulous that people know about, you will be inundated with incredible opportunities. Then, when you decide you need to find your own next adventure, you will quickly find something great. That's what I did. Onwards!
I think we can do mobile media better. Let me explain.
When I watch a movie, I'll often have the movie itself on our big TV and then the movie's IMDB page up on my iPad or Kindle Fire. When I watch a baseball game on TV, I'll also follow the game on MLB's website or its At Bat app so I can see the pitch tracker and look at stats (I'm slightly obsessed/infatuated with the pitch tracker). If I watch a live event, I'll often have Twitter up so I can see what my friends and other funny tweeters are saying about it.
This is now a pretty common use case: people watching something on one big device, then diving in deeper into that content on a smaller device. It's kind of weird though, isn't it? Why do we need multiple devices for this? It's inefficient, it's cumbersome, it looks weird (that's according to my wife, I personally think it's a very debonair look). Even more than that, it's hard to truly pay attention to anything when your head keeps swivelling back and forth. There's a completely new media usage pattern here, but we're not taking advantage of it yet.
Here's one way we could approach this: combine streaming and navigation on a mobile device.
I don't mean toggling back and forth between an app that is streaming and a web browser. I want to stream my TV show or album as I normally would, then I'd like to pull up a translucent browser window on top of that where I can navigate wherever I'd like. The user then isn't constantly switching apps and thus switching contexts; they can see and hear everything, while still having the freedom to browse. It's one big, glorious context that the user controls.
Would that experience work as well on a TV? I don't think so, given how hard it is to navigate with a remote control. (Seriously, you could watch the director's cut of Das Boot while I try to search for a YouTube video on my TV.) I'm not sure about a laptop or a desktop, either. The inputs are there, but the use case I described above really feels to me like a living room activity, not an office activity. That's why I think this is a distinctly mobile opportunity.
I am in favor of the navigation being totally free-form. As a user, let me decide where I want to navigate instead of locking me into an IMDB tab and a Twitter tab with a predetermined hashtag. There's a lot of neat content out there to supplement my media; let me go find it! Much of the time, I might not even want the option to navigate. When I do want it, I should be in charge.
I don't think this would be easy to implement. It'd take a lot of playing with window sizes, locations, and aspect ratios to get this right; a maximized browser window on top of a maximized streaming baseball game would be probably be disorienting. I bet we can find some ratios here that make sense, though, depending on the form factors involved.
There are a load of opportunities beyond this, in terms of mobile media and users joining the conversation. The first step is actually finding that conversation, though, and that's easiest done through this one big context.
The first week of a new development job is usually a sludge pit of paperwork, orientation, and environment configuration. Often, it's the worst week you'll have at that job. We recently had two interns join the Famigo development team for the summer, which led to an interesting question: is there a better way to do all that?
As soon as the interns arrived, I set out a goal for them: push code to production on your first day. While you can't avoid the paperwork and orientation part of a new job, at least they'd be contributing from the very beginning. Why is that important?
- In order to push to production, you'll need a development environment set up.
- You'll also need a bit of understanding about the codebase.
- You'll need to understand some of the core concepts behind our process: unit testing, continuous deployment, etc.
- It sets a good precedent. We're a startup here; we're allowed to move fast.
Is it reasonable to expect an intern to handle all of that on their first day? No, not on their own. Rather, each intern paired up with an experienced developer. The catch: the intern did the typing. I think that works pretty well, for a few reasons.
- The new person gets firsthand experience with the environment and dev tools. It's incredibly helpful to actually hit the keys yourself.
- If an error pops up (spoiler alert: it totally will), there's an experienced person right there to help.
- The new person gets a guided tour of the codebase, but they're the ones doing the navigation, so they're more likely to remember what's where.
This process actually worked a little too well. With the experienced person guiding the process and the new person doing the typing, we actually had both interns push quality code before lunch. Unfortunately for them, that meant they then had to dive into paperwork. Oh well, that's employment for you.
I was recently at the DevOpsDays conference, where I got into a conversation about build automation. I mentioned how we practice continuous deployment, so we may deploy to production 20 times a day. The guy replied, "That sounds great for some tiny startup, but what would happen if you had actual users?"
Allow me to respond in 2 parts. First, ouch. Second, continuous deployment is not at odds with a great user experience or high uptime requirements.
Between our website and our API at Famigo, we handle hundreds of thousands of HTTP calls every day. We've practiced continuous deployment for 2 years. You know how many complaints we've had about a cruddy user experience due to frequent deployments? Zero. Why were these deployments essentially transparent to all of our users? That's a requirement for our build process, and so we've focused on that part as much as the actual act of building and deploying.
How Does It Work?
First, let's talk about what our production environment looks like. We have a few different VMs hosting our web app; these are all based off of the same original image. Our load balancer distributes traffic across these instances evenly. Since all of our web and API is based upon Django, we use virtualenv to manage all of our Python dependencies on each instance. Each instance also runs Jenkins, which does the heavy duty work of building and deploying.
All of the important data comes from MongoDB or Redis. I point that out to just to note that, with this backend, we rarely do schema migrations. Big honking ALTER TABLE statements can cause serious downtime; just ask the guy in the Oracle shirt crying into his keyboard right now.
How Do We Build?
We have one instance that's constantly polling our github repo for changes. When a change is found, it pulls down the repo. Our environment dependencies are part of that repo, so we make a call to virtualenv to ensure the environment is up to date. Then we run all of our tests; there are around 900 of these. When that's done, we rsync the files over to our production directories and restart our fcgi process. We then make a call to the next instance's Jenkins remote access API to kick off a build, and the whole process starts again.
The only portion of the build process that involves any downtime is when we rsync and then restart fcgi. Those steps take maybe a second or two. Since we build and deploy one instance at a time, that second of downtime rolls from machine to machine; in other words, we never have one second of downtime for all users on all instances.
One thing to keep in mind here is that our load balancer constantly pings our instances to ensure they're up. (After all, that's the whole point of these load balancer thingies.) If, for whatever reason, our downtime is longer than a few seconds, the load balancer will stop distributing traffic to that instance until it's back up.
As you can see, you have to be a little bit lucky (unlucky, rather) to ever see downtime here. You need to hit one particular instance with a request during its 1 second of downtime while the load balancer is sending traffic there with the load balancer not having realized the instance is down.
Does That Downtime Even Matter?
Please break out your slide rule, as we're going to do some math. Per instance, if we do 20 deployments with 1 second of downtime for each, that's 20 seconds. There are 86400 seconds in a day. 20/86200 is, in purely mathematical terms, teensy weensy. (I don't know how to calculate downtime across all instances because of the load balancer and its outage detection, so I'm just sticking with one instance here.)
Now, if we were processing credit cards or something like that, 20 seconds of downtime per day due to deployments would be unacceptable. (Note: we don't do that.) On the contrary, if your traffic is largely mobile, as ours is, then 20 seconds a day is nothing. In fact, we expect far worse. The reason is that, in the land of mobile, you get in the habit of trying and retrying everything related to the network, because the coverage can be so spotty.
Continuous deployment does not necessarily mean giant swaths of downtime throughout the day. In fact, as you scale up in environment infrastructure, deployment smarts, and hopefully users, you gain tools that can make this downtime negligible. Now, back to my actual users.