<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>The Art Of Delightful Software</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/" />
    <link rel="self" type="application/atom+xml" href="http://www.codypowell.com/taods/atom.xml" />
    <id>tag:www.codypowell.com,2009-11-22:/taods//11</id>
    <updated>2013-03-18T05:29:27Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.32-en</generator>

<entry>
    <title>It&apos;s Not Refactoring, It&apos;s Untangling</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2013/03/its-not-refactoring-its-untangling.html" />
    <id>tag:www.codypowell.com,2013:/taods//11.1224</id>

    <published>2013-03-18T05:26:29Z</published>
    <updated>2013-03-18T05:29:27Z</updated>

    <summary><![CDATA[Recently, I was catching up with a former colleague. &nbsp;He mentioned a service that I wrote years ago, and how it has since become known as the Career Killer. &nbsp;Basically everyone who touched the Career Killer ended up leaving the...]]></summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<div>Recently, I was catching up with a former colleague. &nbsp;He mentioned a service that I wrote years ago, and how it has since become known as the Career Killer. &nbsp;Basically everyone who touched the Career Killer ended up leaving the company. &nbsp;If the company wanted to have &gt; 0 developers, the only solution at this point was to take a few months and refactor this service completely.</div><div><br /></div><div>I have two things to say about this. &nbsp;First, that code was at 85% unit test coverage when I left so don't go blaming me. &nbsp;Second, this huge refactoring? &nbsp;It's not going to work.</div><div><br /></div><div>Every codebase has at least one component that is widely hated and feared. &nbsp;It does too much, it has too many states, too many other entities call it. &nbsp;When it comes time to pay down technical debt, you should definitely focus on this component. &nbsp;However, if you have an incomplete understanding of this component and you stop everything to completely rewrite it, your odds of success are low. &nbsp;That component, as scary and complex as it appears, is actually way more scary and complex than you think.&nbsp;</div><div><br /></div><div>How do you think that component got into this unfortunate shape? &nbsp;Is it because the company hired a nincompoop and let him run wild in the codebase for years? &nbsp;Or is it because the component was originally a sound abstraction, but its scope of responsibilities had grown over the years due to changing requirements? &nbsp;(For the sake of my ego, I'm hoping the Career Killer is the latter.) &nbsp;In all likelihood, this component arrived at its current, scary state via smart people with good intentions. &nbsp;You know what you are right now? &nbsp;Smart people with good intentions. &nbsp;If you proceed with a big refactor, you'll trade one form of technical debt for another.</div><div><br /></div><div>In order to truly pay this debt down, you need to untangle the complexity around the problem. &nbsp;You need to spend time looking at all the clients calling this component. &nbsp;You need to spend time talking with your colleagues, learning more about the component's history and how it's used. &nbsp;You need to make a few simplifying changes around the periphery of the component and see what works. &nbsp;Each week, you spend a little more time and untangle the problem just a little bit more. &nbsp;Given a long enough timeframe, you'll eventually untangle all of the complexity and brought a teeny bit of order to the universe.</div><div><br /></div><div>Practically speaking, what do you do here? &nbsp;Rather than 3 full months on a complete refactor, spend 25% of your time over the next year. &nbsp;It's the same time commitment either way, but with the 25% plan, you get time to analyze and plan. &nbsp;You get time to untangle.</div> ]]>
        
    </content>
</entry>

<entry>
    <title>Ship It!</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2013/03/ship-it.html" />
    <id>tag:www.codypowell.com,2013:/taods//11.1223</id>

    <published>2013-03-11T00:11:17Z</published>
    <updated>2013-03-11T00:13:49Z</updated>

    <summary><![CDATA[Several years ago, I had a job that, at the time, seemed like heaven. &nbsp;We were a new team building a new product. &nbsp;We were using new technology: C# 2.0 (yes, people were once excited about major releases of C#)....]]></summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<div>Several years ago, I had a job that, at the time, seemed like heaven. &nbsp;We were a new team building a new product. &nbsp;We were using new technology: C# 2.0 (yes, people were once excited about major releases of C#). &nbsp;We were using new techniques, like scrum and test-driven development. &nbsp;It was greenfield development in every possible sense, except for the one where our desks would actually be situated in a green field.</div><div><br /></div><div>I lived in this environment for a few years. &nbsp;I learned a lot about software development, technical leadership, and how to build big systems. &nbsp;Ultimately though, I think I wasted those years. &nbsp;Why? &nbsp;We never shipped.</div><div><br /></div><div>Something magical happens when you ship software: your decisions suddenly have consequences. &nbsp;You suddenly must consider trade-offs. &nbsp;Hopefully, people suddenly care. &nbsp;If they don't, you suddenly must correct that.</div><div><br /></div><div>What's the big deal about decisions and consequences? &nbsp;Any fool with a text editor can write code, but only an amazing few can code and make good choices around trade-offs. &nbsp;That's the most valuable skill a developer can possess: the ability to make hard decisions. &nbsp;(That's actually a great way to make career choices: opt for the place that'll let you make harder decisions.) &nbsp;Like riding a bike or juggling chainsaws, the only way to get good at making hard decisions is by doing it a lot. &nbsp;Each time you make one of these decisions, gather data and iterate accordingly.</div><div><br /></div><div>When I was working on that project that never shipped, I felt like I was making hard decisions. &nbsp;We had big meetings and loud arguments about things that seemed important at the time. &nbsp;You can bet your sweet bippy that we came to conclusions on all sorts of things. &nbsp;However, since we never shipped, we never got any data about any of the choices we made around things like architecture, code coverage, implementation decisions, featureset, and user interface. &nbsp;Without that data, we had no way of knowing if we had chosen correctly. &nbsp;Did we get better at making hard decisions? &nbsp;Without users, how could you tell?</div><div><br /></div><div>There was an easy solution to the problem I faced at that job: ship the dang thing. &nbsp;That decision wasn't up to me, so I should've done the next best thing: join a different team, one that shipped a ton of code. &nbsp;Even if the codebase is worse and the product is less interesting, find a role where you ship; it's the only way you get better.</div> ]]>
        
    </content>
</entry>

<entry>
    <title>Software Karma</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2013/01/software-karma.html" />
    <id>tag:www.codypowell.com,2013:/taods//11.1222</id>

    <published>2013-01-12T21:38:35Z</published>
    <updated>2013-01-12T21:42:16Z</updated>

    <summary><![CDATA[I make a lot of jokes at work about code review karma. &nbsp;Here's the idea: each time a person volunteers to review others' code, that person build their code review karma. &nbsp;Then, when it comes time for that person's own...]]></summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="career" label="career" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="process" label="process" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<div>I make a lot of jokes at work about code review karma. &nbsp;Here's the idea: each time a person volunteers to review others' code, that person build their code review karma. &nbsp;Then, when it comes time for that person's own code to be reviewed, the reviews go smoothly due to the store of code review karma.</div><div><br /></div><div>Build karma works the same way. &nbsp;When someone jumps in to help fix the build, they're accruing build karma. &nbsp;Thanks to build karma, the builds that kick off from that person's own commits are far more likely to be successful.</div><div><br /></div><div>If you have a lot of time on your hands, you can actually see software karma all over the place: QA karma by helping out your testers, refactoring karma by cleaning up the scariest bits of the code base, planning karma by leading sprint planning, etc. &nbsp;The more you help, the more the software gods reward you in the future.</div><div><br /></div><div>I see two explanations behind software karma. &nbsp;The first is that there actually are software gods who sit upon Mount Codelympus and keep a running tally of all these helpful acts. &nbsp;They probably keep this tally in Emacs, since that was clearly not designed for mortals. &nbsp;If there is any truth to this explanation, then let's all quickly sacrifice a 'PHP for Dummies' book to appease them.</div><div><br /></div><div>The other explanation is less exciting, but slightly more practical. &nbsp;By reviewing others' code, you build a mental model around what great code looks like. &nbsp;When you go to write your own code, you apply your model and reap the rewards immediately. &nbsp;By fixing broken builds, you build a mental model around the build process and how it can go wrong; your own builds are far less likely to go wrong thanks to that model. &nbsp;Same thing goes for helping QA, refactoring, sprint planning, and so forth.</div><div><br /></div><div>I'm not taking sides on which explanation is correct, since I don't want to get struck by a lightning bolt or incite a plague. &nbsp;All I do know is this: if you want to get better as a developer, boost your karma.</div> ]]>
        
    </content>
</entry>

<entry>
    <title> The Start of a New Adventure</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/07/the-start-of-a-new-adventure.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1220</id>

    <published>2012-07-20T16:28:04Z</published>
    <updated>2012-07-20T16:40:16Z</updated>

    <summary>Today is my last day as a full-timer at Famigo. It&apos;s been a great 2+ years, and I&apos;m very grateful for all the things I&apos;ve learned and all the friends I&apos;ve made. However, it&apos;s time for my next adventure. Next...</summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="career" label="career" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="startup" label="startup" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<p>Today is my last day as a full-timer at <a href="http://www.famigo.com">Famigo</a>.  It's been a great 2+ years, and I'm very grateful for all the things I've learned and all the friends I've made. However, it's time for my next adventure.  Next week, we move to Seattle where new, big challenges await.  (I'll go into more detail on my new job in a later post.)</p>

<p>Why are we leaving?  We've built fabulous products at Famigo, raised VC money, had lots of cool stuff written about us, and the company continues to charge ahead.  Not only that, we have great friends in this city and a wonderful house.  Why go?  Well, all of this startup stuff has been way more complicated than I expected.  Allow me to explain.</p>

<p><strong>You Can't Put Life On Hold</strong><br/>
I joined Famigo at the ground-level as cofounder and CTO.  My plan was to immerse myself in my work and make this company a great success.  I knew that would probably take years, but I was ready to make the commitment.  In true comedic fashion, we <em>immediately</em> found out that my wife was pregnant.  All of a sudden, questions about user retention and referral mechanisms looked trivial compared to questions about diaper rash and tummy time.</p>

<p>My plan to put real life on hold and focus on the startup failed in a million different ways.  We had health issues, car issues, house issues, and family issues.  Not only was all of that stressful for our entire family on top of the startup rollercoaster, it was also expensive.  And, unfortunately, I couldn't pay for any of these expenses with Famigo stock options, even if I offered them with no vesting and a ridiculously low strike price.  (Note to American economy: come on, man!)</p>

<p>Even more than the issue of money, there's the issue of time.  The idea of doing a startup is something that had appealed to me for a long time.  It was my dream.  After a couple of years, I began to realize that my dream was causing all other dreams to be deferred.  We wanted more kids, my wife wanted to take some time to be a stay at home mom, we wanted to give my son a neat childhood with plenty of adventure.  Slowly, I began to realize that pursuing this startup dream to the exclusion of everything else was a little bit selfish. </p>

<p><strong>while(true) surprise();</strong><br/>
The lifecycle of a startup wasn't what I was expecting.  There's a lot written about huge successes like Instagram, where lots of good things happened.  There's also a decent amount written about startup failures, where lots of bad things happen.  There's not much written about all the startups in the middle, where you experience some success but Facebook isn't exactly shaking in its boots.</p>

<p>Everything was a surprise; that goes for just about every good and bad thing that happened to us.  You can read about startups all day long, but ultimately, that's no preparation for the experience itself.  Deals take way longer or way shorter than expected, helpful people appear or disappear as if by magic, and the key human relationships beneath the business can spontaneously combust.  Not all of these surprises are bad, but they are a constant.</p>

<p>The tough part about all of the surprises is that it makes it impossible to plan anything.  That's tough, because things like families, investors, and employees like to know what might happen next.  Do the inherent surprises behind a startup ever end?  I'd venture a guess, but then life would find a way to surprise me.</p>

<p><strong>Do It Anyway, But Do It Right</strong><br/>
Given what I've written thus far, would I do this again?  Would I recommend this experience to others?  Absolutely.  I have had a blast, I've learned so much, and I have met some of my best friends this way.  I am incredibly proud of the entire experience.  I also discovered an unexpected benefit: if you're a technical person and you do it right, people notice.</p>

<p>When I say 'do it right', I mean build something fabulous that people know about.  Both parts of that are equally important.  As an introspective person, I feel weird drawing attention to the work I do, either here on this blog or on <a href="http://twitter.com/codypo">Twitter</a>.  (You probably would not guess this from the giant ego fest that is my website!)  It simply must be done, though.  If you don't do it, no one will, and then you run the risk of all your hard work never getting the notice it deserves.</p>

<p>If you <em>do</em> manage to build something fabulous that people know about, you will be inundated with incredible opportunities.  Then, when you decide you need to find your own next adventure, you will quickly find something great.  That's what I did.  Onwards!</p>
]]>
        

    </content>
</entry>

<entry>
    <title>One Big, Glorious Context: How to Improve Mobile Media</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/07/one-big-glorious-context-improving-mobile-media.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1219</id>

    <published>2012-07-17T19:03:15Z</published>
    <updated>2012-07-17T20:17:21Z</updated>

    <summary>I think we can do mobile media better. Let me explain. When I watch a movie, I&apos;ll often have the movie itself on our big TV and then the movie&apos;s IMDB page up on my iPad or Kindle Fire. When...</summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="mobiledevelopment" label="mobiledevelopment" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mobileux" label="mobileux" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<p>I think we can do mobile media better.  Let me explain.</p>

<p>When I watch a movie, I'll often have the movie itself on our big TV and then the movie's IMDB page up on my iPad or Kindle Fire.  When I watch a baseball game on TV, I'll also follow the game on MLB's website or its At Bat app so I can see the pitch tracker and look at stats (I'm slightly obsessed/infatuated with <a href="http://www.sportsgeekery.com/wp-content/uploads/2012/02/mlbabatipad8.jpg">the pitch tracker</a>).  If I watch a live event, I'll often have Twitter up so I can see what my friends and other funny tweeters are saying about it.</p>

<p>This is now a pretty common use case: people watching something on one big device, then diving in deeper into that content on a smaller device.  It's kind of weird though, isn't it?  Why do we need multiple devices for this?  It's inefficient, it's cumbersome, it looks weird (that's according to my wife, I personally think it's a very debonair look).  Even more than that, it's hard to truly pay attention to anything when your head keeps swivelling back and forth.  There's a completely new media usage pattern here, but we're not taking advantage of it yet.</p>

<p>Here's one way we could approach this: <strong>combine streaming and navigation on a mobile device</strong>.</p>

<p>I don't mean toggling back and forth between an app that is streaming and a web browser.  I want to stream my TV show or album as I normally would, then I'd like to pull up a translucent browser window on top of that where I can navigate wherever I'd like.  The user then isn't constantly switching apps and thus switching contexts; they can see and hear everything, while still having the freedom to browse.  It's one big, glorious context that the user controls.</p>

<p>Would that experience work as well on a TV?  I don't think so, given how hard it is to navigate with a remote control.  (Seriously, you could watch the director's cut of Das Boot while I try to search for a YouTube video on my TV.)  I'm not sure about a laptop or a desktop, either.  The inputs are there, but the use case I described above really feels to me like a living room activity, not an office activity.  That's why I think this is a distinctly mobile opportunity.</p>

<p>I am in favor of the navigation being totally free-form.  As a user, let me decide where I want to navigate instead of locking me into an IMDB tab and a Twitter tab with a predetermined hashtag.  There's a lot of neat content out there to supplement my media; let me go find it!  Much of the time, I might not even want the option to navigate.  When I do want it, I should be in charge.</p>

<p>I don't think this would be easy to implement.  It'd take a lot of playing with window sizes, locations, and aspect ratios to get this right; a maximized browser window on top of a maximized streaming baseball game would be probably be disorienting.  I bet we can find some ratios here that make sense, though, depending on the form factors involved.</p>

<p>There are a load of opportunities beyond this, in terms of mobile media and users joining the conversation.  The first step is actually finding that conversation, though, and that's easiest done through this one big context.</p>

<p>(Many thanks to <a href="http://http://c-lo.net/">Carlo Longino</a> for chatting through a lot of this with me at <a href="http://www.unclebillys.com/bs-landing/">Uncle Billy's</a> the other night.  Also, many thanks to the brewmasters at Uncle Billy's; y'all do fine work.)</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Can&apos;t Fix It?  Go Home.</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/06/solve-hard-problems-by-going-home.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1218</id>

    <published>2012-06-27T21:29:02Z</published>
    <updated>2012-06-27T21:40:54Z</updated>

    <summary><![CDATA[We ran into a positively bewildering bug in our Android app last week. &nbsp;It was the perfect storm of bug-dom: it looked incredibly simple, but actually debugging it was horrendous due to several threads, API calls, and strangely-timed UI events....]]></summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="debugging" label="debugging" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="management" label="management" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="process" label="process" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<div>We ran into a positively bewildering bug in our Android app last week. &nbsp;It was the perfect storm of bug-dom: it looked incredibly simple, but actually debugging it was horrendous due to several threads, API calls, and strangely-timed UI events. &nbsp;On top of everything, this bug didn't really affect any user; it just made the UI look <i>slightly </i>weird if you were paying a lot of attention.</div><div><img alt="famigo_sandbox_screenshot.png" src="http://www.codypowell.com/taods/famigo_sandbox_screenshot.png" width="319" height="354" class="mt-image-right" style="float: right; margin: 0px 0px 20px 20px; " /></div><br class="Apple-interchange-newline" /><div>If you look at the <a href="https://play.google.com/store/apps/details?id=com.famigo.sandbox">Famigo Sandbox</a> app, you'll notice there's a scroller at top where we list app recommendations. &nbsp;The issue we had was that the app scroller would occasionally list a blank app. &nbsp;Instead of an an app title and an icon, we'd show TextView as the app name and no icon at all. &nbsp;You can scroll through hundreds of app recommendations, but you'd only get the blank app once. &nbsp;Sounds trivial, no?</div><div><br /></div><div>Last week, we were readying a great new version of the app to push to Google Play. &nbsp;We had one known issue left: the blank spot in the app scroller. &nbsp;A few of us had actually spent a bit of time looking at this bug in the past, but no one had figured it out. &nbsp;We had a bit of time in our schedule before we needed to push the app, and I thought it'd be great to finally nail that bug. &nbsp;Fixing it would mean no known issues; hooray! &nbsp;Also, even though the bug was largely harmless, I worried that it'd be a <a href="http://pragprog.com/the-pragmatic-programmer/extracts/software-entropy">broken window</a>&nbsp;that might lead us towards sloppiness in the future. &nbsp;</div><div><br /></div><div>I paired up with John, one of our developers, to squash that sucker. &nbsp;When we sat down, we fully expected to fix the bug within the hour; I think I actually said that out loud. &nbsp;(Sidenote: never do that on a weird bug. &nbsp;If you do say something like that, the software gods might overhear and punish you for your hubris.)</div><div><br /></div><div>We began creating breakpoints, logging everything, and stepping through the code. &nbsp;Within a few minutes, I was baffled; this particular bit of functionality was far more complex than I thought. &nbsp;Due to the number of threads, it was even hard to figure out if it was an issue with the rendering logic or with the underlying data structure. &nbsp;Our joint mindset quickly went from "Haha, let's solve this silly bug" to "Hmm, interesting" to "I don't get it" to "Is there another line of work we're qualified for? &nbsp;Maybe garbage men, or is that a union deal?"</div><div><br /></div><div>After several hours, we literally had no idea what the problem might be. &nbsp;The rest of the office seemed to really enjoy our sighs, profanity, and nonsensical ramblings as we talked through what might be happening.</div><div><br /></div><div>It was tempting to stay at the office until we found the bug. &nbsp;I have tried that before, and I found that, past a certain amount, my efforts become detrimental. &nbsp;I get tired, I mess things up even worse, and I spend the entire next day just working my way back to the original bug. &nbsp;We made a note of what we were last looking at, then we left the office broken and dejected.</div><div><br /></div><div>That night, I actually dreamed about the bug. &nbsp;Yep, I couldn't get away from it even in my sleep. &nbsp;(Even worse, in the dream, I was pair programming with Dog the Bounty Hunter. &nbsp;Let's all choose not to analyze that.)</div><div><br /></div><div>Both John and I got back into the office early that next day. &nbsp;I imagine we looked like a couple of grizzled, old soldiers headed into battle, since no one dared to joke about the bug or even make eye contact with us. &nbsp;As we sat down together, I thought it made sense to put a time limit on our debugging. &nbsp;If we couldn't fix this ridiculous, silly bug in 90 minutes, then it was a sign from the cosmos that the app was destined to ship with one empty app in the app scroller.</div><div><br /></div><div>We referred to the note we made last night on where to pick back up and we jumped back into code. &nbsp;About 10 lines below, we saw something very, very strange. &nbsp;If we got a certain response from the API, we would insert a blank app into the app scroller to keep the numbering even across our paged requests. &nbsp;"That's... weird," we both said, trying not to sound optimistic.</div><div><br /></div><div>We deleted that chunk of code, tested the app, and saw that everything now worked great. &nbsp;We had 72 minutes left on the timer, and we had actually broke for 5 minutes for our standup! &nbsp;We had been looking right at that function the night before, and we missed the issue entirely.</div><div><br /></div><div>The lesson? &nbsp;When you've stared at something for hours and still don't get it, go home. You'll see the problem with new eyes tomorrow.</div> ]]>
        
    </content>
</entry>

<entry>
    <title>What&apos;s the Best Way to Get a New Developer Started?</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/06/whats-the-best-way-to-get-a-new-developer-started.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1217</id>

    <published>2012-06-21T15:58:50Z</published>
    <updated>2012-06-21T16:02:15Z</updated>

    <summary>The first week of a new development job is usually a sludge pit of paperwork, orientation, and environment configuration. Often, it&apos;s the worst week you&apos;ll have at that job. We recently had two interns join the Famigo development team for...</summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="management" label="management" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="process" label="process" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="teamwork" label="teamwork" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<p>The first week of a new development job is usually a sludge pit of paperwork, orientation, and environment configuration.  Often, it's the worst week you'll have at that job.  We recently had two interns join the <a href="http://www.famigo.com">Famigo</a> development team for the summer, which led to an interesting question: is there a better way to do all that?</p>

<p>As soon as the interns arrived, I set out a goal for them: push code to production on your first day.  While you can't avoid the paperwork and orientation part of a new job, at least they'd be contributing from the very beginning.  Why is that important?</p>

<ul>
<li>In order to push to production, you'll need a development environment set up.</li>
<li>You'll also need a bit of understanding about the codebase.</li>
<li>You'll need to understand some of the core concepts behind our process: unit testing, continuous deployment, etc.</li>
<li>It sets a good precedent.  We're a startup here; we're allowed to move fast.</li>
</ul>

<p>Is it reasonable to expect an intern to handle all of that on their first day?  No, not on their own.  Rather, each intern paired up with an experienced developer. The catch: the intern did the typing.  I think that works pretty well, for a few reasons.</p>

<ul>
<li>The new person gets firsthand experience with the environment and dev tools.  It's incredibly helpful to actually hit the keys yourself.</li>
<li>If an error pops up (spoiler alert: it totally will), there's an experienced person right there to help.</li>
<li>The new person gets a guided tour of the codebase, but they're the ones doing the navigation, so they're more likely to remember what's where.</li>
</ul>

<p>This process actually worked a little too well.  With the experienced person guiding the process and the new person doing the typing, we actually had both interns push quality code before lunch.  Unfortunately for them, that meant they then had to dive into paperwork.  Oh well, that's employment for you.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Taco Driven Development and Learning to Listen</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/05/taco-driven-development-and-learning-to-listen.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1216</id>

    <published>2012-05-05T14:43:15Z</published>
    <updated>2012-05-05T14:49:02Z</updated>

    <summary><![CDATA[We try to have fun together at Famigo. &nbsp;Every Wednesday we have Taco Twednesday, in which we all leave the office to eat at and review various taco establishments. &nbsp;(We're also logging these reviews, taking steps towards the world's first...]]></summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="career" label="career" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="process" label="process" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="teamwork" label="teamwork" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<div>We try to have fun together at <a href="http://www.famigo.com">Famigo</a>. &nbsp;Every Wednesday we have Taco Twednesday, in which we all leave the office to eat at and review various taco establishments. &nbsp;(We're also logging these reviews, taking steps towards the world's first big taco data powerhouse.) &nbsp;I also regularly invite everybody over to my house to make homebrew beer. (Ask about our not-quite-award-winning ale, Electric Brewgaloo.) &nbsp;Finally, we devote one afternoon a month to Mandatory Fun Day, where one rotating employee plans an afternoon of fun for the rest of the company.</div><div><br /></div><div>What's the point of all this camaraderie? &nbsp;Sure, it's fun to get out of the office, it's great for retention, and it helps with recruiting. &nbsp;We're doing something more important than all of that though when we're having fun, though. &nbsp;We're doing something that every software organization needs help with: we're learning to listen to each other.</div><div><br /></div><div>Writing software is an isolating experience. &nbsp;Many years ago**, I had a job where I'd sit down, I'd put on my headphones, I'd bang away at the keyboard, and I'd go home 10 hours later. &nbsp;That was all day, every day, for a few years. &nbsp;Actually, there was an occasional exception. &nbsp;Sometimes, I'd hit a hard problem and I'd spend 20 or 30 hours at the keyboard before I went home. &nbsp;I wasn't the exception; the rest of the team was like that too. &nbsp;And let me tell you, we made some crappy, crappy software.</div><div><br /></div><div>It was a little surprising at the time, because, individually, we were all smart. &nbsp;As a group then, shouldn't we have been <i>very </i>smart, making <i>very</i> good software? &nbsp;Spoiler alert: <b>no</b>. &nbsp;No one listened to anyone else, and so each person on the team was left on his own to make crucial mistakes. &nbsp;These mistakes compounded over the years until the product crashed.</div><div><br /></div><div>Why weren't we talking? &nbsp;I can only speak anecdotally here. &nbsp;My teammates were friendly enough, but there were differences when it came to age, outside interests, background, and politics. &nbsp;All we had to discuss were the typical programming debates (vim vs. emacs, etc.) or company rumors, both of which were charged conversation topics. &nbsp;In short order, everyone was still talking, but no one was listening.</div><div><br /></div><div>We had important questions for each other, things like "What do you think of Feature X?", "Do these requirements make sense?", and "What the hell is it we're actually building?" &nbsp;Even at the time, I knew I should be asking them, but not enough to get me to participate in an awkward conversation with people I didn't know very well. &nbsp;Collectively, we could've answered those and built something important. &nbsp;Instead, each of us charged ahead with a deeply flawed idea of what we were building.</div><div><br /></div><div>Would tacos, beer making, or a go cart outing have fixed that? &nbsp;Maybe so! &nbsp;At the very least, it's a shared experience. &nbsp;It's a starting point. &nbsp;We could begin having agreeable conversations, starting with questions like "How about these tacos?" &nbsp;or "Wow, we made some crappy beer, didn't we?" These simple conversations build rapport, and I would now trust my teammate's opinion on something. &nbsp;All of that is crucial if I want to feel comfortable asking harder questions like, "I can't figure this out; will you help me?" or "Are we building the right thing?" &nbsp;That is why it's only partly a joke when I say we practice taco-driven development.</div><div><br /></div><div>**The job in question is wayyyy back in time, and has nothing to do with any recent employers.</div> ]]>
        
    </content>
</entry>

<entry>
    <title> I Ain&apos;t Afraid of No Downtime: Scaling Continuous Deployment</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/04/i-aint-afraid-of-no-downtime-scaling-continuous-deployment.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1215</id>

    <published>2012-04-12T21:56:29Z</published>
    <updated>2012-04-12T22:00:20Z</updated>

    <summary>I was recently at the DevOpsDays conference, where I got into a conversation about build automation. I mentioned how we practice continuous deployment, so we may deploy to production 20 times a day. The guy replied, &quot;That sounds great for...</summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="devops" label="devops" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="leanstartup" label="leanstartup" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="process" label="process" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<p>I was recently at the DevOpsDays conference, where I got into a conversation about build automation.  I mentioned how we practice continuous deployment, so we may deploy to production 20 times a day.  The guy replied, "That sounds great for some tiny startup, but what would happen if you had actual users?"</p>

<p>Allow me to respond in 2 parts.  First, ouch.  Second, continuous deployment is <i>not</i> at odds with a great user experience or high uptime requirements.</p>

<p>Between our website and our API at <a href="http://www.famigo.com">Famigo</a>, we handle hundreds of thousands of HTTP calls every day.  We've practiced continuous deployment for 2 years.  You know how many complaints we've had about a cruddy user experience due to frequent deployments?  Zero.  Why were these deployments essentially transparent to all of our users?  That's a requirement for our build process, and so we've focused on that part as much as the actual act of building and deploying.</p>

<p><b>How Does It Work?</b><br/>
First, let's talk about what our production environment looks like.  We have a few different VMs hosting our web app; these are all based off of the same original image.  Our load balancer distributes traffic across these instances evenly.  Since all of our web and API is based upon Django, we use <a href="http://www.virtualenv.org/en/latest/index.html#what-it-does">virtualenv</a> to manage all of our Python dependencies on each instance.  Each instance also runs <a href="http://jenkins-ci.org/">Jenkins</a>, which does the heavy duty work of building and deploying.</p>

<p>All of the important data comes from MongoDB or Redis.  I point that out to just to note that, with this backend, we rarely do schema migrations.  Big honking ALTER TABLE statements can cause serious downtime; just ask the guy in the Oracle shirt crying into his keyboard right now.</p>

<p><b>How Do We Build?</b><br/>
We have one instance that's constantly polling our github repo for changes.  When a change is found, it pulls down the repo.  Our environment dependencies are part of that repo, so we make a call to virtualenv to ensure the environment is up to date.  Then we run all of our tests; there are around 900 of these.  When that's done, we rsync the files over to our production directories and restart our fcgi process.  We then make a call to the next instance's Jenkins remote access API to kick off a build, and the whole process starts again.</p>

<p><b>Downtime?</b><br/>
The only portion of the build process that involves any downtime is when we rsync and then restart fcgi.  Those steps take maybe a second or two.  Since we build and deploy one instance at a time, that second of downtime rolls from machine to machine; in other words, we never have one second of downtime for all users on all instances.</p>

<p>One thing to keep in mind here is that our load balancer constantly pings our instances to ensure they're up.  (After all, that's the whole point of these load balancer thingies.)  If, for whatever reason, our downtime is longer than a few seconds, the load balancer will stop distributing traffic to that instance until it's back up.</p>

<p>As you can see, you have to be a little bit lucky (unlucky, rather) to ever see downtime here.  You need to hit one particular instance with a request during its 1 second of downtime while the load balancer is sending traffic there with the load balancer not having realized the instance is down.</p>

<p><b>Does That Downtime Even Matter?</b><br/>
Please break out your slide rule, as we're going to do some math.  Per instance, if we do 20 deployments with 1 second of downtime for each, that's 20 seconds.  There are 86400 seconds in a day.  20/86200 is, in purely mathematical terms, teensy weensy.  (I don't know how to calculate downtime across all instances because of the load balancer and its outage detection, so I'm just sticking with one instance here.)</p>

<p>Now, if we were processing credit cards or something like that, 20 seconds of downtime per day due to deployments would be unacceptable.  (Note: we don't do that.)  On the contrary, if your traffic is largely mobile, as ours is, then 20 seconds a day is nothing.  In fact, <i>we expect far worse</i>.  The reason is that, in the land of mobile, you get in the habit of trying and retrying everything related to the network, because the coverage can be so spotty.</p>

<p><b>Conclusion</b><br/>
Continuous deployment does not necessarily mean giant swaths of downtime throughout the day.  In fact, as you scale up in environment infrastructure, deployment smarts, and hopefully users, you gain tools that can make this downtime negligible.  Now, back to my actual users.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Code Like Clarkson</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/03/code-like-clarkson.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1214</id>

    <published>2012-03-12T00:08:57Z</published>
    <updated>2012-03-12T00:12:10Z</updated>

    <summary><![CDATA[Like anyone else with a brain and a heart, I love Top Gear. &nbsp;In fact, I love it so much that I find myself borrowing wisdom from the show and applying it other domains entirely.As you may know, there's a...]]></summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="process" label="process" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<div>Like anyone else with a brain and a heart, I love Top Gear. &nbsp;In fact, I love it so much that I find myself borrowing wisdom from the show and applying it other domains entirely.</div><div><br /></div><div>As you may know, there's a portion of show where they put a celebrity in a reasonably priced car, which the celebrity then races around the Top Gear track. &nbsp;We then watch the celebrity watch their lap. &nbsp;Often, the celebrity will say something how it looks as if they're going really slow. &nbsp;Jeremy Clarkson always has the same retort: if you look like you're going slow, you're probably going quite fast.</div><div><br /></div><div>This idea maps nicely to software. &nbsp;To the uninformed, it looks like we're going slow when we write tests. &nbsp;It looks like we're going slow when we learn and utilize new tools like Hadoop or Cassandra. &nbsp;It looks like we're going slow when we perform A/B tests. &nbsp;It looks like we're going slow when we pair program. &nbsp;And yet, all of these "slow" activities are tremendously helpful to going fast in the long run.</div><div><br /></div><div>Conversely, if it looks like you're writing software quickly, you're probably not. &nbsp;Yes, you can get all cranked up on caffeine, code for 20 hours, and write a thousand lines of code. &nbsp;You can do that day after day, just coding, releasing gigantic features constantly. &nbsp;I think you'd quickly encounter a day of reckoning. &nbsp;Things would burst into flames, and you'd realize it's more trouble than it's worth to fix this mountain of code. &nbsp;You'd realize that it looked as you were going fast, but you were really going slow.</div> ]]>
        
    </content>
</entry>

<entry>
    <title>Bytes Matter</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/02/bytes-matter.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1213</id>

    <published>2012-02-28T03:57:37Z</published>
    <updated>2012-02-28T04:05:10Z</updated>

    <summary>I love to profile applications, because I always learn something that surprises me. Initial Profiler Surprise: Client Side Case in point, I was recently profiling our Android application, the Famigo Sandbox. This app sends a lot of data back and...</summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="mobiledevelopment" label="mobiledevelopment" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mongodb" label="mongodb" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="scalability" label="scalability" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<p>I love to profile applications, because I always learn something that surprises me.</p>

<p><strong>Initial Profiler Surprise: Client Side</strong><br/>
Case in point, I was recently profiling our Android application, the <a href="http://www.famigo.com/sandbox/">Famigo Sandbox</a>.  This app sends a lot of data back and forth with our API, as we try to determine which of the apps on your phone are safe for your kids.  I always assumed that, if app performance suffered during some of the chattier features, it was probably due to slow cell reception.</p>

<p>The profiler told me that I was wrong; the transfer time was almost always negligible.  What <i>wasn't</i> negligible was the amount of CPU time it took to parse the JSON coming from the API into native types.  (Note that I'm measuring JSON parse time across an average app session, not just for one call.)</p>

<p>Like most JSON decoders, we parse everything, regardless of whether we use it or not.  I took another look at our API responses and learned that our app actually didn't need half of what we were sending.</p>

<p>Now, we weren't doing anything too crazy on any individual API call.  We consistently returned too much data everywhere, though, across many API calls.  In aggregate, <b>these bytes mattered</b>.  Once we learned this, we streamlined the data returned from our API and quickly saw our JSON parsing bottleneck go away.</p>

<p><strong>Subsequent Profiler Surprise: Server Side</strong><br/>
Here, I was profiling our website, which is essentially <a href="http://www.famigo.com">an app recommendation engine for families</a>.  We consistently see some calls take a long time, and I assumed it was the complexity of the queries.  For example, our queries to find and sort <a href="http://www.famigo.com/best-iphone-apps/">the best iphone apps</a> or <a href="http://www.famigo.com/newest-free-android-apps/">free android apps</a> take into account a lot of disparate data from our own reviewers, the app stores, and all of our family users.</p>

<p>When I profiled these calls again, I was shocked.  The queries were actually well-tuned (as of the author of these queries, yes, this is shocking); the slowness was coming from the ORM (pedantic note: it's really an ODM - <em>shakes TI85 threateningly</em>) we use to turn our MongoDB documents into our lovely Python models.</p>

<p>This problem was actually very similar to the problem seen in our Android app.  MongoDB documents are encoded in BSON, which is very similar to JSON, and our ORM is responsible for parsing that BSON into usable types.  On almost all of these queries, we were asking our db drivers to parse the entire document when we really only needed a small subset (1/3 or 1/4) of the fields.  That's hardly noticeable when you're dealing with a few documents, but it becomes quite a bottleneck with thousands of documents.  Again, I realized that bytes matter.</p>

<p>Once I figured out the problem, the fix was easy.  Instead of asking for every field on every document in the query, I simply specified the fields I wanted.  When this change went live, the bottleneck disappeared and we got an easy 40% improvement in average render time.</p>

<p><strong>Let Us Conclude</strong><br/>
I don't think I need to restate this, but I will, because it's my website and we hammer points into the ground 'round these parts.  The lesson is that the more data you return, the more you must process.</p>

<p>This is so basic that it's often easy to ignore entirely.  However, once you have real users and real data, bytes matter, and they matter more and more as you scale.  Use them wisely.</p>
]]>
        

    </content>
</entry>

<entry>
    <title> Understanding-Driven Development</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/02/understanding-driven-development.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1211</id>

    <published>2012-02-17T03:35:17Z</published>
    <updated>2012-02-17T03:41:16Z</updated>

    <summary>I have a weird idea. What if, with every change we made to our codebase, we tried to increase our understanding of it a little bit? Entropy Tries to Thwart Us This is challenging because codebases always go in the...</summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="implementation" label="implementation" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="process" label="process" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<p>I have a weird idea.  What if, with every change we made to our codebase, we tried to increase our understanding of it a little bit?</p>

<p><b>Entropy Tries to Thwart Us</b><br/>
This is challenging because codebases always go in the opposite direction.  As you make more changes and new people join the team, everybody understands less and less of what ought to be happening; the fact the code works at all is nearly miraculous!  Soon, everyone who touches the codebase adopts an "If it ain't broke, don't fix it" attitude.</p>

<p>Success depends on understanding, though.  We have to understand the code to add new features, fix important bugs, refactor, and bring new teammates aboard.  Not only that, but problems that are deeper than code, like architecture and scalability, can't be addressed without first understanding.</p>

<p><b>Understanding Must Be Widely Distributed</b><br/>
One person understanding isn't enough.  After all, what happens if that one person gets eaten by a komodo dragon?</p>

<p>There are deeper problems than that, though.  Imagine that your brain becomes tightly coupled with a bit of code.  The first problem is that your brain is faulty, and you will forget.  The second (scarier) problem is that, if you're the only person who understands a piece of code, you own it and you'll maintain it.  Forever.  It doesn't matter what you else you progress to, when a problem arises with that code, it's your problem.  It encourages context switching, and lots of tiny, strange code silos.</p>

<p><b>How to Create Understanding</b><br/>
How do you increase understanding on a large scale, then?  Let's go through a few approaches, none of which are earth shattering.</p>

<ol>
<li><p>Automated tests.  When you have simple, isolated tests that are run often, it means anyone can learn about the code, make a change, see the effects, and feel good about the work they just did; you are creating understanding.  Unit tests, BDD-style tests, integration tests?  All of these work.</p></li>
<li><p>Refactoring.  As you are adding features or fixing bugs, you can create understanding if you're constantly working to make the code as clear as possible.  The great thing about these changes is that they can be trivial.  One technique is just to revisit the names used in a chunk of old code.  If a variable contains sales invoices and you change its name from temp to sales_invoices, you have succeeded.  Make more changes like that!</p></li>
<li><p>Documentation.  Yes, documentation can create understanding, but only if it accurately reflects the current state of your code.  The most effective way to do this is to generate it dynamically based on the code itself: method signatures, assertions, url routes, the requirements stated in your BDD tests.</p></li>
<li><p>Environment automation.  There are probably a lot of magical bits in your environment.  Maybe your build process doesn't work unless this one particular directory is owned by this one particular user, or your CDN occasionally serves up old assets and you have to poke around in the Amazon Web Service dashboard to fix it.  These weird workarounds are often simple, but you encounter them infrequently enough that no one remembers exactly what's happened or why.  Do your brains a favor: automate all of this.  Once it's written, it can be understood.</p></li>
</ol>

<p><b>How to Create Misunderstanding</b><br/>
You can easily abuse all of the methods I just said, and actually use them to create misunderstanding.</p>

<ol>
<li><p>A test creates misunderstanding if it depends on data that's changed by other tests.  If your tests don't repeatably succeed, regardless of order, you're causing confusion.</p></li>
<li><p>Refactoring can create misunderstanding if you take well-understood code and change it dramatically, without also writing tests.</p></li>
<li><p>Documentation often causes more harm than good.  Think about the nearest gigantic, outdated Word doc, or the comments in your code you fail to revise as you refactor.  At some point, someone will read that and get confused.</p></li>
<li><p>Environment automation causes misunderstanding if it doesn't accurately reflect the state of your enviroment.  Maybe you have some disaster recovery scripts lying around.  Do they work, or would looking at them only give you misconceptions about the way your environment used to look?</p></li>
</ol>

<p><b>Conclusion: Be Smarter.</b><br/>
Ultimately, software development is really, really hard.  We have to think in terms from single bits to clusters of super-powered VMs.  The best (only?) way to work effectively together and build great things is to constantly and collectively work towards a better understanding of our code.</p>
]]>
        

    </content>
</entry>

<entry>
    <title> Things Get Weird at Scale</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/01/things-get-weird-at-scale.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1210</id>

    <published>2012-01-23T16:57:00Z</published>
    <updated>2012-01-23T19:06:54Z</updated>

    <summary>Something scary happened on Saturday. At Famigo, we have several different monitoring systems for our production environment. At about 3 AM, they all collectively went nuts. I happened to be up then because my son had a coughing fit, so...</summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="scalability" label="scalability" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<p>Something scary happened on Saturday.  At <a href="http://www.famigo.com">Famigo</a>, we have several different monitoring systems for our production environment.  At about 3 AM, they all collectively went nuts.  I happened to be up then because my son had a coughing fit, so I checked the site, verified that nothing weird was happening, and chalked it up to code gremlins.</p>

<p>When I woke up a few hours later, I saw roughly 38 gazillion more alerts.  Based on the amount of load we were seeing, you'd think Barack Obama had interrupted all network programming to give a Famigo plug (note to Barack: you should do this).  Still, I didn't actually see many users on the site, just a lot of load.  Also, we're load balanced in production, and each traffic-handling node was under heavy load; it wasn't just one machine.</p>

<p><b>Maybe It's Not a Problem?</b><br/>
Saturday mornings are a key time for us at Famigo because we send out a report to each activated user, showing what apps their kids played along with some personalized app recommendations.  When I saw the load, I immediately thought something was wrong with the email generation process.  Maybe we dropped an index somewhere and the queries were suddenly taking a long time?  I resolved not to worry any more about this until the emails were done.</p>

<p>Here's the problem: the emails wouldn't finish.  They were going so slowly due to the load that we'd be sending these emails for days.  I now began to worry more.  I canceled our email task and began troubleshooting in earnest.</p>

<p><b>Bizarre Facts Emerge</b><br/>
As I SSHed into our various boxes, I noticed, via a top command, that it was the Python process serving our website and API that was consuming 100% of the CPUs.  That was intriguing.  I restarted all of the usual suspects (MongoDB, lighttpd, our web app), only to see things quickly begin to degrade again.  Within a few minutes, Python was once again consuming 100% of the CPU.</p>

<p>Like most web apps, we don't really do much work that's CPU-bound.  In fact, one of the laws of performance I've learned is that if you do have work that's CPU-intensive, always do it in the background.  And yet, we were clearly taxing the CPU.  Maybe we accidentally pushed a commit that attempts to generate pi to 1 million digits every time someone made a web request?</p>

<p>It seemed clear that the issue was with our code.  Here's another weird thing: we hadn't changed much lately.  I went through all of the commits for the past 2 days, and it was all pretty boring stuff.  Just to verify, I pointed my dev instance to our production database (don't try this at home, kids) and began to actually navigate through these recent changes.  Like I initially thought, there was nothing earth-shattering; it was all pretty standard web stuff.</p>

<p>At this point, I began to think I was hallucinating.  I hadn't seen <a href="http://www.imdb.com/title/tt0172493/">Girl Interrupted</a>, but I imagined that Winona Ryder got committed in that movie because of the mental strain of debugging in production.</p>

<p><b>A-ha! (Or, How to Solve Problems Through Random Mouse Clicks)</b><br/>
I didn't know what to do.  I just began clicking around our site on my dev instance, monitoring page load time in Chrome.  Everything looked just fine on my dev instance at first, and then I began to notice one trivial view taking a little bit longer than it should've.  The page itself should've rendered in a second or so, and it was taking closer to 1.5 seconds.  The more I reloaded, the longer it took.  This was particularly interesting because this view was our application view (<a href="http://www.famigo.com/app/lame-castle-1/">here's an example</a>).  We can render over 30,000 apps with that view, so while the logic is very simple, it's constantly being rendered because of all the traffic.</p>

<p>Now, we get to the funny part.  As I mentioned, we analyze tens of thousands of apps and, if you knew the right app slug, you could actually render any app, even dirty ones, with that view.  While we didn't advertise this fact, you could get to stuff like famigo.com/app/super-sexy-sex-time/.  Those are clearly not the apps we want families to see, even progressive European families.  So, I had recently put in just a bit of work to keep sex apps from rendering.  Every time someone requested an app, we'd check to make sure it was in the set of allowed, non-sex apps before we rendered anything.</p>

<p>It's really just one line of code that does the no-sex-app check.  The no-sex-app check wasn't being done in the database; I was basically saying, in Python, 'raise 404 if app not in good_apps'.  That's so simple!  On my development instance, it worked fine.  It didn't slow down our unit tests.  However, when I wrote a quick bash script to request that URL 10 times simultaneously, things began to explode.  In production, when we regularly have 25+ visitors requesting that URL at all times, everything truly burst into flames.  It turns out that this particular view <b>was</b> CPU-bound, I just didn't know it until it encountered some scale.</p>

<p><b>Scale Drives You Mental</b><br/>
I think this is a fascinating bug.  It's very simple logic that would work fine if we had 3 or 5 people on the site at any moment.  With 10x that traffic, it was catastrophic to the rest of the platform.  Imagine how crazy this gets at 100x or 10,000x that traffic.  That's what makes scalability fun: gigantic issues at scale often come from very innocuous code.  At least for me, the root cause is never what I expect.</p>

<p>How do we prevent this from happening in the future?  I'm not entirely sure.  It's not really something that fits into a unit test or an integration test.  As a stopgap solution, we wrote a Python decorator that wraps all of our views and logs how long they took to render.  Based on that, we can calculate how long a view should take to render, and alert ourselves if the render time is outside a reasonable span of time.  It's not perfect, but it's a start.  Anybody have a better idea?</p>
]]>
        

    </content>
</entry>

<entry>
    <title> The Beautiful Marriage of MongoDB and Redis</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2012/01/the-beautiful-marriage-of-mongodb-and-redis.html" />
    <id>tag:www.codypowell.com,2012:/taods//11.1209</id>

    <published>2012-01-08T20:24:58Z</published>
    <updated>2012-01-08T20:26:52Z</updated>

    <summary>I am on the record as being a MongoDB fan, admirer, and devotee. I never quite felt the same way about Redis, though. My friends would talk excitedly about Redis and I&apos;d say, &quot;But I have a perfectly good key...</summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    <category term="nosql" label="NoSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<p>I am on the record as being a MongoDB fan, admirer, and devotee.  I never quite felt the same way about Redis, though.  My friends would talk excitedly about Redis and I'd say, "But I have a perfectly good key value store in memcached and a perfectly good document oriented database in MongoDB.  Between those two, I will solve all hard problems, excluding global warming!"  Slowly, though, I began to run into harder problems.</p>

<p><b>A Harder Problem</b><br/>
At <a href="http://www.famigo.com">Famigo</a>, we do many, many distinct, complex queries when it comes to recommending apps for families (eg, give me the top 1000 puzzle games for young adults that are free on the Amazon App Store, sorted by user rating).  Doing all these queries on demand proved to be a little bit slow (average query time is about a second), so I decided to cache the results of each distinct query for 8 hours.  That's slightly more complex, but it wasn't like I was writing an Erlang compiler in Visual Basic.</p>

<p><b>Initial Approaches</b><br/>
Take 1 of cache implementation: Use memcached for the cache.  Ten minutes later, curse memcached for not having an ordered datatype.</p>

<p>Take 2 of cache implementation: Use MongoDB for the cache.  Many minutes later, celebrate success (prematurely).</p>

<p>What did our cached query results look like in MongoDB?  Each document in the cache had a cache key (eg, most-popular-puzzle-games-for-young-adults), an expiration date, an ordinal, and a reference to the application document that we wanted to render.</p>

<p><b>Warning Signs</b><br/>
There were already hints that I was doing it wrong.  Case in point: I had to manage all of the cache expiration myself.  In MongoDB, you can specify a maximum number of documents that a collection can store (which I was doing; I specified a max of 500k docs), but that's not at all the same thing as caching these results for exactly 8 hours.  Speaking of which: hey 10gen, we want TTL collections!</p>

<p>Another sign I was doing it wrong: I had to do a lot of index tuning to make my interactions with the MongoDB cache fast.  Every time I checked the cache, I had to specify the cache key, expiration date, and sort by the ordinal; for that to be fast, <i>all of those</i> needed to be covered by an index.  While the index sped up my finds, it slowed down my inserts.  I had a hell of a time finding the right balance.</p>

<p>Unfortunately, I'm not yet done listing the signs that I wasn't doing it right.  You can't delete from a MongoDB capped collection.  That's no problem if you're just collecting logs, but from time to time, we must invalidate our cache.  Since I couldn't delete these documents from the cache, I had to add another column that stored an Active status, which also required an index, since we had to query by it every time.</p>

<p><b>How Did It Work? (Spoiler Alert: Not So Great.)</b><br/>
We ended up running in production on my MongoDB app query cache for a month or two.  It was definitely faster than performing all of the complex queries in real time (~300ms instead of 1s), but there was a new delay when we had to add results to the cache (~200ms).  As both app data and users scaled up by an order of magnitude, it was clear that this would just burst into flames at some point.</p>

<p><b>A New Solution Emerges!</b><br/>
I decided to try something new.  I knew that Redis had a sorted set datatype, so I started to play with that.  Rather than cache these app query results in MongoDB, I created a sorted set of app ids for each query.  I let Redis handle all of the cache expiration business by setting a TTL value for each key.  When I wanted to pull from the cache, I did so, then did a find in MongoDB using the $in operator with all of the app ids, then I reordered that in Python based on the app ordering in Redis.  I knew it wasn't as pretty, but was it effective?</p>

<p>For my first test, I merely timed how long it took to add a few hundred results to my Redis-backed cache.  That was regularly around 200ms; it was now down to 1 or 2ms.  Impressive... but then that should be fast.  I refused to be impressed until I started pulling from the cache.</p>

<p>Was it faster to pull the app ids from Redis, use that to pull the documents from MongoDB, then use Python to reorder everything?  Actually, yes.  Thus far, getting from the cache takes 1/3 of the time that it did before.  Meanwhile, adding to the cache is essentially free.</p>

<p><b>How Not to Do MongoDB, or Any Other Datastore</b><br/>
It turns out that, technically, I was correct.  I <i>could</i> use MongoDB as a key-value store for caching, much like I <i>could</i> use my Mazda 3 as an amphibious assault vehicle.  In practice, neither would be optimized for those use cases.</p>

<p>A key part of determining your architecture is understanding the strengths and weaknesses of your technology choices.  The primary strength of MongoDB is how it allows you to simplify and decouple your data modeling via document-orientation.  What about Redis?  Its primary strength is how it enables very fast access to a few key data structures, like sets and dictionaries.  With both of those stated, it becomes clear the situations in which you can combine MongoDB and Redis to build delightful software.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Building Software Is like Escaping from Prison</title>
    <link rel="alternate" type="text/html" href="http://www.codypowell.com/taods/2011/12/building-software-is-like-escaping-from-prison.html" />
    <id>tag:www.codypowell.com,2011:/taods//11.1208</id>

    <published>2011-12-23T19:05:17Z</published>
    <updated>2011-12-23T19:11:11Z</updated>

    <summary>If there&apos;s one thing that the earth has enough of, it&apos;s social media professionals. If there&apos;s another thing that the earth has enough of, it&apos;s software development analogies. Regardless, I&apos;m going to spin one here. You know what building software...</summary>
    <author>
        <name>Cody</name>
        <uri>http://www.codypowell.com</uri>
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://www.codypowell.com/taods/">
        <![CDATA[<p>If there's one thing that the earth has enough of, it's social media professionals.  If there's <i>another</i> thing that the earth has enough of, it's software development analogies.  Regardless, I'm going to spin one here.  You know what building software is like?  It's like escaping from prison.</p>

<p>Think back to your favorite prison break movie, whether it's <a href="http://www.imdb.com/title/tt0057115/">the Great Escape</a>, <a href="http://www.imdb.com/title/tt0111161/">Shawshank Redemption</a>, the cleverly-titled TV show <a href="http://www.imdb.com/title/tt0455275/">Prison Break</a>, or <a href="http://www.imdb.com/title/tt0455275/">the Rock</a> (which, granted, is about breaking <i>into</i> prison - further proof that Nick Cage doesn't play by your rules).  What was involved?</p>

<p><b>A cast of quirky characters, drawn together by a shared goal.</b><br/>
On a prison break team, you might have a group of guys digging the tunnel, another group laying down the track inside the tunnel, and then some more guys securing fake IDs and afro wigs for once you've escaped.  There's always (ALWAYS!) a scene earlier in the movie showing how none of these guys liked each other originally.  Once they settle on the idea of a prison break, they quickly become inseparable and ready to lay down their lives for each other.</p>

<p>Between developers, marketing, biz dev, management, and investors, the typical software organization is a collection of people who'd never gather together for any other reason.  It rivals the most dysfunctional family in the world.  And yet, the end goal, be it world domination, billions of dollars, or just happier users, is enough get this odd bunch of folks to put in thousands of hours in extremely stressful situations.  Weekends are skipped, holidays go unobserved, and your kids come to refer to you as 'That smelly guy who drops by occasionally to swear at us and change underwear', all for the sake of the software.</p>

<p><b>A group of antagonists, out to thwart the escape.</b><br/>
In the prison break movie, there's always a group of guards charged with preventing the prisoners from escaping.  We might get a scene showing what happens when some other group tried and failed to escape.  We come to learn that these guards are some rough hombres.</p>

<p>Similarly, your competition is there to keep you from the rich rewards you'll gain upon your glorious software release.  This competition could be a competing company in this market, or it could be another department in your organization.  Regardless, an inordinate amount of energy is spent worrying about these antagonists.</p>

<p><b>Fortunately, the antagonists aren't very smart.</b><br/>
Remember how, in all of those movies, the prisoners empty one handful of dirt at a time into the yard, so no one realizes they're tunneling out?  And how they put a dummy made out of socks in their beds each night so the guards don't notice their absence?  If the guards were actually intelligent and engaged, they'd probably notice that kind of thing.  They never do.</p>

<p>Your software group's antagonists are probably the same.  You'll spend a ton of time worrying that they've figured out what you're up to, based on a few sentences on your website or a line in some PR piece that slipped out.  Oh God, maybe they even signed up for a beta account!  And then you'll worry that once they figure this out, they'll beat you to market and steal the money, fame, and silk snuggies that accompany both of those.  This will probably never happen either.  </p>

<p><b>Months and months of drudgery.</b><br/>
Unsurprising fact: it's a lot of work to tunnel through a building, under the prison yard, and out to safety.  Especially when all you have is a spoon.</p>

<p>Along the same lines, it's a lot of work to write software.  No matter how many practices, methodologies, and tools we use, it is ultimately just a hell of a lot of typing.  Not all of this is exciting; you'd probably rather dig through a wall with a spoon than revisit your password reset logic.  Nonetheless, the goal depends on this, so we type and type and type.</p>

<p><b>Followed by short bursts of brain-melting terror.</b><br/>
What happens when the prisoners finally escape?  All hell breaks loose: somebody realizes the fake IDs never showed up, there's a police-looking-guy in front of the rally point, and the prisoners can hear the guards and their dogs a few hundred yards away.</p>

<p>A software release is quite similar.  What should be a triumphant moment quickly turns sour, as the production environment goes down, Google de-indexes us for some reason, and we discover a bug that's led us to charging some people 100x more than we ought to.</p>

<p><b>Hooray, we escaped from prison and everything's great!  Except for when it isn't.</b><br/>
Here's the surprising part: for all the cliches, prison break movies diverge when it comes to the conclusion. Some end happily, with everyone free.  In some, no one escapes.  And then some end on a mixed note, with a few prisoners escaping and the rest being captured/shot/nibbled on by guard dogs.  In general, the endings aren't anything to be depressed over, because there's always a high chance of a sequel.</p>

<p>Software works the same way.  There are endings that are happy, non-happy, and all emotions in between.  If that's the case and we're not guaranteed success regardless of the sacrifices, then we should focus on the practice itself of software development.  Let's have fun, treat each other excellently, and learn.  Let's understand that there's an element of luck in these results, but that shouldn't detract from our satisfaction of doing a job really well and building something wonderful.</p>

<p>If we take over the world in the process, that's all the better.  Even if we don't, we'll still be in fine position for the sequel.</p>

<p><i>Many thanks to everybody who helped me think of prison break cliches on Twitter.</i></p>
]]>
        

    </content>
</entry>

</feed>
