June 2012 Archives

Can't Fix It? Go Home.

| No TrackBacks
We ran into a positively bewildering bug in our Android app last week.  It was the perfect storm of bug-dom: it looked incredibly simple, but actually debugging it was horrendous due to several threads, API calls, and strangely-timed UI events.  On top of everything, this bug didn't really affect any user; it just made the UI look slightly weird if you were paying a lot of attention.

If you look at the Famigo Sandbox app, you'll notice there's a scroller at top where we list app recommendations.  The issue we had was that the app scroller would occasionally list a blank app.  Instead of an an app title and an icon, we'd show TextView as the app name and no icon at all.  You can scroll through hundreds of app recommendations, but you'd only get the blank app once.  Sounds trivial, no?

Last week, we were readying a great new version of the app to push to Google Play.  We had one known issue left: the blank spot in the app scroller.  A few of us had actually spent a bit of time looking at this bug in the past, but no one had figured it out.  We had a bit of time in our schedule before we needed to push the app, and I thought it'd be great to finally nail that bug.  Fixing it would mean no known issues; hooray!  Also, even though the bug was largely harmless, I worried that it'd be a broken window that might lead us towards sloppiness in the future.  

I paired up with John, one of our developers, to squash that sucker.  When we sat down, we fully expected to fix the bug within the hour; I think I actually said that out loud.  (Sidenote: never do that on a weird bug.  If you do say something like that, the software gods might overhear and punish you for your hubris.)

We began creating breakpoints, logging everything, and stepping through the code.  Within a few minutes, I was baffled; this particular bit of functionality was far more complex than I thought.  Due to the number of threads, it was even hard to figure out if it was an issue with the rendering logic or with the underlying data structure.  Our joint mindset quickly went from "Haha, let's solve this silly bug" to "Hmm, interesting" to "I don't get it" to "Is there another line of work we're qualified for?  Maybe garbage men, or is that a union deal?"

After several hours, we literally had no idea what the problem might be.  The rest of the office seemed to really enjoy our sighs, profanity, and nonsensical ramblings as we talked through what might be happening.

It was tempting to stay at the office until we found the bug.  I have tried that before, and I found that, past a certain amount, my efforts become detrimental.  I get tired, I mess things up even worse, and I spend the entire next day just working my way back to the original bug.  We made a note of what we were last looking at, then we left the office broken and dejected.

That night, I actually dreamed about the bug.  Yep, I couldn't get away from it even in my sleep.  (Even worse, in the dream, I was pair programming with Dog the Bounty Hunter.  Let's all choose not to analyze that.)

Both John and I got back into the office early that next day.  I imagine we looked like a couple of grizzled, old soldiers headed into battle, since no one dared to joke about the bug or even make eye contact with us.  As we sat down together, I thought it made sense to put a time limit on our debugging.  If we couldn't fix this ridiculous, silly bug in 90 minutes, then it was a sign from the cosmos that the app was destined to ship with one empty app in the app scroller.

We referred to the note we made last night on where to pick back up and we jumped back into code.  About 10 lines below, we saw something very, very strange.  If we got a certain response from the API, we would insert a blank app into the app scroller to keep the numbering even across our paged requests.  "That's... weird," we both said, trying not to sound optimistic.

We deleted that chunk of code, tested the app, and saw that everything now worked great.  We had 72 minutes left on the timer, and we had actually broke for 5 minutes for our standup!  We had been looking right at that function the night before, and we missed the issue entirely.

The lesson?  When you've stared at something for hours and still don't get it, go home. You'll see the problem with new eyes tomorrow.

The first week of a new development job is usually a sludge pit of paperwork, orientation, and environment configuration. Often, it's the worst week you'll have at that job. We recently had two interns join the Famigo development team for the summer, which led to an interesting question: is there a better way to do all that?

As soon as the interns arrived, I set out a goal for them: push code to production on your first day. While you can't avoid the paperwork and orientation part of a new job, at least they'd be contributing from the very beginning. Why is that important?

  • In order to push to production, you'll need a development environment set up.
  • You'll also need a bit of understanding about the codebase.
  • You'll need to understand some of the core concepts behind our process: unit testing, continuous deployment, etc.
  • It sets a good precedent. We're a startup here; we're allowed to move fast.

Is it reasonable to expect an intern to handle all of that on their first day? No, not on their own. Rather, each intern paired up with an experienced developer. The catch: the intern did the typing. I think that works pretty well, for a few reasons.

  • The new person gets firsthand experience with the environment and dev tools. It's incredibly helpful to actually hit the keys yourself.
  • If an error pops up (spoiler alert: it totally will), there's an experienced person right there to help.
  • The new person gets a guided tour of the codebase, but they're the ones doing the navigation, so they're more likely to remember what's where.

This process actually worked a little too well. With the experienced person guiding the process and the new person doing the typing, we actually had both interns push quality code before lunch. Unfortunately for them, that meant they then had to dive into paperwork. Oh well, that's employment for you.

About the Author

The Art of Delightful Software is written by Cody Powell. I'm currently Director of Engineering at TUNE here in Seattle. Before that, I worked on Amazon Video. Before that, I was CTO at Famigo, a venture-funded startup that helped families find and manage mobile content.

Twitter: @codypo
Github: codypo
LinkedIn: codypo's profile
Email: firstname + firstname lastname dot com