November 2011 Archives

MongoDB Is a Tool, Not THE Tool

| No TrackBacks

There's been a lot of angst (exhibit A, B) and counter-angst (exhibit C, D) directed at MongoDB lately. We're enthusiastic users of MongoDB at Famigo but we're not zealots, so our approach towards Mongo may be instructive to others.

Our take on MongoDB: it's a great tool, but it's not the only tool. MongoDB is fast, easy to administer, and has a great API for many different use cases. When you find yourself in a situation where those 3 no longer apply, you should use a different database. Let's consider these, point by point.

MongoDB is fast
This has long been one of Mongo's selling points; if you're gullible enough to believe database benchmarks, there's proof scattered about the web. Lately, there's been much debate about how some of 10gen's design decisions could potentially kill db performance, particularly Mongo's global write lock. Under a write-heavy load (as many Mongo instances are), Mongo could become CPU-bound, which would be catastrophic for performance.

That sounds terrifying (global write lock omgwtfbbq?!?), but we have never experienced this. Under a reasonable load, you're unlikely to run into this if you follow Mongo's one guiding performance principle: your working set must fit into physical RAM. I cannot overemphasize this point. Mongo relies on memory mapped files for performance, and you'll see a gigantic degradation in response time if it must go to the hard drive to read or write.

Remember, for outstanding performance, your working set must fit into physical memory, not virtual memory. If your data is too large for that, you should either shard it so that it will fit into physical memory on multiple machines, or choose another database.

MongoDB is easy to administer
It's quite simple to configure MongoDB for replication; if you know the difference between bash and batman, you could do it in less than 30 minutes. The same goes for sharding. Compared to the amount of time I've spent configuring and freaking out over Oracle clusters of similar size, my MongoDB administration time is a rounding error's rounding error.

There are scary stories here too, particularly with rebalancing shards and the availability of all the requisite services on a large, sharded databases under heavy load. Again, I wonder if MongoDB is the right choice. If you're looking for scalability and high availability via replication, I would try Cassandra. (Random asides: Cassandra's performance actually scales linearly as you add instances, which sounds like magic. Neato graph and other stuff here.)

MongoDB has a great API for many different use cases
Considering that Mongo uses a JSON-like encoding for all its data, the query language is simply amazing (awesomeness ahoy!). Not only that, but there's built-in support for map/reduce across your collections. When it comes to standard CRUD work or ad-hoc querying (via its querying language or map/reduce), Mongo delivers nearly everywhere.

Where isn't it so great? One example is full text searching. You can technically kinda do it, but it lacks basic functionality like stemming. Given the sheer number of simple, powerful full text search engines, you should just supplement Mongo with something like Solr for searching. That's what we do.

Okay, so MongoDB doesn't work superbly for all problems in all deployments at all levels of load. What does?

I like it that Mongo doesn't solve all my problems. One of the great aspects of the NoSQL movement is the sheer number of amazing tools available. I love that that, in the course of building great software, I get to work with Mongo, Redis, Solr, and others. It's fun, and I learn; these are good things.

About the Author

The Art of Delightful Software is written by Cody Powell. I'm currently Director of Engineering at TUNE here in Seattle. Before that, I worked on Amazon Video. Before that, I was CTO at Famigo, a venture-funded startup that helped families find and manage mobile content.

Twitter: @codypo
Github: codypo
LinkedIn: codypo's profile
Email: firstname + firstname lastname dot com