Non-Relational Databases Are a Performance Hack 7 July 2009
Posted by manniwood in SQL.trackback
My rather incendiary title to this blog leverages a quote from Eric Raymond that is seared in my mind: “Threading is a performance hack.” His full quote, from page 159 (chapter 7) of The Art of Unix Programming is:
A closely related red herring is threads (that is, multiple concurrent processes sharing the same memory-address space). Threading is a performance hack. To avoid a long diversion here, we’ll examine threads in more detail at the end of this chapter; the summary is that they do not reduce global complexity but rather increase it, and should therefore be avoided save under dire necessity.
I write this in response to a very well-written blog by Adam Wiggins on why RDBMSs scale vertically and not horizontally. I am with him all the way until he reaches his conclusion:
So where do we go from here? Some might reply “keep trying to make SQL databases scale.” I disagree with that notion. When hundreds of companies and thousands of the brightest programmers and sysadmins have been trying to solve a problem for twenty years and still haven’t managed to come up with an obvious solution that everyone adopts, that says to me the problem is unsolvable. Like Kirk facing the Kobayashi Maru, we can only solve this problem by redefining the question.
I disagree that we should stop trying to make SQL (or, more properly, relational) databases scale.
Now don’t get me wrong. I undertstand that even though the relational data model does not state anything explicitly about concurrency or transactions or scalability, one of its requirements is that the database should always be in a consistent state, and that, from a users’s point of view, this consistency is best maintained by making it appear as though all interactions with the database happen serially, and not concurrently. I also understand that in general this is how the major RDBMSs have implemented the relational model. So this is obviously a huge issue for horizontal scalability! I think Adam Wiggins is spot on when he describes the current difficulties in scaling RDBMSs.
Where I think he’s wrong is in saying we should therefore stop trying to make RDBMSs scale. Look again at Raymond’s statement that threads are a performance hack. We’ve all noticed there are still threaded applications in the world. Why? Sometimes the performance hack is worth it. But, if we could get the performance without the threads, would we willingly use threads? Of course not. Who would want the added complexity?
Same goes with non-relational RDBMSs. If Google’s BigTable could have the awesome performance it has today, but still support the relational data model, do you think it would? Of course it would. Who likes having to code all sorts of data integrity checks in their business logic when simply using an RDBMS would take care of all that far more robustly?
So, unlike Adam Wiggins, I think we should continue trying to make RDBMSs scale (even if it turns out to be a Quixotic quest!) because the relational data model is so useful. I have no desire to poorly re-implement the relational data model in my application code because I have opted to use a database that does not offer such features out of the box.
But if there’s one thing Wiggins got really, really right is that sometimes you have no choice but to throw guaranteed data integrity out the window to meet massive scalability requirements. I happily acknowledge that sometimes you just need a performance hack. But please realise that it is a performance hack, not a best practice. It should only be used sometimes, and certainly not as often as current thinking would have you believe.
Comments»
No comments yet — be the first.