Only the R Matters in RDBMS 29 June 2009
Posted by manniwood in SQL.trackback
“Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
– Frederick P. Brooks, Jr., The Mythical Man Month, chapter 9, section “Representation Is the Essence of Programming”.
Bradford’s blog entry, “Social Media Kills the Database”, really struck a chord with me. Although I am an unabashed RDBMS fanboy, I may be an unusual one, because in some ways, I wish RDBMSs were used less than they are today. Not very fanboy-ish of me, but let me explain.
I was struck by the way Bradford called modern RDBMSs the “Swiss Army knives” of data storage, and referred to their “jack-of-all-trades nature”.
At first I thought that wasn’t a fair characterisation of RDBMSs, because I only use the relational features of RDBMSs, and figured everybody else did too. But then I thought of how many RDBMS vendors have added non-relational capabilities to their database products, like object-relational features and XML support; and how their marketing departments describe their products as all things to all people. And I realised Bradford was on to something: RDBMS vendors are trying to sell their products as the Swiss Army knives of data storage! I would argue that this hurts RDBMSs, and it confuses developers who try to use RDBMSs for things RDBMSs aren’t good at.
So what are RDBMSs good at? They’re pretty good implementations of the relational data model. And the relational data model is a pretty useful way of representing (and thinking about) data.
Now one thing I consider to be a bit of a misperception of RDBMSs is that
Traditional relational databases (like those used in finance) evolved in the 70’s and 80’s as a glorified spreadsheet-and-file-cabinet. The metaphor prevails today.
The truth is actually a bit more peculiar and interesting.
The relational data model began as a set of research papers published by IBM employee E. F. Codd in 1969 and throughout the 1970s. So unlike Google’s BigTable, or Amazon’s SimpleDB, which began as implementations, the relational model of data began as a set of ideas, not as source code.
This really appeals to me: the relational data model strives to be a mathematically and logically rigorous way of representing data, and exists outside of any one implementation. I can’t think of any other data model that this is true of.
In fact, work on the relational model continues to this day! C. J. Date in particular continues to work on, and publish books about, the relational data model. It’s particularly enlightening (not to mention entertaining) to read books like Database in Depth, where Date says that modern RDMBSs — and standard SQL — do a poor job of implementing the relational data model.
But I must say, even though RDBMSs are an imperfect implementation of the relational data model, I still love RDBMSs (and the relational model) to death. Do you think Unix/Linux is cool for being built around a central metaphor (“everything’s a file”)? SQL is way cooler: it tries to be an implementation of an actual data model based in logic and set theory!
Of course, we have to ask: is the relational data model, as implemented by SQL in today’s RDBMSs, actually useful for anything?
Honest answer: it depends.
If you use heavily object-oriented code, and have a bunch of objects that you wish to store and retrieve, don’t use an RDBMS; use an object store. The impedence mismatch is too great. (Here I will link to my favourite blog entry of all time, “The Vietnam of Computer Science”, which says that Object/Relational Mapping is the Vietnam of computer science.)
If you have a bunch of hierarchical data to store, consider a filesystem, or XML. Don’t use an RDBMS.
If you have a bunch of documents to store, consider some sort of document store. Don’t use an RDBMS (unless you need transactions or guarantees of data integrity, and don’t mind storing your documents as BLOBs).
On the other hand, if you have a problem that is particularly data-driven, and you’d like to store and act upon your data in a way that is reasonably rigorous, and follows rules and constraints that you’d like to define, use an RDBMS.
Or, if your problem is particularly data-driven, and you want to use a tool that helps you think and explore ideas in a data-oriented language, use an RDBMS.
In fact, let me repeat Brooks’ quote here:
Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.
Often times, once you’ve correctly understood how to model your problem’s data, you’ll find that you’ve come to understand your problem at a deeper level as well, and the solutions to your problems will have become obvious. RDBMSs and SQL really help here. They are not a panacea, but they are useful tools.
A favourite example of mine is the writings of Philip Greenspun. Most of his code examples are in SQL. For instance the User Registration and Management chapter of Software Engineering for Internet Applications has all of its code examples in SQL. It’s just easier to think of data representation problems in SQL.
If data correctness is paramount, use an RDBMS. If data corruption is unacceptable, use an RDBMS.
Want to know my favourite (paraphrased) definition of what a relational database is? It comes from C. J. Date: A relational database is a series of propositions that always must evaluate to true.
RDBMSs are designed from the ground up not to lose data, not to corrupt data, not to have your data in an inconsistent state. RDBMSs try very hard to live up to C. J. Date’s ideal: every proposition in an RDBMS at any time ideally evaluates to true. This is incredibly useful behaviour for a multitude of purposes.
If you’re going to ask questions of your data, and you need correct, repeatable, reproducible answers, use an RDBMS.
Once you’ve found the best way to model your problem’s data, you can have your RDBMS enforce your schema and reject bad data. This goes a long way to helping you solve your problem and ensuring correct results. The declarative nature of SQL is a real boon for defining data and their relationships. Look at the sample table definitions in the User Registration and Management chapter of Software Engineering for Internet Applications, and tell me if you’d rather define all those relationships and constraints using EJBs. I didn’t think so.
And when I say use an RDBMS, I mean use your RDBMS.
Learn to think in SQL.
Write your data model in a text editor, coding the table and constraint definitions by hand. Keep your schema code in a plain text SQL file that you can load into your database via its command-line client — this will be a runnable/loadable definition of your database. Keep the SQL file under source-code management, as you do with your application code, so that you can maintain the history of your data schema as it evolves.
Do not hide your RDBMS behind an ORM! Realise that objects, though popular, are not the best representation of relational data: the relational model is the best representation of relational data. This means that you should try to hold onto that representation even in your business logic code: lists of maps translate very well to relations of tuples (or tables of rows; or result sets of rows). Objects do not. (Again, look at “The Vietnam of Computer Science”, and my own blog entry on why objects are overrated.) Instead, prefer systems like iBATIS or (shameless plug) Pybatis.
Finally, if you need many ways to report on the same set of data, use an RDBMS. It’s very sad that many developers seem to fear SQL. Do not fear SQL! Pick up a copy of the SQL Cookbook and marvel at the the concise power of SQL queries, and ask yourself: “would I really want to do this in Java?”
Unfortunately, a lot of misinformation surrounds RDBMSs, and some of it is perpetuated by over-eager marketing departments of the major vendors, trying to sell their wares as all things to all people. An RDBMS is not an object store. It’s not a heirarchical file system. It’s not Google’s BigTable. Even if a vendor tells you it is!
But if you use your RDBMS for its core competency (storing and querying data according to the relational model — at least as currently codified by SQL) it won’t take you long to wonder how you got by without it.
I agree with this post wholeheartedly :D
[...] Manni Wood A very opinionated coder. « Only the R Matters in RDBMS [...]
[...] some ways, my blog entry Only the R matters in RDBMS answers that question, but I was not specifically talking on this [...]