jump to navigation

FizzBuzz, Nostalgia, and Baby Steps in a New Language 9 September 2009

Posted by manniwood in Programming.
2 comments

Jeff Atwood wrote about FizzBuzz, a programming “challenge” that apparently a lot of CS grads have trouble solving. (I wonder how one graduates with a CS degree not knowing about if/else and looping? I’m self-taught, so I can’t relate any stories about classmates who just didn’t get it.)

Anyway, while reading about FizzBuzz, I quickly hacked up a solution in Python’s REPL (apparently, this desire to hack a solution is very common), but then I figured I should hack a solution up in Lisp, which I’ve been learning lately:

(loop for i from 1 to 100 do
  (cond ((and (= (mod i 3) 0) (= (mod i 5) 0))
         (print "FizzBuzz"))
        ((= (mod i 3) 0)
         (print "Fizz"))
        ((= (mod i 5) 0)
         (print "Buzz"))
        (t
         (print i))))

Of course, this got me to thinking about all the other programming languages I know, so I had to code something up in every C-syntax language I know:

In C, I can’t do anything outside of a function, so we have the standard main(). Before I learned C, I had C envy, and I figured “real” programming languages did not let you just hack code outside of a containing function (or method and class in OOP languages). But Lisp is considered a real languge, and my solution, above, executes just fine.

#include <stdio.h>

int main() {
    int i;
    for (i = 1; i <= 100; i++) {
        if (i % 3 == 0 && i % 5 == 0) {
            printf("FizzBuzz\n");
        } else if (i % 3 == 0) {
            printf("Fizz\n");
        } else if (i % 5 == 0) {
            printf("Buzz\n");
        } else {
            printf("%i\n", i);
        }
    }
}

In Java, of course, everything has to be an object, so I have to declare an object and a main method just to begin coding. And let’s not forget System.out.prinln instead of printf(). All this extra typing always makes me feel so Java-ish. ;-) The code is visibly wider than the C code. :-0

public class FizzBuzz {
    public static void main(String[] args) {
        for (int i = 1; i <= 100; i++) {
            if (i % 3 == 0 && i % 5 == 0) {
                System.out.println("FizzBuzz");
            } else if (i % 3 == 0) {
                System.out.println("Fizz");
            } else if (i % 5 == 0) {
                System.out.println("Buzz");
            } else {
                System.out.println(i);
            }
        }
    }
}

Turning to JavaScript was fun: I discovered that Ubuntu 9.04 has a command-line JavaScript parser built in. (Or had I installed it before?) So you can just write .js files with shebang notation at the top, and you’re one “chmod +x” away from running your JavaScript on the command line. Go, Rhino! Unlike the other C-like languages, JavaScript puts the Script in JavaScript: I just start hacking code outside of a function definition, as I would with Perl or Python:

#!/usr/bin/js

for (i = 1; i >= 100; i++) {
    if (i % 3 == 0 && i % 5 == 0) {
        print("FizzBuzz");
    } else if (i % 3 == 0) {
        print("Fizz");
    } else if (i % 5 == 0) {
        print("Buzz");
    } else {
        print(i);
    }
}

I’ll always have a soft spot for Perl: it’s the first language I was paid to code in; it launched my career.

#!/usr/bin/perl

for ($i = 1; $i <= 100; $i++) {
    if ($i % 3 == 0 && $i % 5 == 0) {
        print "FizzBuzz\n";
    } elsif ($i % 3 == 0) {
        print "Fizz\n";
    } elsif ($i % 5 == 0) {
        print "Buzz\n";
    } else {
        print $i, "\n";
    }
}

Bash is only slightly yukkier than Perl. Doing math inside $(()) is just a little odd, syntactically, but it gets the job done:

#!/bin/bash

for i in $(seq 100)
do
    if [ $((i % 3)) == 0 ] && [ $((i % 5)) == 0 ]; then
        echo 'FizzBuzz'
    elif [ $((i % 3)) == 0 ]; then
        echo 'Fizz'
    elif [ $((i % 5)) == 0 ]; then
        echo 'Buzz'
    else
        echo $i
    fi
done

SQL fanboy that I am, I couldn’t resist doing this in PostgreSQL. It came out rather clean and terse, which I guess shouldn’t surprise me too much. SQL is my favourite domain-specific language!

select case when i % 3 = 0 and i % 5 = 0 then 'FizzBuzz'
            when i % 3 = 0               then 'Fizz'
            when i % 5 = 0               then 'Buzz'
            else                         cast(i as text)
       end
  from (select generate_series(1, 100) as i) as lst;

And here’s the code I hacked together in Python’s REPL, starting this whole trip down languages-I’ve-known memory lane:

for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print 'FizzBuzz'
    elif i % 3 == 0:
        print 'Fizz'
    elif i % 5 == 0:
        print 'Buzz'
    else:
        print i

I must admit. Looking at the Python, I’m very glad I learned it. Is it just me, or does Python look like pseudo code? Like if you were thinking in your head “well, first I’d have to loop through the numbers 1 through 100, and then I’d have to check modulo 3 and 5, and then…” would the pseudo code in your head look a lot like Python? Python has often been called runnable pseudo code, and looking at all the languages I know, FizzBuzz certainly looks the cleanest in Python.

Then there’s Lisp, which I’ve been teaching myself lately. By contrast, Lisp is just a little bit yucky. I understand that, as Paul Graham says, Lisp doesn’t actually have a syntax, and its lack of syntax allows Lisp to support all sorts of interesting features like its awesome macros.

But tell me I’m not the only one who thinks “((and (= (mod i 3) 0) (= (mod i 5) 0))” is not very much fun for human eyes to parse. If nothing else, prefix notation puts verbs and nouns a little bit too far from each other. (I wonder if I’d be more OK with this if my mother tongue was Latin? Word order doesn’t matter in Latin…)

Makes me wonder what FizzBuzz would look like in PHP and Ruby.

Or Maybe I’ll Learn Lisp Instead 9 September 2009

Posted by manniwood in Programming.
add a comment

I promised myself that I was going to learn more about ORM, but in my off time, I’ve been learning Lisp instead.

I still want to learn more ORM stuff (it’s everywhere, and popularity is a good reason to learn any technology) but it struck me that I already have strong SQL skills and 10+ years hacking Java (not to mention reading the GoF book) so learning ORM really well would be about as difficult as learning a new library. Combine my knowledge of OOP with my knowledge of SQL and use that knowledge to learn a library like Hibernate, or Django’s ORM, or Rails’ ORM. Done.

What I really wanted to do was learn Arc or Lisp, and really get a handle on a language whose feature set Paul Graham says other languages are converging on anyway.

Besides: Andrew Hunt and David Thomas recommend in their book, The Pragmatic Programmer, that I should learn a new programming language every year. I moved my J2EE project to Python/Django back in November; it’s been almost a year already! Time to crack open my copy of Practical Common Lisp and get serious once and for all.

Lorenzo Alberton’s Graphs in the Database 9 September 2009

Posted by manniwood in SQL.
add a comment

Here’s a recommendation to read Lorenzo Alberton’s Graphs in the database: SQL meets social networks.

Alberton shows how to represent both directed and undirected graphs in SQL, with a heavy focus on how that applies to linkedin– and facebook–style social networking.

The SQL fanboy in me loves articles like this.

Git Branches and Remote Repositories 5 September 2009

Posted by manniwood in Uncategorized.
2 comments

I’ll be honest: it took me a while to wrap my head around Git, and how it really worked, and how use it effectively.

I’m going to put in a high recommendation for Scott Chacon’s excellent Git Internals, published by PeepCode. It’s a US$9.00 pdf, and it’s better than any other book or online resource I’ve ever read about Git.

Chacon does a better job than anybody else of showing you how Git works, so that by the time he gets into every day tasks, why you are doing what you are doing makes perfect sense, and you’re just learning the commands and syntax to leverage Git’s capabilities.

When it comes to sharing Git branches between repositories, there’s still not a perfectly good, clear resource out there, so I’m going to share with you what I scribbled on my copy of the tear-out “Git Command Quick Reference” from Travis Swicegood’s Pragmatic Version Control Using Git. Hopefully, what I had to scribble on my quick reference card will appear in a second edition.

First off, let me describe what I want Git to do for me.

I want to have a clone of my repository on another geographically remote server. (For purposes of this discussion, I will assume that it has already been created, and is called myremote.)

I want to create a new branch in my local repository on my development machine at my desktop, and at the end of the work day, I want to push this branch out to my remote repository—perhaps for sharing, perhaps just for easy backup purposes.

Git does not push new local branches out by default, and I (and maybe it’s just me) find the documentation very uninformative on how to manage remote branches.

Here’s how.

Let’s say I’m at my desktop computer and I’m using my local Git repository.

Let’s say I make a new branch based on master:


git branch my-new-branch master

Now let’s say it’s the end of the work day, and my work in the branch my-new-branch is not complete. I don’t want to merge my-new-branch back into master, but I do want to push my-new-branch out to my remote repository for backup purposes.

Here’s how:


git push myremote my-new-branch

From now on, while working locally in my-new-branch, doing a


git push myremote

should do the right thing and push out changes I make in my-new-branch.

Now let’s say it’s a day or two later, and I’ve merged my-new-branch back into master. I know I no longer need my-new-branch, so I delete it locally:


git -d my-new-branch

Done.

Of course, my-new-branch still exists in my remote repository; I’ve only deleted it in my local repository.

I delete my remote copy of my-new-branch like so:


git push myremote :my-new-branch

Yes, that’s right: putting a colon in front of a branch and pushing it will delete it on the remote repository. Not very obvious, is it?

I may go into detail about pushing local tags out to remote repositories in another post.

Happy Gitting in the meantime.

Comments on Mark Pilgrim’s [XML/XHTML] Thought Experiment 4 September 2009

Posted by manniwood in Uncategorized.
add a comment

I think Mark Pilgrim is spot on about the realities of XML (especially XHTML) today: browsers have been accepting malformed XHTML since forever, so generating correctly-formed XHTML is really difficult, because every XHTML generator has a long history of never having had to.

Here’s the thing, though: I really wish Pilgrim had emphasised this point: the only reason why we can’t/won’t/don’t generate valid XHTML today is because of bad decisions that were made in the beginning. Conversely, the reason why the C programming language is parsed in such a consistent way is because of decisions that were made in the beginning of C. There’s no rule that says everything everywhere has to be poorly specified from the start, poorly implemented from the start, become popular, and have to therefore stay poorly specified and poorly implemented because “it’s always been that way, and it’s too difficult to change now”.

If anything, XHTML should be a lesson in the benefits and pitfalls of non-rigorous format specification and parsing.

I would be horrified if the takeaway from Pilgrim’s article was that we should always design clients for all new formats to be as accepting of garbage as possible.

Nonsense!

One lesson should be: take a little more care the next time you write a specification.

I can think of a great example: JSON.

Can you point to trillions of lines of malformed legacy JSON out in the wild? Nope. Bad JSON doesn’t parse. But bad XHTML does. Why the difference?

JSON is simpler than XHTML. It’s easy to implement, and easy to parse.

Another lesson should be: design simpler markup languages.

Then again, sometimes, you need something with the complexity and expressiveness of XHTML.

Yet another lesson could be: if you’re going to design something with the complexity and expressiveness of XHTML, expect the benefits and pitfalls of XHTML.

In other words, perhaps there is an inverse rule between the “richness” of a markup language’s feature set, and the expectations on the robustness the parsers of that will have to parse it.

I’ve been (re-)reading a lot of Paul Graham lately, and one thing he says that rings true to me is:

Everyone by now presumably knows about the danger of premature optimization. I think we should be just as worried about premature design—deciding too early what a program should do.

—Paul Graham, Hackers and Painters

If there’s one thing I think XML generally (not just XHTML in particular) suffers from is culture of premature design—especially in the way that it is used.

I remember a horrible phase of the late 1990s and early 2000s where everything had to be stored in XML. Key/value pairs were stored in XML instead of .ini files; tabular data was stored as XML rather than as .csv or fixed width files; hierarchical data were stored as XML rather than as JSON; sometimes entire databases were stored in XML instead of in an RDBMS… it was a horror show.

George Orwell’s second rule in his essay “Politics and the English Language” was “Never use a long word where a short one will do.” I think the same apples to markup schemes: never use a complex one where a simple one will do.

Or: Don’t use XML unless you absolutely have to.

When it comes to the current state of browsers, though, I think there’s a catch: I think the current demands we put on our browsers require us to use XML—or something of equal complexity that would end up looking a lot like it. There is no .csv or JSON solution for the browsers markup problem. We need something like XML.

With XHTML, that’s exactly what we have.

HTML5 and the abandonment of XHTML 2.0 actually improves the situation: there’s a tacit admission that the way we (mis)parse XHTML4 now is its own markup language that is neither valid SGML nor valid XML. HTML5 is not strictly SGML or XML—but it’s something of equal complexity that ended up looking a lot like it. ;-)

So I have at least two lessons from Pilgrim’s though experiment:

1. We cannot turn back the clock and correctly implmement XHTML as actual, correct, XML. Much to its credit, HTML5 accepts this: it is neither SGML nor XML—it has become its own markup language that merely looks like its forbears. Much of the markup that was “wrong” under XHTML (even though it would parse anyway) is now “correct” under HTML5 (because, well, it parses anyway).

2. Friends don’t let friends use XML. As the evolution of (X)HTML(5) has shown, large, feature-rich markup languages are hard to get right, and although they carry many benefits, they also carry problems. So if you need to solve a problem with markup, really look to see if you can use JSON or .ini or .csv or even a fixed-with flat file before jumping on the XML bandwagon. XML is often overkill anyway—except when it’s not.

Another reason why I need to learn Lisp 31 August 2009

Posted by manniwood in Uncategorized.
add a comment

Today’s Hacker News linked to Why Lisp macros are cool, a Perl perspective, and it made me want to learn Lisp. I just need to make the time…

One thing that Lisp doesn’t have going for it is its syntax. Lisp syntax is so uniform that it’s not particularly user-friendly. It’s macro-friendly (read the above link and you’ll see why) but not as human-readable as programming languages with more syntactic sugar.

Apparently, Perl 6 is going to have Lisp-like macros, even though Perl 6 will still have Perl-like syntax. It will be interesting playing with that when Perl 6 (Rakudo?) has a beta release. Is it possible that we could have it all? Richer, less-uniform syntax, and yet still the power of macros? That would be nice…

Notes on Design Patterns, Frameworks, Ever-Higher-Level Languages, and Why I Want to Learn Lisp 29 August 2009

Posted by manniwood in Programming.
2 comments

Here’s a shout-out to a blog entry that says “Design patterns are common structures that appear in computer programs, but that can’t themselves be abstracted into a reusable component”.

I remember early on, when porting some of my Java code to Python, I wanted to make a singleton in Python. I turned to my copy of the Python Cookbook, and found that singletons are very rarely needed in Python, because Python’s modules exhibit essentially the same behaviour. It was one of those learning moments for me, when I realised that I was trying to write Python like Java, and that I should try to write Python more like Python. (In my own defence, we all do this when learning a new language. :-)

Here’s an observation by the always-fascinating Steve Yegge on design patterns:

About half of all design patterns out there (not just the GoF patterns) appear to be ways take perfectly natural design ideas and twist them to fit into someone’s static type system: recipes for pounding square pegs into round holes.

—Steve Yegge, Is Weak Typing Strong Enough?

Yegge’s quote also rings true to me, because I just don’t find myself doing a lot of design-patterny stuff in Python, presumably because Python’s dynamic typing and powerful built-in data structures (lists and dictionaries) prevent me from having to use a lot of objects and their attendant design patterns in the first place.

This makes me want to really, really learn Common Lisp, or Arc, or Scheme, or some language with full-fledged macros. What does this have to do with design patterns? I can’t really tell you, because I don’t know Common Lisp yet.

But here’s a sneaky suspicion I have.

I suspect that the same way half of the design patterns that Yegge mentions were invented to compensate for languages that lack dynamic typing and higher-level constructs, the other half of design patterns, and even entire frameworks (think: Ruby on Rails, Django) compensate for languages that lack true macros.

Now it could turn out that I’m wrong about this, or that there’s only a kernel of truth in my suspicion, but that the whole truth is more complicated.

But readers of my blog know that I give a lot of credence to Paul Graham’s idea that mainstream programming languages are slowly converging on the feature set of Lisp.

I’ve never heard of a Lisp or Arc framework for building database-backed web sites. Where’s Lisp’s Django, or Arc’s Rails, or Scheme’s Struts? I suspect that these languages don’t need frameworks, because macros are essentially tools for building project-specific frameworks.

Who among us has taken a framework like Django or Rails or Struts, and used the parts that solved our problems, and re-written or ignored the parts that didn’t solve our problems, or that introduced new problems? Now imagine what it would be like using a programming language that was essentially a framework construction kit; that allowed you to write a framework more quickly than learning and adjusting to a pre-existing framework?

The only higher-level abstraction I can think of off the top of my head is a language that enables you to easily construct your own domain-specific languages. (I like regular expressions for text searching and manipulation, and SQL for data searching and manipulation; imagine your own DSL for your exact business problem! That would even beat frameworks or even macros…)

Apparently, Perl 6 and the Parrot VM are going to try to tackle that exact problem. The problem is, Perl 6 hasn’t shipped yet. (I’ve only been waiting 10 years… ;-)

In the meantime, I really need to learn Lisp or Arc once and for all. I really want to learn about macros.

Does the Internet No Longer Operate on Internet Time? 27 August 2009

Posted by manniwood in Java, Programming.
add a comment

As sectors of the economy get more established, it seems that progress slows down and becomes more incremental. I’ve been wondering lately if the Internet no longer runs on Internet Time (a popular phrase from the roaring 90s).

I have two reasons to believe that the Internet is slowing down:

1) IE6 is still around.

2) Java is still the top programming language for web sites (although perhaps I’m ignoring the popularity of C#)

IMPORTANT NOTE: First off, I want to point out that I am not equating Java with IE6’s badness. Whereas I think most developers would cheer if IE6 disappeared tomorrow, I don’t think anybody wants to drive a stake into the heart of Java. I promised myself I’d try to make nuance a feature of my blog, so let me put some nuance right here: I’m not gunning for Java the way I am for IE6.

I’ll address IE6 first. In the late 90s, when IE became more popular than Netscape, Netscape seemed to disappear within a few years. It didn’t take long for most sites to simply stop supporting Netscape.

But when you look at how small the browser population was back then, in retrospect it seems like it would have been easy to turn away from Netscape in a couple of years, like turning a small boat.

On the other hand, when you look at the popularity curve of IE, waiting for it to go away will be more like trying to turn a battleship. Happily, the mindshare of developers is fully focused on the newer browsers; fortunately, even Microsoft wants everybody to upgrade to IE8; fantastically, large sites are starting to post warnings about discontinuing support for IE6.

But when you look at the longevity of IE6, even if its popularity dropped off a cliff tomorrow, it’s had a much larger, longer run than Netscape ever had. It’s as though the Internet is slowing down as it gets larger.

Now let’s look at the continuing popularity of Java.

Why do I equate the continuing popularity of Java with a slowing Internet? Because I think that if the Internet moved as quickly as it did in the roaring 90s, the current crop of Java programmers would happily be using what they deemed to be Java’s worthy successor by now, the way a lot of C developers switched to C++ (perhaps after some grumbling) and the way a lot of C++ developers switched to Java (perhaps after some more grumbling).

When you look at a rough time line, C seems to have been ascendant for about a decade, and then C++ for another decade, and now there seems to be Java. Java’s already been going strong for a decade, but unlike C and C++, it’s very unclear at this point if Java is on a downward slope or not.

If we are to believe Bruce Tate’s book Beyond Java, published four years ago, we would think that Java would be on its way out by now. Tate did a good job describing Java’s pain points. All languages have them, and Java is no more exempt from this than C or C++ are. But four years on, we are definitely not beyond Java.

It seems sort of inevitable that the same way C++ tried to correct C’s shortcomings, and Java tried to correct C++’s shortcomings, that some Java-community-anointed successor language should be here trying to correct Java’s shortcomings. After all it’s been more than 10 years since Java took over (largely) from C++ (not to mention clobbering Perl/CGI in the web application realm).

I suppose a whole blog entry could be written on why Java has no clear successor. Both Ruby and Python do not follow the C/C++/Java syntax, and that’s probably a real no-no in trying to win over Java developers. The Java community itself seems to be incrementally improving Java rather than creating a new successor language the way C++ aspired to succeed C, and Java aspired to succeed C++.

But today’s blog post is not going to try to explore the many reasons, good or bad, for why the Internet seems to be slowing down. I’m just going to use IE and Java as two data points to observe that the Internet is, in fact, no longer running on Internet Time.

RDBMS Static Typing vs. General Purpose Language Dynamic Typing 23 August 2009

Posted by manniwood in Programming, Python, SQL, Uncategorized.
add a comment

One thing that’s fun about being on the job interview circuit is that you get to re-think a lot of your biases and tastes.

Something that has occurred to me is that I prefer to program in dynamically typed languages when I can (lately that would be manifested as a preference for Python over Java) and yet this fails to explain my love for RDBMSs, and the static typing that goes with them!

For general-purpose programming, Paul Graham’s opinion definitely rings true, so I’ll quote him at length:

As far as I can tell, the way they taught me to program in college was all wrong. You should figure out programs as you’re writing them, just as writers and painters and architects do.

Realizing this has real implications for software design. It means that a programming language should, above all, be malleable. A programming language is for thinking of programs, not for expressing programs you’ve already thought of. It should be a pencil, not a pen. Static typing would be a fine idea if people actually did write programs the way they taught me to in college. But that’s not how any of the hackers I know write programs. We need a language that lets us scribble and smudge and smear, not a language where you have to sit with a teacup of types balanced on your knee and make polite conversation with a strict old aunt of a compiler.

—Paul Graham, Hackers and Painters

What’s interesting is that I find for the projects I’ve worked on, I really like to nail down the data model in an RDBMS, and then code the business logic and display code using as flexible a language as possible. I think this bias comes from two requirements I’ve encountered a lot in my programming career:

1) make stuff easy to report on

2) change data capture and display on a fairly routine basis.

To make stuff easy to report on, it’s nice to have constraints and other forms of data integrity enforcement, so that your reports are already half-way complete just because of how your RDBMS stores (or refuses to store) your data: you know your data are correct, and you just have to aggregate them. Honestly, I sometimes find RDBMSs a bit too rigid to deal with the ever-changing needs of business, but all mature RDBMSs seem to have good support for alter <anything> commands.

And if you can get your data stored in a consistent way in an RDBMS, SQL is at your beck and call to report on your data in all sorts of interesting ways. This generally pleases the business very much.

For the projects I’ve tended to work on, after all that hard work has gone into getting the data model right, I generally prefer to make as thin a layer as possible over that database, so that the database can essentially speak for itself. A dynamically typed language comes in handy here, because it will have good support for lists and maps, and not a whole lot of casting or type conversion has to happen to shuttle data back and forth from the display to the RDBMS, and vice versa. Manipulating and validating form input is generally much easier in higher-level languages as well, with their somewhat easier out-of-the-box implementations of regular expressions, etc, etc.

Of course, this leaves open some interesting possibilities. If my next project has ever-changing needs, and a data model cannot really be pinned down, will I try to squeeze that design into an RDBMS, or will I try to find a more flexible, non-traditional data store? I can easily imagine turning to a non-traditional data store if the data are too unruly, and don’t follow the sorts of patterns that would suggest using an RDBMS.

If the data were to rarely be reported on, I might not even miss SQL that much. After all, SQL really shines when it comes to reporting.

But if I had to generate reports from non-relational data, that might start to get interesting. As much as I love dynamically typed languages, this might be one area where I’d wish for SQL’s strong typing, because programatically preparing reports would feel sort of like re-inventing the SQL wheel.

I don’t know if I’ll ever be able to explain the seeming inconsistency in my love of dynamic typing in general-purpose programming languages, and my love of SQL and the strong typing that goes with it. Maybe it’s because I like storing my data in a formal way (when the data are amenable to it), but I like manipulating my data in a flexible way, especially when I know I can count on them being correct.

All I know is that on the projects I’ve worked on, it’s been a great combination. But I also know that all projects are different, and that not all data are relational. Maybe one day I’ll work on a project that uses dynamic typing from top to bottom. That would be interesting.

Excited About Non-Traditional Data Stores 21 August 2009

Posted by manniwood in Programming.
add a comment

Any readers of my blog will know that my experience with data lies almost exclusively in data that are best modelled in a relational way. So I have a lot of biases that have evolved through playing a lot in that problem space:

  • I like SQL a lot, but realise the SQL standard is an imperfect implementation of the relational model.
  • (Heck, I know what the relational model is, after reading lots of C. J. Date.)
  • I like writing SQL by hand, (roughly analogous to the way designers like to code their HTML/CSS by hand) because I think I get the best results that way.
  • I think the object/relational impedance mismatch is best solved by simply avoiding the situation: don’t use ORM.
  • If your data set is small but must be accurate (as has been the constraint on most of my projects), use a database that does this for you. (That is, declaring constraints in DML and having your RDBMS do the rest is a beautiful thing.)

On the other hand… :-)

I’ve been interviewing a places that use non-traditional data stores, sometimes on massive data sets, where ACID compliance is not paramount.

The challenges some of these project face sound really cool, and the tools they are using to meet those challenges would be a lot of fun to learn.

So first off, I should make something really clear: I have reasonably passionate views about RDBMSs, especially regarding small, accurate data sets, because I’ve done a lot of work in this area, with real-world consequences if I didn’t get things correct.

However, I also feel that not all data are relational!

Although I agree with C. J. Date that no other data model compares to the relational data model (at least in terms of being able to be specified in a rigorous way), not all data have to be stored in a way that conforms to the relational model.

For instance, I remember being a little bit afraid early in the development of Windows Longhorn (later to become Vista) when it seemed like they were going to make the entire file system a database. Really? I rather like the hierarchical file system! It turns out the Windows design team came to the same conclusion too.

Take server access logs. They are stored as flat files. Would they benefit from being stored in an RDBMS? I don’t know. It would be nice being able to run SQL queries on them. But you could also make the argument that sometimes a flat file is just a flat file.

Take documents. Some people store them in Lotus notes, some people store them as binary blobs in RDBMSs, and most of us still keep documents on whatever hierarchical file system our operating system currently uses.

Although I love the relational data model, as soon as you decide to use it for your data, you are immediately making certain decisions about the nature of your data, either implicitly, or explicitly.

I’ve played in the relational world for quite some time now, and am excited about looking at data through a different lens.

So maybe I’ll work on a project where modelling the data with objects rather than relations is the absolute right thing to do!

Maybe I’ll be on a project where transactions and constraints have to be enforced in the application layer, because the underlying data store cannot enforce these things itself (and that would be because the underlying data store has other strengths).

Maybe I’ll be on a project where the object model can be the definitive data model, because no other applications are ever given direct access to the underlying data store; so the data store will not have to be the final arbiter of what the data model is. Instead, the data will only ever be accessible as a web service, and so the API that we expose to the data will be the only legitimate view of the data.

I’ve played in the world of small, accurate relational data sets for quite some time. It could be fun to work with really large, semi-structured data sets, and see what sorts of solutions they require, and maybe challenge some of my core assumptions about data.