SQL NULLs are Evil! Part II 21 July 2009
Posted by manniwood in SQL, SQL NULLs.trackback
In SQL NULLs are Evil (Part I), I detailed the evilness of SQL NULLs, and solutions for more accurately representing missing data in tables.
One thing I did not cover was the evilness of NULLs in foreign keys, and solutions to address that.
For instance, let’s say we have a table of products where we want to keep a record of which user created a row, and which user modified a row, like so:
create sequence user_ids start 1;
create table users (
id bigint constraint "User id must be unique."
primary key not null,
...
);
create sequence product_ids start 1;
create table products (
id bigint constraint "Product id must be unique."
primary key not null,
name varchar(80) constraint
"Product name must be unique."
unique not null,
inserted_on timestamp with time zone
default current_timestamp not null,
updated_on timestamp with time zone
default current_timestamp not null,
inserted_by bigint constraint
"Product must be inserted by existing user."
references users(id) not null,
updated_by bigint constraint
"Product must be updated by existing user."
references users(id) not null
);
One thing we could do for freshly-inserted products is just cheat, and say that the updated_by user is the same as the inserted_by user, even though, strictly speaking, freshly-inserted products have never been updated. (In fact, that’s the cheat we use for updated_on; we just use the current time the new row is inserted. This is not exactly accurate; another solution could be to use a sentry date that is impossible in the real world.)
Another solution is to literally make an unknown user and set any new row’s inserted_by foreign key to the unknown user. As you can see, I’ve set up the sequences (used for generating primary keys) to begin at 1; I’ll use ids of less than 1 for special purposes. 0 seems like a good idea for the unknown user:
insert into users (id, username, ...)
values (0, 'unselected' ...);
I could even go back and put a default value of 0 in products.updated_by’s column definition, so that freshly-inserted products’ updated_by would default to the unknown user automatically.
Getting More Extreme
Once you start to think this way, you can decide what sorts of sentry values you want or need. For instance, let’s say you are working on a database where columns in a bugs table may have foreign keys to a program features table. Now, consider this list of program features:
program_features -3 'Any' -- used for search dropdowns -2 'Unsure' -1 'Not Applicable' 0 'Unselected' 1 'Back End' 2 'Business Logic' 3 'User Interface' ...
Furthermore, let’s assume there is only a handful of features, and that they fit in a drop-down list when the user enters a new bug. ‘Unselected’ could be entered as the program feature of a new bug if the user did not bother to select a program feature from the drop-down menu. ‘Not Applicable’ could be used if the bug did not pertain to any particular feature of the program, or ‘Unsure’ could be selected if the user was unsure.
In search menus, where there may be a drop-down to search for bugs by program feature, you may be using -3 as a sentry value to mean ‘Search all’. You may (or may not, depending on your tastes) encode the ‘all’ sentry value into your table as a reminder of what sentry value you are using (and, maybe you’d add a check to your bugs table to disallow -3 as a program feature foreign key).
These are, of course, just suggestions. I’ve found some of these techniques useful in the databases I build. Used judiciously, these sorts of sentry values let you have non-null foreign keys in many areas of your database where you might have put up with NULLs.
You can also usefully represent kinds of missing data, and search on them later. So, for instance, instead of looking for bugs where the product feature is unknown (which is what SQL NULL really means; not “missing”, but “unknown”), you can actually search for bugs where the user purposefully didn’t choose a program feature, or expressly didn’t know which program feature contained the bug, or whatever. When these sorts of things need to be tracked, it’s nice to know you can do so.
If you start to think about your data this way, you’ll find that there are different flavours of “missing”. You can decide how much you care about such things, and design your database accordingly. As a pleasant benefit, there will be one less place in your data model where NULLs are allowed.
[...] July 2009 Posted by manniwood in Uncategorized. trackback After reading SQL NULLs are Evil! and SQL NULLs are Evil! Part II, if you are still not convinced that SQL NULLs are evil, read David Wheeler’s Neither NULL [...]