This article stems from a interesting event on a project I was working on
recently. Overall it's about defaults, but as you'll see it's just as much about
design principles and communication.
I did the database design consisting of about 50 tables, driven by what I'd
consider to be fairly standard design goals:
- Normalize the design (reasonably!)
- Avoid nulls as much as possible
- Date stamp rows where appropriate
- Provide default values where they make sense
- etc, etc
I also had a light data dictionary where I documented the purpose of each
table along with the column names and their purposes as well. For those that are
wondering, I built my dictionary by hand in Word. I have no real objection to
using the built in extended properties that can be used for the same purpose,
but I didn't have a tool handy to generate a doc from the properties and
maintaining the doc manually wasn't a lot of work for a project this size.
I also documented some of the standards to be used and how some of the
business processes would work.
Now going back to the design principles, I hope most of you would agree with
me at least on normalizing and avoiding nulls. Date stamping (requiring you to
track at least the date the record was added to the table) isn't uncommon, but
in my experience tends to be under used in table design. I'm not trying to build
a full auditing solution as I will typically implement that with triggers once
the design has stabilized, I just like to know when rows were added. For the
tables where I felt a date stamp was needed I used a datetime data type
(arguably in some of these cases smalldatetime would have been sufficient, but
we'll save that for another day) and a default of GETUTCDate().
Note: After using getdate() for much of my career, in the past couple
years I've started to use getutcdate() instead. For those of you who may not
have used them, getdate() returns the local server time and getutcdate() returns
universal (GMT) time. If you're running a single server you can use either,
though getdate() is probably more comfortable since the time you see in your
queries will match the time on your watch. Once you start crossing time zones on
multiple servers you'll start to find getutcdate() more interesting.
I also listed 'provide default values where they make sense' as a design
goal. For example, in this project there is a business process where users start
out as unverified, may get moved to a status of unverified (follow up sent), and
hopefully to a status of verified. I prefer numbers over text for this kind of
situation, so I had a look up table where:
- 0 = Unverified
- 1 = Verified
- 2 = Unverified (Follow up sent)
- 3 = Expired (Can be deleted)
In the users table I set the default to zero. It's not that I couldn't or
wouldn't ever add a user directly with a status of verified, but in the standard
business flow I knew that users would start at zero and by setting it as a
default, I could save passing in one value to a stored procedure and also
indirectly enforce the workflow I had in mind by making sure users started at
the correct status.
I was fortunate enough to be able to outsource some of the early data access
work to a contractor I had worked with many times before and I provided him with
a copy of my docs. Some time later after that initial work was done I was doing
some testing and was not getting the expected results. Digging in to the problem
a bit I found that the test row I had added and expected to see was being
excluded due to a where clause built around the dateadded column. Looking a
little further I finally found the problem in the stored procedure used to add
the row, he had explicitly specified getdate()!
Talk about an "aha" moment. My first thought was that using GetDate() made
sense in his context. Most of us use it routinely and he had worked with me
before when I used it, and of course I had not documented that I was using UTC
for dates throughout. So my fault, and easy enough to fix.
But wait, there's more. Even if he had used GetUTCDate, to me it would have
been bad coding. The column had a default and it would never make sense outside
of some type of administrative action to enter a different value for the date
the record was added, so I would have just omitted the column from the insert
altogether. A middle of the road approach would have been to use the DEFAULT
keyword as the value for the date. The worst part is that if I changed the
default on the table, perhaps to reflect a change in the business approach, the
stored procedure would still be implementing the original behavior.
I think that we wind up with four possible philosophies:
- Defaults are set for a reason, overriding is treason! Columns with
default values should be excluded from all insert statements
- Defaults are ok, you can include columns with defaults in your insert,
but if you truly want the default value the correct way is to use the
DEFAULT keyword
- Defaults are evil. Most likely you would set the column as not null and
force whoever is writing the insert to do the homework required to figure
out what a good initial value is or where it should come from.
- Chaos, where everyone does it however they like on the project!
Aside from avoiding chaos, I'm not sure which to recommend. I've used
defaults for a long time and am used to working that way, but is it helpful or
confusing behavior? Or is it just a matter of setting the standards up front so
there is no collision of philosophies? One downside to not using defaults is you
would no longer be able to do "insert into table default values" which I find
useful when generating a few rows for testing, but have never used in production
code. You could enforce defaults via an instead of trigger, but that seems like
a lot of work that you may regret at some point anyway.
Is it a Zen question, or is there really a right answer? I'm looking forward
to your comments so that we can figure it out together!