To Date or Not To Date….

OK….so I’m in the middle of a very interesting Effective Dating project…basically we’re dealing with some tables that only store current data, but there is a need for future-dated data as well. Because the base table can’t hold that data, we have to come up with additional tables and a process to store future-dated data and then move it into the base table whenever we get to the future date. Very cumbersome and fragile, and that’s why we’re looking at this project.

There is a great book on this subject by Richard Snodgrass which you can find at Amazon…this book is highly recommended (read: required) for anyone looking at Effectively Dating tables. Trust me, you won’t believe how complicated the subject is…here is the link to the book:

http://www.amazon.com/Developing-Time-Oriented-Database-Applications-Management/dp/1558604367/ref=sr_1_1?ie=UTF8&s=books&qid=1242268718&sr=1-1

There is also a book by C.J. Date on the subject, titled “Temporal Data & the Relational Model”, but I haven’t read it….if anyone has read it please feel free to post your feedback on it!

Anyways, the book makes an interesting case for some major enhancements to the Relational Database products out there, SQL Server included. Basically, what you need when effectively dating tables is to add the valid date/time period for each row. So, if you have an e-commerce website, for example, and you want to schedule a sales promotion for 3 days, you would indicate in your PROMOTION table that this particular sales promotion is only valid for 3 days. The beauty of the solution is that the SQL queries in the application would query the tables by using the system date, as in “give me the promotions that are effective for today (the system date)”…so the website content can change as needed without any manual intervention.

If the table was not effectively dated, you would have to schedule a job to run at the start date/time of the promotion and make all the data changes right then and there. Of course, the promotion will probably start at midnight, and something will break when the scheduled job runs. The promotion won’t show on the website, and the DBA will get called at half-past midnight in order to fix the problem to the tune of frantic curses and imprecations from the pained executive staff…but I won’t elaborate any further on the joys of our profession 🙂

Now, how do you specify a valid date/time period for each row? The obvious answer at first glance is to add 2 date columns: StartDate and EndDate, and then add them to the Primary Key of the table. That should work, right?

You won’t believe how off the mark that solution is. Because what’s happening here is that the database is just storing 2 dates….it’s not really storing a date/time PERIOD (or INTERVAL). The database doesn’t know that this period should include ALL DATES IN BETWEEN the 2 dates you’re storing. So it’s possible to insert rows that overlap… as long as the StartDate and EndDate are different the Primary Key constraint will allow the INSERT…but you probably won’t want 2 of the same promotions running at the same time…..

And what about Foreign Keys? As it turns out, these won’t work either….in our example, if you have a FK in the ORDER table pointing to the PromotionID, well now you can have duplicate promotionIDs in the PROMOTION table. What if you have a promotion that you run every year for 3 weeks? You keep the same promotion ID, because the substance of the promotion is the same, but every year you insert a new row with different effective dates. So, if a customer bought something from PromotionID 6, and you’ve been running that promotion for 5 years, you now have 5 rows with PromotionID = 6 in the PROMOTION table. The FK can’t handle that…it needs a unique constraint to point to.

So you need lots of constrants or triggers to get the job done….this is what the Snodgrass book deals with…..and by the way, the above is just a delicious sampling of the issues encountered…it’s not an all-inclusive list.

In response to this, I have entered a suggestion in MS Connect in order to enhance SQL Server to fix this problem. The solution entails defining a DATE/TIME PERIOD or INTERVAL datatype. It turns out that the ANSI SQL standard already has this, but it’s not implemented in SQL Server. Below I pasted the text of the Connect entry…if it sounds interesting to you, please follow this link and vote for it:

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=432281

Hefty subject…I think I should get 3 cents for this one…. 🙂

The text of the MS Connect entry follows:

“Problem: SQL Server does not have an ANSI INTERVAL data type. When we create effectively dated tables we have to simulate a time period by using 2 DATETIME columns (StartDate, EndDate). However, often we don't want to allow 2 rows to have overlapping date ranges and many times we don't want to allow gaps in there either. All of these validations have to be done in SQL. It is cumbersome and probably doesn't perform as well as a data type would.

Solution: Create an 'Interval' or 'Time Period' data type. It should probably contain 2 values (StartDate, EndDate) stored in the same column and it can then treat all time periods between these 2 dates as a continuous length of time instead of just having 2 separate date columns in the DB.

It should also contain a special value for 'forever' or 'until the end of time'. This is heavily used for rows that don't currently have an end date. NULLs are often used for this, but they create problems such as not being able to participate in a Primary Key and they also are treated differently in SQL and can cause performance problems. So many times people just use an artificial value like '9999-12-31' for this end date, but this is not standardized, nor is its meaning obvious to the user.

It should have functions or operators to check for the gaps and overlaps.

It can also contain options to make it more flexible, such as "should the period be inclusive of the start & end dates?" and "allow gaps/overlaps?".

The benefits of this would be that we don't have to code triggers and also the logic would be very neatly encapsulated in the datatype and would not clutter up our SQL excessively. I imagine it would be faster as well.

Microsoft already did something like this in SQL 2008, where they created the Spatial Data Types. Maybe some of the existing features developed for Spatial support can be reused for this data type? Spatial data already has an 'Overlaps' operator, which is similar to what we would need for this data type.”

Rate

Share

Share

Rate

To Date or Not To Date….

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts