Do you find yourself scrambling all the time to implement some commonly occurring processes, such as sales promotions, in your organization? Have you ever had to create a new set of tables that is mostly redundant except for the dates in which the data occurs? If so, you may be in dire need of Effective Dating and not even know it.
Effective Dating is a Design Pattern that recognizes that most data is only effective for a specific period of time. This is also referred to as "Temporal Database Design". In the case of sales promotions, they usually have a start date and an end date, so you should take these into account in your design. This, in the simplest sense, is what Effective Dating is. However, as with many things, "the Devil is in the details", and in this case, it's a really big Devil 🙂
Representing date-time data seems like a very basic concept at first, yet in many, many applications it is ignored and replaced with heinous workarounds. I have seen entire sets of tables and processes that are completely redundant, existing only because the base tables were not Effectively Dated. Indeed, ignoring date/time data in applications has the effect of producing many complications that make you wonder why life is so hard and a little voice creeps into your head wondering "shouldn't this be done another way?"
So, why is it so often ignored?
One reason is that representing date-time data is fraught with many complications and pitfalls, so implementing it can be discouraging. Another possible reason is that with all things database-related, once the initial decision is made to commit to a bad design it's rather difficult, though not impossible, to undo its effects.
Accordingly, much research has been done regarding temporal databases, such as the book by Richard Snodgrass "Developing Time-Oriented Database Applications in SQL" (you can find a PDF version of this book here: http://www.cs.arizona.edu/~rts/tdbbook.pdf). If you've ever read up on this subject, you know how complicated it gets. But simply stated, what exactly is the problem with Date/Time data and why is it so complicated? What benefits do you get by properly expressing this data in an Effective Dating scenario? In this article I'm going to provide a clear, simple answer to this question and also lay the foundation for the later parts of this series of articles that explain what happens to Relational Databases when you introduce the concept of Effective Dating.
Suffice to say for now that the impact to a RDBMS is heavy; most of the concepts that are well-known and that we sometimes take for granted such as Primary Keys and Foreign Keys tend to change drastically the minute that you introduce effective dates into a database schema. Nonetheless, it's still worth the trouble to implement it in the right scenarios, and I will show one such real-world scenario in this article.
The Concept
Although we don't always express this in our database schemas, most data is temporal; it actually only lives over a period of time. Yet in many tables that we design we don't put in effective dates to signify this. Many times there is a good reason for this: maybe we're not interested in historical data, or maybe we're not interested enough in order to justify the cost in adding dates to the design.
However, there are times where putting these effective dates in will tremendously help out a project. Unfortunately, many times we don't see this benefit at the start, and our initial design is not effectively dated even though it really is in our best interest. We normally realize this when we find that we need to add all sorts of workarounds just because our system is not date-aware by design. Yet many times we go on blissfully ignoring the effectively dated nature of our business. We keep on piling workarounds, each more ridiculous than the last, on top of a design that is pretending to be eternal, but failing. This is what I call the "Delusions of Eternity" anti-pattern. It's sort of like ignoring our own mortality (we humans are also "effectively dated"), so maybe that is where this attitude comes from. Who knows, but maybe our own mortality tends to put a negative slant on the whole subject of Effective Dating and causes us not to think about it 🙂
The Scenario
Let's solidify all of this with an example. Let's say we're designing a database that handles promotions for retail sales in an e-commerce environment. A common design for this is to have a Promotions table which contains all the promotion details and then a 'PromotedProducts' table which links promotions to products. Below is a simple diagram of this schema:
The Problem
So what's the problem here? It seems like a good design at first, but the fact is that Promotions are by definition Effectively Dated, and this design ignores that. Soon the cracks in the design start to appear. Let's illustrate how this design painfully falls apart:
Yes, you can store promotion details and associate them to products, but what happens when a promotion starts and then ends? You have to switch prices in the products that are affected. So, how do you do this? Well, obviously you run some kind of batch job, at exactly the right time, to change these prices. The batch job would insert the row into the PromotedProducts table at the beginning of the promotion, and then would delete the row at the end of the promotion. The correct price for the product would be written to the OrderDetail table by using a SELECT with a LEFT JOIN on the PromotedProducts table. The price from PromotedProducts would be used, and if it's NULL, then the price from the Products table would be used instead. No big deal, right? Wrong, by several counts:
Once you accept that you have to run a batch job, things start to go downhill from there:
- What if management wants to start a promotion at midnight? You'd better be up and watching in case that batch job fails.
- If the batch job does fail, you're dealing with downtime or bad pricing, both of which result in a direct cost to the business.
- Because you are deleting the promotion as soon as it's over, you don't have the capability to analyze how many orders the promotion generated, or do any other kind of analysis.
- Someone will probably get the idea that they should store this information off in an Excel spreadsheet somewhere so they can report it to management. Viola! You have now created an unnecessary business process and a questionable source of data to base management decisions on.
- Every time you do a promotion you have to set up a new batch job in order to insert the data. So there is manual intervention involved with every promotion setup, which means that every promotion turns into a development project, with development and testing phases. It's a simple project, but a project nonetheless. This results in the following sorry list of circumstances: a much higher cost of doing promotions
- Since you're doing new development every time you do a promotion there is always a good possibility of failure, even if you're doing "the same thing" every time in the eyes of management. After a few failures, management will be frustrated and start asking the question "why can't we get this right?".
- The business will undoubtedly be slowed down in its ability to do promotions and could be outrun by the competition in this regard.
- Forget about the possibility of doing many promotions at once....it's too difficult to manage.
So we see many problems here, not the least of which are frantic, screaming managers and sleepless DBAs. And this reinforces another famous saying: data modeling is good for your health (OK, it's not famous, but it should be).
We also see how the effects of a bad design can spill over into the other departments in a company, raising costs and blood pressures and creating many fragile and unnecessary business processes to accommodate the bad design. Surely there must be a better way.
The Solution
Now, let's see what happens if we break down, repent, and just accept the fact that these promotions live and die by dates. We would include 2 dates in the PromotedProducts table signifying the Start and End dates of the Promotion. The first thing we then notice is that these 2 new columns must be included in the Primary Key, since there can now be several promotions for the same product. Here is the new database schema:
All we did was add the 2 dates in the PromotedProducts table, but the real magic is what the application does with this data: the website application would be designed so that it looks at these dates and compares them to the current date/time to see what rows should be displayed. It would look in the PromotedProducts table using the effective dates to determine if the promotions are active and should be displayed. The pricing would still be calculated using the LEFT JOIN method we described above.
Using this design, let's look at the weaknesses of the previous design and see what's happened to them:
- No batch job needs to be run. The system is date-aware and automatically displays the right price at the right time. So, no downtime is possible from a Development standpoint, and no staying up at night 🙂
- Setting up a new promotion goes from being a development project to a few simple data changes done by users with a simple interface over the Promotions tables. The Development Department doesn't even have to be involved. This produces a much nicer list of circumstances:
- Much lower cost of doing promotions.
- Promotions are effortless from a Development standpoint, so Management oversight will instead be directed at the users who set up the promotions.
- This makes the company much more agile and able to respond quickly to business opportunities.
- It's possible to manage as many promotions as you like at the same time, since the system itself is handling all the complex shuffling of dates and prices.
- The promotions are persisted in the table even after they are finished, so you have the ability to go back and analyze the statistics of the promotions.
Conclusion
This simple example illustrates how beneficial it is to consider including Effective Dates into your database schema. I have seen both of these scenarios at work and the difference is too vast to be ignored. You go from a frantic, stressful environment focused on how to make the systems work to a relaxed, confident environment focused on how to better run the business. And all because you let the software handle all the work. And isn't that what software is supposed to do in the first place?
So that is the good news. In the next article of the series I will present the impact and pitfalls of the Effectively Dated design above and how to overcome them. We will build on the simple data model that we started with here and see what issues surface as you take Effective Dating further and further. You will then start to see how far RDBMS's still have to go to properly represent this type of data.