Is a history table required for maintaining a column's previous value?

Question

Is a history table required for maintaining a column's previous value?

Jesse Reich

Hall of Fame

Points: 3020
More actions
June 16, 2009 at 1:01 pm

#384782

I have a table with names that can be updated by a monthly process. The business requirement states that we need to know which names changed the last time it was updated, but we only need to know the most recent change.
Approach 1)
Table: Item
ItemID int NOT NULL
ItemShortName varchar(50) NOT NULL
ItemLongName varchar(50) NOT NULL
PreviousItemLongName varchar(50) NOT NULL
PK: ItemID
Before running the update process for ItemLongName, we will put the all the existing ItemLongName values into PreviousItemLongName
Approach 2)
Table: Item
ItemID int NOT NULL
ItemShortName varchar(50) NOT NULL
ItemLongName varchar(50) NOT NULL
PK: ItemID
Table: Item_History
ItemID int NOT NULL
ItemUpdateDate datetime NOT NULL
ItemLongName varchar(50) NOT NULL
PK: ItemID, ItemUpdateDate
Before running the update process for ItemLongName, we will put all the current ItemLongName values into the Item_History table.
It seems like the first approach is possibly violating first normal form (by having 2 long names stored). But the business requirements are not asking to maintain a history of changes to the field, only the last changes made. Does anyone have any suggestions on the best approach? Perhaps something completely different?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

One question is do they always change every month?

If not, then the first solution might not work, because you might be looking for "what was the name last month", and end up with either the current name (because it didn't change but the column was updated anyway), or data from longer ago than one month (because it didn't change and thus wasn't updated). Either way, you lose the data on when the change was made, unless you also add a "LastUpdated" column and store that there.

Keeping it in a log table allows you to track more, and even though right now the business requirement is to only keep the last month's name, odds are you're going to need to report on frequency of changes or something of that sort, somewhere down the road.

I'd be more inclined to generate an audit log. The chance of scope-creep on this one is pretty high. Plus, I've had really good experiences with audit logs coming in useful for a large number of things. (Enough so that I published articles on the subject on this site.)

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Jeff Moden SSC Guru Points: 1004376 More actions · Answer 2

Another possible method is Type 2 SCD's (Slowly Changing Dimensions). See the following article...

http://en.wikipedia.org/wiki/Slowly_Changing_Dimension

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 3

[font="Verdana"]The "type 2 slowly changing dimension" is a good approach for not needing a separate history or audit log table.

You don't actually have to use a dimensional model to use that same concept (in fact, I have been using it for several decades now in OLTP systems.) I used this as a template for every table that required historical data trails.

The idea is that you mark each record to indicate whether it is the current record, and from what date it applied. I used to put the following fields on my historical tables:

IsCurrent bit not null default (1)

CurrentFrom datetime not null default (getdate())

CurrentTo datetime not null default ('99991231')

You can actually make the IsCurrent field a calculated field if you wish, but hey, it's only 1 bit. Also, I filled in the value of CurrentTo so you can easily match to the correct version of a row on the date by saying something like "where mydate between CurrentFrom and CurrentTo" and not need to worry about nulls. Better for indexing too.

When an update comes in, you do the following:

1. update the last version of the row being updated to set the IsCurrent field to 0 and the CurrentTo field to getdate()

2. insert the new version of the row being updated

And yes, you can do all of that as set based. I'd do the updates first, because step 2 can also insert any new rows.

Now, check out the MERGE statement on SQL Server 2008, because there's an option there to use an OUTPUT clause, and use that clause as the input to an insert. Seriously rocks! You end up doing step 1 and step 2 all in one statement!

[/font]

Jesse Reich Hall of Fame Points: 3020 More actions · Answer 4

Here is my understanding of the Slowly Changing Dimension article as applied to the original example. There are multiple methodologies used to track historical data and each one is specified by a type number (I've left out Type 0):

--Type 1

ItemID ItemShortName ItemLongName

123 PST Pasta

-- ** after update **

ItemID ItemShortName ItemLongName

123 PST Pasta W/Sauce

--Type 2

ItemID ItemShortName ItemLongName IsCurrent FromDt ToDt

123 PST Pasta 0 20090101 20090615

124 PST Pasta W/Sauce 1 20090616 99991231

--Type 3

ItemID ItemShortName ItemLongName PreviousLongName

123 PST Pasta W/Sauce Pasta

--Type 4

[Item]

ItemID ItemShortName ItemLongName

123 PST Pasta W/Sauce

[ItemHistory]

ItemID ItemLongName CreateDate

123 Pasta 20090101

-- Hybrid (my interpretation of BWC's recommendation)

RowKey ItemID ItemShortName ItemLongName IsCurr FromDt ToDt

1 123 PST Pasta 0 20090101 20090615

2 123 PST Pasta w/Sauce 1 20090616 99991231

My understanding of the difference between the Hybrid and Type 2 is that in the Hybrid you don't have to increment the ItemID which would be useful for integrating the new table with original queries.

Management seems to want us to take the approach of Type 3 since our current requirement is only to show the last value previous to the update (not necessarily last month's value). But I posed the question on the forum because Type 3 struck me as problematic due to the first normal form violation (although nobody has mentioned it so maybe it's not a valid concern). It also seems limiting in regards to future requirements.

That leaves Type 2, 4, and the Hybrid. My only concern for our system with Type 2 and the Hybrid is that it seems like all our existing tables with foreign keys against ItemID would have to be re-written (as well as our queries).

I'm curious about Type 2 and Hybrid; do you have to have some kind of check constraint with a column to protect against date gaps between rows? Or do you simply assume it will be handled by the logic in the stored procedures.

Meanwhile, thanks for all the great input.

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 5

8kb (6/17/2009)
Management seems to want us to take the approach of Type 3 since our current requirement is only to show the last value previous to the update (not necessarily last month's value). But I posed the question on the forum because Type 3 struck me as problematic due to the first normal form violation (although nobody has mentioned it so maybe it's not a valid concern). It also seems limiting in regards to future requirements.

[font="Verdana"]I'm not entirely sure that it violates first normal firm, as technically the data isn't repeated: you would store the differences between the last version and the current version, which are (by definition) different.

I agree: the type 3 approach is limiting for future requirements, but it is also easier to implement.[/font]

8kb (6/17/2009)
That leaves Type 2, 4, and the Hybrid. My only concern for our system with Type 2 and the Hybrid is that it seems like all our existing tables with foreign keys against ItemID would have to be re-written (as well as our queries).

[font="Verdana"]True, to some degree. By creating views that only list the "current" rows, you can avoid having to rewrite queries. But I used this as a design template, so the idea of historical versioning was built in right from the start. You're patching it into an existing design, so minimilising rework is obviously a concern.[/font]

8kb (6/17/2009)
I'm curious about Type 2 and Hybrid; do you have to have some kind of check constraint with a column to protect against date gaps between rows? Or do you simply assume it will be handled by the logic in the stored procedures.

[font="Verdana"]It's actually really hard to build it in as a constraint. You can sort of semi get it with unique constraints. However, my designs tended to use procedural gateways to the database (no direct table data changes), so it was fairly easy to capture a check there. I suppose you could do it in trigger code if you don't want to enforce procedural gateways, but personally I prefer the gateway option.[/font]

Jesse Reich Hall of Fame Points: 3020 More actions · Answer 6

It looks like the Type 3 approach has won out. But on the flipside, if management changes their mind for new features in the future, I have a much better understanding of new some approaches. Thanks for the detailed explanations...