December 9, 2013 at 4:03 am
Guys,
I am working on a framework for a new enterprise level DW. One of the requirements is to store all historical changes in a single history table so if you update 3 fields in a claims record and 8 fields in the customer record then all the changes are held in the same history table (yes, I am concerned about performance!). The structure is still being defined but will basically have the following data:
Source table identifier (either ID or name)
Source column identifier (either ID or name)
Data type
Old Value
New Value
Datetimestamp of change
User responsible for change
Changed on data load ID
Looking at the information required some of it seems to replicate the data held in sys.objects and sys.columns and it seems a bit pointless to re-invent the wheel so I am proposing that we use the table objectID as the source table identifier.
I would like to do the same for columns but SELECT * FROM sys.columns returns the table object_ID and the column ID (as in display sequence order). The problem would arise if the columns in the atomic tables were re-sequenced, all of the history would be broken as I have no control over the column_name or Column_ID in sys.columns. Is there a unique ID available for the column objects, or do I need to build in my own version of sys.objects and sys.columns to deal with slowly changing dimensions - I need to be able to protect the history against any modifications to the atomic tables (names, IDs, data types, constraints, column sequence etc.) In theory I may have to store changes to atomic table structures in history so that I can recreate the record including columns sequence and data type at any point in the past - not nice :w00t:.
December 9, 2013 at 4:32 am
Columns are uniquely identified by the combination of object_id (for the table) and column_id. They don't have a single unique ID of their own.
The only time a table's object_ID can change is if it is dropped and recreated. The only time a column's column_ID changes is if the column or table is dropped and recreated.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 9, 2013 at 6:26 am
Drat! 🙂
I suspected that was the case. Looks like I'll be building it myself then.
December 9, 2013 at 6:57 am
Why? Unless you're dropping and recreating tables, the object id and column id will remain the same. They don't change at random times for no reason. If you're frequently dropping and recreating tables in a DW, history is probably not the biggest concern.
Also, if this is a data warehouse, why a history table and not slowly-changing dimensions?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 9, 2013 at 8:38 am
That, my friend, would be a whole different thread 😀
My data sources are spread across a number of different technologies (SQL2000, 2005,2008, Oracle 11i, 12, Progress 9, 10, as well as flat files, Excel and Access databases). Some atomic data tables will be populated by more than one system and some of the applications are reaching end-of-life and will be replaced in the next 18 months. I am a contractor and just trying to cover all of the bases.
The decision has been made that the atomic tables will only hold the current values for each entity record and that changed data will be held in a meta-data table - the plan is to keep the atomic data store as generic as possible (entity data will be held in entity specific tables but reference values and the command and control structure, including SCD will be held in generic tables) [This is being taken to the extreme in that the reference values are held in a Scheme table with a SchemeValue table with a FK into the Scheme table, however the Schemes themselves are held in a scheme called 'Scheme' with each schemename and its SchemeID held in the SchemeValue table]
One of these generic tables is the history table which needs to hold enough information to identify the entity type, entity reference and entity property that has been updated. The idea is that ETL packages then have a generic pattern to determine the atomic entity table, the entity record and fields that need to be updated; the history table is densley populated and only contains changed fields, whereas a traditional SCD would contain all 247 fields again for the 3 fields that have been updated.
One of the other challenges is that I KNOW the structure of the atomic entity tables will change over time: changed data types, column names and/or column order so I need to be able to regenerate what the table looked like for any given time (In this situation, I don't think that performance is going to be critical - at least I hope not!) and so we want to be able to store the changes to the table structures in the history table as well:hehe:
The history table will also be used to back-out any bad data loads in a recovery scenario.
Part of the challenge is that we don't know what the future requirements are going to be so we are trying to build a very generic model - I am hoping to blog the development process because if we can get it to work, it will be of major interest.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply