October 17, 2008 at 5:54 am
Hi,
I am working on an application where I need to keep different versions for data and I would like to know what is the best way to design it? this is similar to source control, where I can have version of a file that is checked out, latest checked in version and list of history versions.
for each row in the table, I need to store three versions i.e. one pending, one live and list of history versions. should I store all the three versions in one table? I think this will cause performance issues when querying for live data because the data could be as big as 500,000 rows or even more than that. other option is to store pending, live in one table and history data in another table OR just keep each version in a different table and use a common key across all of them to track the changes?
also when a row gets deleted from the live table how to mark it as deleted in history table? should I add a bit flag column called "IsDeleted" in history table and use that to identiy the deletion?
the requirement for history data is that I should be able to query the history for set of records for any given date range.
Thanks.
October 17, 2008 at 6:21 am
We built a similar system. I outline the design and our tests in this article [/url]at Simple-Talk. To summarize, we create a version table, to which all the other tables have a link, and then we get the top value less than a given version. This can give you the latest record or a snapshot of a moment in time from all the records.
The initial set of data is still running, and growing. We did a different design where we also moved the historical records to a different database. It's not very big, so I can't tell you that it worked better or worse.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 17, 2008 at 6:58 am
We had a similar situation for not too big of a database and not too many changes.
We created a table to record the changes to the original table. The triggers will record the date and time when the change occurred as well as the data changes from and to. Next day will will have a report to display the changes. This data will move weekly to the database that will have historical records. So you will have changes for the week in one table in the same DB and historical data in another DB. It worked good for our organization. I think it really depends on how big is the database and how often the changes happen.
October 17, 2008 at 7:44 am
thanks for spending your time to reply to my questions.
the solutions look fine. but let me to explain the problem in a different way. lets say I have a product record in product table. I want to change the name of the product to something else, but that change goes through some workflow and requires approval. only when the change is approved, the name should be updated in the main table and the existing record should be appended to the audit or history table. in this case I should be able to store the change that is not yet approved during the workflow which can take any number of days, the live data for the products and the audit data. I can also create new products or delete existing products which also require approval before the change is committed to the live table.
this may be a bad example for products. but I am just trying to explain the problem with an example. in this particular case, should have tblproductpending, tblproductLive, tblproductAudit tables? I tried a design by keeping all versions in the same table with some flags, but I faced performance problems with it especially when querying for live data.
this data should be presented to the user in two different views. one view displays only live stuff, other view displays live plus pending wherever pending exists for corresponding live stuff but displays only the changes made by that user, the user shouldn't be able to see any unapproved changes made by other users. user can make a bunch of changes in a session and submit them for approval.
hope it is clear.
October 17, 2008 at 8:24 am
The second versioned system we created had an Inprocess and a History storage. The Inprocess was a multi-day or even multi-week approval process. Once approved, the new version and the appropriate data in the other tables was published to the history database. We also did an experiment with leaving it all in one database, but on two schema's. That didn't work quite as well as two databases, but it was functional.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 17, 2008 at 8:39 am
Did you try creating a definition table for different statuses such as pending , approved etc. You can have timestamp and userID which will help with query performance. May be I do not have complete picture of the problem
October 17, 2008 at 8:50 am
I do have a status table for the different workflow statuses. the workflow tables are different, they are used to track different states for the ChangeList submitted by the user like created,submitted,approved, rejected etc.
the changelist has a collection of changes and each change may point to a product record in the product table example I have explained before. that product record may be updated subsequently any number of times during workflow, but the requirement is that I don't need to keep versions during workflow which means if the product name changes from A to B and later from B to C when the workflow is in progress, I simply update the same pending record. but once the change is approved, then that value should be committed to the Live record. once committed to the Live record, if user wants to change it again, he has to submit another changelist and that goes through approval process again as usual.
October 17, 2008 at 8:59 am
That really sounds almost identical to the system we did with history & inprocess tables. While we migrated between the two, the structures and queries were pretty much as I outlined in the article.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 17, 2008 at 10:29 am
Is it possible for you to share the high level data model that you used for that system? I mean you can change the table and column names to something else. my current design looks like this
tblProductLive: ProductId int, name varchar(100), createddate,createduserid,modifieddate,
modifieduserid
tblProductPending: ProductPendingId int, ProductId int (FK), name varchar(100), statusid, createddate, createduserid,modifieddate,modifieduserid
tblProductAudit: ProductAuditId, ProductId int(FK), name varchar(100), startdate, enddate, startdateuserid, enddateuserid
I gave a product example to keep the question simple. but actually my application is a hierarchical data management system. users can add nodes/update attributes of existing node/delete nodes in a tree. these changes will require approval before they are committed to the live tree. each node has list of configurable attributes which are subject to change any time. so I have a generic node and attribute tables with one-to-many relationship between node and attribute table. each node and attribute has pending change, current and history data.
October 17, 2008 at 10:50 am
I'm sorry, no, I can't.
The basic structures look a lot like what you see in the article.
I've got another article coming out in the SQL Standard this month that compares several querying methods against this type of data. That might be useful too.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 17, 2008 at 10:52 am
no problem, thank you so much for the replies.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply