June 8, 2005 at 1:25 pm
At my current workplace, there is a group of programmers working on our next-generation app for in-house work. Their work includes the design of the enterprise DB, which is where I have a question for people.
In the current design, they are implementing in-table row versioning. In this implementation, when a row is modified, instead of actually modifying the existing row, the original row is flagged as 'deleted', and a new row is added. A pair of Date fields indicate the date/times that the row was valid from - to (labeled as Date added and date modified), with no IsDeleted flag indicating the current row.
Each table will have an identity field, and a second surrogate 'Identity' field for identifying the rows related to one entity. The table's PK (and a clustered index) will be based on the second 'Identity' field.
Personally, I believe that tables like this will become increasingly problematic as time goes on, growing ridiculously large and slow to query, and difficult to run applications against. However, I need some actual information on this:
1) Has anyone ever actually considered an implementation like this in SQL Server?
2) If so, what were your thoughts on this?
3) If you went to Implementation, what were the results?
4) Are there any existing theoretial works that support this methodology? If so, what are they?
This is one of the most intelligent groups I know on SQL Server questions, and I would really like an assist on answering this.
THANKS!
June 8, 2005 at 1:47 pm
Sounds like a good plan... except for the part where the history of the table is kept in the master table. I can't imagine having to go through millions of useless rows just to fetch some simple data.
I would think that it's best to put the history of the data into it's own history table... Then when you need to check the history, you go in the history and when you need the data, you go get the data .
As for theory... I would think that I wouldn't want to keep the apples that have passed their due date with the good ones in the same basket... but that's just me .
June 8, 2005 at 2:14 pm
I know of no theory that would support this implementation, but I can name one that says this is a bad implementation. How about the relational model of database management? You should not store different things in one table. A flag that makes a row change 'mode' indicates that there are actually more than one kind of entity stored in a table.
And what do you mean with two surrogate keys using identity? Since every new version of a row will receive a new pk it means rows in other tables that reference the old row will need to be updated with the new key.
June 8, 2005 at 2:37 pm
Hehe.. I missed that one (referencial integrity). But I think it's the best reason to stop (or not start) this non-sens. Not to re-mention speed, ease of use, correct design and a few others I'm surely missing again.
June 8, 2005 at 4:09 pm
this is another classic example of programmers getting their hands on the data, brainstorming, and then coming up with a solution that probably fits their needs and they can rationalise until the cows come home.
i'd say your concerns are more than justified. put your foot down and stand your ground, you have the relational model to fall back on. fight the power.
cheers
dbgeezer
June 8, 2005 at 4:29 pm
>>And what do you mean with two surrogate keys using identity?
Here is what a sample table layout would look like, with generic identifiers:
VersionID int identity
RecordID int <<--- Core of PK
RecordData varchar(12)
IsDeleted bit
Created datetime
CreatedBy Varchar(15)
LastModified datetime
LastModifiedBy varchar(15)
So, let's enter data item 1, and edit it twice:
1 1 ThisisTests 1 06/01/2005 12:30:00 Brendt 06/02/2005 10:15:00 Mike
2 1 ThisisTest 1 06/02/2005 10:15:00 Mike 06/04/2005 08:44:10 Brendt
3 1 ThisIsATest 0 06/04/2005 08:44:10 Brendt
As near as I can tell, this means RI through triggers, and no actual PK/FK relationships. Piles of data, long queries or inclusion of LastModified Is Null in all queries. Lots of additional overhead, bad indexing, and bad performance is what I see coming.
June 8, 2005 at 4:43 pm
Amen.
May the force be with you .
June 8, 2005 at 4:48 pm
Can you live with this type of query???
Select VersionID, RecordID, RecordData, isDeleted, Created, CreatedBy, LastModified, LasmodifiedBy from dbo.YourTable Main inner join
(Select RecordID, max(LastModified) as LastModified
from dbo.YourTable
group by RecordID) dtCurrentDate on Main.RecordID = dtCurrentDate.RecordID and Main.LastModiFied = dtCurrentDate.LastModified
EDIT
Sorry I forgot about the IsDeleted Column. You'd still have to scan the whole table, flush 90% of the data, to present the whole table in a report.... is this something you can live with???
June 8, 2005 at 5:10 pm
Nope - that's why I am looking for outside reinforcing opinions on this table structure. Thanks!
June 8, 2005 at 5:30 pm
If you need real punch, you should ask Joe Celko to comment on this design. They'll start running for their mommies once he's done with 'em .
June 8, 2005 at 11:07 pm
You don't even need to look at how complex the queries for this table will be. It is absolutely not an acceptable design! There is no PK (RecordID cannot be PK since there will be many rows with equal values for it). Like I said before the table contains multiple types of entities. There is unnecessary data stored with the Created/LastModified design. There is no real key (at least not in this example). Etc.
Say no, get a copy of Fabian Pascal's Practical Issues in Database Management and have the designers read that if they are going to have anything to say about the database design.
June 9, 2005 at 1:58 am
Indeed.
To the dark side poor design leads.
cheers
dbgeezer
June 9, 2005 at 3:17 am
Chris already mentioned all there is to say. You should suggest, more or less gently, that those developers take some course in database design. Maybe you can get your hands on http://www.oreilly.com/catalog/databaseid/
I've started yesterday to review it and it looks very promising. Relational theory mere mortals outside the ivory tower can understand.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
June 9, 2005 at 4:10 am
As it has been said: History should be managed with different table or you could consider using different dbms product that has a native support for multiversion data.
June 9, 2005 at 6:04 am
I have successfully implimented this type of historical schema but with a slightly different approach such as:
-- The CORE table
create table Partys
( PartyId integer identity(1,1)
, EffectiveDt datetime not null
, ExpirationDt datetime not null
, constraint Partys_P primary key (PartyId)
)
-- The Version table
-- Note that PartyVersions should not have its own identity column.
Create table PartyVersions
(PartyId integer identity(1,1) not null
,AsOfDt datetime not null
,CreatedDt datetime not null
,CreatedBy Varchar(15) not null
,LastModified datetime not null
,LastModifiedBy varchar(15) not null
-- Data Columns go here
, constraint PartyVersions_P primary key (PartyId , AsOfDt )
, constraint Partys_F_PartyVersions foreign key
(PartyId ) references Partys
)
-- A version table with a uniqueness constraint
Create table PartyTaxIdVersions
(PartyId integer identity(1,1) not null
,AsOfDt datetime not null
,CreatedDt datetime not null
,CreatedBy Varchar(15) not null
,LastModified datetime not null
,LastModifiedBy varchar(15) not null
,TaxIdentifier integer not null
, constraint PartyTaxIdVersions_P primary key (PartyId , AsOfDt )
, constraint PartyTaxIdVersions_P unique (TaxIdentifier, AsOfDt )
, constraint Partys_F_PartyTaxIdVersions foreign key
(PartyId ) references Partys
)
All Queries for this class of structure will be similiar to
select *
from Partys
join PartyVersions
on PartyVersions.PartyId = Partys.PartyId
join (select PartyVersions.PartyId
, MIN(AsOfDt) as AsOfDt
from PartyVersions
where AsOfDt >= @AsOfDt
) as PartyVersionLatest (PartyId, AsOfDt )
on PartyVersionLatest.PartyId = PartyVersions.PartyId
and PartyVersionLatest.AsOfDt= PartyVersions.AsOfDt
join PartyTaxIdVersions
on PartyTaxIdVersions.PartyId = Partys.PartyId
join (select PartyVersions.PartyId
, MIN(AsOfDt) as AsOfDt
from PartyTaxIdVersions
where AsOfDt >= @AsOfDt
) as PartyTaxIdVersionLatest (PartyId, AsOfDt )
on PartyTaxIdVersionLatest.PartyId = PartyVersions.PartyId
and PartyTaxIdVersionLatest.AsOfDt= PartyVersions.AsOfDt
Please be aware of these situations:
The typical SQL will contain a variable ( As Of Date ) within a subquery, views cannot contain variables and therefore, views cannot be used to aid re-usability.
Since stored procedure use parameter sniffing to determine the optimal query plan, the AsOfDt value must always be passed and must not be defaulted to the current date within the stored procedure. i.e. do not have this logic in any stored procedures:
if @AsOfDt is null set @AsOfDt = current_timestamp
Columns that have uniqueness constraints should be in a seperate table or updates to the other columns will cause many rows having the same value which increases the index size resulting in performance problems.
The Core table and the version table should be on different disks to increase thruput.
SQL = Scarcely Qualifies as a Language
Viewing 15 posts - 1 through 15 (of 30 total)
You must be logged in to reply to this topic. Login to reply