October 15, 2016 at 11:49 am
Comments posted to this topic are about the item Using OVER() to Fix Bad Version Numbers
October 17, 2016 at 12:49 am
Will keep this one in mind - but my hunch is, this won't sort out a situation where some field had a version 1 and two version 3s...
October 17, 2016 at 1:59 am
Hi Dave
Thanks for taking the time to read the article. When you have (as in your example), a v1 and two v3 values, all you can do is make a best guess at which of the V3 values is the latest. When I fixed up the system the article talks about, I made sure the ORDER BY in the ROW_NUMBER() call ordered by start date. This should pick up the latest value most of the time, you'll have to go with the row returned by SQL Server if two rows have the same version number and start date (unless you can more ordering criteria to differentiate the rows).
It might be a better idea to generate the version numbers dynamically in SQL Server (either on INSERT or via computed column), which if done properly should prevent this problem from occurring.
October 17, 2016 at 3:03 am
I know it's dummy data, but what would you suggest to fix the inconsistency in the end date field? As it stands, it is u/s and would be better dropped.
October 17, 2016 at 4:43 am
You can update a CTE directly, so there's no need for the #BadRecords temp table and the MULTIPLES and BAD sub-queries:
WITH CTE AS(
SELECT ContactId, FieldTypeId, FieldVersion, StartDate,
ROW_NUMBER() OVER (PARTITION BY ContactId, FieldTypeId
ORDER BY StartDate) AS CorrectVersion
FROM SCD.ContactFields)
UPDATE CTE
SET FieldVersion = CorrectVersion
WHERE FieldVersion != CorrectVersion;
October 17, 2016 at 7:35 am
You can easily put a StartDate and EndDate (as well as a current flag) into what you call the traditional row model of a data warehouse. I can't imagine how the attribute-based model you describe could ever be performant, especially when dealing with a large financial institution that would have tons of data.
Be still, and know that I am God - Psalm 46:10
October 17, 2016 at 7:58 am
Thanks for the refresher on the OVER() clause.
October 18, 2016 at 5:12 am
Hi Sam - yes, your method is valid too. There are plenty of ways to do this kind of thing, I'm just trying to give an overview of how the OVER() clause works.
David - again, I agree with you. And yes, the structure used was not particularly performant. But it was in place before I started working with the company, so the only option was to work (and fix) what was already there.
October 18, 2016 at 5:14 am
Hi Robert
With a corrected Version Number, it's relatively straightforward to set the end date of the current row to the start date of the following row.
E.g.:
FieldVersion: 3, StartDate: 2016-04-06 09:00, EndDate: 2016-08-20 00:32
FieldVersion: 4, StartDate: 2016-04-06 10:00, EndDate: NULL
You want to set the EndDate of FieldVersion 3 to 2016-04-06 10:00. This code will hopefully start you off (worked fine for the sample records I used):
SELECT ContactId, FieldId, FieldTypeId, FieldVersion, FieldVersion - 1 AS PreviousFieldVersion, StartDate, EndDate
INTO #MultiVersions
FROM Scd.ContactFields
SELECT CF.ContactId, CF.FieldId, CF.FieldVersion, CF.StartDate, CF.EndDate AS CurrentEndDate,
MV.StartDate AS NewEndDate
FROM Scd.ContactFields CF
LEFT JOIN #MultiVersions MV
ON MV.ContactId = CF.ContactId AND MV.FieldTypeId = CF.FieldTypeId
AND MV.PreviousFieldVersion = CF.FieldVersion
ORDER BY CF.ContactId, CF.FieldTypeId, CF.FieldVersion
Regards,
Mike.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply