Using OVER() to Fix Bad Version Numbers

  • Comments posted to this topic are about the item Using OVER() to Fix Bad Version Numbers

  • Will keep this one in mind - but my hunch is, this won't sort out a situation where some field had a version 1 and two version 3s...

  • Hi Dave

    Thanks for taking the time to read the article. When you have (as in your example), a v1 and two v3 values, all you can do is make a best guess at which of the V3 values is the latest. When I fixed up the system the article talks about, I made sure the ORDER BY in the ROW_NUMBER() call ordered by start date. This should pick up the latest value most of the time, you'll have to go with the row returned by SQL Server if two rows have the same version number and start date (unless you can more ordering criteria to differentiate the rows).

    It might be a better idea to generate the version numbers dynamically in SQL Server (either on INSERT or via computed column), which if done properly should prevent this problem from occurring.

  • I know it's dummy data, but what would you suggest to fix the inconsistency in the end date field? As it stands, it is u/s and would be better dropped.

  • You can update a CTE directly, so there's no need for the #BadRecords temp table and the MULTIPLES and BAD sub-queries:

    WITH CTE AS(

    SELECT ContactId, FieldTypeId, FieldVersion, StartDate,

    ROW_NUMBER() OVER (PARTITION BY ContactId, FieldTypeId

    ORDER BY StartDate) AS CorrectVersion

    FROM SCD.ContactFields)

    UPDATE CTE

    SET FieldVersion = CorrectVersion

    WHERE FieldVersion != CorrectVersion;

  • You can easily put a StartDate and EndDate (as well as a current flag) into what you call the traditional row model of a data warehouse. I can't imagine how the attribute-based model you describe could ever be performant, especially when dealing with a large financial institution that would have tons of data.

    Be still, and know that I am God - Psalm 46:10

  • Thanks for the refresher on the OVER() clause.

  • Hi Sam - yes, your method is valid too. There are plenty of ways to do this kind of thing, I'm just trying to give an overview of how the OVER() clause works.

    David - again, I agree with you. And yes, the structure used was not particularly performant. But it was in place before I started working with the company, so the only option was to work (and fix) what was already there.

  • Hi Robert

    With a corrected Version Number, it's relatively straightforward to set the end date of the current row to the start date of the following row.

    E.g.:

    FieldVersion: 3, StartDate: 2016-04-06 09:00, EndDate: 2016-08-20 00:32

    FieldVersion: 4, StartDate: 2016-04-06 10:00, EndDate: NULL

    You want to set the EndDate of FieldVersion 3 to 2016-04-06 10:00. This code will hopefully start you off (worked fine for the sample records I used):

    SELECT ContactId, FieldId, FieldTypeId, FieldVersion, FieldVersion - 1 AS PreviousFieldVersion, StartDate, EndDate

    INTO #MultiVersions

    FROM Scd.ContactFields

    SELECT CF.ContactId, CF.FieldId, CF.FieldVersion, CF.StartDate, CF.EndDate AS CurrentEndDate,

    MV.StartDate AS NewEndDate

    FROM Scd.ContactFields CF

    LEFT JOIN #MultiVersions MV

    ON MV.ContactId = CF.ContactId AND MV.FieldTypeId = CF.FieldTypeId

    AND MV.PreviousFieldVersion = CF.FieldVersion

    ORDER BY CF.ContactId, CF.FieldTypeId, CF.FieldVersion

    Regards,

    Mike.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply