Tuple Versioning

  • Dimension table with start dates and end dates. End Date is maximum value for the most recent record. (it isn't null and this can't be changed).

    create table #t ( -- drop table #t

    id int identity,

    BusKeyint,

    Name nvarchar(50),

    StartDate datetime,

    EndDate datetime

    )

    Need to return the ID and Value for the most recent BusKey row

    example data:

    insert into #t (BusKey, Name,StartDate, EndDate)

    select 1, 'Old value', '1-Jan-2008', '1-Feb-2008'

    unionselect 1, 'Old value', '1-Feb-2008', '10-Feb-2008'

    unionselect 1, 'Current value', '22-Feb-2008', '1-Dec-2008'

    unionselect 2, 'Old value', '1-Oct-2008', '1-Oct-2008'

    unionselect 2, 'Current value', '1-Oct-2008', '1-Jan-2009'

    unionselect 3, 'Value', '1-Jan-2008', '1-Oct-2008'

    unionselect 3, 'Dup Value', '1-Feb-2008', '1-Oct-2008'

    This is what I normally do do:

    ;With CurrentEndDates as (

    select BusKey, max(EndDate) EndDate

    from #t

    group by BusKey

    )

    select *

    from #t t

    inner join CurrentEndDates cur on cur.BusKey = t.BusKey and cur.EndDate = t.EndDate

    I would like to know if anyone else has other approaches to this problem.

    Ideally one that can handle multiple columns identifying the row (in the above case the start date should be the largest of those with the same EndDate). This is simple enough with another sub-query of course.

    ;With CurrentEndDates as (

    select BusKey, max(EndDate) EndDate

    from #t

    group by BusKey

    ), CurrentEndDatesMaxStartDates as (

    select t.BusKey, max(t.EndDate) EndDate, max(StartDate) StartDate

    from #t t

    inner join CurrentEndDates cur on cur.BusKey = t.BusKey and cur.EndDate = t.EndDate

    group by t.BusKey

    )

    select *

    from #t t

    inner join CurrentEndDatesMaxStartDates cur

    oncur.BusKey = t.BusKey and cur.EndDate = t.EndDate

    and cur.StartDate = t.StartDate

    This feels so messy to me.

    Thanks,

  • Use row_number

    with cte as (

    select id,BusKey,Name,StartDate,EndDate,

    row_number() over(partition by BusKey order by EndDate desc,StartDate desc) as rn

    from #t)

    select id,BusKey,Name,StartDate,EndDate

    from cte

    where rn=1

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • That's a cute trick, I shall use that more often.

    Although it only performs well when there is clustered index over the order by. (the sort kills it when there isn't).

    Much better in terms of maintainability and readability though.

    Thanks.

  • I'm working up a comparison between TOP, MAX and ROW_NUMBER that addresses this problem. You can use all three. They have different strengths and weaknesses. You can use them as sub-selects or using the CROSS APPLY functionality. It all works. Just some, in some cases, work better than others. I still haven't finished all the details yet, but here are three examples (these are using structures from the tests and I'm just not ready to publish it all yet):

    SELECT d.[DocumentName]

    ,d.[DocumentId]

    ,v.[VersionDescription]

    ,v.[VersionId]

    FROM dbo.[Document] d

    JOIN dbo.[Version] v

    ON d.[DocumentId] = v.[DocumentId]

    AND v.[VersionId] = (SELECT TOP (1) v2.VersionId

    FROM dbo.[Version] v2

    WHERE v2.DocumentId = v.DocumentId

    ORDER BY v2.DocumentId, v2.VersionId DESC

    )

    WHERE d.[DocumentId] = 9729

    SELECT d.[DocumentName]

    ,d.[DocumentId]

    ,v.[VersionDescription]

    ,v.[VersionId]

    FROM dbo.[Document] d

    JOIN dbo.[Version] v

    ON d.[DocumentId] = v.[DocumentId]

    AND v.[VersionId] = (SELECT MAX(v2.VersionId)

    FROM dbo.[Version] v2

    WHERE v2.DocumentId = v.DocumentId

    )

    WHERE d.[DocumentId] = 9729

    SELECT x.*

    FROM ( SELECT d.[DocumentName],

    d.[DocumentId],

    v.[VersionDescription],

    v.[VersionId],

    ROW_NUMBER()OVER (ORDER BY v.VersionId DESC) AS RowNum

    FROM dbo.[Document] d

    JOIN dbo.[Version] v ON d.[DocumentId] = v.[DocumentId]

    WHERE d.[DocumentId] = 9729

    ) AS x

    WHERE x.RowNum = 1

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Richard (5/21/2008)


    That's a cute trick, I shall use that more often.

    Although it only performs well when there is clustered index over the order by. (the sort kills it when there isn't).

    Much better in terms of maintainability and readability though.

    Thanks.

    Here's another way, may be more efficient than row_number

    select a.id,a.BusKey,a.Name,a.StartDate,a.EndDate

    from #t a

    where not exists (select * from #t b

    where b.BusKey=a.BusKey

    and (b.StartDate>a.StartDate or (b.StartDate=a.StartDate

    and b.EndDate>a.EndDate)))

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Why wouldn't the max ID for each BusKey contain the latest data? :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply