TOP vs ROW_NUMBER

  • If I have a simple query like this:

    SELECT d.[DocumentName]

    ,d.[DocumentId]

    ,v.[VersionDescription]

    ,v.[VersionId]

    FROM dbo.[Document] d

    JOIN dbo.[Version] v

    ON d.[DocumentId] = v.[DocumentId]

    AND v.[VersionId] = (SELECT TOP (1) v2.VersionId

    FROM dbo.[Version] v2

    WHERE v2.DocumentId = v.DocumentId

    ORDER BY v2.DocumentId, v2.VersionId DESC

    )

    WHERE d.[DocumentId] = 9729

    I can convert it to use ROW_NUMBER like this:

    SELECT x.*

    FROM ( SELECT d.[DocumentName],

    d.[DocumentId],

    v.[VersionDescription],

    v.[VersionId],

    ROW_NUMBER()OVER (ORDER BY v.VersionId DESC) AS RowNum

    FROM dbo.[Document] d

    JOIN dbo.[Version] v ON d.[DocumentId] = v.[DocumentId]

    WHERE d.[DocumentId] = 9729

    ) AS x

    WHERE x.RowNum = 1

    But when I have a slightly more complicated query like this:

    SELECT d.[DocumentName],

    d.[DocumentId],

    v.[VersionDescription],

    pu.[VersionId],

    p.[PublisherName],

    pu.[PublicationDate],

    pu.[PublicationNumber]

    FROM dbo.[Document] d

    CROSS APPLY ( SELECT TOP ( 1 )

    v2.VersionId,

    v2.DocumentId,

    v2.VersionDescription

    FROM dbo.[Version] v2

    WHERE v2.DocumentId = d.DocumentId

    ORDER BY v2.DocumentId,

    v2.VersionId DESC

    ) AS v

    JOIN dbo.[Publication] pu

    ON pu.[DocumentId] = d.[DocumentId]

    AND pu.[VersionId] = ( SELECT TOP ( 1 )

    pu2.versionid

    FROM dbo.Publication pu2

    WHERE pu2.DocumentId = d.DocumentId

    AND pu2.VersionId <= v.[VersionId]

    AND pu2.PublisherId = pu.PublisherId

    ORDER BY pu2.DocumentId,

    pu2.VersionId DESC

    )

    JOIN dbo.[Publisher] p

    ON pu.[PublisherId] = p.[PublisherId]

    WHERE d.[DocumentId] = 10432

    AND p.[PublisherId] = 4813

    I'm stumped. Anyone have a good shot at this? So far the failures not pretty.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hmm...Trying to work this one out.

    The first one (in the CROSS APPLY), returns the Highest version of a given document in the VERSION TABLE.

    The second one seems to be taking an awfully hard road to get to the high version of a document in the DOCUMENT table. Are there times when the DOCUMENT table wouldn't have a row matching the highest version in the VERSION table?

    If "no" - then this might do it:

    SELECT d.[DocumentName],

    d.[DocumentId],

    v.[VersionDescription],

    pu.[VersionId],

    p.[PublisherName],

    pu.[PublicationDate],

    pu.[PublicationNumber]

    FROM dbo.[Document] d

    inner join ( SELECTRow_number() over (Partition by v2.documentID order by v2.versionID DESC) RN,

    v2.VersionId,

    v2.DocumentId,

    v2.VersionDescription

    FROM dbo.[Version] v2

    ) AS v on d.documentID=v.documentID

    JOIN dbo.[Publication] pu

    ON pu.[DocumentId] = d.[DocumentId]

    AND pu.[VersionId] = v.versionID --just go for the max version at all times.

    JOIN dbo.[Publisher] p

    ON pu.[PublisherId] = p.[PublisherId]

    WHERE d.[DocumentId] = 10432

    AND p.[PublisherId] = 4813

    and v2.rn=1

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • hmm... never mind earlier comment. I think this one's a little better anyway.

    SELECT d.[DocumentName],

    d.[DocumentId],

    v.[VersionDescription],

    pu.[VersionId],

    p.[PublisherName],

    pu.[PublicationDate],

    pu.[PublicationNumber]

    FROM dbo.[Document] d

    inner join

    ( SELECTRow_number() over (Partition by v2.documentID order by v2.versionID DESC) RN,

    v2.VersionId,

    v2.DocumentId,

    v2.VersionDescription

    FROM dbo.[Version] v2

    ) AS v ON d.documentID=v.documentID

    LEFT OUTER JOIN

    ( SELECT Row_number() over (Partition by pu2.documentID, publisherID order by pu2.versionID DESC) RN,

    pu2.versionid,

    pu2.documentID,

    pu2.publicationdate,

    pu2.publicationnumber,

    pu2.publisherID

    FROM dbo.Publication pu2

    ) puON pu.[DocumentId] = d.[DocumentId]

    JOIN dbo.[Publisher] p

    ON pu.[PublisherId] = p.[PublisherId]

    WHERE d.[DocumentId] = 10432

    AND p.[PublisherId] = 4813

    and v2.rn=1

    and pu.rn=1

    Edit: I think you need the second one to partition by documentID AND publisherID

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Ah, partition by... crud. I was bouncing all around it and I just couldn't figure it out. That looks great. I tested it against the single row and large set and it worked fine.

    Thanks for the help.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (5/21/2008)


    Ah, partition by... crud. I was bouncing all around it and I just couldn't figure it out. That looks great. I tested it against the single row and large set and it worked fine.

    Thanks for the help.

    Any time!

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Have you tried turning those row_number sub-queries into aggregates? I've tried both, and sometime the aggregates are faster.

    Something like this:

    ;with

    -- Publication Version CTE, used for Pub CTE

    PubVersion (DocID, PubID, VerID) as

    (SELECT pu2.documentID, publisherID, max(pu2.versionID)

    FROM dbo.Publication pu2

    GROUP BY pu2.documentID, publisherID),

    -- Publication data CTE, used in final query

    Pub (DocID, PubID, VerID, PublisherName, PublicationDate, PublicationNumber) as

    (select DocID, PubID, VerID, PublisherName, PublicationDate,

    PublicationNumber

    from PubVersion

    inner join dbo.Publication p

    on PubVersion.DocID = p.DocumentID

    and PubVersion.PubID = p.PublisherID

    and PubVersion.VerID = p.VersionID

    inner join dbo.[Publisher] p2

    on PubVersion.PubID = p2.PublisherID),

    -- Document Version CTE, used in final query

    DocVersion (DocID, VerID) as

    (SELECT v2.documentID, max(v2.versionID)

    FROM dbo.[Version] v2)

    SELECT

    d.[DocumentName],

    d.[DocumentId],

    v.[VersionDescription],

    v.VerID as [VersionId],

    pub.[PublisherName],

    pub.[PublicationDate],

    pub.[PublicationNumber]

    FROM

    dbo.[Document] d

    INNER JOIN dbo.[Version] v

    on d.DocumentID = v.DocumentID

    INNER JOIN DocVersion

    on v.DocumentID = DocVersion.DocID

    and v.VersionID = DocVersion.VerID

    LEFT OUTER JOIN Pub

    on d.DocumentID = Pub.DocID

    I find CTEs more readable than derived tables in the From clause, so I set it out that way. See if that'll do what you need.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Yeah. I'm actually working on comparing MAX, TOP and ROW_NUMBER. I was hoping to outline which might be better when.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I'm definitely interested in your test results on that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Grant Fritchey (5/21/2008)


    Yeah. I'm actually working on comparing MAX, TOP and ROW_NUMBER. I was hoping to outline which might be better when.

    A worthy investigation. I look forward to your results.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Another thing you might test on it is using an indexed view with the aggregates already in it. That's one of the things they're good for.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Grant Fritchey (5/21/2008)


    Yeah. I'm actually working on comparing MAX, TOP and ROW_NUMBER. I was hoping to outline which might be better when.

    If you want to see some REALLY interesting things - throw in the MAX() OVER () into your testing. Careful though - it might not return what you'd expect.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Now, now... we're stepping outside the bounds of the experiment.

    I'll have to see what I can do with a materialized view though. That's a good idea just to try out.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Matt Miller (5/21/2008)


    If you want to see some REALLY interesting things - throw in the MAX() OVER () into your testing.

    Max () OVER ()???? :blink:

    < wanders off in search of the Tome of All Knowledge >

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply