top row number within partition

  • i am ranking a select using row number with partition. Is there a way of adding another colum that will comtian the highest rank umber for each partitioned group.

    so

    1 5 val1

    2 5 val1

    3 5 val1

    4 5 val1

    5 5 val1

    1 3 val2

    2 3 val2

    3 3 val2

    1 2 val3

    2 2 val3

    thanks,

    Jules

    www.sql-library.com[/url]

  • i hope its clear that its the second column i want to get. the first one is gerenerated by a rownumber function

    www.sql-library.com[/url]

  • I assume you are partitioning on column 3, and that by "top-ranked" you mean the rows where row_number() = 1.

    I'm changing up the data a bit in the example below to clarify which values are actually being returned. The trick is to take each distinct value in column 3 and cross apply a subquery to pull the first row for that value.

    Please let me know if you have any questions.

    Bob

    declare @sample table (col1 int, col2 int, col3 varchar(5))

    insert into @sample

    select 1, 5, 'val1' union all

    select 2, 6, 'val1' union all

    select 3, 7, 'val1' union all

    select 4, 8, 'val1' union all

    select 5, 9, 'val1' union all

    select 1, 3, 'val2' union all

    select 2, 4, 'val2' union all

    select 3, 5, 'val2' union all

    select 1, 9, 'val3' union all

    select 2, 3, 'val3'

    --

    select distinct s.col3,ca.col1,ca.col2

    from @sample s

    cross apply (select top 1 col1, col2

    from @sample s2

    where s2.col3 = s.col3

    ) ca

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • but this means having another statement. I could use a max subquery to get these values in a second statement which i think would perform better than a cross apply. I was wondering if i could get those value repeating in the same select that generates the row numbers?

    thanks for your help.

    Jules

    www.sql-library.com[/url]

  • I misunderstood, sorry about that. You are wanting to build the values you originally posted.

    A summary query either before or after would give you those counts and a simple join should run quicker than a cross apply. You could use CTEs to do that and technically it's a single query (see below) but that's just semantics. The query plan is still going to involve two table scans. I don't think that's what you're asking for.

    I can't think of a function that can "look ahead" to see your final values and then retroactively apply them to all rows. I'll stay subscribed to this in case someone else can give you a better suggestion.

    May I ask why you want that max value included in every row?

    declare @sample table (col1 varchar(5))

    insert into @sample

    select 'val1' union all

    select 'val1' union all

    select 'val1' union all

    select 'val1' union all

    select 'val1' union all

    select 'val2' union all

    select 'val2' union all

    select 'val2' union all

    select 'val3' union all

    select 'val3'

    --

    ;with cte1 as (select ROW_NUMBER() over (partition by col1 order by col1) as rowVal,col1 from @sample)

    ,cte2 as (select col1, max(rowVal)as maxVal from cte1 group by col1)

    --

    select c1.rowval,c2.maxVal,c2.col1

    from cte1 c1

    join cte2 c2 on c2.col1 = c1.col1

    --

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • for a large dataset would it not be better to insertt he results of the cte1 into a temp table and run the max query on that

    www.sql-library.com[/url]

  • Bob Hovious (5/6/2009)


    I misunderstood, sorry about that. You are wanting to build the values you originally posted.

    A summary query either before or after would give you those counts and a simple join should run quicker than a cross apply. You could use CTEs to do that and technically it's a single query (see below) but that's just semantics. The query plan is still going to involve two table scans. I don't think that's what you're asking for.

    I can't think of a function that can "look ahead" to see your final values and then retroactively apply them to all rows. I'll stay subscribed to this in case someone else can give you a better suggestion.

    May I ask why you want that max value included in every row?

    declare @sample table (col1 varchar(5))

    insert into @sample

    select 'val1' union all

    select 'val1' union all

    select 'val1' union all

    select 'val1' union all

    select 'val1' union all

    select 'val2' union all

    select 'val2' union all

    select 'val2' union all

    select 'val3' union all

    select 'val3'

    --

    ;with cte1 as (select ROW_NUMBER() over (partition by col1 order by col1) as rowVal,col1 from @sample)

    ,cte2 as (select col1, max(rowVal)as maxVal from cte1 group by col1)

    --

    select c1.rowval,c2.maxVal,c2.col1

    from cte1 c1

    join cte2 c2 on c2.col1 = c1.col1

    --

    More simply

    select ROW_NUMBER() over (partition by col1 order by col1) as rowVal,

    COUNT(*) over (partition by col1) as maxVal,

    col1

    from @sample

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • nice one mark. Didnt know you could use partition with count.

    www.sql-library.com[/url]

  • Obviously, neither did I. But that's worth remembering. Thanks, Mark!

    However, it seems the two CTE approach runs a little faster with a larger number of rows. The culprit is I/O. While the two-CTE scheme scans the @sample table twice, the COUNT(*) OVER method scans it once, but creates a workfile that gets scanned three times, resulting in a relatively huge number of logical reads. Something to consider if you are going to be regularly applying this to tables with thousands of rows.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Jules Bonnot (5/7/2009)


    for a large dataset would it not be better to insertt he results of the cte1 into a temp table and run the max query on that

    That's definitely an "It Depends". If you can get it to one or two scans like Bob has now, that's almost always going to be faster than any temp table shuffling, even (and especially) for large datasets.

    However, in many case the query might try to hold too much in memory or in worktables and then you might be better off breaking it down with intermediate temp tables. I generally write it as one query to begin with and then break it down from there if it has problems.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply