selecting top N records by type

  • goal is simple but the structure of the query alludes me:

    to keep it simple lets say i have a table with 3 columns: salePrice, saleType, and saleDate. the table contains 10 records. 5 of them have a saleType of "credit" and 5 of them have a saleType of "cash". what would my select statement look like if i wanted a result set that showed the top 2 records of each type ordered by saleDate desc?

    i feel like the answer is so simple, yet i don't know how to approach it. btw: my realworld example could have a large number of types and many records related to a specific type.

    thanks for your help

  • One of the articles on the front page of the site today, about running totals and ranking functions, has an answer that might work for you.

    Insert into a temp table, rank based on the type and the desired sequence, then select from the table where the rank is <= the desired number of rows.

    (The ranking function is mildly complex, so I'm not going to quote it here, but it's outlined quite well in the article.)

    Of course, it's much easier in SQL 2005. Just use Row_Number() and a CTE. But in SQL 2000, it requires a bit more work.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • How about

    SELECT top 1 * FROM T1 WHERE name ='cash'

    UNION ALL

    SELECT TOP 1 * FROM T1 WHERE name = 'credit'

  • Thanks for the reply. i'm not sure the union will work because my query can't use literal values like that. it needs to somehow get all records of each specific type and then only return the top N of each in one result set. so if the data looked like this:

    saleType salePrice

    cash 10

    cash 5

    cash 4

    cash 3

    cash 2

    credit 10

    credit 5

    credit 4

    credit 3

    credit 2

    and i wanted to see the top 2 of each, the result set would look like:

    saleType salePrice

    cash 10

    cash 5

    credit 10

    credit 5

    the example i give is really oversimplified in terms of the saleType. imagine that there are 1000 different possible saleTypes and each sale type has 1000 different records and i wanted to see a list of the top 5 records for each 1000 saleType

  • Is this what you mean?

    declare @temptbl table

    (salestype varchar(6), salesprice int)

    insert into @temptbl (salestype, salesprice)

    select 'cash', 5 union all

    select 'cash', 3 union all

    select 'cash', 2 union all

    select 'cash', 1 union all

    select 'credit', 5 union all

    select 'credit', 2 union all

    select 'credit', 1 union all

    select 'credit', 3

    select t1.salesprice, t1.salestype

    from @temptbl t1 where t1.salesprice in

    (select top 2 t2.salesprice from @temptbl t2 where t1.salestype = t2.salestype

    order by t2.salestype, t2.salesprice desc)

    order by t1.salestype,t1.salesprice desc

    Toni

  • Yes, this is exactly what i was looking for! Thanks so much!!!

  • Keep in mind that the suggested solution (a correlated subquery) is very resource hungry and will be quite slow on a big table with a lot of different values in the column being joined against.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared. Can you please provide the preferred alternate to a correlated subquery?

    Thanks

    Toni

  • toniupstny (2/1/2008)


    GSquared. Can you please provide the preferred alternate to a correlated subquery?

    Thanks

    Toni

    Insert into a temp table, add rankings to it, select from that.

    The correlated subquery will be fast if the column you join to has only a few distinct values, but will have to do a table/index scan for each distinct value. Thus, if there are only two (as in the sample data provided), it will probably be reasonably fast. If there are a dozen, or even hundreds, the query will crawl. I've seen correlated subqueries in such cases, and some of them could easily be rear-ended by a glacier, even on high-power hardware.

    The temp table and rankings will be slower, if you only have a couple values, but will be much, much faster if you have more than that.

    There was a very good article on the front page of SQL Server Central just a couple of days ago on how to add rankings to a temp table in SQL 2000. (In 2005, you just use the Row_Number function.) Rather than try to reproduce the whole article, I refered to it in my original answer on this same thread (cf).

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Also, since SQL 2000 doesn't allow the use of a variable in the TOP clause, a ranked temp table will add that functionality. The subquery has to have "top 2" hard-coded into it. Per the original poster, he needs the "top N rows", which sounds to me like a variable.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thank you GSquared.

    I have to confess on being a bit behind in reading the articles this week as I just completed, and passed, my certification test for SQL Server 2000 on Wednesday. I will definitely go back through and pay special attention to that article.

    Toni

  • Congrats on the cert! (I may one day get some certs myself. Haven't been able to afford it yet.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Here is another approach for Top N algorithm

    -- Prepare sample data

    DECLARE @Sample TABLE (SalesType VARCHAR(6), SalesPrice INT)

    INSERT@Sample

    SELECT'cash', 5 UNION ALL

    SELECT'cash', 3 UNION ALL

    SELECT'cash', 2 UNION ALL

    SELECT'cash', 1 UNION ALL

    SELECT'credit', 9 UNION ALL

    SELECT'credit', 8 UNION ALL

    SELECT'credit', 4 UNION ALL

    SELECT'credit', 2

    -- Initialize user supplied parameter

    DECLARE@Items INT

    SET@Items = 2

    -- Prepare staging table

    DECLARE@Stage TABLE (SalesType VARCHAR(6), SalesPrice INT, RecID INT)

    INSERT@Stage

    SELECT DISTINCTs.SalesType,

    NULL,

    v.Number

    FROM@Sample AS s

    INNER JOINmaster..spt_values AS v ON v.Type = 'p'

    WHEREv.Number BETWEEN 1 AND @Items

    -- Populate staging table

    WHILE @Items > 0

    BEGIN

    UPDATEst

    SETst.SalesPrice = x.SalesPrice

    FROM@Stage AS st

    INNER JOIN(

    SELECTsa.SalesType,

    MAX(sa.SalesPrice) AS SalesPrice

    FROM@Sample AS sa

    LEFT JOIN@Stage AS st ON st.SalesType = sa.SalesType

    AND st.SalesPrice = sa.SalesPrice

    WHEREst.SalesType IS NULL

    GROUP BYsa.SalesType

    ) AS x ON x.SalesType = st.SalesType

    WHEREst.RecID = @Items

    SET@Items = @Items - 1

    END

    -- Show the expected result

    SELECTSalestype,

    SalesPrice

    FROM@Stage

    ORDER BYSalestype,

    SalesPrice DESC


    N 56°04'39.16"
    E 12°55'05.25"

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply