Select distinct on multiple columns

  • Yeah, there was no forum for Sql server 2000.

    I'm going to go cry now 😉

  • steveb (3/3/2009)


    Thanks, this code should work orderd by Level

    SELECT * FROM YourTable WHERE LEVEL IN(

    SELECT MAX(LEVEL) AS LEVEL

    FROM YouTable

    GROUP BY [entry_id], [group_id])

    That will only ever allow me to return 1 row, correct? Assuming that I add in the where entry_id=? and group_id=? because otherwise it will only ever return results with the same level, where not all results that I want have the same level.

  • sqlcentral (3/3/2009)


    Yeah, there was no forum for Sql server 2000.

    Yes there is. The 2000 forums are below the 2005 forums on the forum home page.

    T-SQL (2000)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • sqlcentral (3/3/2009)


    steveb (3/3/2009)


    Thanks, this code should work orderd by Level

    SELECT * FROM YourTable WHERE LEVEL IN(

    SELECT MAX(LEVEL) AS LEVEL

    FROM YouTable

    GROUP BY [entry_id], [group_id])

    That will only ever allow me to return 1 row, correct? Assuming that I add in the where entry_id=? and group_id=? because otherwise it will only ever return results with the same level, where not all results that I want have the same level.

    No, this query will return all the distinct rows in your table where the level is the highest for each entry_Id and group_id

  • GilaMonster (3/3/2009)


    Yes there is. The 2000 forums are below the 2005 forums on the forum home page.

    T-SQL (2000)

    Sorry! When I was scanning the list I didn't notice the master groupings (i.e. I was only scanning under the master header for 2005)... Whoops!

    Anyway.

    Is the with/as syntax 2000 compliant?

  • sqlcentral (3/3/2009)


    That will only ever allow me to return 1 row, correct?

    No. There's a group by in the subquery, so that query will return 1 row for each [entry_id], [group_id], the one with the highest level.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • steveb (3/3/2009)


    No, this query will return all the distinct rows in your table where the level is the highest for each entry_Id and group_id

    Ok, perhaps my sample data wasn't broad enough. The level field isn't unique across all rows, only across rows with the entry and group ids identical. For example several rows have a level of 1 but different entry and group ids, so this query doesn't appear to work (tried it) because the inner query returns levels that exist for the duplicated rows. Here is some more accurate sample data:

    entry_id,group_id,cost,level

    1001 2001 5 3

    1001 2001 4 1

    1001 2000 5 1

    1002 2001 4 2

    1002 2001 3 1

    1002 2000 2 1

    So in this case your inner query will return (1,2,3) which will cause the outer query to return all results.

  • sqlcentral (3/3/2009)


    steveb (3/3/2009)


    No, this query will return all the distinct rows in your table where the level is the highest for each entry_Id and group_id

    Ok, perhaps my sample data wasn't broad enough. The level field isn't unique across all rows, only across rows with the entry and group ids identical. For example several rows have a level of 1 but different entry and group ids, so this query doesn't appear to work (tried it) because the inner query returns levels that exist for the duplicated rows. Here is some more accurate sample data:

    entry_id,group_id,cost,level

    1001 2001 5 3

    1001 2001 4 1

    1001 2000 5 1

    1002 2001 4 2

    1002 2001 3 1

    1002 2000 2 1

    So in this case your inner query will return (1,2,3) which will cause the outer query to return all results.

    That is why i asked for a unique column that defines the order, without a primary key on your table then this is going to be difficult.

    is there a reason you are avoiding the use of primary keys?

  • steveb (3/3/2009)


    That is why i asked for a unique column that defines the order, without a primary key on your table then this is going to be difficult.

    is there a reason you are avoiding the use of primary keys?

    Sorry, maybe I don't understand.

    This is an aggregate query that generates that list based off of several databases and conditions, so where would a primary key fit in?

  • sqlcentral (3/3/2009)


    The level field isn't unique across all rows, only across rows with the entry and group ids identical.

    Then try this

    SELECT * FROM YourTable OuterTable WHERE EXISTS (

    SELECT 1

    FROM YouTable sub

    WHERE sub.[entry_id] = OuterTable.entry_id and sub.[group_id] = OuterTable.group_id

    GROUP BY [entry_id], [group_id]

    HAVING MAX(sub.LEVEL) = OuterTable.level

    )

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • sqlcentral (3/3/2009)


    steveb (3/3/2009)


    That is why i asked for a unique column that defines the order, without a primary key on your table then this is going to be difficult.

    is there a reason you are avoiding the use of primary keys?

    Sorry, maybe I don't understand.

    This is an aggregate query that generates that list based off of several databases and conditions, so where would a primary key fit in?

    Okay Sorry i thought this was all in the same table

  • It ain't pretty, but will this get it?

    -----------------------------------------------------------------

    create table #temp (entry_id int ,group_id int,cost int,level int)

    insert into #temp

    select 1001, 2001, 5, 3 union all

    select 1001, 2001, 4, 1 union all

    select 1001, 2000, 5, 1 union all

    select 1002, 2001, 4, 2 union all

    select 1002, 2001, 3, 1 union all

    select 1002, 2000, 2, 1

    select t1.entry_id,t1.group_id,t1.cost,t1.level

    from #temp t1

    where cast(entry_ID as varchar(10))+cast(group_id as varchar(10))+cast(level as varchar(10))

    IN (select top 1 cast(entry_ID as varchar(10))+cast(group_id as varchar(10))+cast(level as varchar(10))

    from #temp t2

    where t2.entry_id = t1.entry_id

    and t2.group_id = t1.group_id

    order by t2.level desc)

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I guess SOMETHING did.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I took a different approach and restructured my data so that I didn't need to do this. The with/as row_number()/partition worked beautifully and did exactly what I wanted and was fairly efficient for large sets of data, however I'm stuck with using SQL Server 2000 and every other solution is horribly inefficient and requires a massive amount of DB work to do something very simple. I also played with stored procedures (cursor with line-by-line insert to a temp table with a primary key defined over the columns that I wanted was the fastest approach, but far from elegant). I figured going back to the drawing board and changing the requirement so I didn't need to do that was a better approach. It requires a bit of duplication of data in the DB (yes, ugly) but works much better overall. Sometimes you need to trade a small increase in DB size for efficiency. Disk space is cheap and we're far more concerned with the amount of time it takes to perform the transaction, so this works out better in the long run anyway.

    Thanks for all the suggestions!

  • Thanks for the reply. It's always good to know what the resolution was. I sent you that last example because I believe it will work on SQL 2000, but it wasn't tested against large datasets. If a redesign solved your problem, that's great. It's a good sign when thinking progresses from "How do I code a solution to this problem." to "Is there a better way to set up the database."

    Have a good one 🙂

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply