Select distinct on multiple columns

  • Ok, after beating my head for a day there has GOT to be an easy way of doing this. MySQL can do it in about a dozen different ways, but MSSQL seems to not be able to.

    I have an aggregate query which piles through a bunch of databases and returns a result set like so (simplified):

    entry_id,group_id,cost

    1001 2001 5

    1001 2001 4

    1001 2000 5

    1002 2001 3

    1002 2000 2

    now all I need to do is skip the second row so that I can use the cost to calculate something based on transactions. Very simple! Basically generate distinct results on entry_id and group_id but not on cost. Can't figure it out.

    I've tried distinct, but it works on all rows. This is a simplified result set, so using min/max as aggregates to get around the distinct limitation won't work because it won't choose an entire row. I've tried creating a temporary table with unique constraint, but can't find a way to get it to just insert the rows that don't fail. There doesn't appear to be a first aggreate function. I've tried doing a where not exists select but because of how complicated the initial query is it becomes a huge mess.

    All I need to do is return the result set with distinct entry_id,group_id columns by choosing only the first result that matches in the result set.

    There has got to be a way to do this, right?

    Of course, to make matters worse, I have to use SQL Server 2000.

    Thanks in advance for your assistance!

  • Do you have a primary key to determine the order?

  • Yes, the results are returned in such an order that the first result is the one that I want to keep, any subsequent results with the same values across those two columns can be ignored or discarded.

  • If you can let me know what the name of the order cokumn is then i can help you ..

    If not then this can be using ROW_ORDER()

  • I did a blog once on something like this - would this help any?

    http://www.texastoo.com/post/TSQL-UNPIVOT.aspx

  • Ok, let's assume there is another column:

    entry_id,group_id,cost,level

    1001 2001 5 7

    1001 2001 4 5

    1001 2000 5 8

    1002 2001 3 2

    1002 2000 2 6

    for any given set of rows where the entry and group ids are equal I need to keep the row that has the highest level. The results will always be returned so that for any given set of matching entry and group ids the first record is the one I want to keep.

  • sqlcentral (3/3/2009)


    Yes, the results are returned in such an order that the first result is the one that I want to keep, any subsequent results with the same values across those two columns can be ignored or discarded.

    Don't count on the order being consistent unless you have an ORDER BY which is based on a unique combination (which will guarantee the rows will always be returned in a consistent order).

    As of right now - like has been mentioned before- we're missing one column (to specify the order) in order to generate an example.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • You need a way to fully define your ordering to give you a "first".

    See if either of these help

    with cte as (

    select entry_id,group_id,cost,

    row_number() over(partition by entry_id,group_id order by level desc) as rn

    from mytable)

    select entry_id,group_id,cost

    from cte

    where rn=1

    select a.entry_id,a.group_id,a.cost

    from mytable a

    where not exists (select * from mytable b

    where b.entry_id=a.entry_id

    and b.group_id=a.group_id

    and b.level>a.level)

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Lee: Thanks, that looks neat but I shudder while thinking how it would apply for non-trivial result sets. That looks like a lot of work for a relatively simple problem.

    I can't believe that there is no way to do an insert into a table and have it skip failed rows instead of trash the entire insert operation.

  • sqlcentral (3/3/2009)


    Lee: Thanks, that looks neat but I shudder while thinking how it would apply for non-trivial result sets. That looks like a lot of work for a relatively simple problem.

    I can't believe that there is no way to do an insert into a table and have it skip failed rows instead of trash the entire insert operation.

    There is a very simple way, you just need to tell us what the unique column is that defines the order of your data.

  • Steveb: I already added another post with a column called level.

  • Mark: the row_number() over partition looks like it's working!!

    Please tell me that is Sql Server 2000 compliant 🙂 (my test machine is Sql 2005 Express)

  • sqlcentral (3/3/2009)


    Mark: the row_number() over partition looks like it's working!!

    Please tell me that is Sql Server 2000 compliant 🙂 (my test machine is Sql 2005 Express)

    Unfortunately, Row_number() won't work on sql 2000.

  • sqlcentral (3/3/2009)


    Please tell me that is Sql Server 2000 compliant 🙂 (my test machine is Sql 2005 Express)

    Nope. 2005 and higher only. You did post in a 2005 forum.

    Doing it in 2000 is a lot more tricky and usually involves temp tables and identity columns.

    See if Mark's othe query helps.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • sqlcentral (3/3/2009)


    Steveb: I already added another post with a column called level.

    Thanks, this code should work orderd by Level

    SELECT * FROM YourTable WHERE LEVEL IN(

    SELECT MAX(LEVEL) AS LEVEL

    FROM YouTable

    GROUP BY [entry_id], [group_id])

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply