Select distinct on multiple columns

Question

Post reply

Select distinct on multiple columns

sqlcentral-1032101

Old Hand

Points: 314
More actions
March 3, 2009 at 9:31 am

#128118

Ok, after beating my head for a day there has GOT to be an easy way of doing this. MySQL can do it in about a dozen different ways, but MSSQL seems to not be able to.
I have an aggregate query which piles through a bunch of databases and returns a result set like so (simplified):
entry_id,group_id,cost
1001 2001 5
1001 2001 4
1001 2000 5
1002 2001 3
1002 2000 2
now all I need to do is skip the second row so that I can use the cost to calculate something based on transactions. Very simple! Basically generate distinct results on entry_id and group_id but not on cost. Can't figure it out.
I've tried distinct, but it works on all rows. This is a simplified result set, so using min/max as aggregates to get around the distinct limitation won't work because it won't choose an entire row. I've tried creating a temporary table with unique constraint, but can't find a way to get it to just insert the rows that don't fail. There doesn't appear to be a first aggreate function. I've tried doing a where not exists select but because of how complicated the initial query is it becomes a huge mess.
All I need to do is return the result set with distinct entry_id,group_id columns by choosing only the first result that matches in the result set.
There has got to be a way to do this, right?
Of course, to make matters worse, I have to use SQL Server 2000.
Thanks in advance for your assistance!

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

steveb. SSC-Forever Points: 46733 More actions · Answer 1

Do you have a primary key to determine the order?

sqlcentral-1032101 Old Hand Points: 314 More actions · Answer 2

Yes, the results are returned in such an order that the first result is the one that I want to keep, any subsequent results with the same values across those two columns can be ignored or discarded.

steveb. SSC-Forever Points: 46733 More actions · Answer 3

If you can let me know what the name of the order cokumn is then i can help you ..

If not then this can be using ROW_ORDER()

ab5sr SSCarpal Tunnel Points: 4056 More actions · Answer 4

I did a blog once on something like this - would this help any?

http://www.texastoo.com/post/TSQL-UNPIVOT.aspx

sqlcentral-1032101 Old Hand Points: 314 More actions · Answer 5

Ok, let's assume there is another column:

entry_id,group_id,cost,level

1001 2001 5 7

1001 2001 4 5

1001 2000 5 8

1002 2001 3 2

1002 2000 2 6

for any given set of rows where the entry and group ids are equal I need to keep the row that has the highest level. The results will always be returned so that for any given set of matching entry and group ids the first record is the one I want to keep.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

sqlcentral (3/3/2009)
Yes, the results are returned in such an order that the first result is the one that I want to keep, any subsequent results with the same values across those two columns can be ignored or discarded.

Don't count on the order being consistent unless you have an ORDER BY which is based on a unique combination (which will guarantee the rows will always be returned in a consistent order).

As of right now - like has been mentioned before- we're missing one column (to specify the order) in order to generate an example.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Mark Cowne One Orange Chip Points: 26968 More actions · Answer 7

You need a way to fully define your ordering to give you a "first".

See if either of these help

with cte as (

select entry_id,group_id,cost,

row_number() over(partition by entry_id,group_id order by level desc) as rn

from mytable)

select entry_id,group_id,cost

from cte

where rn=1

select a.entry_id,a.group_id,a.cost

from mytable a

where not exists (select * from mytable b

where b.entry_id=a.entry_id

and b.group_id=a.group_id

and b.level>a.level)

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

sqlcentral-1032101 Old Hand Points: 314 More actions · Answer 8

Lee: Thanks, that looks neat but I shudder while thinking how it would apply for non-trivial result sets. That looks like a lot of work for a relatively simple problem.

I can't believe that there is no way to do an insert into a table and have it skip failed rows instead of trash the entire insert operation.

steveb. SSC-Forever Points: 46733 More actions · Answer 9

sqlcentral (3/3/2009)
Lee: Thanks, that looks neat but I shudder while thinking how it would apply for non-trivial result sets. That looks like a lot of work for a relatively simple problem.
I can't believe that there is no way to do an insert into a table and have it skip failed rows instead of trash the entire insert operation.

There is a very simple way, you just need to tell us what the unique column is that defines the order of your data.

sqlcentral-1032101 Old Hand Points: 314 More actions · Answer 10

Steveb: I already added another post with a column called level.

sqlcentral-1032101 Old Hand Points: 314 More actions · Answer 11

Mark: the row_number() over partition looks like it's working!!

Please tell me that is Sql Server 2000 compliant 🙂 (my test machine is Sql 2005 Express)

steveb. SSC-Forever Points: 46733 More actions · Answer 12

sqlcentral (3/3/2009)
Mark: the row_number() over partition looks like it's working!!
Please tell me that is Sql Server 2000 compliant 🙂 (my test machine is Sql 2005 Express)

Unfortunately, Row_number() won't work on sql 2000.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

sqlcentral (3/3/2009)
Please tell me that is Sql Server 2000 compliant 🙂 (my test machine is Sql 2005 Express)

Nope. 2005 and higher only. You did post in a 2005 forum.

Doing it in 2000 is a lot more tricky and usually involves temp tables and identity columns.

See if Mark's othe query helps.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

steveb. SSC-Forever Points: 46733 More actions · Answer 14

sqlcentral (3/3/2009)
Steveb: I already added another post with a column called level.

Thanks, this code should work orderd by Level

SELECT * FROM YourTable WHERE LEVEL IN(

SELECT MAX(LEVEL) AS LEVEL

FROM YouTable

GROUP BY [entry_id], [group_id])