picking the "best" row out of a set

Question

Post reply

picking the "best" row out of a set

dongadoy

SSC Eights!

Points: 900
More actions
May 21, 2005 at 10:04 pm

#69085

Does anyone have a better solution for this problem?
In the table there are one or more rows per person (pid). Each row is an aspect of the pid with other columns (a, b, c) defining the aspect. Each row as unique identifier (i). I need a query to pick the "best" row for a particular person.
1) The "best" row is the row with the lowest value for a.
2) If multiple rows have the same value for a, pick the row with the highest value for b.
3) If multiple rows have the same value for b, pick the row with the lowest value for c.
4) If multiple rows have the same value for c, pick the row with the lowest value for i.
The query below returns the correct result; but it is unwieldily. In the real world, there are four additional columns of attributes and they are a mix of int's, varchar's, and datetime's, and some are nullable. Does anyone have a different solution to this problem that may be eaiser to manage, understand, code, or with better performance?
create table #tbl (pid int, a int, b int, c int, i int identity)
insert #tbl (pid, a, b, c) values (1, 1, 1, 1)
insert #tbl (pid, a, b, c) values (1, 1, 2, 1)
insert #tbl (pid, a, b, c) values (1, 1, 2, 2)
insert #tbl (pid, a, b, c) values (2, 2, 2, 2)
insert #tbl (pid, a, b, c) values (3, 1, 2, 1)
insert #tbl (pid, a, b, c) values (3, 2, 2, 1)
insert #tbl (pid, a, b, c) values (3, 2, 2, 2)
select *
from #tbl best
where (pid = 1)
and not exists(
   select *
   from #tbl bad
   where
      (bad.pid = best.pid)
      and (bad.i <> best.i)
      and (
         (bad.a < best.a)
         or (
            (bad.a = best.a)
            and (
               (bad.b > best.b)
               or (
                  (bad.b = best.b)
                  and (
                     (bad.c < best.c)
                     or (
                        (bad.c = best.c)
                        and (bad.i < best.i)
                        )
                     )
                  )
               )
            )
         )
   )
pid a b c i
--- - - - -
1   1 2 1 2
pid a b c i
--- - - - -
2   2 2 2 4
pid a b c i
--- - - - -
3   1 2 1 5

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

ron k SSCertifiable Points: 6103 More actions · Answer 1

Select top 1 *

From #tbl

Where pid = 1

Order By a asc, b desc, c asc, i asc

pid a b c i

----------- ----------- ----------- ----------- -----------

1 1 2 1 2

pid a b c i

----------- ----------- ----------- ----------- -----------

2 2 2 2 4

pid a b c i

----------- ----------- ----------- ----------- -----------

3 1 2 1 5

Rob Farley SSCrazy Points: 2739 More actions · Answer 2

Worth noting that you can get a list of everyone's best rows with:

select * from #tbl t

where t.i =

(

select top 1 t1.i FROM #tbl t1

where t1.pid = t.pid

order by a asc, b desc, c desc, i asc

)

Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au

Jason Givens SSCertifiable Points: 6048 More actions · Answer 3

Rob's answer is the best with one exception, you will need to make the sort for "c" to be ascending instead of descending.

Joe, I'm curious why you said you never use an IDENTITY column in an RDBMS? They do have their uses.

hawg

----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

ron k SSCertifiable Points: 6103 More actions · Answer 4

If I can speak for Joe I think his school of thought is that you should be able to define a PK from a set of column(s) that are naturally unique otherwise you have not done you job in designing the table.

Jason Givens SSCertifiable Points: 6048 More actions · Answer 5

I will agree that it is best to build your database where there is a logical PK but I have had multiple instances where I simply needed a unique ID for my records to use as a primary key when there is no good option and an IDENTITY field provides what I need.

For example, I have built a system to help with my SQL administration and one of the things I do is list all of the SQL Servers in a table. I don't want to use the SQL Server name as the key because I prefer to use an integer as my key in this system so I use an IDENTITY field to automatically create a server ID that I then use for all my foreign key constraints throughout the system.

----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

Jason Givens SSCertifiable Points: 6048 More actions · Answer 6

Calm down Joe before you blow a gasket.

I've never claimed to be a complete SQL expert and this was just a quick example. Regardless of what the definition is I still stand by my statement that IDENTITY [columns] have their place in an RDBMS.

hawg

----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

dongadoy SSC Eights! Points: 900 More actions · Answer 7

Back to the original question...

Column a is more important than b, and b is more important than c. A row with min(a) and max(b) and min(c) may not exist. Unfortunately, my sample data did not reflect that. Joe Celko's solution will not work.

This additional sample data illustrates the issue.

insert #tbl (pid, a, b, c) values (4, 1, 2, 1)

insert #tbl (pid, a, b, c) values (4, 2, 3, 1)

Jason Givens SSCertifiable Points: 6048 More actions · Answer 8

Rob's answer still works only with changing the "C" sort from desc to asc like the following:

select * from #tbl t

where t.i =

(

select top 1 t1.i FROM #tbl t1

where t1.pid = t.pid

order by a asc, b desc, c asc, i asc

)

----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

dongadoy SSC Eights! Points: 900 More actions · Answer 9

Yes, (4, 1, 2, 1) is the row the query should return. However, your query does not return any row for pid=4.

dongadoy SSC Eights! Points: 900 More actions · Answer 10

If the values for a, b, and c are all positive numbers under ten:

If it were only so simple!

dongadoy SSC Eights! Points: 900 More actions · Answer 11

Using TOP with ORDER BY is a great idea. So simple.

However, I simplified my situation a little too much. There are actually other columns that govern the sort order - different orders for past, current, and future rows. I'm trying to figure a way to use Rob's solution, anyway. I know expressions including CASE may be used in an ORDER BY clause; maybe that's the key. I will create a new thread if I can adequately describe the situation. Thanks all.

Rob Farley SSCrazy Points: 2739 More actions · Answer 12

Sorry - it was just a typo, putting 'desc' instead of 'asc'.

If you can describe what your actual problem is, it should be fairly easy to get a solution using similar technique.

As for the 'i' field... just use whatever is appropriate for the key. Of course, you might have a bit of fun if you have a multi-column key.

RobF

Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au

Michael Du Bois SSCertifiable Points: 5193 More actions · Answer 13

Just a thought: TOP is a reserved word in T-SQL and is not part of ANSI-99-SQL. If for some unexplained and undoubtedly ridiculous reason your company changes databases or you work in a multiple db environment SELECT statements using TOP will break.

The issue of portability should always be considered if only to reach the conclusion that it is not an issue because you company will never change no matter what happens.

Mike

ron k SSCertifiable Points: 6103 More actions · Answer 14

Point well taken. But what does that mean when it comes to using stored procedures? Do we not use them?