May 21, 2005 at 10:04 pm
Does anyone have a better solution for this problem?
In the table there are one or more rows per person (pid). Each row is an aspect of the pid with other columns (a, b, c) defining the aspect. Each row as unique identifier (i). I need a query to pick the "best" row for a particular person.
1) The "best" row is the row with the lowest value for a.
2) If multiple rows have the same value for a, pick the row with the highest value for b.
3) If multiple rows have the same value for b, pick the row with the lowest value for c.
4) If multiple rows have the same value for c, pick the row with the lowest value for i.
The query below returns the correct result; but it is unwieldily. In the real world, there are four additional columns of attributes and they are a mix of int's, varchar's, and datetime's, and some are nullable. Does anyone have a different solution to this problem that may be eaiser to manage, understand, code, or with better performance?
create table #tbl (pid int, a int, b int, c int, i int identity)
insert #tbl (pid, a, b, c) values (1, 1, 1, 1)
insert #tbl (pid, a, b, c) values (1, 1, 2, 1)
insert #tbl (pid, a, b, c) values (1, 1, 2, 2)
insert #tbl (pid, a, b, c) values (2, 2, 2, 2)
insert #tbl (pid, a, b, c) values (3, 1, 2, 1)
insert #tbl (pid, a, b, c) values (3, 2, 2, 1)
insert #tbl (pid, a, b, c) values (3, 2, 2, 2)
select *
from #tbl best
where (pid = 1)
and not exists(
select *
from #tbl bad
where
(bad.pid = best.pid)
and (bad.i <> best.i)
and (
(bad.a < best.a)
or (
(bad.a = best.a)
and (
(bad.b > best.b)
or (
(bad.b = best.b)
and (
(bad.c < best.c)
or (
(bad.c = best.c)
and (bad.i < best.i)
)
)
)
)
)
)
)
pid a b c i
--- - - - -
1 1 2 1 2
pid a b c i
--- - - - -
2 2 2 2 4
pid a b c i
--- - - - -
3 1 2 1 5
May 21, 2005 at 10:15 pm
Select top 1 *
From #tbl
Where pid = 1
Order By a asc, b desc, c asc, i asc
pid a b c i
----------- ----------- ----------- ----------- -----------
1 1 2 1 2
pid a b c i
----------- ----------- ----------- ----------- -----------
2 2 2 2 4
pid a b c i
----------- ----------- ----------- ----------- -----------
3 1 2 1 5
May 22, 2005 at 11:25 pm
Worth noting that you can get a list of everyone's best rows with:
select * from #tbl t
where t.i =
(
select top 1 t1.i FROM #tbl t1
where t1.pid = t.pid
order by a asc, b desc, c desc, i asc
)
Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au
May 23, 2005 at 8:03 am
Rob's answer is the best with one exception, you will need to make the sort for "c" to be ascending instead of descending.
Joe, I'm curious why you said you never use an IDENTITY column in an RDBMS? They do have their uses.
hawg
----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011
May 23, 2005 at 10:00 am
If I can speak for Joe I think his school of thought is that you should be able to define a PK from a set of column(s) that are naturally unique otherwise you have not done you job in designing the table.
May 23, 2005 at 10:05 am
I will agree that it is best to build your database where there is a logical PK but I have had multiple instances where I simply needed a unique ID for my records to use as a primary key when there is no good option and an IDENTITY field provides what I need.
For example, I have built a system to help with my SQL administration and one of the things I do is list all of the SQL Servers in a table. I don't want to use the SQL Server name as the key because I prefer to use an integer as my key in this system so I use an IDENTITY field to automatically create a server ID that I then use for all my foreign key constraints throughout the system.
----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011
May 23, 2005 at 10:21 am
Calm down Joe before you blow a gasket.
I've never claimed to be a complete SQL expert and this was just a quick example. Regardless of what the definition is I still stand by my statement that IDENTITY [columns] have their place in an RDBMS.
hawg
----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011
May 23, 2005 at 11:05 am
Back to the original question...
Column a is more important than b, and b is more important than c. A row with min(a) and max(b) and min(c) may not exist. Unfortunately, my sample data did not reflect that. Joe Celko's solution will not work.
This additional sample data illustrates the issue.
insert #tbl (pid, a, b, c) values (4, 1, 2, 1)
insert #tbl (pid, a, b, c) values (4, 2, 3, 1)
May 23, 2005 at 12:00 pm
Rob's answer still works only with changing the "C" sort from desc to asc like the following:
select * from #tbl t
where t.i =
(
select top 1 t1.i FROM #tbl t1
where t1.pid = t.pid
order by a asc, b desc, c asc, i asc
)
----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011
May 23, 2005 at 12:54 pm
Yes, (4, 1, 2, 1) is the row the query should return. However, your query does not return any row for pid=4.
May 23, 2005 at 2:04 pm
If the values for a, b, and c are all positive numbers under ten:
If it were only so simple!
May 23, 2005 at 2:06 pm
Using TOP with ORDER BY is a great idea. So simple.
However, I simplified my situation a little too much. There are actually other columns that govern the sort order - different orders for past, current, and future rows. I'm trying to figure a way to use Rob's solution, anyway. I know expressions including CASE may be used in an ORDER BY clause; maybe that's the key. I will create a new thread if I can adequately describe the situation. Thanks all.
May 23, 2005 at 6:48 pm
Sorry - it was just a typo, putting 'desc' instead of 'asc'.
If you can describe what your actual problem is, it should be fairly easy to get a solution using similar technique.
As for the 'i' field... just use whatever is appropriate for the key. Of course, you might have a bit of fun if you have a multi-column key.
RobF
Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au
May 24, 2005 at 4:51 am
Just a thought: TOP is a reserved word in T-SQL and is not part of ANSI-99-SQL. If for some unexplained and undoubtedly ridiculous reason your company changes databases or you work in a multiple db environment SELECT statements using TOP will break.
The issue of portability should always be considered if only to reach the conclusion that it is not an issue because you company will never change no matter what happens.
Mike
May 24, 2005 at 7:56 am
Point well taken. But what does that mean when it comes to using stored procedures? Do we not use them?
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply