June 2, 2006 at 10:40 am
In a SQL query, my results sets sometime contains records in which there are 2 or more records which are not quite duplicates. For instance, there may be 2 or more records with the same application id, but different dates. I only want one of the records.
If I could I would assign an sequence identity id to every record (in the results set) and restart the sequence identify values to 1 every time there was a change in the application id, thus all the records I wanted would have sequence id of 1.
How can you do something like that with SQL?
Thanks for any and all help.
Russ....
June 2, 2006 at 10:49 am
In SQL 2005, there are RANK functions which can generate sequences for situations like this.
In SQL2000, you typically join to a derived table, which uses 1 ofthe SQL aggregate functions to isolate the 1 record for each set. For example, in your data, if you want the Application ID plus the row with the most recent date:
>>For instance, there may be 2 or more records with the same application id, but different dates.
Select a.*
From YourApplictionTable As a
Inner Join
-- Build a derived table of most recent date per set of ApplicationID
(
Select ApplicationID, Max(AppDate) As MostRecentDate
From YourApplictionTable
Group By ApplicationID
) dtRecent
On (a.ApplicationID = dtRecent.ApplicationID And
a.AppDate = dtRecent.MostRecentDate
)
This approach obviously runs into issues if there are ties and 2 or more records have the same value for the column you're aggregating.
June 2, 2006 at 10:49 am
Do you want the earliest of the records, latest, don't care? If you want first or last, how will you break ties? If you don't care which record you keep it would make it easier to write a view or derived table returning the PKs of the desired records (but if you don't care what's in the field, do you even want the field?)
You can use numbering with an UPDATE of a temp table/variable, but a single SELECT statment is likely to be more efficient.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 2, 2006 at 10:52 am
PW you assume dates are unique within a group. Not a very safe assumption, esp. if there's no time portion, or frequent transactions.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 2, 2006 at 11:03 am
It's a bit inelegant but you can use a hash value to avoid problems with duplicate dates.
a.*
YourApplicationTable As a
Join
Select ApplicationID, cast(substring(max(select convert(char(8),App_Date,112) + str(ApplicationTablePK,10)),9,10) as int) As ApplicationTablePK
V
(a.ApplicationTablePK = V.ApplicationTablePK
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 5, 2006 at 7:25 am
Almost identical question appeared several times during the last month, I'm sure you'll be able to find some of these discussions.
General warning : if you want to simply get one of the records and don't care which one it is (often described in the requirements as "need the first row, without any ordering"), then there is something wrong with the design. It is absolutely legitimate to retrieve a row that is the oldest, newest, last updated, with greatest value in "amount" column etc., but bringing up "any first row" is not correct. Theoretically, if you run such query 5 times without any changes in data, you could get 5 different results.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply