select first entry in every set?

  • In a SQL query, my results sets sometime contains records in which there are 2 or more records which are not quite duplicates.  For instance, there may be 2 or more records with the same application id, but different dates.  I only want one of the records.

    If I could I would assign an sequence identity id to every record (in the results set) and restart the sequence identify values to 1 every time there was a change in the application id, thus all the records I wanted would have sequence id of 1.

    How can you do something like that with SQL?

    Thanks for any and all help.

    Russ....

  • In SQL 2005, there are RANK functions which can generate sequences for situations like this.

    In SQL2000, you typically join to a derived table, which uses 1 ofthe SQL aggregate functions to isolate the 1 record for each set. For example, in your data, if you want the Application ID plus the row with the most recent date:

    >>For instance, there may be 2 or more records with the same application id, but different dates. 

    Select a.*

    From YourApplictionTable As a

    Inner Join

    -- Build a derived table of most recent date per set of ApplicationID

    (

      Select ApplicationID, Max(AppDate) As MostRecentDate

      From YourApplictionTable

      Group By ApplicationID

    ) dtRecent

    On (a.ApplicationID = dtRecent.ApplicationID And

          a.AppDate = dtRecent.MostRecentDate

    )

    This approach obviously runs into issues if there are ties and 2 or more records have the same value for the column you're aggregating.

  • Do you want the earliest of the records, latest, don't care? If you want first or last, how will you break ties? If you don't care which record you keep it would make it easier to write a view or derived table returning the PKs of the desired records (but if you don't care what's in the field, do you even want the field?) 

    You can use numbering with an UPDATE of a temp table/variable, but a single SELECT statment is likely to be more efficient.

     

     

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • PW you assume dates are unique within a group. Not a very safe assumption, esp. if there's no time portion, or frequent transactions.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • It's a bit inelegant but you can use a hash value to avoid problems with duplicate dates.

    Select

    a.*

    From

    YourApplicationTable As a

    Inner

    Join

    (

    Select ApplicationID, cast(substring(max(select convert(char(8),App_Date,112) + str(ApplicationTablePK,10)),9,10) as int) As ApplicationTablePK

    From YourApplictionTable
    Group By ApplicationID

    )

    V

    On

    (a.ApplicationTablePK = V.ApplicationTablePK

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Almost identical question appeared several times during the last month, I'm sure you'll be able to find some of these discussions.

    General warning : if you want to simply get one of the records and don't care which one it is (often described in the requirements as "need the first row, without any ordering"), then there is something wrong with the design. It is absolutely legitimate to retrieve a row that is the oldest, newest, last updated, with greatest value in "amount" column etc., but bringing up "any first row" is not correct. Theoretically, if you run such query 5 times without any changes in data, you could get 5 different results.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply