duplicate records

  • Im trying to extract a single record for which there may be 2 to n not-quite duplicates. I want the most recent based on a date field and am using TOP 1 order by date desc. However, the complication is that the duplicate columns are not the ones I want to ID the record by. Im trying to use a correlated subquery but am restricted by the fact that I can only return 1 column wheras my key is 2 columns. Here is my current attempt (using NOLOCK because its a single user DB and there are 8.5 million rows):

    My problem is that I need the a.herd_identity to be equal to b.herd_identity as well i.e. the latest record in the subquery that is equal to the herd and animal ID in the outer query. I think this should work but it appears not to. Any help is welcome.

    SELECT a.*

    FROM NMR_New_Duplicate_primary a WITH (NOLOCK)

    WHERE a.animal_identity =

    (SELECT TOP 1 b.animal_identity, b.herd_identity

    FROM NMR_new_duplicate_primary b WITH (NOLOCK)

    where a.hbn_emn = b.hbn_emn

    and a.breed_id = b.breed_id

    and a.pedigree_status = b.pedigree_status

    and a.id_type = b.id_type

    ORDER by b.last_changed_date desc)

  • This was removed by the editor as SPAM

  • I think the following code should do the trick.

    In the second part of the join statement, make sure you select and join by all ID fields.

    Then, you join this to your original table, with all ID fields and the changed date.

    
    
    SELECT A.*
    FROM NMR_NEW_DUPLICATE_PRIMARY A
    INNER JOIN
    ( SELECT hbn_emn, breed_id,
    pedigree_status, id_type,
    MAX(last_changed_date) AS lcd
    FROM NMR_NEW_DUPLICATE_PRIMARY
    GROUP BY hbn_emn, breed_id,
    pedigree_status, id_type
    ) B
    ON A.hbn_emn = B.hbn_emn
    AND A.breed_id = B.breed_id
    AND A.pedigree_status = B.pedigree_status
    AND A.id_type = B.id_type
    AND A.last_changed_date = B.lcd

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply