SQL Statement to get rid of duplicates????

  • Hello all,

    Quick question regarding creating a statement.

    We import a great number of records through to a main table and due to business logic we have to import duplicates. At the end of the process I run a query to find any records that are duplicates....

    /*the following query finds all the records brought through for a particular week where there

    are duplicates. This could be because we have received the same file numerous times from them*/

    SELECT Surname, Forename, NameOrderIndicator, MiddleNames,DateOfBirth,

    Gender,FormerSurname, PreferredForname,

    Ethnicity, DateSent, COUNT (*)

    FROM  dbo.Import_Changes_FULL

    WHERE DateSent = '2006-09-29 00:00:00.000'

    GROUP BY Surname, Forename, NameOrderIndicator, MiddleNames,DateOfBirth,

    Gender,FormerSurname, PreferredForname, Ethnicity, DateSent HAVING Count (*)>1

    ORDER BY Count (*)DESC

    this shows how many duplicates of each record we have. Instead of going into each record manually and deleting is there any way we can

    DELETE FROM dbo.Import_Changes_FULL, HAVING Count (*)>1

    Leaving one unique record? I dont think this is possible but I thought I would check first.

     

  •  

    If all the Columns are Dupe, then u can do this.

    SELECT Surname, Forename, NameOrderIndicator, MiddleNames,DateOfBirth,

    Gender,FormerSurname, PreferredForname,

    Ethnicity, DateSent, COUNT (*) as Cts

    into #tmp

    FROM  dbo.Import_Changes_FULL

    WHERE DateSent = '2006-09-29 00:00:00.000'

    GROUP BY Surname, Forename, NameOrderIndicator, MiddleNames,DateOfBirth,

    Gender,FormerSurname, PreferredForname, Ethnicity, DateSent HAVING Count (*)>1

    ORDER BY Count (*)DESC

    Delete dbo.Import_Changes_FULL where exists

     (Select 1 from #tmp where ltrim(rtrim(isnull(Import_Changes_FULL.Surname,''))) = ltrim(rtrim(isnull(#tmp.Surname,'')))

      and ltrim(rtrim(isnull(Import_Changes_FULL.Forename,''))) = ltrim(rtrim(isnull(#tmp.Forename,'')))

      and ltrim(rtrim(isnull(Import_Changes_FULL.NameOrderIndicator,''))) = ltrim(rtrim(isnull(#tmp.NameOrderIndicator,'')))

      and ltrim(rtrim(isnull(Import_Changes_FULL.MiddleNames,''))) = ltrim(rtrim(isnull(#tmp.MiddleNames,'')))

      and ltrim(rtrim(isnull(Import_Changes_FULL.DateOfBirth,'1/1/1900'))) = ltrim(rtrim(isnull(#tmp.DateOfBirth,'1/1/1900')))

      and ltrim(rtrim(isnull(Import_Changes_FULL.Gender,''))) = ltrim(rtrim(isnull(#tmp.Gender,'')))

      and ltrim(rtrim(isnull(Import_Changes_FULL.FormerSurname,''))) = ltrim(rtrim(isnull(#tmp.FormerSurname,'')))

      ....

      and ltrim(rtrim(isnull(Import_Changes_FULL.DateSent,'1/1/1900'))) = ltrim(rtrim(isnull(#tmp.DateSent,'1/1/1900')))

    &nbsp

    Insert into Import_Changes_FULL(.....)

     select .... from #tmp

    If there is Primary key on the "Source" table then your process can be modified to "Import New" and "Update Existing".

    Thanks

    Sreejith

  • Rather than inserting all the records including the duplicates, might I suggest doing that into a staging table, then using this process:

    1. Add a unique index to the "real" table so that dupes are not allowed in

    2. Use this pattern to add the new, incoming data from the staging table to the real one:

    Update realTable ... where exists ( select rows that match the staging table )

    Insert into realTable select distinct ... where not exists ( select rows that match the staging table )

    This will update existing records and insert new ones, with no oportunity for the duplicates to be created in the first place.

  • Brilliant. Its worked. Thanks loads for that. i honestly didnt think it could be done.

  • Which response worked? I have a similar situation, that I would like to test this on.

  • I went for the first one. I used the SELECT statement to select all my records into a temporary table where the records were duplicate and then I used the delete from statement.

     

    What I found was after I created the SELECT into I did a sum on the  COUNT (*) which gave me the total number of duplicates. Then when I ran the delete statement it said that that number of records was affected.

    I grabbed a few names to check and after I ran the delete statement there was only 1 record in the file.

    Hope it works for you!!!

    Debbie

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply