Duplicate Records

  • 1) What is the easiet way to check for duplicate records in a database table?

    2) What is the easiet way to delete dupicate records in a database table and correct the table?

    3)  Are there any utilities on market for SQL-2000 and SQL-2005 that facillitate correcting duplicate records?

    Thanks

    SQL2DAY

  • 1 & 2) Yes!  There are scripts on this site which I have used to identify and delete duplicates!  Unfortunately I just tried doing a search but I am getting an error returned!

    I'd suggest going to the search utility on this site and you'll find lots of good examples.  I do have the code but it's on my work PC, so if you don't have anything by tomorrow, I'll post on here.

    3) The easist way is to have good indexing!  Having a unique index on your table is the best method!

  • The answer is on this board, not sure which forum, probably the t-sql one... search for "removing duplicate records".  We had an "unusual" problem recently where the PK had a GUID in it, and was therefore, technically, not a duplicate.  The other two fields in the concatenated PK were duplicates... and someone here provided an a very slick solution.  For "normal" duplicates, there are probably a dozen solutions here.

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • The easiest way to highlight any duplicate records would be to count records per fields that should be unique.

    If your table should only have one record for Example : District and Employee, then a query like the following could be used :

    Select District, Employee, count(1) from My_Table, Group by District, Employee HAVING count(1) > 1

    This should return all groupings of more than one record.

    Depending on table sizes ect, remove duplicates with SELECT DISTINCT INTO statements.

    Reccomend looking into proper Key and update designs.

     

  • A good DBA test is to figure out how many different methods you can employ to identifiy and remove duplicate rows.

    I find common causes of duplicates include, assuming an identity or guid ( uniqueidentifier !?! ) are actually unique and not applying a unique constraint on the column, multicolumn PK's ( often including an identity or guid ) that forget to apply unique constraints to individual columns where required.

    Just to point out that an identity column is actually a column property and does not guarantee or enforce uniqueness ( ditto guids and newid() )

     

     

     

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Thank You all for your responses.

    My production table is DCPData_Basic_2006.  It has almost 8000 duplicate enteries.  The primary key was set on the first field ID.  Which from reading above is not unique and I agree.  Two additional columns DCP_ID and DateTime are fields I used to search for duplicate records of which I found 8000.  I understand these are put into a holding table.  What I don't understand is how to apply back to production table eliminating all duplicates and only having 1 record per DCP_ID & DateTime.  Could someone please explain?

     

    sql2day

  • You also have to determine WHICH duplicate to get rid of. There's a difference between duplicate keys, and duplicate records. You may have duplicate customers: ABC Company entered twice, but maybe one of them has a more recent Activity_Date or something. If the entire record is a duplicate, then it's easy to cleanup, otherwise you might have to be more picky and careful.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply