November 19, 2012 at 8:08 pm
Comments posted to this topic are about the item Deleting Duplicate Records
November 20, 2012 at 7:40 am
It's a lot more complex problem.
In a well designed database there should be no duplicates.
The simplest solution is to have proper constraints (primary, unique key) and have trigger validation on at least natural keys
Insert into #Employee VALUES (1, 'Mark Dunn', 'HR');
Insert into #Employee VALUES (2, 'Mark.Dunn', 'HR');
Insert into #Employee VALUES (3, 'Mark Dunn', 'HR');
Based on name column, these are duplicates as well, but would pass index constraints, so you have to check on normalized form of name in insert/update trigger.
It can still happen a user (curse them!) finds a pattern not covered in your trigger logic and inserts a sort-of-duplicate.
Second part of the problem is how to merge such duplicates as they may have a lot of references.
Just a bit of food for thought. 🙂
September 11, 2015 at 7:10 am
Right, it shouldn't happen in a production database but will have to be dealt with during ETL processes, especially if you want to be able to report on such errors.
September 14, 2015 at 7:18 am
We would have an identity column so wouldn't need to be using RowNumber in the query. Also we would have a unique natural key to not allow duplicates in the first place. But thanks for the interesting read.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply