unique Index for existing table with duplicates

Question

unique Index for existing table with duplicates

WebTechie

SSChampion

Points: 11405
More actions
January 6, 2011 at 12:28 pm

#230139

Hello,
I have a table with 150,000 records. The application was developed to catch the users entries and insure there would not be any duplicates.
Well that didn't work.
As I was reviewing the table, I noticed there are many duplicates. Somehow the users found a way to introduce duplicates. Therefore, I think this needs to be addressed at the database/table level and not with the developer's code.
I need to add a primary key for guestid and a unique index.
CREATE TABLE [dbo].[tblguest](
[GuestID] [int] IDENTITY(1,1) NOT NULL, --needs to be a primary key
[FirstName] [nvarchar](255) NULL, --needs to be nvarchar(50)
[LastName] [nvarchar](255) NULL, --needs to be nvarchar(50)
[BookDate] [datetime] NULL,
[Telephone] [nvarchar](20) NULL,
[EmailAddress] [nvarchar](255) NULL,
[Birthdate] [datetime] NULL,
[Address] [nvarchar](255) NULL,
[EmployeeID] [int] NULL,
[City] [nvarchar](255) NULL,
[State] [nvarchar](2) NULL,
[Zip] [nvarchar](15) NULL,
[TransDate] [datetime] NULL,
[TransUser] [nvarchar](20) NULL
I then need to create a unique nonclustered index on firstname, lastname, emailaddress and phone. However, there are some duplicates of these four fields.
What would be the best approach to make to create the unique index and get rid of the duplicates?
This is a production table.
Thanks.
Tony
Things will work out. Get back up, change some parameters and recode.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

WebTechie38 (1/6/2011)
Hello,
What would be the best approach to make to create the unique index and get rid of the duplicates?

I'd start with getting a straight count on the number of pure duplicates (all columns are equivalent), and the number of address only duplicates. If they match, you only have pure duplicates, which can be corrected easier because then you just need to reassociate the data to a single record and remove the other (my personal preference is to always keep the lowest ID one).

If they are un-equal, you'll have to use the pure duplicate list to clean up what you can, then pull out the address only ones and have a manual review on the records to determine accuracy. You could shortcut this by only using the most recent one, but I'd have signoff in triplicate from business before you did that.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

WayneS SSC Guru Points: 95460 More actions · Answer 2

Well, you have to get rid of the duplicates before doing the unique index. Here's how I'd do it:

WITH CTE AS(

SELECT [GuestID], [FirstName], [LastName], [BookDate],

[Telephone], [EmailAddress], [Birthdate], [Address],

[EmployeeID], [City], [State], [Zip], [TransDate], [TransUser],

RN = ROW_NUMBER() OVER (PARTITION BY firstname, lastname, emailaddress, telephone ORDER BY TransDate)

FROM tblguest

)

DELETE FROM CTE

WHERE RN > 1;

Edit: change the ROW_NUMBER() ORDER BY TO TransDate DESC to keep the newest; keep it as is to keep the first.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

luckysql.kinda SSCertifiable Points: 7816 More actions · Answer 3

MS SQL Corruption: "Msg 2627 - Violation of PRIMARY KEY constraint"

-Lucky

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

luckysql.kinda (1/6/2011)
MS SQL Corruption: "Msg 2627 - Violation of PRIMARY KEY constraint"

Um, no. This is not a corrupt database, it's a table that has duplicates that need removing.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

WebTechie SSChampion Points: 11405 More actions · Answer 5

Thanks Wayne,

I followed your suggestion and was able to clean the table up.

I've heard that CTEs are things to avoid. However, in this instance it worked fine.

All cleaned up and I've created the appropriate indices.

Thanks again.

Things will work out. Get back up, change some parameters and recode.