Removing Duplicate Rows

Question

Removing Duplicate Rows

GF

Hall of Fame

Points: 3774
More actions
February 28, 2005 at 12:50 pm

#85262

I have a table called company_contacts
CREATE TABLE [dbo].[company_contact] (
[contact_id] [int] IDENTITY (1, 1) NOT NULL ,
[company_id] [int] NOT NULL ,
[contact_type_id] [smallint] NOT NULL ,
[contact_name] [varchar] (55)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[company_contact] WITH NOCHECK ADD
CONSTRAINT [PK_company_contact] PRIMARY KEY CLUSTERED
(
[contact_id]
&nbsp ON [PRIMARY]
GO
I have removed superfluous columns from the script.
Sample Data
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (329,6,'John Deere')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (329,6,'John Deere')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (329,6,'John Deere')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (108,6,'Mary Washington')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (108,6,'Mary Washington')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (685,6,'Hillary clinton')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (3,6,'Dave Navaro')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
As one can see from the sample data, multiple entries have been made by mistake. This is a result of a bug in the application that did the insert.
My problem is this. I need to remove only duplicate entries while leaving on one entry for each person and contact type. For this exercise I am dealing only with contact type, 6.
After you insert the data you can see that in order to delete the duplicate rows you need the [contact_id] of the dupliccate row. I need help in finding the contact_id of just the duplicates while leaving any non duplicated row.
As always your help is appreciated.
Gary

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

rhunt SSCertifiable Points: 5868 More actions · Answer 1

Gary, I replied earlier to your problem and have since deleted that post. I was not reading clearly and have since noticed that you have an indentity column as your primary key.

Your solution could end up being very easy, or a bit of a hassle.

Tell me - are you replicating that PK as a foreign key in other tables?

rhunt SSCertifiable Points: 5868 More actions · Answer 2

Gary, this should get you started. Assuming that the duplicate record you want to keep is arbitrary, you can left join to a derived table to get the information you want:

SELECT MIN(contact_ID)as PK, company_id, contact_type_id, contact_name

FROM company_contacts

GROUP BY company_id, contact_type_id, contact_name

Based on this query, you can delete your duplicate records using:

DELETE FROM company_contacts

FROM company_contacts c

LEFT JOIN

(SELECT MIN(contact_ID)as PK, company_id, contact_type_id, contact_name

FROM company_contacts

GROUP BY company_id, contact_type_id, contact_name) vt

ON c.contact_ID=vt.PK

WHERE vt.PK IS NULL

I have never had to delete replicated FK's being used this way but I would think you could delete those the same way - by left joining to your cleaned up company_contacts table.

Also, your original post indicates that you ommitted superfluous columns from your DDL for the post. You will want to consider your requirements for these columns and adjust the SQL above.

Again, depending on your PK to FK relationsships, definitely logic check the above to make sure it works for you.

GF Hall of Fame Points: 3774 More actions · Answer 3

rhunt, the contact_id is not a foregin key in any table. The company_id is a foregin key to another table.

I was thinking along the same line as you but I just could not put it together.

Thanks for your replies.

Gary

Vladan SSC-Insane Points: 22072 More actions · Answer 4

Hi,

I tested rhunt's SQL and it does precisely what you need - as far as I can tell. I thought there was a typing error in it (superfluous FROM on the first line), but it seems that even with that it works correctly - I didn't know you can repeat it. However, I'd prefer to write it this way, because the 2 "FROM" look confusing to me:

DELETE company_contacts

FROM company_contacts c

LEFT JOIN

(SELECT MIN(contact_ID)as PK, company_id, contact_type_id, contact_name

FROM company_contacts

GROUP BY company_id, contact_type_id, contact_name) vt

ON c.contact_ID=vt.PK

WHERE vt.PK IS NULL

If it doesn't do the job, please let us know what's the problem.

Kishore-132325 Hall of Fame Points: 3784 More actions · Answer 5

Hi,

One more way of removing the duplicate records would be to dump the filtered data into a new table.

SELECT company_id,contact_type_id, contact_name into company_contacts

FROM company_contact

GROUP BY company_id, contact_type_id, contact_name

Then alter the new <company_contacts> table, to add the contact_id column

alter TABLE [dbo].[company_contacts]

add [contact_id] [int] IDENTITY (1, 1) NOT NULL

Then drop the existing table and rename the new table to the name of original table.

I know this is a much lenghty method of doing it. but just wanted to share it with you guys.

--Kishore.

Calvin Lawson SSChampion Points: 11030 More actions · Answer 6

Or you can just run this statement:

delete c1

From company_contact c1

JOIN company_contact c2

on c1.company_ID = c2.Company_ID

and c1.contact_type_id = c2.contact_type_id

and c1.contact_name = c2.contact_name

where c1.contact_id > c2.contact_id

Signature is NULL