February 28, 2005 at 12:50 pm
I have a table called company_contacts
CREATE TABLE [dbo].[company_contact] (
[contact_id] [int] IDENTITY (1, 1) NOT NULL ,
[company_id] [int] NOT NULL ,
[contact_type_id] [smallint] NOT NULL ,
[contact_name] [varchar] (55)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[company_contact] WITH NOCHECK ADD
CONSTRAINT [PK_company_contact] PRIMARY KEY CLUSTERED
(
[contact_id]
  ON [PRIMARY]
GO
I have removed superfluous columns from the script.
Sample Data
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (329,6,'John Deere')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (329,6,'John Deere')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (329,6,'John Deere')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (108,6,'Mary Washington')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (108,6,'Mary Washington')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (685,6,'Hillary clinton')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1002,6,'Shania Twain')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (3,6,'Dave Navaro')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
insert into company_contacts (company_id ,contact_type_id ,contact_name ) values (1,6,'William Tale')
As one can see from the sample data, multiple entries have been made by mistake. This is a result of a bug in the application that did the insert.
My problem is this. I need to remove only duplicate entries while leaving on one entry for each person and contact type. For this exercise I am dealing only with contact type, 6.
After you insert the data you can see that in order to delete the duplicate rows you need the [contact_id] of the dupliccate row. I need help in finding the contact_id of just the duplicates while leaving any non duplicated row.
As always your help is appreciated.
Gary
February 28, 2005 at 2:06 pm
Gary, I replied earlier to your problem and have since deleted that post. I was not reading clearly and have since noticed that you have an indentity column as your primary key.
Your solution could end up being very easy, or a bit of a hassle.
Tell me - are you replicating that PK as a foreign key in other tables?
February 28, 2005 at 2:25 pm
Gary, this should get you started. Assuming that the duplicate record you want to keep is arbitrary, you can left join to a derived table to get the information you want:
SELECT MIN(contact_ID)as PK, company_id, contact_type_id, contact_name
FROM company_contacts
GROUP BY company_id, contact_type_id, contact_name
Based on this query, you can delete your duplicate records using:
DELETE FROM company_contacts
FROM company_contacts c
LEFT JOIN
(SELECT MIN(contact_ID)as PK, company_id, contact_type_id, contact_name
FROM company_contacts
GROUP BY company_id, contact_type_id, contact_name) vt
ON c.contact_ID=vt.PK
WHERE vt.PK IS NULL
I have never had to delete replicated FK's being used this way but I would think you could delete those the same way - by left joining to your cleaned up company_contacts table.
Also, your original post indicates that you ommitted superfluous columns from your DDL for the post. You will want to consider your requirements for these columns and adjust the SQL above.
Again, depending on your PK to FK relationsships, definitely logic check the above to make sure it works for you.
February 28, 2005 at 3:55 pm
rhunt, the contact_id is not a foregin key in any table. The company_id is a foregin key to another table.
I was thinking along the same line as you but I just could not put it together.
Thanks for your replies.
Gary
March 1, 2005 at 1:03 am
Hi,
I tested rhunt's SQL and it does precisely what you need - as far as I can tell. I thought there was a typing error in it (superfluous FROM on the first line), but it seems that even with that it works correctly - I didn't know you can repeat it. However, I'd prefer to write it this way, because the 2 "FROM" look confusing to me:
DELETE company_contacts
FROM company_contacts c
LEFT JOIN
(SELECT MIN(contact_ID)as PK, company_id, contact_type_id, contact_name
FROM company_contacts
GROUP BY company_id, contact_type_id, contact_name) vt
ON c.contact_ID=vt.PK
WHERE vt.PK IS NULL
If it doesn't do the job, please let us know what's the problem.
March 1, 2005 at 5:42 am
Hi,
One more way of removing the duplicate records would be to dump the filtered data into a new table.
SELECT company_id,contact_type_id, contact_name into company_contacts
FROM company_contact
GROUP BY company_id, contact_type_id, contact_name
Then alter the new <company_contacts> table, to add the contact_id column
alter TABLE [dbo].[company_contacts]
add [contact_id] [int] IDENTITY (1, 1) NOT NULL
Then drop the existing table and rename the new table to the name of original table.
I know this is a much lenghty method of doing it. but just wanted to share it with you guys.
--Kishore.
March 1, 2005 at 12:15 pm
Or you can just run this statement:
delete c1
From company_contact c1
JOIN company_contact c2
on c1.company_ID = c2.Company_ID
and c1.contact_type_id = c2.contact_type_id
and c1.contact_name = c2.contact_name
where c1.contact_id > c2.contact_id
Signature is NULL
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply