October 20, 2005 at 5:23 pm
I have a table that has many duplicate records and I need to delete all but
one of the duplicate records. Is there a way I can identify these records with a sql
statement?
Example records Where one needs to stay while the other two get deleted:
(Comment_id, site_id, date_entered, user_id, comment)
6978, AL28492, 07/17/2005, 408, No longer a Franchise, replace w/ non-franchise signage.
8161, AL28492, 08/15/2005, 408, No longer a Franchise, replace w/ non-franchise signage.
8702, AL28492, 09/11/2005, 408, No longer a Franchise, replace w/ non-franchise signage.
The comment field is a data type text, if that makes a difference.
October 20, 2005 at 5:49 pm
Mike - you don't say which 2 get deleted - would it be the min or max comment_id or the min or max date_entered that would stay and the others deleted ?!?!
Here's a select you can "play around" with..and then you can work on your "delete" once you get the "select" to give you the results you want...
select t2.comment_id, t1.site_id, t1.date_entered, t1.user_id, t1.comment from tblTest t1 join (select min(comment_id) as comment_id from tblTest group by site_id, user_id)t2 on t1.comment_id = t2.comment_id
**ASCII stupid question, get a stupid ANSI !!!**
October 21, 2005 at 7:06 am
I tried what you said and I'm getting 3 errors (all the same) "line 1: The column prefix 't1' does not match with a table name or alias name used in the query."...And my little brain says it looks correct.
BTW, According to my boss, I can keep whichever comment row I decide...just as long as the others are gone. I was planning on keeping the ones with the max comment_id.
select t2.comment_id, t1.site_id, t1.date_entered, t1.user_id, t1.comment
from statusrptcomments1 t1
join
(select min(t1.comment_id) as comment_id from statusrptcomments1
group by t1.site_id, t1.user_id) t2 on t1.comment_id = t2.comment_id
October 21, 2005 at 7:15 am
Once you decide on which fields you can use to denote the duplicate records i.e. is it combination of user_id and site_id itself that makes it a duplicate ?
If so, then you can do something like this:
CREATE TABLE TESTCASE (COL1 INT, COL2 DATETIME, COL3 VARCHAR(100), SITE_ID VARCHAR(20), USER_ID INT, COMMENT TEXT)
GO
INSERT INTO TESTCASE VALUES (1, GETDATE(), 'FIRST ROW', 'AL28492', 408, 'SAME ROW')
INSERT INTO TESTCASE VALUES (2, DATEADD(mi, 100, GETDATE()), 'SECOND ROW', 'AL28492', 408, 'SAME ROW')
INSERT INTO TESTCASE VALUES (3, DATEADD(mi, 1000, GETDATE()), 'THIRD ROW', 'AL28492', 408, 'SAME ROW')
GO
--PRESERVE THE LAST ROW
DELETE FROM TESTCASE
WHERE EXISTS (SELECT 1 FROM TESTCASE DUPSINNER
WHERE DUPSINNER.SITE_ID = TESTCASE.SITE_ID
AND DUPSINNER.USER_ID = TESTCASE.USER_ID
AND DUPSINNER.COL1 > TESTCASE.COL1)
--PRESERVE THE FIRST ROW
DELETE FROM TESTCASE
WHERE EXISTS (SELECT 1 FROM TESTCASE DUPSINNER
WHERE DUPSINNER.SITE_ID = TESTCASE.SITE_ID
AND DUPSINNER.USER_ID = TESTCASE.USER_ID
AND DUPSINNER.COL1 < TESTCASE.COL1)
And once you have deleted and fixed the data, you should enforce a unique constraint on the combination that should be unique in this table.
October 21, 2005 at 7:26 am
select t2.comment_id, t1.site_id, t1.date_entered, t1.[user_id], t1.comment
from dbo.statusrptcomments1 t1 inner join
(select min(comment_id) as comment_id from dbo.statusrptcomments1
group by site_id, [user_id],comment) t2 on t1.comment_id = t2.comment_id
DELETE t1
FROM dbo.statusrptcomments1 t1 LEFT OUTER JOIN
(select min(comment_id) as comment_id from dbo.statusrptcomments1
group by site_id, [user_id],comment) t2 on t1.comment_id = t2.comment_id
WHERE t2.Comment_ID IS NULL
Vasc
October 21, 2005 at 7:39 am
Thanks Vasc, but I just finished using the example that Journeyman gave me and it did exactly what I needed.
Thanks to all of you for your help...now that I look at the examples, it seems so simple. Basically, as I read the examples, I need an inner sql statement that makes a row unique and then an outer statement that deletes all the other rows...Or in the case of Journeyman's example, an inner statement that pulls all the non-unique rows and deletes them. Is this correct?
October 21, 2005 at 8:02 am
Well rsharma is deleting all the post made by a user in one dep but keeps 1 (a user will have MAX 1 post for each dep) while I m deleting all duplicate posts made by a user is a certain dep : )
DECLARE @TESTCASE TABLE(COL1 INT, COL2 DATETIME, COL3 VARCHAR(100), SITE_ID VARCHAR(20), USER_ID INT, COMMENT TEXT)
INSERT INTO @TESTCASE VALUES (1, GETDATE(), 'FIRST ROW', 'AL28492', 408, 'SAME ROW')
INSERT INTO @TESTCASE VALUES (2, DATEADD(mi, 100, GETDATE()), 'SECOND ROW', 'AL28492', 408, 'SAME ROW')
INSERT INTO @TESTCASE VALUES (3, DATEADD(mi, 1000, GETDATE()), 'THIRD ROW', 'AL28492', 408, 'SAME ROW')
INSERT INTO @TESTCASE VALUES (4, DATEADD(mi, 2000, GETDATE()), 'THIRD ROW', 'AL28492', 408, 'NEW ROW')
--PRESERVE THE LAST ROW
/*
DELETE FROM @TESTCASE
WHERE EXISTS (SELECT 1 FROM @TESTCASE DUPSINNER
WHERE DUPSINNER.SITE_ID = @TESTCASE.SITE_ID
AND DUPSINNER.USER_ID = @TESTCASE.USER_ID
AND DUPSINNER.COL1 > @TESTCASE.COL1)
*/
--PRESERVE THE FIRST ROW
DELETE T FROM @TESTCASE T
WHERE EXISTS (SELECT 1 FROM @TESTCASE DUPSINNER
WHERE DUPSINNER.SITE_ID = T.SITE_ID
AND DUPSINNER.USER_ID = T.USER_ID
AND DUPSINNER.COL1 < T.COL1)
SELECT * FROM @TESTCASE
VS
declare @T table (Comment_id integer, site_id varchar(10), date_entered datetime, [user_id] integer, comment varchar(50))
--replace @T with @T
INSERT INTO @T(Comment_id, date_entered, site_id, [user_id], comment) VALUES (1, GETDATE(), 'AL28492', 408, 'SAME ROW')
INSERT INTO @T(Comment_id, date_entered, site_id, [user_id], comment) VALUES (2, DATEADD(mi, 100, GETDATE()), 'AL28492', 408, 'SAME ROW')
INSERT INTO @T(Comment_id, date_entered, site_id, [user_id], comment) VALUES (3, DATEADD(mi, 1000, GETDATE()), 'AL28492', 408, 'SAME ROW')
INSERT INTO @T(Comment_id, date_entered, site_id, [user_id], comment) VALUES (4, DATEADD(mi, 2000, GETDATE()), 'AL28492', 408, 'NEW ROW')
SELECT * FROM @T
select t2.comment_id, t1.site_id, t1.date_entered, t1.[user_id], t1.comment
from @T t1 inner join
(select min(comment_id) as comment_id from @T
group by site_id, [user_id],comment) t2 on t1.comment_id = t2.comment_id
DELETE t1
FROM @T t1 LEFT OUTER JOIN
(select min(comment_id) as comment_id from @T
group by site_id, [user_id],comment) t2 on t1.comment_id = t2.comment_id
WHERE t2.Comment_ID IS NULL
Vasc
October 21, 2005 at 9:41 am
If if doesn't matter which record you keep you could just create a cursor that includes each duplicate and the number of times the duplicate appears in the table. Loop through the cursor and each time set rowcount to the number of duplicates minus 1 then do a delete from the table where the site id is equal to your cursor value.
First just to see your duplicates run this;
Select site_id,count(*) as count from duplicate_tab
group by site_id
having count(*) > 1
Then you can run this;
declare
@site varchar(50),
@cnt int,
@rowcnt int
/** This get all Site_ID's that have more than one in the table **/
Declare c_site cursor FOR
Select site_id,count(*) as count from duplicate_tab
group by site_id
having count(*) > 1
Open c_site
Fetch Next from c_site into @site, @cnt
While (@@Fetch_Status = 0)
Begin
set @rowcnt = @cnt - 1
set rowcount @rowcnt
Delete from duplicate_tab
Where site_id = @site
set rowcount 0
Fetch Next From c_site into @site, @cnt
End
close c_site
deallocate c_site
--------------------------------------------------------
This always worked for me.
Thanks,
Jeannine
October 21, 2005 at 9:50 am
Thanks...I try to stay away from cursors, but I like the select statement you did.
October 21, 2005 at 9:59 am
What a wonderfully diplomatic response Mike - applause!
Jeannine - I wanted to warn you before the "CursorsAreEvil" police come swooping down on you - please search this site for cursors etc.. and you'll find an abundance of articles, posts etc.. that explain at great length about why cursors are evil and how you must avoid them whenever possible...
**ASCII stupid question, get a stupid ANSI !!!**
October 21, 2005 at 10:30 am
Yes cursors are nasty if you have a lot of data. But as a quick looping example they are good.
I usually loop through the table using one of the column identifiers. Usually they are the PK or at least indexed.
Below is an example that uses the method of determining the minimum and maximum values for the id column and then augmenting the ID each time through the loop.
Declare
@maxsite varchar(50),
@minsite varchar(50),
@cnt int,
@rowcnt int
-- Get the min and max site_id values where the site ID has dups
select @minsite = Min(site_id), @maxsite = Max(Site_ID) from duplicate_tab Where Exists(Select site_id,count(*) as count from duplicate_tab
group by site_id
having count(*) > 1)
-- Loop through the site_id's based on the min and max
While (@minsite <= @maxsite)
Begin
select @cnt = count(*) from duplicate_tab Where
site_id = @minsite
group by site_id
having count(*) > 1
set @rowcnt = @cnt - 1
set rowcount @rowcnt
Delete from duplicate_tab
Where site_id = @minsite
set rowcount 0
-- determine the next site_id that has duplicates but is still the minimum in the remaining list
select @minsite = Min(site_id) from duplicate_tab Where
site_id > @minsite and
Exists(Select site_id,count(*) as count from duplicate_tab
group by site_id
having count(*) > 1)
End
-Jeannine
November 20, 2005 at 10:46 pm
Hi friend,
check out this query
(Comment_id, site_id, date_entered, user_id, comment)
6978, AL28492, 07/17/2005, 408, No longer a Franchise, replace w/ non-franchise signage.
8161, AL28492, 08/15/2005, 408, No longer a Franchise, replace w/ non-franchise signage.
8702, AL28492, 09/11/2005, 408, No longer a Franchise, replace w/ non-franchise signage
Delete from tblTest where Comment_id not in(select max(Comment_id) from tblTest group by site_id, user_id)
bye
prabhanjan
October 10, 2006 at 6:59 pm
Just to let you know that this has been a real help for me. I was looking to eliminate duplicates with different timestamp and I got stuck for a while as I'm just doing SQL once in a while, so I'm very rusty. Thanks for the query.
Regards,
Martin
October 10, 2006 at 7:47 pm
Glad my post could help. I was surprised to see activity on this post a year later
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply