August 24, 2007 at 6:16 am
Hi ALL!
How can we delete duplicate records from a TABLE in SQL Server
Following points are to be consider while writing query:
Example:
TABLE
Name Amount
--------------
A 15000
A 15000
A 15000
B 12000
B 12000
B 12000
B 12000
C 20000
C 20000
Output After deletion of duplicate records:
Name Amount
--------------
A 15000
B 12000
C 20000
August 24, 2007 at 8:41 am
with t as
(
select
rn= row_number() over (order by col1,col2),
col1,
col2
from tst1
)
delete t
where rn not in
(
select min(rn)
from t
group by col1, col2
)
August 29, 2007 at 4:38 am
this type of post should be in newbies not performance tuning.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
August 30, 2007 at 6:39 am
And what is with these restrictions? Sounds like homework to me No temp tables? In any case you have your answer.
August 30, 2007 at 8:34 am
1) Why do you restrict the use of temp tables or table variables?
2) I think this is a better version of the rownumber code posted above. It avoids the cost of the min and grouping stuff. UNTESTED:
with t as
(
select
rn= row_number() over (order by col1,col2),
col1,
col2
from tst1
)
delete t
where rn > 1
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 30, 2007 at 9:01 am
1. Probably the author intended to post SQL puzzle. Maybe site administrators should create such kind of forum, SQL puzzles ?
2. Your code actually deletes all the rows except the first one. We need to delete just the duplicates.
August 30, 2007 at 12:26 pm
CREATE TABLE xxxx(
Name varchar(20), Amount int
)
INSERT INTO xxxx (name, Amount) Values ('A', 15000)
INSERT INTO xxxx (name, Amount) Values ('A', 15000)
INSERT INTO xxxx (name, Amount) Values ('A', 15000)
INSERT INTO xxxx (name, Amount) Values ('B', 12000)
INSERT INTO xxxx (name, Amount) Values ('B', 12000)
INSERT INTO xxxx (name, Amount) Values ('B', 12000)
INSERT INTO xxxx (name, Amount) Values ('B', 12000)
INSERT INTO xxxx (name, Amount) Values ('C', 20000)
INSERT INTO xxxx (name, Amount) Values ('C', 20000)
with t (rn, name, Amount) as
(
select
rn= row_number() over (partition by name, Amount order by name, Amount),
name, Amount
from xxxx
)
delete t
where rn > 1
select * from xxxx
* Noel
August 30, 2007 at 1:19 pm
good correction noeld. I missed that the first example didn't have the partitioning.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 30, 2007 at 2:33 pm
Partition as same thing as grouping, in my tests both queries showed same estimated cost
August 30, 2007 at 3:08 pm
Not sure what tests you did. Try this one. As you can see, the single-pass mechanism used by the partition query is significantly more efficient than the second method.
set nocount on
CREATE TABLE #t(
Name varchar(20), Amount int
)
INSERT INTO #t (name, Amount) Values ('A', 15000)
INSERT INTO #t (name, Amount) Values ('A', 15000)
INSERT INTO #t (name, Amount) Values ('A', 15000)
INSERT INTO #t (name, Amount) Values ('B', 12000)
INSERT INTO #t (name, Amount) Values ('B', 12000)
INSERT INTO #t (name, Amount) Values ('B', 12000)
INSERT INTO #t (name, Amount) Values ('B', 12000)
INSERT INTO #t (name, Amount) Values ('C', 20000)
INSERT INTO #t (name, Amount) Values ('C', 20000)
;with t (rn, name, Amount) as
(
select
rn= row_number() over (partition by name, Amount order by name, Amount),
name, Amount
from #t
)
delete t
where rn > 1
--1 scan, 7 reads, cost 0.0247
truncate table #t
INSERT INTO #t (name, Amount) Values ('A', 15000)
INSERT INTO #t (name, Amount) Values ('A', 15000)
INSERT INTO #t (name, Amount) Values ('A', 15000)
INSERT INTO #t (name, Amount) Values ('B', 12000)
INSERT INTO #t (name, Amount) Values ('B', 12000)
INSERT INTO #t (name, Amount) Values ('B', 12000)
INSERT INTO #t (name, Amount) Values ('B', 12000)
INSERT INTO #t (name, Amount) Values ('C', 20000)
INSERT INTO #t (name, Amount) Values ('C', 20000)
;with t as
(
select
rn= row_number() over (order by name,amount),
name,
amount
from #t
)
delete t
where rn not in
(
select min(rn)
from t
group by name, amount
)
--2 scans, 36 reads, cost 0.0638
drop table #t
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 31, 2007 at 11:49 am
Yes, you're right. Now I got same numbers as yours. Sorry, when I tested them yesterday I apparently tested same query twice.
September 3, 2007 at 1:59 am
I find this question extremely annoying.
Hiding under a desk from SSIS Implemenation Work :crazy:
September 7, 2007 at 9:56 pm
Heh... why? If you think this is bad, do a search on "Bankers Rounding"
--Jeff Moden
Change is inevitable... Change for the better is not.
September 10, 2007 at 2:11 pm
This is what I use for my queries when finding and/or deleting duplicates:
Note: I use a counter to determine how many times to cycle through the database in order to limit how many records I might be deleting. You can also add where clasuses to either of the subqueries to help out as well. Credit however is due to either google or this site, as I saw and modified this code from somewhere other than my own brain.
declare
@Counter int
set
@Counter=80
set
rowcount 1
while
@Counter>0
begin
delete
from tf where empid in (SELECT empid
FROM
tf GROUP BY empid
HAVING
( COUNT(empid) > 1 ))
set @Counter=@Counter-1
end
set
rowcount 0
September 10, 2007 at 2:16 pm
Gosh is that inefficient tho . . .
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply