How can we delete duplicate records... conditions apply.

Question

Post reply

How can we delete duplicate records... conditions apply.

Hari.Sharma

SSCertifiable

Points: 6904
More actions
August 24, 2007 at 6:16 am

#373301
Hi ALL!
How can we delete duplicate records from a TABLE in SQL Server
Following points are to be consider while writing query:
- Use of other table, temporary table, table variable is not allowed.
- Table has no primary key
Example:
TABLE
Name Amount
--------------
A 15000
A 15000
A 15000
B 12000
B 12000
B 12000
B 12000
C 20000
C 20000
Output After deletion of duplicate records:
Name Amount
--------------
A 15000
B 12000
C 20000
Cheers,
Hari
Tips & Tricks for SQL BI Developers

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Mark Shvarts 1 SSCertifiable Points: 5702 More actions · Answer 1

with t as

(

select

rn= row_number() over (order by col1,col2),

col1,

col2

from tst1

)

delete t

where rn not in

(

select min(rn)

from t

group by col1, col2

)

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 2

this type of post should be in newbies not performance tuning.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

Bob Fazio SSChampion Points: 10794 More actions · Answer 3

And what is with these restrictions? Sounds like homework to me No temp tables? In any case you have your answer.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

1) Why do you restrict the use of temp tables or table variables?

2) I think this is a better version of the rownumber code posted above. It avoids the cost of the min and grouping stuff. UNTESTED:

with t as

(

select

rn= row_number() over (order by col1,col2),

col1,

col2

from tst1

)

delete t

where rn > 1

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Mark Shvarts 1 SSCertifiable Points: 5702 More actions · Answer 5

1. Probably the author intended to post SQL puzzle. Maybe site administrators should create such kind of forum, SQL puzzles ?

2. Your code actually deletes all the rows except the first one. We need to delete just the duplicates.

noeld SSC Guru Points: 96590 More actions · Answer 6

CREATE TABLE xxxx(

Name varchar(20), Amount int

)

INSERT INTO xxxx (name, Amount) Values ('A', 15000)

INSERT INTO xxxx (name, Amount) Values ('B', 12000)

INSERT INTO xxxx (name, Amount) Values ('C', 20000)

with t (rn, name, Amount) as

(

select

rn= row_number() over (partition by name, Amount order by name, Amount),

name, Amount

from xxxx

)

delete t

where rn > 1

select * from xxxx

* Noel

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

good correction noeld. I missed that the first example didn't have the partitioning.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Mark Shvarts 1 SSCertifiable Points: 5702 More actions · Answer 8

Partition as same thing as grouping, in my tests both queries showed same estimated cost

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

Not sure what tests you did. Try this one. As you can see, the single-pass mechanism used by the partition query is significantly more efficient than the second method.

set nocount on

CREATE TABLE #t(

Name varchar(20), Amount int

)

INSERT INTO #t (name, Amount) Values ('A', 15000)

INSERT INTO #t (name, Amount) Values ('B', 12000)

INSERT INTO #t (name, Amount) Values ('C', 20000)

;with t (rn, name, Amount) as

(

select

rn= row_number() over (partition by name, Amount order by name, Amount),

name, Amount

from #t

)

delete t

where rn > 1

--1 scan, 7 reads, cost 0.0247

truncate table #t

INSERT INTO #t (name, Amount) Values ('A', 15000)

INSERT INTO #t (name, Amount) Values ('B', 12000)

INSERT INTO #t (name, Amount) Values ('C', 20000)

;with t as

(

select

rn= row_number() over (order by name,amount),

name,

amount

from #t

)

delete t

where rn not in

(

select min(rn)

from t

group by name, amount

)

--2 scans, 36 reads, cost 0.0638

drop table #t

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Mark Shvarts 1 SSCertifiable Points: 5702 More actions · Answer 10

Yes, you're right. Now I got same numbers as yours. Sorry, when I tested them yesterday I apparently tested same query twice.

Shaun McGuile SSCarpal Tunnel Points: 4141 More actions · Answer 11

I find this question extremely annoying.

Hiding under a desk from SSIS Implemenation Work :crazy:

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 12

Heh... why? If you think this is bad, do a search on "Bankers Rounding"

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Terry Gamble Ten Centuries Points: 1072 More actions · Answer 13

This is what I use for my queries when finding and/or deleting duplicates:

Note: I use a counter to determine how many times to cycle through the database in order to limit how many records I might be deleting. You can also add where clasuses to either of the subqueries to help out as well. Credit however is due to either google or this site, as I saw and modified this code from somewhere other than my own brain.

declare

@Counter int

set

@Counter=80

set

rowcount 1

while

@Counter>0

begin

delete

from tf where empid in (SELECT empid

FROM

tf GROUP BY empid

HAVING

( COUNT(empid) > 1 ))

set @Counter=@Counter-1

end

set

rowcount 0

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 14

Gosh is that inefficient tho . . .

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service