How to remove duplicates from column without using Temp table

Question

Post reply

How to remove duplicates from column without using Temp table

vaibhavraut88

SSC Enthusiast

Points: 136
More actions
June 16, 2011 at 1:49 am

#239263

hey ,
I am tring to removed duplicates without distinct keyword and temp table.
please give me any idea.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Sriram.RM Hall of Fame Points: 3368 More actions · Answer 1

u might wanna try this!!!!

SET ROWCOUNT 1

DELETE YOURTABLE

FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME) >1

WHILE @@ROWCOUNT >0

DELETE YOURTABLE

FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME ) >1

SET ROWCOUNT 0

[font="Times New Roman"]For better assistance in answering your questions
Click Here[/url][/font]

Ron McCullough SSC Guru Points: 63877 More actions · Answer 2

You may want to use a Common Table Expression (CTE) for example:

CREATE TABLE T(Date_Stamp DATETIME,KW1 DECIMAL(5,1), KW2 DECIMAL(5,1))

INSERT INTO T

SELECT '12/10/2010', 5.3, 3.1 UNION ALL

SELECT '12/9/2010', 4, 2 UNION ALL

SELECT '12/8/2010', 3, 1 UNION ALL

SELECT '12/7/2010', 7.4, 5 UNION ALL

SELECT '12/7/2010', 7.4, 5

;with cte

as (select row_number() over(partition by Date_Stamp,KW1,KW2 order by Date_Stamp) as rn,

KW1,KW2,Date_Stamp

from T)

SELECT * FROM cte

Result):

rnKW1KW2Date_Stamp

17.45.02010-12-07 00:00:00.000

27.45.02010-12-07 00:00:00.000

37.45.02010-12-07 00:00:00.000

13.01.02010-12-08 00:00:00.000

14.02.02010-12-09 00:00:00.000

15.33.12010-12-10 00:00:00.000

25.33.12010-12-10 00:00:00.000

Once you have verified the data, then replace the SELECT * FROM CTE statment with:

DELETE FROM cte WHERE rn > 1

For a matter of record the above code was copied from a posting by Jeff Moden on a SSC forum, and not my work, but one I use frequently.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 3

I like using subqueries and an Identity column. If you don't have an Identity, add it to the table. Then join the table to a subquery searching for the MIN() Identity. In the ON clause, match on the PK (or all the columns except the PK if the PK is the identity) where IdentityValue <> IdentityValue.

Do it as a Select statement to make sure it looks right, then turn it into a DELETE.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

vaibhavraut88 SSC Enthusiast Points: 136 More actions · Answer 4

Great Job dude ur script works. but can u explain me that code.

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 5

vaibhavraut88 (6/16/2011)
Great Job dude ur script works. but can u explain me that code.

Look up Common Table Expressions and Row_Number() in Books Online, or Google the phrases. There's a lot to know about CTEs and the best way to understand it is to learn it.

If it's the rest of the SELECT statement that you're having issues with, then I suggest you reverse engineer the code to a point where you can see what's going on. Start with the base SELECT statement, then start adding things (looking up terms along the way) until you get to a point of confusion. Then ask us specifically about that point.

As one of the frequent posters on this forum likes to say, "Don't use this code in a production environment until you understand what it does. After all, you'll be the one supporting it."

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

vaibhavraut88 (6/16/2011)
Great Job dude ur script works. but can u explain me that code.

Ummm.... which script? The one with the WHILE Loop or the other one?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 7

Sriram.RM (6/16/2011)
u might wanna try this!!!!
SET ROWCOUNT 1
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME) >1
WHILE @@ROWCOUNT >0
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME ) >1
SET ROWCOUNT 0

You should really set up a test on a million rows that has a fair number of duplicates and see how slow that actually is. Also, it may not matter if you're stuck in 2k5 but just so you know, SET ROWCOUNT has been deprecated and will be removed from future versions of SQL Server.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

bitbucket-25253 (6/16/2011)
For a matter of record the above code was copied from a posting by Jeff Moden on a SSC forum, and not my work, but one I use frequently.

You, Sir, are a gentleman and a scholar. Thank you for the kudo but it's a well known and common method. I'm just another "user" on this one. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 9

Brandie Tarvin (6/16/2011)
If you don't have an Identity, add it to the table.

I prefer a method which exploits the IDENTITY column, as well, but it's just not possible to add a column to a table for most folks. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 10

Jeff Moden (6/17/2011)
Brandie Tarvin (6/16/2011)
If you don't have an Identity, add it to the table.
I prefer a method which exploits the IDENTITY column, as well, but it's just not possible to add a column to a table for most folks. 🙂

Actually, I usually use it in conjunction with a temp table. @=) But he said no temp tables, so I left out that part of the suggestion.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 11

hi

on a related note , i once suggested the below approach and wanted your input on if it was a good solution, naturally i am talking about a huge table 25 million rows + and we needed to identity duplicates ,

I suggested using a hash value to identify the rows which are duplicates using Hasbytes , then counting only hashbytes to identify the dulicates rather than performing the distinct , group by , row_num operations on all columns in the table.

In retrospect i cant help feeling that this solution was too complicated.

Jayanth Kurup[/url]

Johan Bijnens SSC Guru Points: 135376 More actions · Answer 12

Keep in mind your hashes may produce duplicates as well !

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 13

Hashbytes seems overly complicated to me as well.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Sriram.RM Hall of Fame Points: 3368 More actions · Answer 14

Jeff Moden (6/17/2011)
Sriram.RM (6/16/2011)
u might wanna try this!!!!
SET ROWCOUNT 1
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME) >1
WHILE @@ROWCOUNT >0
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME ) >1
SET ROWCOUNT 0
You should really set up a test on a million rows that has a fair number of duplicates and see how slow that actually is. Also, it may not matter if you're stuck in 2k5 but just so you know, SET ROWCOUNT has been deprecated and will be removed from future versions of SQL Server.

My Bad... that indeed is a bad one... mistook it for the SQL 2000 forum!!!

[font="Times New Roman"]For better assistance in answering your questions
Click Here[/url][/font]