June 16, 2011 at 1:49 am
hey ,
I am tring to removed duplicates without distinct keyword and temp table.
please give me any idea.
June 16, 2011 at 5:12 am
u might wanna try this!!!!
SET ROWCOUNT 1
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME) >1
WHILE @@ROWCOUNT >0
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME ) >1
SET ROWCOUNT 0
June 16, 2011 at 6:00 am
You may want to use a Common Table Expression (CTE) for example:
CREATE TABLE T(Date_Stamp DATETIME,KW1 DECIMAL(5,1), KW2 DECIMAL(5,1))
INSERT INTO T
SELECT '12/10/2010', 5.3, 3.1 UNION ALL
SELECT '12/10/2010', 5.3, 3.1 UNION ALL
SELECT '12/9/2010', 4, 2 UNION ALL
SELECT '12/8/2010', 3, 1 UNION ALL
SELECT '12/7/2010', 7.4, 5 UNION ALL
SELECT '12/7/2010', 7.4, 5 UNION ALL
SELECT '12/7/2010', 7.4, 5
;with cte
as (select row_number() over(partition by Date_Stamp,KW1,KW2 order by Date_Stamp) as rn,
KW1,KW2,Date_Stamp
from T)
SELECT * FROM cte
Result):
rnKW1KW2Date_Stamp
17.45.02010-12-07 00:00:00.000
27.45.02010-12-07 00:00:00.000
37.45.02010-12-07 00:00:00.000
13.01.02010-12-08 00:00:00.000
14.02.02010-12-09 00:00:00.000
15.33.12010-12-10 00:00:00.000
25.33.12010-12-10 00:00:00.000
Once you have verified the data, then replace the SELECT * FROM CTE statment with:
DELETE FROM cte WHERE rn > 1
For a matter of record the above code was copied from a posting by Jeff Moden on a SSC forum, and not my work, but one I use frequently.
June 16, 2011 at 6:01 am
I like using subqueries and an Identity column. If you don't have an Identity, add it to the table. Then join the table to a subquery searching for the MIN() Identity. In the ON clause, match on the PK (or all the columns except the PK if the PK is the identity) where IdentityValue <> IdentityValue.
Do it as a Select statement to make sure it looks right, then turn it into a DELETE.
June 16, 2011 at 11:45 pm
Great Job dude ur script works. but can u explain me that code.
June 17, 2011 at 4:40 am
vaibhavraut88 (6/16/2011)
Great Job dude ur script works. but can u explain me that code.
Look up Common Table Expressions and Row_Number() in Books Online, or Google the phrases. There's a lot to know about CTEs and the best way to understand it is to learn it.
If it's the rest of the SELECT statement that you're having issues with, then I suggest you reverse engineer the code to a point where you can see what's going on. Start with the base SELECT statement, then start adding things (looking up terms along the way) until you get to a point of confusion. Then ask us specifically about that point.
As one of the frequent posters on this forum likes to say, "Don't use this code in a production environment until you understand what it does. After all, you'll be the one supporting it."
June 17, 2011 at 5:01 am
vaibhavraut88 (6/16/2011)
Great Job dude ur script works. but can u explain me that code.
Ummm.... which script? The one with the WHILE Loop or the other one?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 17, 2011 at 5:03 am
Sriram.RM (6/16/2011)
u might wanna try this!!!!SET ROWCOUNT 1
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME) >1
WHILE @@ROWCOUNT >0
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME ) >1
SET ROWCOUNT 0
You should really set up a test on a million rows that has a fair number of duplicates and see how slow that actually is. Also, it may not matter if you're stuck in 2k5 but just so you know, SET ROWCOUNT has been deprecated and will be removed from future versions of SQL Server.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 17, 2011 at 5:08 am
bitbucket-25253 (6/16/2011)
For a matter of record the above code was copied from a posting by Jeff Moden on a SSC forum, and not my work, but one I use frequently.
You, Sir, are a gentleman and a scholar. Thank you for the kudo but it's a well known and common method. I'm just another "user" on this one. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
June 17, 2011 at 5:10 am
Brandie Tarvin (6/16/2011)
If you don't have an Identity, add it to the table.
I prefer a method which exploits the IDENTITY column, as well, but it's just not possible to add a column to a table for most folks. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
June 17, 2011 at 5:13 am
Jeff Moden (6/17/2011)
Brandie Tarvin (6/16/2011)
If you don't have an Identity, add it to the table.I prefer a method which exploits the IDENTITY column, as well, but it's just not possible to add a column to a table for most folks. 🙂
Actually, I usually use it in conjunction with a temp table. @=) But he said no temp tables, so I left out that part of the suggestion.
June 18, 2011 at 4:29 am
hi
on a related note , i once suggested the below approach and wanted your input on if it was a good solution, naturally i am talking about a huge table 25 million rows + and we needed to identity duplicates ,
I suggested using a hash value to identify the rows which are duplicates using Hasbytes , then counting only hashbytes to identify the dulicates rather than performing the distinct , group by , row_num operations on all columns in the table.
In retrospect i cant help feeling that this solution was too complicated.
June 18, 2011 at 7:31 am
Keep in mind your hashes may produce duplicates as well !
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
June 20, 2011 at 5:14 am
Hashbytes seems overly complicated to me as well.
June 21, 2011 at 5:34 am
Jeff Moden (6/17/2011)
Sriram.RM (6/16/2011)
u might wanna try this!!!!SET ROWCOUNT 1
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME) >1
WHILE @@ROWCOUNT >0
DELETE YOURTABLE
FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME ) >1
SET ROWCOUNT 0
You should really set up a test on a million rows that has a fair number of duplicates and see how slow that actually is. Also, it may not matter if you're stuck in 2k5 but just so you know, SET ROWCOUNT has been deprecated and will be removed from future versions of SQL Server.
My Bad... that indeed is a bad one... mistook it for the SQL 2000 forum!!!
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply