July 22, 2008 at 11:08 pm
Comments posted to this topic are about the item Delete Duplicate values
July 22, 2008 at 11:13 pm
Abhijit,
just give a short explanation of your topic, so it will be more clear to all.
Cheers!
Sandy.
--
October 9, 2008 at 11:03 pm
I agree that you need to write something to go with your script.
And I believe that your script is slightly flawed, it looks like it will only delete the first duplicate record of each duplicate set and will leave all the rest intact (your test data only had 2 duplicates in any set, as far as I noticed, so it worked fine for that data set). I would suggest making the following minor change, which would keep the first record of each duplicate set and delete all others (the first one in theory anyway, SQL gives no guarantee on order of duplicate records, and how would you know anyway? 😉 ).
DELETE FROM cteDV WHERE RID <> 1
You could also saved a keystroke and said '> 1', but both will do the same thing in this case.
For a quick demo using asterisk in the select statement of your CTE works but I would suggest spelling out the columns so that the code doesn't break should the source table structure change.
Over all a nice little script for getting rid of annoying duplicate records, thank you for sharing it with us.
October 10, 2008 at 2:53 am
Kenney,
I appreciate your suggestion. thank you :).
Abhijit - http://abhijitmore.wordpress.com
October 10, 2008 at 5:54 am
Just to give physical representation of Kenny's suggestions...
--===== Create a temporary demonstration table and populate it with duplicated info
-- This is not part of the solution... it's just creating a demo.
-- This DROP is just to support reruns for test purposes
IF OBJECT_ID('tempdb..#DuplicateVals') IS NOT NULL
DROP TABLE #DuplicateVals
CREATE TABLE #DuplicateVals
(
Col1VARCHAR(3),
Col2VARCHAR(3),
Col3VARCHAR(3)
)
INSERT INTO #DuplicateVals
(Col1,Col2,Col3)
SELECT 'aa1','aa1','aa1' UNION ALL --More than 1 duplicate from here...
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL --... to here
SELECT 'aa2','aa2','aa2' UNION ALL
SELECT 'aa2','aa2','aa2' UNION ALL
SELECT 'aa3','aa3','aa3' UNION ALL --No duplicates. Will it survive?
SELECT 'aa4','aa4','aa4' UNION ALL
SELECT 'aa4','aa4','aa4' UNION ALL
SELECT 'aa5','aa5','aa5' UNION ALL --No duplicates. Will it survive?
SELECT 'aa6','aa6','aa6' UNION ALL --More than 1 duplicate from here...
SELECT 'aa6','aa6','aa6' UNION ALL
SELECT 'aa6','aa6','aa6' --... to here
--===== Sanity check: Display the contents of the test table BEFORE the deletion
SELECT * FROM #DuplicateVals
--===== Delete duplicates without any temporal information present.
-- Note that there isn't actually any reason to return any columns from the table in the CTE...
-- Just the RowNum will suffice.
;WITH cteDV AS
(
SELECTROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 ORDER BY Col1, Col2, Col3 ) AS RowNum
FROM#DuplicateVals
)
DELETE FROM cteDV WHERE RowNum > 1
--===== Sanity check: Display the contents of the test table AFTER the deletion
SELECT * FROM #DuplicateVals
... the details are in the code but I wanted to point out that you don't really need to return any of the table's columns in the CTE for this method to work... just the RowNum.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 10, 2008 at 6:36 am
Great comments everyone.
This code introduced me to the WITH statement. perhaps I had seen it before, but I apparently put it out of my mind.
From what I've read, this common table expression acts like a view inside a query.
I've never been able to get this straight, but since it acts like a view, is that why deleting items from the cteDV will delete items from the base temp table, or is this something specific to the WITH statement? :unsure:
I got my information from the following site: http://www.valentina-db.com/dokuwiki/doku.php?id=paradigma:public:en:releases:2.5:statement_with_option
Keith Wiggans
October 10, 2008 at 7:37 am
Very clever! I never thought to delete from a CTE!
October 10, 2008 at 8:10 pm
Vert Cool~~
But I still cannot understand why deleting items from the cteDV will delete items from the base temporary table, or is this something specific to the CTE statemen.
So amazing and so confusing.
Can anyone give me the answer? Many thanks
October 10, 2008 at 11:57 pm
Cade.Bu (10/10/2008)
Vert Cool~~But I still cannot understand why deleting items from the cteDV will delete items from the base temporary table, or is this something specific to the CTE statement.
So amazing and so confusing.
Can anyone give me the answer? Many thanks
Cade.Bu,
It's like kwiggans said, a CTE is like a View:
kwiggans (10/10/2008)
Great comments everyone.This code introduced me to the WITH statement. perhaps I had seen it before, but I apparently put it out of my mind.
From what I've read, this common table expression acts like a view inside a query.
I've never been able to get this straight, but since it acts like a view, is that why deleting items from the cteDV will delete items from the base temp table, or is this something specific to the WITH statement? :unsure:
Since I'm different from most (or is that contrary? :hehe: ), I like to think of CTEs as derived tables that I can alias then refer to as many times in my query as I want simply by using the alias. Actually, an argument can be made that a View is simply a derived table that you can access with out defining it for each query, you only define it the one time. But I digress, back to the issue.
So, a CTE works pretty much like a local view, and follows the same rules as a view in order for it to be updateable. Which means, an updateable CTE can be used just like an updateable view, you can perform Select, Insert or Delete queries on them.
Kwiggans,
The With statement is just the mechanism used to define the CTE. It's the fact that the CTE acts like a View that allows you to delete records through it.
I hope I helped more than I befuddled,
Kenney
Alternate Method:
This example can be done without the use of a CTE as well. All you have to do is to replace the CTE code and the delete statement with this piece of code that uses a derived table in a Delete statement to achieve the same goal:
DELETE NumberedDups
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 ORDER BY Col1, Col2, Col3 ) AS RowNum
FROM #DuplicateVals) AS NumberedDups
WHERE RowNum > 1
Performance Note:
An article I've read stated and showed that a CTE is faster than using a temp table, and I've found this to be true up to a point. The temp table is slower mostly due to the hard drive access need to create it in the first place, but this is just a onetime hit. The problem with the CTE is that SQL runs it each time you reference it in your query, so if your CTE takes 3 seconds to run and you use it 5 times in your query then you're looking at up to 15 seconds (or less due to caching) just for the CTE to be ran all 5 times and then you still have to add the time it takes for the rest of your query to run. But if you used a temp table then after the first hit of creating the temp table you start gaining time because it takes a lot less time to retrieve the records from a temp table than it does to reproduce the result set each time VIA a CTE (or any other query mechanism).
What does this mean? It means that if your CTEs are slow you will want to think twice about using them if you need to reference them multiple times in the same query batch.
Extra Study Idea:
You might want to look into a cool feature of CTEs. They can refer to themselves from within their own With statement (this is a recursive CTE). Books on Line has a little piece on this functionality where they show you how to get a list of someone(s) supervisor and their supervisor and their supervisor, etc. I think it's well worth a look since it opens up a path to do complicated things in a less complicated way.
October 11, 2008 at 1:38 am
Hi Kenney,
[font="Verdana"]I see..Thank you very much for your explanation.:)[/font]
Cade
October 11, 2008 at 11:33 am
Outstanding explanation, Kenney.
On the recursive thing... do keep in mind that recurssion is nothing more than a loop and has the same performance impedements as a loop. In other words, it IS a form of RBAR. There are many ways to avoid the use of any type of RBAR on many things. Even when it comes to Hierarchies, there are some really good ways to precalculate the hierarchy (nested sets) and then do the lookups using set based technologies instead of RBAR.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 11, 2008 at 10:59 pm
Thank you Jeff.
And thanks for pointing out the RBAR aspect of recursive queries. I'm still not very good at spotting hidden RBAR, and you are the master of that subject. 🙂 I remember reading an article by you about triangular joins being RBAR, it was very enlightening.
Do you happen to have a posted example of a non-RBAR hierarchal query? I'd really like to see how that is done. I've been very impressed by your postings that I have read.
Thanks again,
Kenney
October 13, 2008 at 8:00 am
Kenny,
Thanks for pointing out the performance implications of using a CTE versus a temp table multiple times in a query.
October 13, 2008 at 8:34 am
Kenney Hill (10/11/2008)
Thank you Jeff.And thanks for pointing out the RBAR aspect of recursive queries. I'm still not very good at spotting hidden RBAR, and you are the master of that subject. 🙂 I remember reading an article by you about triangular joins being RBAR, it was very enlightening.
Do you happen to have a posted example of a non-RBAR hierarchal query? I'd really like to see how that is done. I've been very impressed by your postings that I have read.
Thanks again,
Kenney
You bet...
http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html
http://www.codeproject.com/KB/database/nestedsets.aspx
What I end up doing is preserving both models on hierarchies that don't change much (most don't). People have an easier time maintaining and thinking about the adjacency model and the code runs better on the nested set model.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 11, 2009 at 3:27 am
Abhijit,
Great Article.
Moreover, nicely explained and modified by Jeff.
Really appriciate.
🙂
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply