October 10, 2013 at 8:34 am
curious_sqldba (10/10/2013)
Very nice article and good explanation. Can you please add your thoughts related to performance and is this the recommended way to do it?
I find the performance to be quite good, I have this code or similar code in production in a number of places and I've never had issues with speed. I haven't had the opportunity to compare its performance to a CTE (shouldn't be much difference at all) or a cursor (could be a large difference if there's a large number of records). I don't know if it is THE recommended way to do it, but it is the way I would recommend to do it.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
October 11, 2013 at 7:33 am
Nice introductory article Stefan. Good job!
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 15, 2013 at 12:38 pm
I tried this solution on table with over 7 million results. (Around a 100.000 duplicates each) however the query did run for around 57 minutes then my server crashed and the database I was performing it on crashed and went into recovery mode... (needed to reboot the server to made it available again :-)).
So the function is great for smaller databases but not a real solution in my case for a database with 7+ million results. (In the end I had a total of 1654 rows left, so yeah ALLOT of duplicates). In my case I had the luck I could run:
SELECT
DISTINCT CreateStatement,
dbo.split(dbo.split(CreateStatement, '[', 2), ']') AS Tablename
FROM
CB_I_Drop_Create_Indexes
Above query took me arround 4 minutes to complete the same job. 🙂 (In the createStatement column the tablename for the index was in that column row). Hope someone also has some use for it.
October 15, 2013 at 12:54 pm
prennings (10/15/2013)
I tried this solution on table with over 7 million results. (Around a 100.000 duplicates each) however the query did run for around 57 minutes then my server crashed and the database I was performing it on crashed and went into recovery mode... (needed to reboot the server to made it available again :-)).So the function is great for smaller databases but not a real solution in my case for a database with 7+ million results. (In the end I had a total of 1654 rows left, so yeah ALLOT of duplicates). In my case I had the luck I could run:
SELECT
DISTINCT CreateStatement,
dbo.split(dbo.split(CreateStatement, '[', 2), ']') AS Tablename
FROM
CB_I_Drop_Create_Indexes
Above query took me arround 4 minutes to complete the same job. 🙂 (In the createStatement column the tablename for the index was in that column row). Hope someone also has some use for it.
I think in that case you might even be better off just copying the good rows to a new table, dropping the old one and renaming it. How did you end up with a table that's that dirty?
I don't know that this query would crash your system or make the database unavailable. I'd think that for that to happen there'd have to be some configuration problems with the server and for it to be a rather small server.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
October 15, 2013 at 5:48 pm
prennings (10/15/2013)
I tried this solution on table with over 7 million results. (Around a 100.000 duplicates each) however the query did run for around 57 minutes then my server crashed and the database I was performing it on crashed and went into recovery mode... (needed to reboot the server to made it available again :-)).So the function is great for smaller databases but not a real solution in my case for a database with 7+ million results. (In the end I had a total of 1654 rows left, so yeah ALLOT of duplicates). In my case I had the luck I could run:
SELECT
DISTINCT CreateStatement,
dbo.split(dbo.split(CreateStatement, '[', 2), ']') AS Tablename
FROM
CB_I_Drop_Create_Indexes
Above query took me arround 4 minutes to complete the same job. 🙂 (In the createStatement column the tablename for the index was in that column row). Hope someone also has some use for it.
I would love to see the split function. Any chance of you posting it, please? Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 22, 2015 at 11:14 pm
Great article, thanks! Will definitely use it.
January 23, 2015 at 12:23 am
http://deleteduplicaterecords.blogspot.in/
You get all answers here
January 23, 2015 at 7:21 am
Here is another way to delete dups.
WITH dup
AS (Select Policy, ROW_NUMBER() OVER (PARTITION BY agentnum, policy ORDER BY ( SELECT 0)) RN
FROM dbo.YourTable)
DELETE FROM dup
WHERE RN > 1
January 23, 2015 at 7:23 am
jeknight44 (1/23/2015)
Here is another way to delete dups.WITH dup
AS (Select Policy, ROW_NUMBER() OVER (PARTITION BY agentnum, policy ORDER BY ( SELECT 0)) RN
FROM dbo.YourTable)
DELETE FROM dup
WHERE RN > 1
This thread is about a question of the day (some sort of quiz) from 1.5 years ago.
I think most people here thus already know how to remove duplicates.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 23, 2015 at 7:44 am
Sometimes the easiest way is not the most elegant. When I just have one set of duplicates I select the date, stash it on a nopepad or excel, delete all the duplicate records and re-insert it with a paste using the 'Edit top 200' feature in MS-SMS.
I find this thread has very helpful for when you have multiple dupe records in your table that was poorly designed without an ID or key constraint preventing dupe records.
January 23, 2015 at 8:03 am
nicolas_johnson 52979 (1/23/2015)
Sometimes the easiest way is not the most elegant. When I just have one set of duplicates I select the date, stash it on a nopepad or excel, delete all the duplicate records and re-insert it with a paste using the 'Edit top 200' feature in MS-SMS.I find this thread has very helpful for when you have multiple dupe records in your table that was poorly designed without an ID or key constraint preventing dupe records.
Sometimes the problem is that the data you receive doesn't have a distinct ID or you get the data twice. Your table can be perfect, but if the data coming in isn't, this will keep your system from crashing. And it is something you can automate, so you don't have to manually process any duplicates you have. When I set up my imports to dedupe this way, I rarely have to pay attention to duplicates after that.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
January 23, 2015 at 9:12 am
Shankar Walvekar (1/23/2015)
http://deleteduplicaterecords.blogspot.in/You get all answers here
Why? Sioban's article has all the answers along with reasons why it works.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 23, 2015 at 9:30 am
Jeff Moden (1/23/2015)
Shankar Walvekar (1/23/2015)
http://deleteduplicaterecords.blogspot.in/You get all answers here
Why? Sioban's article has all the answers along with reasons why it works.
Thanks Jeff.
I feel like I always see someone self-promoting like this in the comments. That's why I didn't bother to respond. Or click through. : -)
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
January 23, 2015 at 9:33 am
Hi,
I some times use cte to delete duplicates. It works fine.
Paul
January 23, 2015 at 9:44 am
pmusiej (1/23/2015)
Hi,I some times use cte to delete duplicates. It works fine.
Paul
Sure! That's a way you can do it. Want to post some code to share with people? I find multiple ways of doing things is both nice to know and can help if one way runs into trouble.
For example, sometimes CTEs can run long on large datasets.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Viewing 15 posts - 31 through 45 (of 52 total)
You must be logged in to reply to this topic. Login to reply