February 11, 2014 at 12:17 pm
table has 500 million rows. There are 21 million rows that have one ore more fully duplicate ones, totalling in 70 million duplicate rows altogether.
Can someone recommend (T-SQL) a good way to DEDUPLICATE THIS TABLE without having to create another huge table (reloading dup data into it..) ? Is there a way to do it at the ONE TABLE level,?
not creating other perm tables to hold data etc?
Thanks!
Likes to play Chess
February 11, 2014 at 12:24 pm
Take a look at this post
http://jasonbrimhall.info/2011/03/21/dedupe-data-cte/
You could easily set this up to batch through your table at 10,000 records at a time. I'd probably just set it up in a job and set the job to run every few minutes.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
February 11, 2014 at 12:32 pm
I would use something like this: with DupTable as (
select ID, ROW_NUMBER() over (partition by ID order by ID) as rn
from Table
group by ID
having count(*) > 1)
delete DupTable
where rn > 1
I wouldn't do the whole table at once since that isn't very tlog friendly. You can add a select top n to the cte.
February 11, 2014 at 12:33 pm
February 11, 2014 at 12:43 pm
Groundhog Day, Keith. He's seen this once or twice before 😉
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
February 11, 2014 at 12:49 pm
Keith Tate (2/11/2014)
Jason was too quick for me. Guess that's what I get for typing out my code instead of having a blog post ready to go. 😀
Hahaha
I still had to go find the URL 😉
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
February 11, 2014 at 12:50 pm
ChrisM@home (2/11/2014)
Groundhog Day, Keith. He's seen this once or twice before 😉
Yeah something like that.
If memory serves, we had this same discussion with this OP just last week.:Whistling:
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply