August 21, 2013 at 3:02 am
I have this query for marking duplicate rows. It works on a small database but takes more than 2 days to run if the db is large. Is there a way of making this execute faster. Thank you
UPDATE DSObject_table
SET Object_isDeleted = 1
where handle_id IN
(
select handle_id from DSObject_table as dso
where exists (select Civil_Case__original_file_name, Civil_Case_Files_size,Count(handle_id)
from DSObject_table
where DSObject_table.Civil_Case__original_file_name = dso.Civil_Case__original_file_name
andDSObject_table.Civil_Case_Files_size = dso.Civil_Case_Files_size
group by DSObject_table.Civil_Case__original_file_name, DSObject_table.Civil_Case_Files_size
having count(DSObject_table.handle_id) > 1))
and DSObject_table.handle_id NOT IN
(
SELECT Min(handle_id) FROM DSObject_table AS dso
where exists (select Civil_Case__original_file_name, Civil_Case_Files_size,Count(handle_id)
from DSObject_table
where DSObject_table.Civil_Case__original_file_name = dso.Civil_Case__original_file_name
andDSObject_table.Civil_Case_Files_size = dso.Civil_Case_Files_size
group by DSObject_table.Civil_Case__original_file_name, DSObject_table.Civil_Case_Files_size
having count(DSObject_table.handle_id) > 1)
group by Civil_Case__original_file_name, Civil_Case_Files_size);
August 21, 2013 at 3:26 am
I think if you can speed up the update by using the row_number function:
-- display rownr's assigned to each 'block' of similar rows (only for display purpose)
select row_number() over (partition by Civil_Case__original_file_name, Civil_Case_Files_size order by handle_id) as rownr, *
from DSObject_table
order by Civil_Case__original_file_name, rownr
-- update the value using the rownr definition (see select query above)
update DSObject_table
set Object_isDeleted = 1
from (select row_number() over (partition by Civil_Case__original_file_name, Civil_Case_Files_size order by handle_id) as rownr, *
from DSObject_table) DSObject_table
where rownr > 1
August 21, 2013 at 5:32 am
Thanks, to display it only took 4min 11sec to execute. What does over partition by do?
August 21, 2013 at 5:38 am
It basically gives each record a unique number
see this http://technet.microsoft.com/en-us/library/ms186734.aspx
August 21, 2013 at 7:49 am
imran.adam (8/21/2013)
see this http://technet.microsoft.com/en-us/library/ms186734.aspx%5B/quote%5D
That is the correct link but your description is incorrect.
Using PARTITION in the context of ROW_NUMBER divides the rows into sets which means that the value returned will restart at 1. This does not mean that each row gets a unique number, it restarts the counter.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
August 21, 2013 at 7:59 am
Sean Lange (8/21/2013)
imran.adam (8/21/2013)
It basically gives each record a unique numbersee this http://technet.microsoft.com/en-us/library/ms186734.aspx%5B/quote%5D
That is the correct link but your description is incorrect.
Using PARTITION in the context of ROW_NUMBER divides the rows into sets which means that the value returned will restart at 1. This does not mean that each row gets a unique number, it restarts the counter.
Yes that is what i meant, thanks for clarifying. Basically the data is split into subsets with a unique number e.g 1,2,3 then next will be 1,2 and then 1,2,3,4 and so on.....
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy