August 21, 2013 at 3:02 am
I have this query for marking duplicate rows. It works on a small database but takes more than 2 days to run if the db is large. Is there a way of making this execute faster. Thank you
UPDATE DSObject_table
SET Object_isDeleted = 1
where handle_id IN
(
select handle_id from DSObject_table as dso
where exists (select Civil_Case__original_file_name, Civil_Case_Files_size,Count(handle_id)
from DSObject_table
where DSObject_table.Civil_Case__original_file_name = dso.Civil_Case__original_file_name
andDSObject_table.Civil_Case_Files_size = dso.Civil_Case_Files_size
group by DSObject_table.Civil_Case__original_file_name, DSObject_table.Civil_Case_Files_size
having count(DSObject_table.handle_id) > 1))
and DSObject_table.handle_id NOT IN
(
SELECT Min(handle_id) FROM DSObject_table AS dso
where exists (select Civil_Case__original_file_name, Civil_Case_Files_size,Count(handle_id)
from DSObject_table
where DSObject_table.Civil_Case__original_file_name = dso.Civil_Case__original_file_name
andDSObject_table.Civil_Case_Files_size = dso.Civil_Case_Files_size
group by DSObject_table.Civil_Case__original_file_name, DSObject_table.Civil_Case_Files_size
having count(DSObject_table.handle_id) > 1)
group by Civil_Case__original_file_name, Civil_Case_Files_size);
August 21, 2013 at 3:26 am
I think if you can speed up the update by using the row_number function:
-- display rownr's assigned to each 'block' of similar rows (only for display purpose)
select row_number() over (partition by Civil_Case__original_file_name, Civil_Case_Files_size order by handle_id) as rownr, *
from DSObject_table
order by Civil_Case__original_file_name, rownr
-- update the value using the rownr definition (see select query above)
update DSObject_table
set Object_isDeleted = 1
from (select row_number() over (partition by Civil_Case__original_file_name, Civil_Case_Files_size order by handle_id) as rownr, *
from DSObject_table) DSObject_table
where rownr > 1
August 21, 2013 at 5:32 am
Thanks, to display it only took 4min 11sec to execute. What does over partition by do?
August 21, 2013 at 5:38 am
It basically gives each record a unique number
see this http://technet.microsoft.com/en-us/library/ms186734.aspx
August 21, 2013 at 7:49 am
imran.adam (8/21/2013)
see this http://technet.microsoft.com/en-us/library/ms186734.aspx%5B/quote%5D
That is the correct link but your description is incorrect.
Using PARTITION in the context of ROW_NUMBER divides the rows into sets which means that the value returned will restart at 1. This does not mean that each row gets a unique number, it restarts the counter.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
August 21, 2013 at 7:59 am
Sean Lange (8/21/2013)
imran.adam (8/21/2013)
It basically gives each record a unique numbersee this http://technet.microsoft.com/en-us/library/ms186734.aspx%5B/quote%5D
That is the correct link but your description is incorrect.
Using PARTITION in the context of ROW_NUMBER divides the rows into sets which means that the value returned will restart at 1. This does not mean that each row gets a unique number, it restarts the counter.
Yes that is what i meant, thanks for clarifying. Basically the data is split into subsets with a unique number e.g 1,2,3 then next will be 1,2 and then 1,2,3,4 and so on.....
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply