Marking duplicate rows

Question

Post reply

Marking duplicate rows

masangombali2

SSC Enthusiast

Points: 181
More actions
August 21, 2013 at 3:02 am

#278919

I have this query for marking duplicate rows. It works on a small database but takes more than 2 days to run if the db is large. Is there a way of making this execute faster. Thank you
UPDATE DSObject_table
SET Object_isDeleted = 1
where handle_id IN
(
select handle_id from DSObject_table as dso
where exists (select Civil_Case__original_file_name, Civil_Case_Files_size,Count(handle_id)
from DSObject_table
where DSObject_table.Civil_Case__original_file_name = dso.Civil_Case__original_file_name
andDSObject_table.Civil_Case_Files_size = dso.Civil_Case_Files_size
group by DSObject_table.Civil_Case__original_file_name, DSObject_table.Civil_Case_Files_size
having count(DSObject_table.handle_id) > 1))
and DSObject_table.handle_id NOT IN
(
SELECT Min(handle_id) FROM DSObject_table AS dso
where exists (select Civil_Case__original_file_name, Civil_Case_Files_size,Count(handle_id)
from DSObject_table
where DSObject_table.Civil_Case__original_file_name = dso.Civil_Case__original_file_name
andDSObject_table.Civil_Case_Files_size = dso.Civil_Case_Files_size
group by DSObject_table.Civil_Case__original_file_name, DSObject_table.Civil_Case_Files_size
having count(DSObject_table.handle_id) > 1)
group by Civil_Case__original_file_name, Civil_Case_Files_size);

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

HanShi SSC-Dedicated Points: 33511 More actions · Answer 1

I think if you can speed up the update by using the row_number function:

-- display rownr's assigned to each 'block' of similar rows (only for display purpose)

select row_number() over (partition by Civil_Case__original_file_name, Civil_Case_Files_size order by handle_id) as rownr, *

from DSObject_table

order by Civil_Case__original_file_name, rownr

-- update the value using the rownr definition (see select query above)

update DSObject_table

set Object_isDeleted = 1

from (select row_number() over (partition by Civil_Case__original_file_name, Civil_Case_Files_size order by handle_id) as rownr, *

from DSObject_table) DSObject_table

where rownr > 1

** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **

masangombali2 SSC Enthusiast Points: 181 More actions · Answer 2

Thanks, to display it only took 4min 11sec to execute. What does over partition by do?

imran.adam Ten Centuries Points: 1008 More actions · Answer 3

It basically gives each record a unique number

see this http://technet.microsoft.com/en-us/library/ms186734.aspx

Sean Lange SSC Guru Points: 286602 More actions · Answer 4

imran.adam (8/21/2013)

It basically gives each record a unique number

see this http://technet.microsoft.com/en-us/library/ms186734.aspx%5B/quote%5D

That is the correct link but your description is incorrect.

Using PARTITION in the context of ROW_NUMBER divides the rows into sets which means that the value returned will restart at 1. This does not mean that each row gets a unique number, it restarts the counter.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

imran.adam Ten Centuries Points: 1008 More actions · Answer 5

Sean Lange (8/21/2013)

imran.adam (8/21/2013)
It basically gives each record a unique number
see this http://technet.microsoft.com/en-us/library/ms186734.aspx%5B/quote%5D
That is the correct link but your description is incorrect.
Using PARTITION in the context of ROW_NUMBER divides the rows into sets which means that the value returned will restart at 1. This does not mean that each row gets a unique number, it restarts the counter.

Yes that is what i meant, thanks for clarifying. Basically the data is split into subsets with a unique number e.g 1,2,3 then next will be 1,2 and then 1,2,3,4 and so on.....

Marking duplicate rows

Cookies on SQLServerCentral