February 29, 2012 at 3:07 am
Hi,
I have a query of the form:
DELETE st1
FROM StageTable AS st
INNER JOIN Meter m ON m.meter_key = st.meter_key
INNER JOIN Meter m1 ON m1.meter_point = m.meter_point
INNER JOIN StageTable AS st1 ON st1.meter_key = m1.meter_key
AND st1.read_type_key = st.read_type_key
AND st1.date_key = st.date_key
AND st1.settlement_key = st.settlement_key
WHERE
(m.meter_point = 'x')
AND (
(st1.source_id < st.source_id)
OR (
(st1.source_id = st.source_id)
AND (st1.load_key < st.load_key)
)
)
Which is meant to find duplicate rows and delete them from my staging table. This seems to be extremely slow, and takes over 10 minutes. The table has 46694 rows and when I turn the delete into a select of st1, it runs in under 5 seconds, and returns 16707 rows, so it doesn't seem to be a problem with identifying which rows need to be deleted. Some of the rows returned are duplicates of themselves, the delete would only be deleting 14412 rows.
I have found that I can speed the query up so that it runs in seconds by putting a TOP keyword on the delete, even if I set the TOPto be a number higher than the dataset amount, like 500,000.
I can also speed up the query into seconds by running the select first, and then joining that dataset onto my staging table and getting the rows to delete this way. All of these seem to do the same thing to me, but I don't know why the original query was so inefficient compared to the other methods, as I am running more or less the same query.
Can anyone explain to me why getting the dataset first through a select or adding a TOP is so much more efficient than basic direct delete?
Edit: I have compared the execution plans of the select vs the delete and they both have similar execution plans, with similar relative costs, from what I can see.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply