March 15, 2013 at 3:26 am
I have a Data Flow Task that reads from a Flat File Source (orders.csv) and then does an OLE DB Command to do a delete from a SQL database table if a match is found. This is the SQL command in the OLE DB Command Task:
Delete orders from orders with(index(IX_orderno))
WHERE orderno = ? and ordersuf = ? and oetype in ('r','t','d') and cono = ?
(The index IX_orderno is defined as non-clustered and is orderno + ordersuf + oetype + cono. )
There are 400,000 records in the orders.csv and the delete is extremely slow (over 1 hour.)
We do have 8 indexes on this table, all non-clustered. Would this have any ramifications on the deletion speed?
Help would greatly appreciated.
March 15, 2013 at 3:46 am
I tried with removing all the indexes from the table, except for the one being used, and then reran the task. The results were the same - no speed improvement.
March 15, 2013 at 5:54 am
mberman (3/15/2013)
I have a Data Flow Task that reads from a Flat File Source (orders.csv) and then does an OLE DB Command to do a delete from a SQL database table if a match is found. This is the SQL command in the OLE DB Command Task:Delete orders from orders with(index(IX_orderno))
WHERE orderno = ? and ordersuf = ? and oetype in ('r','t','d') and cono = ?
(The index IX_orderno is defined as non-clustered and is orderno + ordersuf + oetype + cono. )
There are 400,000 records in the orders.csv and the delete is extremely slow (over 1 hour.)
We do have 8 indexes on this table, all non-clustered. Would this have any ramifications on the deletion speed?
Help would greatly appreciated.
The OLE DB Command Task will execute a Sql Command for every row in an ssis dataset.
This is typically not what you would want to do for wholesale deletes as what this means is that your Sql Command is running, possibly thousands of times deleting one row each time.
You can check this by spinning up Profiler and you will most likely see thousands of Delete actions.
This is probably why the Delete is slow.
You could try importing the data from the flat file into a scratch/temp. table in the database and then perform a batch delete via JOINS/MERGE or some such.
March 15, 2013 at 5:59 am
Just to clarify I meant the OLE DB Sql Command is run for every qualifying row not every row
I would add that the times I have seen the OLE DB Command used is normally to simulate some kind of MERGE. My experience has been it is faster to do that kind of thing in Pure Sql, if you are able to do so.
March 15, 2013 at 6:36 am
Have you tried removing the index hint, just to guage any difference in performance?
What I would do, as suggested, is move all csv rows into a staging table, perform DELETE FROM ... JOIN function to remove matching rows. SQL is much better working on sets of data.
'Only he who wanders finds new paths'
March 15, 2013 at 7:28 am
OTF (3/15/2013)
The OLE DB Command Task will execute a Sql Command for every row in an ssis dataset.
...
This is probably why the Delete is slow.
Not probably. It's with great certainty you can say the OLE DB command is the reason for the slow performance.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 19, 2013 at 11:09 pm
Thank you for your replies. The information was right on.
I created a scratch table in the database (orders_del) that is just being used to hold the order#'s to be deleted. (I do a truncate on the table first thing.)
Then the Execute SQL Command Taks does the delete:
delete FROM dbo.orders WHERE exists
(SELECT * FROM dbo.orders_del AS tmp
WHERE tmp.cono = dbo.orders.cono
AND tmp.orderno = dbo.orders.orderno
AND tmp.ordersuf = dbo.orders.ordersuf)
This reduced the time from 3 hours to 11 minutes!
March 20, 2013 at 8:06 am
mberman (3/19/2013)
Thank you for your replies. The information was right on.I created a scratch table in the database (orders_del) that is just being used to hold the order#'s to be deleted. (I do a truncate on the table first thing.)
Then the Execute SQL Command Taks does the delete:
delete FROM dbo.orders WHERE exists
(SELECT * FROM dbo.orders_del AS tmp
WHERE tmp.cono = dbo.orders.cono
AND tmp.orderno = dbo.orders.orderno
AND tmp.ordersuf = dbo.orders.ordersuf)
This reduced the time from 3 hours to 11 minutes!
I guess Koen was right... a dead certainty that it was the issue 🙂
Glad you are getting better performance out of it now.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply