ETL Process taking a while to execute

Question

Post reply

ETL Process taking a while to execute

SSC-Rules

SSC Veteran

Points: 206
More actions
July 14, 2008 at 7:35 am

#195206

Hi,
we have an ETL process that is running everynight, and we get data (text files) about say 4 million rows. And only about 300 thousand rows have been updated, so we have to do a delete statment for the rest of the rows, which is taking a while to execute. Do you all know any other way around these? We thought of cursors but same overhead for cursors are high also. Also temp table same way, temp db would be filled, same performance issue.
Thanks,
FK

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Cursors = really, really bad idea

Could you describe what happens in a little more detail please?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SSC-Rules SSC Veteran Points: 206 More actions · Answer 2

Agree, cursors are bad and we are staying away from those!!

The current delete statment takes about 872729 ms, which is kind of high, and what the delete does is delete the unchanged data (3.5 million rows) and keep the changed once .5 million rows.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Could you describe the ETL process a bit more please?

You may find it more efficient to copy the rows you want to keep to a temp table, truncate the original table and copy the rows back.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SSC-Rules SSC Veteran Points: 206 More actions · Answer 4

Wel, I think that should work. Thanks.

Our ETL process the following..

We have mainframe files that get loaded at night, and basically the ETL will load the file into a table, once there using slowly changing dimension we check for various fileds if they were updated or what needs to be updated, so for the once we don't want to do anything, we delete based on one column flag, and then continue to load the rest to the appropriate fact tables. So that delete step was taking a while, we created an index (non-clustered) on it but still did not help, I will try to create a temp table or even a regular table, and do the truncate. Make sense. Thanks.

Loner SSC-Insane Points: 21279 More actions · Answer 5

Which column did you create an index? How big is the table? If the table is big and there are many indexes on the table, it will take a while to delete it.

Did you mention you use cursor or just use the 'Delete' statement?

DELETE x

FROM Tablex x

INNER JOIN temptable t ON x.col1 = t.col1

WHERE x.Delcol = 'Y'

SSC-Rules SSC Veteran Points: 206 More actions · Answer 6

The table is about 4 Million rows, very big, but the temp table idea seems to be working, thanks.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

Generally, if you're deleting up to half the table (maybe even more) it's faster to save the rows you want, then trunate the table than to run the delete.

Less impact on the tran log too, since truncate is a minimally logged operation.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

gustavo.romero.vargas SSC Enthusiast Points: 169 More actions · Answer 8

i was in a project in which we had the same issue but with excel.. what we did was all the sql code validation or operations that we had in the ETL was moved to stored procedures and we reduced the time to almost %80.it worked. and also if the file has information that is not needed it could be deleted before doing any operation.

MelS-512196 SSC Eights! Points: 842 More actions · Answer 9

The truncate solution seems best, but if you really get stuck having to delete millions of rows, one technique that has worked for me is setting a rowcount of something like 50000 or 100000 and looping through the delete statement until count = 0. Keeps the tran log manageable. Each delete statement usually would take a few seconds, and even if you loop through 30-40 times, total time may only be a few minutes.

Dave Klug SSCertifiable Points: 5811 More actions · Answer 10

Be aware that the SCD transform in SSIS does generally not scale well on large datasets.