UPDATE Query taking hours to run in DTS Package???

Question

Post reply

UPDATE Query taking hours to run in DTS Package???

pop

SSCrazy

Points: 2130
More actions
February 7, 2006 at 3:44 pm

#364915

I have a table Claimdata with about 19 million rows. This table is imported weekly from SAS. The Import takes about an hour using the Transform Data Task and then the following update is run using the Execute SQL Task which takes 4-5 hours to complete.

UPDATE Claimdata
SET Claimdata.LagCategory = FinancialCategories.LagCategory
FROM FinancialCategories
INNER JOIN Claimdata ON
FinancialCategories.ClaimType = Claimdata.ClaimType

Here’s the FinancialCategories DDL

CREATE TABLE [FinancialCategories] (ClaimType varchar (50) ,
LagCategory varchar (50) ,
FinancialCategory varchar (50)
) ON [PRIMARY]
GO

There are no indexes on any of the fields involved in the query. Does anyone have any suggestions on how I could reduce the time it takes to UPDATE this field? I really appreciate any advice on this. Thanks !

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

pop SSCrazy Points: 2130 More actions · Answer 1

also, there are only about 15 rows in FinancialCategories. Maybe I should just put it in a function?

PW-201837 SSC-Insane Points: 20805 More actions · Answer 2

1. Since there is no clustered index, the table may be a highly fragmented heap, requiring more disk I/O than necessary

2. If the database is running in full recovery mode, the UPDATE will be generating large amounts of log activity

3. If the column Claimdata.LagCategory is initially loaded as NULL, and then set to a value, this prevents "Update in place", so each row UPDATE will actually be implemented as a DELETE followed by an INSERT behind the scenes. If #2 is true, then the combination of #2 & #3 is a real performance killer

pop SSCrazy Points: 2130 More actions · Answer 3

Thanks for the reply PW.

1.) Do you mean clustered index on FinancialLagCategories or Claimdata?

2.) It is running in Simple recovery mode.

3.) Interesting, there is no transformation mapped to that field, so you're right, it probably is putting NULL into that column. How can I get around this?

Thanks !

allen davidson SSCertifiable Points: 7042 More actions · Answer 4

You could try putting a transformation from any field into Claimdata.LagCategory then choose activex script and put a fixed literal in it equal to the length of the longest value that will be placed in this fileld. Or you could use a DEFAULT on the Column definition.

If you put a Default on I would try to do it when Claimdata is empty (if poss).

If LagCategory will not change then you could look it up through your join on FinancialCategories

Vadim77 Valued Member Points: 61 More actions · Answer 5

I would recommend to start by setting up an index on ClaimType on both tables. Depending on the data you get it could be clustered index. Having said that, if there is a date field on Claimdata table, I would use that for Clustered index in combination with ClaimType and break the update by date range.

The general rule is when you building an index, your goal is to decrease number of records SQL will search through, ideally jumping straight to the record it needs. Also, if you break update statement by date it will help too.

Hope this helps.

Vadim.

peterus SSC Veteran Points: 287 More actions · Answer 6

A clustered index on the large table should help performance, but as long as you use some index on the referenced columns you should get better performance.

Also adding this to the beginning

set rowcount 100,000

and this to the end

where Claimdata.LagCategory != FinancialCategories.LagCategory

then either wrap it in a cursor, or use Sql Agent to run 10 or 20 updates with no cursor.

This should keep your transactions small and limit the size of each transaction nicely. You should change the rowcount to fit your environment.

If you want it automated, I'd recommend the cursor, once you've sized the rowcount accordingly.

Grant Lavers SSC Rookie Points: 28 More actions · Answer 7

What about not running the update at all?

What is the cost of modifying the views? that reference the data so that the join above is utilised.

Not running your update would leave your data normalised to so extent.

pop SSCrazy Points: 2130 More actions · Answer 8

Thanks for all the replies ! The Clustered Index def. speeds up the update, but it takes a long time to add the index to the table (all indexes are dropped before the load)

What about using a Lookup in the Transform data task. I've never done one before, so 1.)do you think this would be a good idea? and 2.) Would this be the right way to go about doing it? Thanks !

Using the Lookup Query below:

SELECT FinancialCategories.LagCategory

FROM FinancialCategories INNER JOIN CLAIMDATASET ON FinancialCategories.ClaimType = CLAIMDATASET.ClaimType

WHERE (CLAIMDATASET.ClaimType = ?)

Function Main()

DTSDestination("LagCategory") = DTSLookups("lkp_LagCategory").Execute(DTSSource("ClaimType"))

Main = DTSTransformStat_OK

End Function

peterus SSC Veteran Points: 287 More actions · Answer 9

Have you considered leaving the Clustered index in place and just keeping it at a low fill factor/Padding?

This way your data should load almost as fast, but you won't have to rebuild any idexes, just add the NC indices that support your update. When that is done, add the rest back.

19 milion rows isn't very much. Maybe you need to look at your Disk Drive configuration and available memory too. A standard table with 19 million rows should take 10-15 minutes max to add a one or two column Clustered index.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 10

Lookups can be useful if you need to look something up on a different data source.

With 2 tables in the same data source, a lookup will nearly always be slower. You can always benchmark it, just remember to set the cache size large enough.

>>

SELECT FinancialCategories.LagCategory

FROM FinancialCategories INNER JOIN CLAIMDATASET ON FinancialCategories.ClaimType = CLAIMDATASET.ClaimType

WHERE (CLAIMDATASET.ClaimType = ?)

This looks like different functionality than the original UPDATE.

Where did table CLAIMDATASET enter the picture, and what size is it ?

juliekenny SSCommitted Points: 1558 More actions · Answer 11

A couple of things that we have done with similar issues are

1. use with (tablockx) to avoid the use of multiple page locks (if this is a problem you will see screenfuls of page locks building up on tempdb)

2. If possible split the update into several updates - for example write a cursor to pick up each category, then for each category update the table. The aim being to split the 19 million rows into 19 x 1 million. This can work because you're less likely to be over-stretching the memory.

Filet SSC Enthusiast Points: 184 More actions · Answer 12

You could also try avoiding the update altogether by importing your SAS data into a staging table, then having a second transformation that inserts the joined data into Claimdata i.e with transformation source SQL such as:

SELECT StagingClaimdata.<attributes>, FinancialCategories.LagCategory

FROM StagingClaimdata

INNER JOIN FinancialCategories

ON FinancialCategories.ClaimType = StagingClaimdata.ClaimType.

If the import from SAS is taking about an hour, this should take much the same time.

Cheers

Filet