Table population way too slow!

Question

Table population way too slow!

Neil Munday

SSC Veteran

Points: 271
More actions
May 28, 2004 at 6:22 am

#108442

Hello,
I am having great difficulty in getting my application to run at an acceptable speed.
I have been given a Text file containing 675,000 Order records that I need to load into a central Order Table currently holding 200,000 records. I have one clustered index on this table based on two columns.
Using the wizards I have obtained some VB DTS code to get the text file into a Working Table. I am then executing a select top 1 from this Table to obtain a row of data.
Once I have the data in an ADO recordset I perform an exists check on the Order Table then perform some calculations. I then either Insert the new Order or delete the current one and replace it. Once processed the Order is deleted from the working table and I retrieve the next.
Its not coo complicated. However, performance is dire. I have validated that all Inserts, Updates and Deletes are using the correct Index columns.
However, for a load of 6750,000 records I am being quoted by my timer code (that is accurate for smaller loads) that the processing will take 160 hours. Thats nearly a week!
Being a newbie for SQL performance tracking I was wondering if you guys could offer me some suggestions as what could be hindering things and where to look for improvements / where to trace whatever is slowing things down.
Any help will be much appreciated as I have been struggling with this for days and my deadline has passed.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

WilburSmith Hall of Fame Points: 3225 More actions · Answer 1

Hi there,

it seems you are doing a lot of work to achieve your goal - possibly two deletes and one update/insert per row. Isn't there any meory based lookups in DTS to validate existing rows in target?

Handling updates one row at a time with cursor is usually very inefficient. If possible, you should try take advantage of joined updates/deletes.

Suppose TmpOrder is the table where you load the data from file (hopefully a bulk load) and Order is the target with PK OrderNo:

1. Upadet the existing rows in the target + calculations

Update Order

set col1=B.col1, col2=B.col2*1.5,....

From Order A, TmpOrder B

Where A.OrderNo=B.OrderNo

2. Delete matching ros from tmp

Delete TmpOrder

From TmpOrder, Order

Where TmpOrder.OrderNo=Order.OrderNo

3. Insert the new rows with calculations

Insert into Order (col1, col2,....) Select col1, col2*1.5... From TmpOrder

4. truncate TmpOrder

Ville

Mike Good SSCertifiable Points: 7415 More actions · Answer 2

Slow because pulling 1 row at a time (need to sift through working table 675k times), and then (I think) pulling the data into an external program (the ADO recordset?) where you subsequently initiate insert/update/delete stmts?

Can go much faster if you process all the data on the SQL Server, no external program.

In the right circumstances, Ville's approach is the fastest way. But it depends how big the data is, what your hardware is, and if you have down-time (nobody accessing Orders) to do it. You may have too much data for a single transaction (for your configuration), in which case this approach will never finish. Also, you need to have long enough downtime to complete the big operation (it's going to block anyone who attempts to use the Orders table, and you probably don't want users accessing Orders in between the delete & the insert).

I don't have downtime, and big transactions frequently do not finish when I try them. So what I would do is write a server-side cursor to traverse the worktable. Within the cursor loop you check Orders to see if insert or update should be done, do the insert or update, then check for error. This will definitely work and won't block users, and can be implemented in production at any time. Also, as Ville suggests, there's no need to delete from WorkTable.

declare @err int, @n int, @key1 char(3), @key2 char(3), ...

set @n = 0

declare c1 cursor for

select key1, key2, ... from Worktable

for read only

open c1

while (1 = 1)

begin

fetch c1 into @key1, @key2, ...

if @@fetch_status = -1

break

--status msg every 1k rows

if (@n % 1000) = 0

raiserror('processing row %d ...', 0, 1, @n) with nowait

--insert or update

if exists(select 1 from Orders where key1 = @key and key2 = @key2)

update Orders

set ...

where key1 = @key1

and key2 = @key2

else

insert Orders values (...)

--on error just show msg & continue processing

select @err = @@error

if @err <> 0

raiserror('error %d processing key1=%s, key2=%s', 0, 1, @key1, @key2) with nowait

end

deallocate c1

print 'done.'

This will definitely be few orders of magnitude faster than 160hrs.