May 28, 2004 at 6:22 am
Hello,
I am having great difficulty in getting my application to run at an acceptable speed.
I have been given a Text file containing 675,000 Order records that I need to load into a central Order Table currently holding 200,000 records. I have one clustered index on this table based on two columns.
Using the wizards I have obtained some VB DTS code to get the text file into a Working Table. I am then executing a select top 1 from this Table to obtain a row of data.
Once I have the data in an ADO recordset I perform an exists check on the Order Table then perform some calculations. I then either Insert the new Order or delete the current one and replace it. Once processed the Order is deleted from the working table and I retrieve the next.
Its not coo complicated. However, performance is dire. I have validated that all Inserts, Updates and Deletes are using the correct Index columns.
However, for a load of 6750,000 records I am being quoted by my timer code (that is accurate for smaller loads) that the processing will take 160 hours. Thats nearly a week!
Being a newbie for SQL performance tracking I was wondering if you guys could offer me some suggestions as what could be hindering things and where to look for improvements / where to trace whatever is slowing things down.
Any help will be much appreciated as I have been struggling with this for days and my deadline has passed.
May 28, 2004 at 11:54 am
Hi there,
it seems you are doing a lot of work to achieve your goal - possibly two deletes and one update/insert per row. Isn't there any meory based lookups in DTS to validate existing rows in target?
Handling updates one row at a time with cursor is usually very inefficient. If possible, you should try take advantage of joined updates/deletes.
Suppose TmpOrder is the table where you load the data from file (hopefully a bulk load) and Order is the target with PK OrderNo:
1. Upadet the existing rows in the target + calculations
Update Order
set col1=B.col1, col2=B.col2*1.5,....
From Order A, TmpOrder B
Where A.OrderNo=B.OrderNo
2. Delete matching ros from tmp
Delete TmpOrder
From TmpOrder, Order
Where TmpOrder.OrderNo=Order.OrderNo
3. Insert the new rows with calculations
Insert into Order (col1, col2,....) Select col1, col2*1.5... From TmpOrder
4. truncate TmpOrder
Ville
June 1, 2004 at 1:25 am
Slow because pulling 1 row at a time (need to sift through working table 675k times), and then (I think) pulling the data into an external program (the ADO recordset?) where you subsequently initiate insert/update/delete stmts?
Can go much faster if you process all the data on the SQL Server, no external program.
In the right circumstances, Ville's approach is the fastest way. But it depends how big the data is, what your hardware is, and if you have down-time (nobody accessing Orders) to do it. You may have too much data for a single transaction (for your configuration), in which case this approach will never finish. Also, you need to have long enough downtime to complete the big operation (it's going to block anyone who attempts to use the Orders table, and you probably don't want users accessing Orders in between the delete & the insert).
I don't have downtime, and big transactions frequently do not finish when I try them. So what I would do is write a server-side cursor to traverse the worktable. Within the cursor loop you check Orders to see if insert or update should be done, do the insert or update, then check for error. This will definitely work and won't block users, and can be implemented in production at any time. Also, as Ville suggests, there's no need to delete from WorkTable.
declare @err int, @n int, @key1 char(3), @key2 char(3), ...
set @n = 0
declare c1 cursor for
select key1, key2, ... from Worktable
for read only
open c1
while (1 = 1)
begin
fetch c1 into @key1, @key2, ...
if @@fetch_status = -1
break
--status msg every 1k rows
if (@n % 1000) = 0
raiserror('processing row %d ...', 0, 1, @n) with nowait
--insert or update
if exists(select 1 from Orders where key1 = @key and key2 = @key2)
update Orders
set ...
where key1 = @key1
and key2 = @key2
else
insert Orders values (...)
--on error just show msg & continue processing
select @err = @@error
if @err <> 0
raiserror('error %d processing key1=%s, key2=%s', 0, 1, @key1, @key2) with nowait
end
deallocate c1
print 'done.'
This will definitely be few orders of magnitude faster than 160hrs.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply