Table population way too slow!

  • Hello,

    I am having great difficulty in getting my application to run at an acceptable speed.

    I have been given a Text file containing 675,000 Order records that I need to load into a central Order Table currently holding 200,000 records. I have one clustered index on this table based on two columns.

    Using the wizards I have obtained some VB DTS code to get the text file into a Working Table. I am then executing a select top 1 from this Table to obtain a row of data.

    Once I have the data in an ADO recordset I perform an exists check on the Order Table then perform some calculations. I then either Insert the new Order or delete the current one and replace it. Once processed the Order is deleted from the working table and I retrieve the next.

    Its not coo complicated. However, performance is dire. I have validated that all Inserts, Updates and Deletes are using the correct Index columns.

    However, for a load of 6750,000 records I am being quoted by my timer code (that is accurate for smaller loads) that the processing will take 160 hours. Thats nearly a week!

    Being a newbie for SQL performance tracking I was wondering if you guys could offer me some suggestions as what could be hindering things and where to look for improvements / where to trace whatever is slowing things down.

    Any help will be much appreciated as I have been struggling with this for days and my deadline has passed.

  • Hi there,

    it seems you are doing a lot of work to achieve your goal - possibly two deletes and one update/insert per row. Isn't there any meory based lookups in DTS to validate existing rows in target?

    Handling updates one row at a time with cursor is usually very inefficient. If possible, you should try take advantage of joined updates/deletes.

    Suppose TmpOrder is the table where you load the data from file (hopefully a bulk load) and Order is the target with PK OrderNo:

    1. Upadet the existing rows in the target + calculations

    Update Order

    set col1=B.col1, col2=B.col2*1.5,....

    From Order A, TmpOrder B

    Where A.OrderNo=B.OrderNo

    2. Delete matching ros from tmp

    Delete TmpOrder

    From TmpOrder, Order

    Where TmpOrder.OrderNo=Order.OrderNo

    3. Insert the new rows with calculations

    Insert into Order (col1, col2,....) Select col1, col2*1.5... From TmpOrder

    4. truncate TmpOrder

    Ville

  • Slow because pulling 1 row at a time (need to sift through working table 675k times), and then (I think) pulling the data into an external program (the ADO recordset?) where you subsequently initiate insert/update/delete stmts?

    Can go much faster if you process all the data on the SQL Server, no external program. 

    In the right circumstances, Ville's approach is the fastest way.  But it depends how big the data is, what your hardware is, and if you have down-time (nobody accessing Orders) to do it.  You may have too much data for a single transaction (for your configuration), in which case this approach will never finish.  Also, you need to have long enough downtime to complete the big operation (it's going to block anyone who attempts to use the Orders table, and you probably don't want users accessing Orders in between the delete & the insert).

    I don't have downtime, and big transactions frequently do not finish when I try them.  So what I would do is write a server-side cursor to traverse the worktable.  Within the cursor loop you check Orders to see if insert or update should be done, do the insert or update, then check for error.  This will definitely work and won't block users, and can be implemented in production at any time.  Also, as Ville suggests, there's no need to delete from WorkTable.

    declare @err int, @n int, @key1 char(3), @key2 char(3), ...

    set @n = 0

    declare c1 cursor for

            select key1, key2, ... from Worktable

    for read only

    open c1

    while (1 = 1)

    begin

            fetch c1 into @key1, @key2, ...

            if @@fetch_status = -1

                    break

            --status msg every 1k rows

            if (@n % 1000) = 0

                    raiserror('processing row %d ...', 0, 1, @n) with nowait

            --insert or update

            if exists(select 1 from Orders where key1 = @key and key2 = @key2)

                    update  Orders

                       set  ...

                     where  key1 = @key1

                       and  key2 = @key2

            else

                    insert  Orders values (...)

            --on error just show msg & continue processing

            select @err = @@error

            if @err <> 0

                    raiserror('error %d processing key1=%s, key2=%s', 0, 1, @key1, @key2) with nowait

    end

    deallocate c1

    print 'done.'

    This will definitely be few orders of magnitude faster than 160hrs.

     

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply