Eliminating Duplicate Records with SQL Code vs. DTS Options

  • I need to load a table with text data file using a DTS Package. I do not want to load records from the text file which are already on the table (based on a pkey of ReportingAccount, TradeDate, TransactionReferenceNumber.) I do not want to use the DTS Package Exception File and Max Error Count Options because I want to continue loading the table if there are duplicate records and I want the DTS Package to be able to stop if there is any other exception (other than duplicate key.)

    This is the DTS Method I have used to resolve this problem. For my first DTS Tasks, I load the text file (as is) into a Temp Table. For the second DTS Task, I execute the below SQL Code to pull records from the Temp Table which are not already on the Permanent Production Table (MikeHistory) and load them into the Permanent Production Table (MikeHistory). The Temp Table and Permanent Production Table have the same structure.

    SELECT

    M1.*

    FROM MikeHistory_Temp M1

    WHERE AsOfDate = '07/29/2005'

    AND NOT EXISTS (

       SELECT 

       M2.*

       FROM MikeHistory M2

       WHERE M2.ReportingAccount=M1.ReportingAccount

       AND M2.TradeDate=M1.TradeDate

       AND M2.TransactionReferenceNumber=M1.TransactionReferenceNumber

       AND M2.AsOfDate = '07/29/2005'

    )

    Please let me know if this is the best way to handle this type of problem. Also, please provide other ways to handle this problem.

    BTW, I may have another issue where I need to just pull or read one record from the Temp Table if there are duplicate records present on the Temp Table (which is loaded from the text file.) Is there an easy way to accomplish this?

    Thanks in advance, Kevin (We have SQL Server 2000, SP3)

  • I don't know if it would be any faster (depends on resulting execution plans, table sizes, etc), but what I've done in the past is just set the max error count to a value greater than what you'd expect in terms of duplicates.

    - Rick

  • I do not think this would work because I still want the DTS Package to stop executing if there are errors other than duplicate records.

  • I don't think you can do this simply in DTS, I've had a similar function to perform and had to use a package.

    I loaded the data into a temp table, then performed the package, which uses a cursor to scan the temp table to insert records on the "main" table.

    The reason I had to do this, was while the input table had fixed width records, it had 3 record "types" with wildly diffrent layouts (and lengths!), so I had to process it in a "custom" way. I loaded it into sql server as a table with a single record, because I had to use variable width to actually load it. Then I used substring to "chop up" the fields.

    What you would need to do, is expand on this, add in a second cursor to reference the main table and check to see if the record you are about to insert exists, and if it does, not to do the insert.

    the package was:

    create proc proc_input as

    declare @cdate datetime

    declare @ccount smallint

    -- cursor for reading in input table

    declare c_input cursor for select * from input_table

    -- buffer to store input record

    declare @cbuf varchar(255)

    -- datetime "point"

    set @cdate = getdate()

    -- init counters

    set @ccount = 0

    open c_input

    fetch next from c_input into @cbuf

    while @@fetch_status = 0 begin

    if (substring(@cbuf,1,1) = 'T') begin

    insert into main (main_datetime,

    main_original_pay_date,

    main_original_method,

    main_original_amount,

    main_reference,

    values (@cdate,

    substring(@cbuf,55,8),

    substring(@cbuf,63,2),

    substring(@cbuf,34,9)+'.'+substring(@cbuf,43,2),

    substring(@cbuf,19,15))

    set @ccount = @ccount +1

    end

    fetch next from c_input into @cbuf

    end

    close c_input

    deallocate c_input

    go

  • BTW, I may have another issue where I need to just pull or read one record from the Temp Table if there are duplicate records present on the Temp Table (which is loaded from the text file.) Is there an easy way to accomplish this?
     
    You can do this:
     
    1) Using DTS/Bulk Insert/BCP, load the text file into the temp table (have an auto-incremental column with Identity (1,1) set up for it -> this will help in Step #2).
    2) Run a SQL to eliminate any dupes from it.
    3) Run the SQL to load the records into the main table.
     
    1) and 3) you already know.  For #2, you can use some-piece of code like this:
     
    DELETE FROM MikeHistory_Temp 
    WHERE EXISTS (SELECT 1 FROM MikeHistory_Temp  DUPSINNER

      WHERE DUPSINNER.ReportingAccount = MikeHistory_Temp.ReportingAccount

        AND DUPSINNER.TradeDate = MikeHistory_Temp.TradeDate

        AND DUPSINNER.TransactionReferenceNumber = MikeHistory_Temp.TransactionReferenceNumber

        AND DUPSINNER.ROWNUM < MikeHistory_Temp.ROWNUM)

    In here, ROWNUM is the auto-incremental identity column and the join is on the columns that uniquely identify a record in your table.
     
    If there are lots of records in the temp table, then you can use SET ROWCOUNT and a WHILE loop and Delete and COMMIT in chunks.  If the temp table is going to have data from previous runs as well, then include the filter criteria (like the date criteria that you have shown in your example) and make sure that column is indexed.
     
    Also, for #3, since you are querying on a date field, if you have a lot of records in this temp table, make sure you index that field for faster performance. 
     
  • Adam, I think my post the other day was similar to what you encountered .... trying to separate different text records to be imported into SQL.  Richard Kemp responded and suggested using:

    FIND.EXE "TextString" extract_filename.txt >just_the_detail.txt 

    to extract specific records from one text file into another, then that "clean" text file can more easily be imported via DTS.   It was a nice clean solution.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply