capturing duplicate rows in bulk inserts

  • I'm building a process that imports rows from a .csv file. the bulk of this data should be new, but there may be instances where a few of them would be updates, so they would already exist in the destination table.  I can uniquely identify them from a reference number & date combination.  In this instance I would want to put the identified rows into a temporary holding table, then as part of my import process, I would revisit the duplicates and use them to perform an update. My process reads blocks of data from the csv file (batches of 1000) and puts them into a staging table.  I then have a stored procedure that inserts these roles from the staging table into the live table.  This is performed inside a transaction.  So I may have 1000 rows as part of my insert, 10 of these may fail because of a key violation (in which case they will be updates).  At the moment, if this process fails on 10 rows, the entire transaction is rolled back and im left with the original 1000 rows in the staging table.  Is it possible to just somehow pick out the 10 that have failed and allow the others to be inserted ? Or is there some other way i can identify these specific rows ?

  • Sounds like your process moving data from your staging table to your production table is just inserting all the data, it's not doing any checks for the existance. You'll need to add a WHERE clause to check for the existence of the key in your production, and ignore those records. You may want to even check for those first, as if you do it later 9after the data has been inserted), all the keys will be found. In VERY simple (and untested) terms:
    --Find the rows to UPDATE and store them elsewhere.
    INSERT INTO UpdateTable
    SELECT *
    FROM StagingTable ST
    WHERE ST.YourKeyColumn IN (SELECT PT.YourKeyColumn
                               FROM ProductionTable);

    --INSERT the new rows.
    INSERT INTO ProductionTable
    SELECT *
    FROM StagingTable ST
    WHERE ST.YourKeyColumn NOT IN (SELECT PT.YourKeyColumn
                                   FROM ProductionTable);

    --Do your updates
    UPDATE ProductionTable
    SET YourColumn1 = UT.YourColumn1,
        YourColumn2 = UT.YourColumn2,
        YourColumn3 = UT.YourColumn3,
        ...
    FROM UpdateTable UT
    WHERE UT.YourKeyColumn = ProductionTable.YourKeyColumn;

    --Clean out the staging Tables
    TRUNCATE UpdateTAble;
    TRUNCATE Stagingtable;
    GO

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • post deleted...  misread the original post

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply