Avoiding duplicates stop and restart an ssis package that is using batch inserts

  • Hi,

    I an using VS 2022 to build a package that selects and then inserts a large number of records into a table. The volume of records in the source table exceeds 1 billion. As this will take a long time to process, I have to do this in a maintenance window of 2 hours. So I have to stop and restart the package, but this causes duplicate records to be inserted into the target table. SQL version is 2019

    Is there a way that I can avoid getting duplicates when stopping and restring the package. Any help would be appreciated

    Thanks

    Kindest Regards,RJ

  • Hi, I can think of this

    1. Can you use Batch processing and checkpoints those batches, like a batch of 10 millions records wrapped inside a single transaction, so all or nothing for that particular batch and checkpoint will help you determine where to start again.
    2. Or if I remember correctly, SSIS has build in checkpoints to resume where we left.
    3. Or use Table partition and have a control table to check which partition you were writing before restart and then start from there.

    Thanks,
    Abhinav

  • Can you somehow modify the process to make it incremental, or are all the rows always new?

    If you have space, perhaps you could try running the insert to a staging table outside of the maintenance window and then use T-SQL to update the main table – this may or may not improve things.

    If the > 1 billion rows is the complete record set for the table, have you considered having two identically structured tables and using aliases to switch between them? This would also enable you to do things outside of your maintenance window. I can outline the process in more detail if it's a possibility.


Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply