Batch Processing in SSIS

  • Hi Everyone,

    I was trying to develop a package which process 500 records each time and insert into the destination using the for each loop.

    First of all, is it the good way to accomplish the task of Batch Processing.

    Kindly Let me know what is the best way or approach to accompish this task. Let me also share what i have tried :

    -> I have For each loop in my package , Configured with enumerator "ADO Enumerator",select variable "user :: BatchSize" and the property "Rows in the First Table" is checked.

    And i have data flow task and inside this i have oledb source an oledb destination configured properly.

    When i ran this package , i get an error : "Error: Variable "User::BatchSize" does not contain a valid data object". I am unable to grasp , why is this error.

    Please help me in resolving this issue , also need your guidlines to accomplish the task "batch process".

    Regards

    Chaithanya M

  • I wouldn't use a For Each, but a for loop, as you know upfront how many batches of 500 rows you need to transfer.

    In the OLE DB Source, I would do something like this:

    DECLARE @batchsize = ?;

    SELECT TOP(@batchsize) ...

    FROM ...

    In the parameter mapping, map your SSIS variable BatchSize to the 0 ordinal (this corresponds with the question mark placeholder).

    After the dataflow, I would either delete the transferred rows or mark them as transferred.

    By the way, 500 rows is extremely small for SSIS. You can easily take 50000 rows in one batch or even more. What are you doing inside the dataflow?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hi Koen,

    Thank you very much for replying. I tried the way you suggested i.e .

    I took a Data Flow Task in Control Flow and in Data Flow , i took oledb source and Configured Data Access mode as 'Sql Command' and used the following batch :

    DECLARE @Batchsize BIGINT=?

    SELECT TOP(@Batchsize) * FROM chk1

    When i parsed this i get this error :

    "Syntax error, Permission Violation, or other nonspecific error" .

    Please suggest me how to resolve this . Also i just want to confirm if i understood the process that you sugggestd ,

    1. You asked me to take up Data Flow task in Control Flow

    2.Create a Package Variable "BatchSize "

    3.In Data FLow oledb source and destination , oledb source configured as Data Access mode ->'Sql Command' , and use the above batch T-sql.

    3.Configure the destination

    4. Use a Execute-SQL taks to Delete the processed Rows using the @Batchsize Varible value again.

    Please confirm if above steps are correct.

    Regards

    Chaithanya M

  • Don't forget to do the parameter mapping in the OLE DB source:

    SSIS – Pass a variable to a OLE DB Source in a Data Flow

    [/url]

    This can be a bit buggy from time to time, so an alternative is to create dynamic SQL with a variable:

    SSIS: Using dynamic SQL in an OLE DB Source component

    Regarding the delete step:

    you of course don't want to delete random 500 rows, so you need to know which one you selected in the dataflow. In the dataflow, use a multicast to duplicate the stream. Write the PK of the source to a staging table. Now, do a delete with an inner join on the staging table:

    DELETE FROM myTable

    FROM

    myTable s

    INNER JOIN

    myStagingTable t

    ON s.ID = t.ID

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I forgot:

    if the source and destination table are on the same server, you can replace the DELETE statement with a trigger as an alternative.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hi Koen,

    Is there any way of doing it without deleting the rows from the source table.

    How to mark the rows as transferred instead of deleting it. I should not delete the rows.

    Thanks in advance

  • preethi.kaliyamoorthy (5/13/2015)


    Hi Koen,

    Is there any way of doing it without deleting the rows from the source table.

    How to mark the rows as transferred instead of deleting it. I should not delete the rows.

    Thanks in advance

    Change the DELETE statement I posted earlier to an UPDATE statement?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen,

    I am sorry. I couldnt understand.

    Can you please explain.

  • preethi.kaliyamoorthy (5/13/2015)


    Koen,

    I am sorry. I couldnt understand.

    Can you please explain.

    UPDATE s

    SET IsTransferred = 1

    FROM

    myTable s

    INNER JOIN

    myStagingTable t

    ON s.ID = t.ID;

    Do you understand the code?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I do understand it. But my doubt is if we set the IsTransferred property to 1 for the transferred rows, those rows will not be selected while fetching the next set of rows ? Please correct me if I am wrong.

  • preethi.kaliyamoorthy (5/13/2015)


    I do understand it. But my doubt is if we set the IsTransferred property to 1 for the transferred rows, those rows will not be selected while fetching the next set of rows ? Please correct me if I am wrong.

    Isn't that the idea?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Sorry if I am asking silly doubts. What is IsTransferred here. Is it a property for every table. I am just a beginner. I dont get it.

  • preethi.kaliyamoorthy (5/13/2015)


    Sorry if I am asking silly doubts. What is IsTransferred here. Is it a property for every table. I am just a beginner. I dont get it.

    You asked yourself:

    How to mark the rows as transferred instead of deleting it. I should not delete the rows.

    The code I gave you marks the rows as transferred. Whatever you do with it and how that flag behaves is up to you.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • preethi.kaliyamoorthy (5/13/2015)


    Sorry if I am asking silly doubts. What is IsTransferred here. Is it a property for every table. I am just a beginner. I dont get it.

    IsTransferred is not a built-in column on every table. If you have a requirement to only transfer routed one to need to engineer something to help you keep track of what you previously transferred. Adding a BIT column to the source table called IsTransferred that is initialized to 0 that you can update to 1 when you transfer the row is one way. Deleting the source row is technically another way.

    One option I use often that is not obtrusive to the source table but that requires a unique key to be present in the source table is to store the unique key for the rows I have transferred. When I select rows for transfer I exclude those already-transferred rows.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks Orlando.

    I understand that we need to add that BIT column to the source table. But I shouldnt change the structure of the source table. How can I do that.

    And the method you mentioned, can you explain that.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply