Batch Processing by reading data from a table to a ole db destination

  • Hi,

    I have 1000 records in a table....

    I need to process 1000 records in batch(100 records per batch)..

    Read 100 records from the table and insert into another table in the db...

    Mark these 100 records as processed...

    Again read the next 100 records which are not processed...

    and so on...

    How can it be done using SSIS Controls...

    Can i use a For loop control for the looping purpose..how to make it then to 100 records ?

    Thanks in advance,

    Naveen

  • I wouldn't use SSIS for this. This is one of those instances where I'd use a loop. Can you provide table structures and some more info as to why you need to move the data and why you need to limit to 100 per batch? 1000 rows should move quickly.

  • 100 records was just an example...Actually by 1000 records i meant was to insert large number of records....

    Explaining the working of the application is a below...

    There is a application from which a user will upload an excel file...

    The max size of the file allowed to upload is 2gb...So in turn there can be large amount of records in the excel file...When the user uploads the excel file the SSIS starts working and inserts the data from the excel sheet to the tables in the database..So during the time of this insertion the application can be used

    by other users also..

    ie if the data is being upload to a user table there will be other users who can access the application and create records for users..so which means parallel insertion should be done to the tables...without the table getting locked for one insertion....So thats why i wish to insert data batch by batch say 100 or

    1000 or more...

    The table structure of the table where we insert data from the excel and from that table to the main table is also as below

    [tblUserTemp]

    (

    [UserId] [int] IDENTITY(1,1) NOT NULL,

    [UserName] [nvarchar](75) NULL,

    [Password] [nvarchar](200) NULL,

    [FirstName] [nvarchar](100) NULL,

    [LastName] [nvarchar](100) NULL,

    [Processed] [bit] NOT NULL DEFAULT ((0))

    )

    Thanks,

    Naveen

  • You should be able to accomplish this by seting the Maximum Insert Commit Size setting on your OLE DB Destination Fast-Load component.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • John Rowan (8/10/2009)


    You should be able to accomplish this by seting the Maximum Insert Commit Size setting on your OLE DB Destination Fast-Load component.

    Duh, why didn't I think of that?

  • Jack Corbett (8/10/2009)


    John Rowan (8/10/2009)


    You should be able to accomplish this by seting the Maximum Insert Commit Size setting on your OLE DB Destination Fast-Load component.

    Duh, why didn't I think of that?

    That's why we all work together here, right! I've been "duh'd" too many times:-).

    Anyway, that is assuming that the OLE DB Destination w/ fast-load is being used.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Hi,

    Adding some more information....

    Actually im planning to use an execute sql task so that i can call a stored procedure with that parameters....

    The reason im not using the ole db destination is that im having master-detail relationship for these tables...

    i.e when im inserting data to tblUsers after inserting say 1000 records for tblUsers ill have to insert the corresponding records ( recently inserted 1000 records) of tblUsers from another table ( say tblUserPersonalDetailsTemp) to the main table ( say tblUserPersonalDetails). ie im having identity property for the primary key of these tables..So inserting a record for tblUsers will generate a identity value..This identity value ( say UserId ) is a foreign key in the table tblUserPersonalDetails.

    The value supplied for id from the excel is a temp value just for understanding purpose...So the real id value is generated after insertion of each row as identity value...and i use this identity value as the foreign key column for the other table...

    Im not sure whether this can be preformed without using a sp and using a control in ssis....

    Please let me know if there is another way..

    In short what i need to do..

    1) Read the data from different data sheets in a excel file to a temp table...

    2) Each sheet represents a table..

    3) Insert data from each table in batch to its master tables.

    What i tried

    ------------

    1) Read data from the excel file ( data sheet 1 for tblUsers) to a temp table tblUserTemp -- It is a data flow task.

    2) Read data from the excel file ( data sheet 2 for tblUserPersonalDetails) to a temp table tblUserPersonalDetailsTemp-- It is also a data flow task.

    3)Get all data from the tblUserTemp which are not processed - execute sql task - Using a sp for this..

    4) Using a for loop to iterate through resultset...- FOR EACH loop Container

    5) Inside the for loop use a execute sql task ( call a sp for the purpose of insertion to the master detail tables.) - Sp written as a transaction..

    So each row treated as a transaction...

    Limitations

    ----------

    The process seems to be slow as its takes more than an hour to process 30000 records for the tblUser table...

    Is there any alternative so that i can increase performance ?

    Thanks in advance,

    Naveen

  • Yuk, I don't like the sound of that. If you are bringing all of the details into a temp table first and then processing from there out to master/child tables, what I would do is use the PK on the temp table to help you avoid RBAR updates ....

    Add another indexed field to your master table and, when you create your master records, populate this field with the PK from the temp table.

    Then, when you create your child records, you can look up the 'proper' FK from the master table via the new temp PK field on the master table. You should be able to do this as a set-based create, vastly speeding things up.

    Hope that makes sense.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Naveen,

    And all of this becasue you don't know how to do this in the data flow? How long does it take the data flow to run and populate the 30,000 rows into your temp table? Surely not more than a few seconds, right? I would be the vast majority of the time spent during that hour run is looping through the rows.

    Can you tell us how long the data flow portion of you package takes to load the temp tables?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Hi,

    It just take a few seconds...

    If im doing it after running the cache then its takes just 2-3 seconds only..

    Thanks,

    Naveen

  • Hi Naveen,

    Is there a reason why this has to be part of a transaction? Generally with batch inserts you populate the master table first, then lookup the newly inserted data to populate the child table.

    Consider these steps...

    1. Remove the temp table stage.

    2. Create 2 data flows... In the first data flow, insert your data into the master table from your first excel sheet.

    3. In the second data flow you populate your foreign key table. As you have already populated the master table (in the previous step), you can use a lookup in this data flow to return the identity of the master table records to satisfy the foreign key constraint.

    I'd expect this to take a similar amount of time as it took to populate the temp tables you specified in your earlier post (2-3 seconds I think you said).

    Hope this helps

    Kindest Regards,

    Frank Bazan

  • Naveen,

    Frank is spot on. That is exactly where I was heading! Use the data flows to do the work for you and get rid of the staging table with the loops.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply