Loop to insert in batches

  • I am hoping someone can help me with this issue that I having. I am trying to create a table that I need for testing purposes. What I need to do is insert Table A into Table B 10 times. My issue is that Table A is 50 mil records so I want to do it in batches on an ID in the table.

    What I want to do is a loop, that performs in batches, and inserts 10 times. (I have created a loop that can do this using a cartesian join to a table with 10 records in it, which creates my exploded table. Yes very hard on the server and cache).

    For example, this might help what I am trying to get across:

    Table A has 1000 records. Within this 1000 records are 10 unique ids. I need a loop that can pick out an ID and all its records and insert it into Table B ten times.

    CAn anyone help. I'm sorry this is so vague but I'm at home trying to figure out where I might have gone wrong to begin with.

    Thanks,

    Steve

  • Steve,

    I gotta ask... you're saying that you basically want to copy all of table A into table B 10 times (regardless of the loop you requested)???   To what end?  Maybe there's something else we can do without ultimately creating table B as a 500 million row table....

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Do the new ids need to be unique?

    Is this more complicated than:

    insert b select * from A

    insert b select * from A

    insert b select * from A

    ...

    ?

  • Thanks guys. I sat down, and went throught it logically and was able to build the two loops that i needed to complete this. There was some logic I needed to apply and have built into the loops and be able to utilise the cache.

    I just panicked last night instead of sitting down and modeling it out.

    Thanks

  • Can we see the final result.. maybe we can improve it?!

  • Yeah... and how many rows did you ultimately end up with?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is the code. It works fast....50mil start, 500mil final.

    Declare@StartPointint,

    @minstoreint,

    @MaxStoreint,

    @IncrimentSeedint,

    @Countint

    SET@StartPoint=0

    SET@minstore=1

    SET@MaxStore=500

    SET@Count=(Select Max(PlaceID)

    From Schema.SchemaTestStage)

    WHILE@minstore <= @Count

    BEGIN

    SET@IncrimentSeed=0

    WHILE@IncrimentSeed<10

    BEGIN

    BEGIN TRANSACTION

    INSERT INTOSchema.SchemaTest

    (

    PlaceID,

    MONTH_YEAR,

    Number,

    TOTAL,

    UNITS,

    COST,

    NEW_TOTAL,

    NEW_UNITS,

    NEW_COST,

    [Our Number],

    SUPPLIERCD,

    SUPPLIERID,

    PrescriberID

    )

    SELECTPlaceID,

    MONTH_YEAR,

    Number,

    TOTAL,

    UNITS,

    COST,

    NEW_TOTAL,

    NEW_UNITS/10,

    NEW_COST,

    [Our Number],

    SUPPLIERCD,

    SUPPLIERID,

    Cast(PlaceID as varchar) + Cast(Right(ROWID, 1) as varchar)

    FROMSchema.SchemaTestStage

    WHEREPlaceID Between @minstore and @MaxStore

    COMMIT TRANSACTION

    SET @IncrimentSeed = @IncrimentSeed + 1

    END

    SET@StartPoint=@MaxStore

    SET@minstore=@StartPoint + 1

    SET@MaxStore=@StartPoint + 500

    END

  • Thanks for the posted code.... still, why do you need to manufacture 10 rows for each row in the source table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I need to test a process which involves run time. Once the actual "real" table is produced, we will be running reports based on certain metrics. The long and short of it is 6400 runs, based on a table that is 500 mil records. We need a baseline for time to produce.

    The looping structure can be used with a join on another data table to produce the result set of the real table.

  • I'm not QA expert but doesn't having 10 times a small set of data vs having 1 real size data affect the index distribution in such a way it could alter the results of the test?

  • This purpose was not a result set test as it was a resource and system time test. The end result metrics will have to be QA'd, and that table will be the 500mil table.

  • Still, Remi is correct... it will affect even a system time test because of the distribution unless you know for sure that the distibution of the actual data will be very similar.  It was a good idea you had, though... most don't take the time to test on anything more than a couple of dozen rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree. The metrics and distribution were tested on the smaller table to ensure that when we exploded it 10 fold, which is now 100 after last night, would remain the same in baseline. Not enough people do these types of tests to ensure that sytem resources are being allocated properly. A lot of programmers take for granted the SANs and mainframes and the power they house.

    This was run using about 10 different ranges until the optimal one was found. Testing this has been an eye opener for some.

    Thanks a lot for the input here. I am just starting in the sytems architechture game and optimisation of these systems is in turn a DBA's best friend.

  • Spot on especially about testing (or the general lack, there of)... thanks for the feedback. 

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply