SELECT INTO

  • We are planning to migrate from sql 2000 to sql 2008. One of the tables in a datawarehouse database has 4 billion records. Instead of doing a backup/restore we are planning to populate the new tables of sql 2008 using SELECT * INTO because only selected tables are going to 2008.

    I'm not 100% sure if all 4 billion records could be populated in a single shot. My major concern is what if it breaks in the middle and I might have to start over again.

    Is it possible to build a query to populate 100000 rows at a time? Eventhough it breaks in the middle, it should let me start from where it has stopped.

    Thanks.

  • hi,

    try this after u stop any kind of access to the db except u.

    stop the app pool.

    add a column identity to every table

    alter table add columnname int identity default (1,1)

    then

    after that move the records by that identity column.

    insert into tablename select columnname from tablename where identitycolumn between 1 to 10000

    insert into tablename select columnname from tablename where identitycolumn between 10000 to 20000

    carry on till ur count overs to 100000

    after that remove the identity column from tables.

    ALTER TABLE tablename DROP COLUMN columnname

  • Depending upon what you are doing you could partition the table(s) are perform a Merge Split.

    http://blogs.msdn.com/b/felixmar/archive/2011/02/14/partitioning-amp-archiving-tables-in-sql-server-part-1-the-basics.aspx

    http://blogs.msdn.com/b/felixmar/archive/2011/08/29/partitioning-amp-archiving-tables-in-sql-server-part-2-split-merge-and-switch-partitions.aspx

    I did this a few times when this was not an option but I don't have the code.

    I created a control table where I stored the batch size.

    I also had a column that indicated the Total Number of Records to Process.

    I also had a column to store a boolean value (Active_Inactive) so that I could determined whether to terminate the program before it processed the next batch.

    I create a Stored that had a While Loop and it each time it would check the Active Column as well as the batch size (in the event that you decided to change the batch size).

    I also created a log table so that I could track what occured and when, etc.

    I hope this helps.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • sunny.tjk (9/8/2011)


    We are planning to migrate from sql 2000 to sql 2008. One of the tables in a datawarehouse database has 4 billion records. Instead of doing a backup/restore we are planning to populate the new tables of sql 2008 using SELECT * INTO because only selected tables are going to 2008.

    I'm not 100% sure if all 4 billion records could be populated in a single shot. My major concern is what if it breaks in the middle and I might have to start over again.

    Is it possible to build a query to populate 100000 rows at a time? Eventhough it breaks in the middle, it should let me start from where it has stopped.

    Thanks.

    What does the PK look like on that table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What about SSIS? I see it as the most sensible option here.

    -- Gianluca Sartori

  • You've probably thought of this, but... depending on what proportion of the tables and/or data are going to be retained, another option is to do a backup/restore and just drop the tables you don't want.

  • Would it not be far easier to restore your backups (I assume you are taking backups of some sort; FULL + DIFF + TRNs) from the current SQL 2000 instance onto the SQL 2008 and carry out your archiving once migrated (using partitioning?)?

    You could restore your full + diff, leave the db as non-recovered and apply the trn's up to the point of migration. This option means you can leave your full backup restoring for as long as it needs, there's no rush to do it all in an outage or "window of opportunity".

    If you start deleting data from your pre-migrated db how are you going to get it back if you have accidentally removed something you need later? I would go for the option that gives me a solid backout plan, which to me sounds like the backup + restore option. Unless of course I'm missing something completely obious!

    Just a thought....

  • Create a second table on the primary server with identity column so that each record can be uniquely identified and then start the insert in batches 10000 at a time. If it fails then identify you start the insert again based on the row ID. It should look something like this. But this is a bit tedious process unlike the backup and restore..

    DECLARE @BatchSize int = 10000

    WHILE 1 = 1

    BEGIN

    INSERT INTO [dbo].[Destination] --WITH (TABLOCK) -- Uncomment for 2008

    (

    FirstName

    ,LastName

    ,EmailAddress

    ,PhoneNumber

    )

    SELECT TOP(@BatchSize)

    s.FirstName

    ,s.LastName

    ,s.EmailAddress

    ,s.PhoneNumber

    FROM [dbo].[Source] s

    WHERE NOT EXISTS (

    SELECT 1

    FROM dbo.Destination

    WHERE PersonID = s.PersonID

    )

    IF @@ROWCOUNT < @BatchSize BREAK

    END

    Thank You,

    Best Regards,

    SQLBuddy

  • sqlbuddy123 (9/11/2011)


    Create a second table on the primary server with identity column so that each record can be uniquely identified and then start the insert in batches 10000 at a time. If it fails then identify you start the insert again based on the row ID. It should look something like this. But this is a bit tedious process unlike the backup and restore..

    DECLARE @BatchSize int = 10000

    WHILE 1 = 1

    BEGIN

    INSERT INTO [dbo].[Destination] --WITH (TABLOCK) -- Uncomment for 2008

    (

    FirstName

    ,LastName

    ,EmailAddress

    ,PhoneNumber

    )

    SELECT TOP(@BatchSize)

    s.FirstName

    ,s.LastName

    ,s.EmailAddress

    ,s.PhoneNumber

    FROM [dbo].[Source] s

    WHERE NOT EXISTS (

    SELECT 1

    FROM dbo.Destination

    WHERE PersonID = s.PersonID

    )

    IF @@ROWCOUNT < @BatchSize BREAK

    END

    Thank You,

    Best Regards,

    SQLBuddy

    Not sure what you meant by the second table?

  • Welsh Corgi (9/8/2011)


    Depending upon what you are doing you could partition the table(s) are perform a Merge Split.

    http://blogs.msdn.com/b/felixmar/archive/2011/02/14/partitioning-amp-archiving-tables-in-sql-server-part-1-the-basics.aspx

    http://blogs.msdn.com/b/felixmar/archive/2011/08/29/partitioning-amp-archiving-tables-in-sql-server-part-2-split-merge-and-switch-partitions.aspx

    I did this a few times when this was not an option but I don't have the code.

    I created a control table where I stored the batch size.

    I also had a column that indicated the Total Number of Records to Process.

    I also had a column to store a boolean value (Active_Inactive) so that I could determined whether to terminate the program before it processed the next batch.

    I create a Stored that had a While Loop and it each time it would check the Active Column as well as the batch size (in the event that you decided to change the batch size).

    I also created a log table so that I could track what occured and when, etc.

    I hope this helps.

    Sounds like a plan but little complicated for me... 😛

  • sunny.tjk (9/11/2011)


    Not sure what you meant by the second table?

    Quite literally... make a copy of the table at the schema level. Generate the CREATE TABLE statement for the original table, change the name of the table slightly, and create the "second table" with it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff got it perfectly right. Just use CREATE Table and add an identity column to it.

    Thank You,

    Best Regards,

    SQLBuddy

  • sqlbuddy123 (9/11/2011)


    Jeff got it perfectly right. Just use CREATE Table and add an identity column to it.

    Thank You,

    Best Regards,

    SQLBuddy

    But should I create that table on the primary sevrer or the destination server? I'm little confused since you replied to my post saying "Create a second table on the primary server with identity column so that each record can be uniquely identified and then start the insert in batches 10000 at a time.

    If it fails then identify you start the insert again based on the row ID. It should look something like this. But this is a bit tedious process unlike the backup and

    restore.."

    Thanks,

    Sunny.

  • sqlbuddy123 (9/11/2011)


    Jeff got it perfectly right. Just use CREATE Table and add an identity column to it.

    Thank You,

    Best Regards,

    SQLBuddy

    I'm still confused if I should choose this approach because I think doing a cross server will hurt the performance.

    i.e. SELECT * INTO destination table

    FROM remoteserver.dbname.sourcetablename.

  • Create the second table on the primary server itself which will be quicker and easy. Creating on the destination server defeats the purpose.

    Thank You,

    Best Regards,

    SQLBuddy

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply