SELECT INTO

Question

Post reply

SELECT INTO

sunny.tjk

SSCertifiable

Points: 5041
More actions
September 8, 2011 at 3:09 pm

#247529

We are planning to migrate from sql 2000 to sql 2008. One of the tables in a datawarehouse database has 4 billion records. Instead of doing a backup/restore we are planning to populate the new tables of sql 2008 using SELECT * INTO because only selected tables are going to 2008.
I'm not 100% sure if all 4 billion records could be populated in a single shot. My major concern is what if it breaks in the middle and I might have to start over again.
Is it possible to build a query to populate 100000 rows at a time? Eventhough it breaks in the middle, it should let me start from where it has stopped.
Thanks.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Ivan Mohapatra SSCarpal Tunnel Points: 4452 More actions · Answer 1

hi,

try this after u stop any kind of access to the db except u.

stop the app pool.

add a column identity to every table

alter table add columnname int identity default (1,1)

then

after that move the records by that identity column.

insert into tablename select columnname from tablename where identitycolumn between 1 to 10000

insert into tablename select columnname from tablename where identitycolumn between 10000 to 20000

carry on till ur count overs to 100000

after that remove the identity column from tables.

ALTER TABLE tablename DROP COLUMN columnname

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 2

Depending upon what you are doing you could partition the table(s) are perform a Merge Split.

http://blogs.msdn.com/b/felixmar/archive/2011/02/14/partitioning-amp-archiving-tables-in-sql-server-part-1-the-basics.aspx

http://blogs.msdn.com/b/felixmar/archive/2011/08/29/partitioning-amp-archiving-tables-in-sql-server-part-2-split-merge-and-switch-partitions.aspx

I did this a few times when this was not an option but I don't have the code.

I created a control table where I stored the batch size.

I also had a column that indicated the Total Number of Records to Process.

I also had a column to store a boolean value (Active_Inactive) so that I could determined whether to terminate the program before it processed the next batch.

I create a Stored that had a While Loop and it each time it would check the Active Column as well as the batch size (in the event that you decided to change the batch size).

I also created a log table so that I could track what occured and when, etc.

I hope this helps.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

sunny.tjk (9/8/2011)
We are planning to migrate from sql 2000 to sql 2008. One of the tables in a datawarehouse database has 4 billion records. Instead of doing a backup/restore we are planning to populate the new tables of sql 2008 using SELECT * INTO because only selected tables are going to 2008.
I'm not 100% sure if all 4 billion records could be populated in a single shot. My major concern is what if it breaks in the middle and I might have to start over again.
Is it possible to build a query to populate 100000 rows at a time? Eventhough it breaks in the middle, it should let me start from where it has stopped.
Thanks.

What does the PK look like on that table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

spaghettidba SSC Guru Points: 105732 More actions · Answer 4

What about SSIS? I see it as the most sensible option here.

-- Gianluca Sartori

Ian Scarlett SSC-Insane Points: 23197 More actions · Answer 5

You've probably thought of this, but... depending on what proportion of the tables and/or data are going to be retained, another option is to do a backup/restore and just drop the tables you don't want.

Paul Duffett Mr or Mrs. 500 Points: 536 More actions · Answer 6

Would it not be far easier to restore your backups (I assume you are taking backups of some sort; FULL + DIFF + TRNs) from the current SQL 2000 instance onto the SQL 2008 and carry out your archiving once migrated (using partitioning?)?

You could restore your full + diff, leave the db as non-recovered and apply the trn's up to the point of migration. This option means you can leave your full backup restoring for as long as it needs, there's no rush to do it all in an outage or "window of opportunity".

If you start deleting data from your pre-migrated db how are you going to get it back if you have accidentally removed something you need later? I would go for the option that gives me a solid backout plan, which to me sounds like the backup + restore option. Unless of course I'm missing something completely obious!

Just a thought....

SQLBuddy123 One Orange Chip Points: 28294 More actions · Answer 7

Create a second table on the primary server with identity column so that each record can be uniquely identified and then start the insert in batches 10000 at a time. If it fails then identify you start the insert again based on the row ID. It should look something like this. But this is a bit tedious process unlike the backup and restore..

DECLARE @BatchSize int = 10000

WHILE 1 = 1

BEGIN

INSERT INTO [dbo].[Destination] --WITH (TABLOCK) -- Uncomment for 2008

(

FirstName

,LastName

,EmailAddress

,PhoneNumber

)

SELECT TOP(@BatchSize)

s.FirstName

,s.LastName

,s.EmailAddress

,s.PhoneNumber

FROM [dbo].[Source] s

WHERE NOT EXISTS (

SELECT 1

FROM dbo.Destination

WHERE PersonID = s.PersonID

)

IF @@ROWCOUNT < @BatchSize BREAK

END

Thank You,

Best Regards,

SQLBuddy

sunny.tjk SSCertifiable Points: 5041 More actions · Answer 8

sqlbuddy123 (9/11/2011)
Create a second table on the primary server with identity column so that each record can be uniquely identified and then start the insert in batches 10000 at a time. If it fails then identify you start the insert again based on the row ID. It should look something like this. But this is a bit tedious process unlike the backup and restore..
DECLARE @BatchSize int = 10000
WHILE 1 = 1
BEGIN
INSERT INTO [dbo].[Destination] --WITH (TABLOCK) -- Uncomment for 2008
(
FirstName
,LastName
,EmailAddress
,PhoneNumber
)
SELECT TOP(@BatchSize)
s.FirstName
,s.LastName
,s.EmailAddress
,s.PhoneNumber
FROM [dbo].[Source] s
WHERE NOT EXISTS (
SELECT 1
FROM dbo.Destination
WHERE PersonID = s.PersonID
)
IF @@ROWCOUNT < @BatchSize BREAK
END
Thank You,
Best Regards,
SQLBuddy

Not sure what you meant by the second table?

sunny.tjk SSCertifiable Points: 5041 More actions · Answer 9

Welsh Corgi (9/8/2011)
Depending upon what you are doing you could partition the table(s) are perform a Merge Split.
http://blogs.msdn.com/b/felixmar/archive/2011/02/14/partitioning-amp-archiving-tables-in-sql-server-part-1-the-basics.aspx
http://blogs.msdn.com/b/felixmar/archive/2011/08/29/partitioning-amp-archiving-tables-in-sql-server-part-2-split-merge-and-switch-partitions.aspx
I did this a few times when this was not an option but I don't have the code.
I created a control table where I stored the batch size.
I also had a column that indicated the Total Number of Records to Process.
I also had a column to store a boolean value (Active_Inactive) so that I could determined whether to terminate the program before it processed the next batch.
I create a Stored that had a While Loop and it each time it would check the Active Column as well as the batch size (in the event that you decided to change the batch size).
I also created a log table so that I could track what occured and when, etc.
I hope this helps.

Sounds like a plan but little complicated for me... 😛

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 10

sunny.tjk (9/11/2011)
Not sure what you meant by the second table?

Quite literally... make a copy of the table at the schema level. Generate the CREATE TABLE statement for the original table, change the name of the table slightly, and create the "second table" with it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SQLBuddy123 One Orange Chip Points: 28294 More actions · Answer 11

Jeff got it perfectly right. Just use CREATE Table and add an identity column to it.

Thank You,

Best Regards,

SQLBuddy

sunny.tjk SSCertifiable Points: 5041 More actions · Answer 12

sqlbuddy123 (9/11/2011)
Jeff got it perfectly right. Just use CREATE Table and add an identity column to it.
Thank You,
Best Regards,
SQLBuddy

But should I create that table on the primary sevrer or the destination server? I'm little confused since you replied to my post saying "Create a second table on the primary server with identity column so that each record can be uniquely identified and then start the insert in batches 10000 at a time.

If it fails then identify you start the insert again based on the row ID. It should look something like this. But this is a bit tedious process unlike the backup and

restore.."

Thanks,

Sunny.

sunny.tjk SSCertifiable Points: 5041 More actions · Answer 13

sqlbuddy123 (9/11/2011)
Jeff got it perfectly right. Just use CREATE Table and add an identity column to it.
Thank You,
Best Regards,
SQLBuddy

I'm still confused if I should choose this approach because I think doing a cross server will hurt the performance.

i.e. SELECT * INTO destination table

FROM remoteserver.dbname.sourcetablename.

SQLBuddy123 One Orange Chip Points: 28294 More actions · Answer 14

Create the second table on the primary server itself which will be quicker and easy. Creating on the destination server defeats the purpose.

Thank You,

Best Regards,

SQLBuddy