September 8, 2011 at 3:09 pm
We are planning to migrate from sql 2000 to sql 2008. One of the tables in a datawarehouse database has 4 billion records. Instead of doing a backup/restore we are planning to populate the new tables of sql 2008 using SELECT * INTO because only selected tables are going to 2008.
I'm not 100% sure if all 4 billion records could be populated in a single shot. My major concern is what if it breaks in the middle and I might have to start over again.
Is it possible to build a query to populate 100000 rows at a time? Eventhough it breaks in the middle, it should let me start from where it has stopped.
Thanks.
September 8, 2011 at 11:35 pm
hi,
try this after u stop any kind of access to the db except u.
stop the app pool.
add a column identity to every table
alter table add columnname int identity default (1,1)
then
after that move the records by that identity column.
insert into tablename select columnname from tablename where identitycolumn between 1 to 10000
insert into tablename select columnname from tablename where identitycolumn between 10000 to 20000
carry on till ur count overs to 100000
after that remove the identity column from tables.
ALTER TABLE tablename DROP COLUMN columnname
September 8, 2011 at 11:54 pm
Depending upon what you are doing you could partition the table(s) are perform a Merge Split.
I did this a few times when this was not an option but I don't have the code.
I created a control table where I stored the batch size.
I also had a column that indicated the Total Number of Records to Process.
I also had a column to store a boolean value (Active_Inactive) so that I could determined whether to terminate the program before it processed the next batch.
I create a Stored that had a While Loop and it each time it would check the Active Column as well as the batch size (in the event that you decided to change the batch size).
I also created a log table so that I could track what occured and when, etc.
I hope this helps.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
September 9, 2011 at 1:06 am
sunny.tjk (9/8/2011)
We are planning to migrate from sql 2000 to sql 2008. One of the tables in a datawarehouse database has 4 billion records. Instead of doing a backup/restore we are planning to populate the new tables of sql 2008 using SELECT * INTO because only selected tables are going to 2008.I'm not 100% sure if all 4 billion records could be populated in a single shot. My major concern is what if it breaks in the middle and I might have to start over again.
Is it possible to build a query to populate 100000 rows at a time? Eventhough it breaks in the middle, it should let me start from where it has stopped.
Thanks.
What does the PK look like on that table?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 9, 2011 at 2:06 am
What about SSIS? I see it as the most sensible option here.
-- Gianluca Sartori
September 9, 2011 at 2:42 am
You've probably thought of this, but... depending on what proportion of the tables and/or data are going to be retained, another option is to do a backup/restore and just drop the tables you don't want.
September 9, 2011 at 2:53 am
Would it not be far easier to restore your backups (I assume you are taking backups of some sort; FULL + DIFF + TRNs) from the current SQL 2000 instance onto the SQL 2008 and carry out your archiving once migrated (using partitioning?)?
You could restore your full + diff, leave the db as non-recovered and apply the trn's up to the point of migration. This option means you can leave your full backup restoring for as long as it needs, there's no rush to do it all in an outage or "window of opportunity".
If you start deleting data from your pre-migrated db how are you going to get it back if you have accidentally removed something you need later? I would go for the option that gives me a solid backout plan, which to me sounds like the backup + restore option. Unless of course I'm missing something completely obious!
Just a thought....
September 11, 2011 at 6:52 pm
Create a second table on the primary server with identity column so that each record can be uniquely identified and then start the insert in batches 10000 at a time. If it fails then identify you start the insert again based on the row ID. It should look something like this. But this is a bit tedious process unlike the backup and restore..
DECLARE @BatchSize int = 10000
WHILE 1 = 1
BEGIN
INSERT INTO [dbo].[Destination] --WITH (TABLOCK) -- Uncomment for 2008
(
FirstName
,LastName
,EmailAddress
,PhoneNumber
)
SELECT TOP(@BatchSize)
s.FirstName
,s.LastName
,s.EmailAddress
,s.PhoneNumber
FROM [dbo].[Source] s
WHERE NOT EXISTS (
SELECT 1
FROM dbo.Destination
WHERE PersonID = s.PersonID
)
IF @@ROWCOUNT < @BatchSize BREAK
END
Thank You,
Best Regards,
SQLBuddy
September 11, 2011 at 7:46 pm
sqlbuddy123 (9/11/2011)
Create a second table on the primary server with identity column so that each record can be uniquely identified and then start the insert in batches 10000 at a time. If it fails then identify you start the insert again based on the row ID. It should look something like this. But this is a bit tedious process unlike the backup and restore..
DECLARE @BatchSize int = 10000
WHILE 1 = 1
BEGIN
INSERT INTO [dbo].[Destination] --WITH (TABLOCK) -- Uncomment for 2008
(
FirstName
,LastName
,EmailAddress
,PhoneNumber
)
SELECT TOP(@BatchSize)
s.FirstName
,s.LastName
,s.EmailAddress
,s.PhoneNumber
FROM [dbo].[Source] s
WHERE NOT EXISTS (
SELECT 1
FROM dbo.Destination
WHERE PersonID = s.PersonID
)
IF @@ROWCOUNT < @BatchSize BREAK
END
Thank You,
Best Regards,
SQLBuddy
Not sure what you meant by the second table?
September 11, 2011 at 7:57 pm
Welsh Corgi (9/8/2011)
Depending upon what you are doing you could partition the table(s) are perform a Merge Split.I did this a few times when this was not an option but I don't have the code.
I created a control table where I stored the batch size.
I also had a column that indicated the Total Number of Records to Process.
I also had a column to store a boolean value (Active_Inactive) so that I could determined whether to terminate the program before it processed the next batch.
I create a Stored that had a While Loop and it each time it would check the Active Column as well as the batch size (in the event that you decided to change the batch size).
I also created a log table so that I could track what occured and when, etc.
I hope this helps.
Sounds like a plan but little complicated for me... 😛
September 11, 2011 at 9:18 pm
sunny.tjk (9/11/2011)
Not sure what you meant by the second table?
Quite literally... make a copy of the table at the schema level. Generate the CREATE TABLE statement for the original table, change the name of the table slightly, and create the "second table" with it.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 11, 2011 at 11:03 pm
Jeff got it perfectly right. Just use CREATE Table and add an identity column to it.
Thank You,
Best Regards,
SQLBuddy
September 12, 2011 at 9:56 am
sqlbuddy123 (9/11/2011)
Jeff got it perfectly right. Just use CREATE Table and add an identity column to it.Thank You,
Best Regards,
SQLBuddy
But should I create that table on the primary sevrer or the destination server? I'm little confused since you replied to my post saying "Create a second table on the primary server with identity column so that each record can be uniquely identified and then start the insert in batches 10000 at a time.
If it fails then identify you start the insert again based on the row ID. It should look something like this. But this is a bit tedious process unlike the backup and
restore.."
Thanks,
Sunny.
September 12, 2011 at 2:16 pm
sqlbuddy123 (9/11/2011)
Jeff got it perfectly right. Just use CREATE Table and add an identity column to it.Thank You,
Best Regards,
SQLBuddy
I'm still confused if I should choose this approach because I think doing a cross server will hurt the performance.
i.e. SELECT * INTO destination table
FROM remoteserver.dbname.sourcetablename.
September 12, 2011 at 4:58 pm
Create the second table on the primary server itself which will be quicker and easy. Creating on the destination server defeats the purpose.
Thank You,
Best Regards,
SQLBuddy
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply