Importing data performance

Question

Post reply

Importing data performance

smitty-1088185

SSCrazy

Points: 2321
More actions
April 28, 2011 at 1:21 pm

#249482

We are in the process of importing millions of rows of data into an existing database. To speed up the process I was going to switch the database from full recovery to simple mode and drop all the indexes. I would recreate the indexes after the import and switch back to Full mode. Is this a good idea and how do I script out the existing indexes for recreation?

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

smitty-1088185 SSCrazy Points: 2321 More actions · Answer 1

smitty-1088185

SSCrazy

Points: 2321

April 29, 2011 at 2:08 am

#1317688

Was my question that stupid? lol

sachnam SSCrazy Points: 2317 More actions · Answer 2

smitty-1088185 (4/29/2011)
Was my question that stupid? lol

Ofcourse not! I would recommend you to insert such a huge record set in batches otherwise your recovery model setting and else will not work.

Sql server will consider it as a single transaction and it will remain active until the insertion is completed so even if your recovery model is simple it will keep growing and will not get truncated.

So, better you divide it into batches of lets say 100000 and measure the time and resourcing it is consuming if everything seems to be under control try another batch and continue like that.

If you expect the table to be very big, you can also look for partioning which could help in boosting your query performance..

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 3

Dropping all indexes may help with the inserts, but then again it might not. It really depends on if any of them are necessary for lookup operatoins during the inserts. If not, yes, dropping them and recreating them is the way to go. To script them out you can use the scripting utility built into Management Studio. Right click the database and select tasks, generate script...

You already know about simple recovery from the previous post.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ricer Right there with Babe Points: 722 More actions · Answer 4

I would leave the database at full recovery and separate the insert into smaller batches. Then perform a transaction log backup between the batches.

If you plan on going ahead with the simple recovery plan then make sure you perform a full backup before and after.

-------------------------------------------------
Will C,
MCITP 2008 Database Admin, Developer

Syed Jahanzaib Bin hassan SSChampion Points: 11410 More actions · Answer 5

Did you dropped Clustered Index too ?

You can create all indexes script RIGHT CLICK on the a specific database and then check attachment for further steps

Regards,

Syed Jahanzaib Bin Hassan

MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

My Blog

http://www.aureus-salah.com

My Blog
www.aureus-salah.com

SQLBuddy123 One Orange Chip Points: 28294 More actions · Answer 6

Very well said Sachnam....

Follow his advise. You never should insert millions of records at a time. Insert them in small batches.

Go with Simple recovery model and small batches.

Thank You,

Best Regards,

SQLBuddy

smitty-1088185 SSCrazy Points: 2321 More actions · Answer 7

WOW! That is great information from everyone. You guys rock!!! I will go with simple and batch jobs with backups in between. Now I have to figure out how to seperate the data into smaller chunks 😉

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 8

While you don't specify where the data you're importing is coming from (flatfile, another table, etc). You should use a WHILE loop to batch process your records. You can try something like this:

SET NOCOUNT ON

GO

/* Declarations */

DECLARE

@batchsize int, @PhyDB sysname, @Table varchar(25),

@Schema varchar(3),@sql nvarchar(500), @Error int,

@RowCount int, @LoopCount int

/*Set the variables */

SELECT

@PhyDB = 'MyDB', @Schema = 'dbo', @Table = 'MyTable',

@BatchSize = 10000, @RowCount = 0, @LoopCount = 0

/* Create the statements */

SET @sql = 'SQL Statement to SELECT and INSERT your data '

WHILE @batchsize <> 0

BEGIN

/* Delete the data */

EXEC sp_executesql @sql

SELECT @Error = @@ERROR, @batchsize = @@ROWCOUNT

SELECT @RowCount = @RowCount + @batchsize, @LoopCount = @LoopCount + 1

IF @LoopCount = 10

BEGIN

CHECKPOINT

PRINT('CHECKPOINT REACHED (100,000 rows deleted)')

SET @LoopCount = 0

END

PRINT ('Records Deleted: ' + CAST(@Batchsize as varchar(25)) + ', Total Count: ' + CAST(@RowCount as varchar(25)))

END

SET NOCOUNT OFF

GO

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

silviu.stochita Grasshopper Points: 22 More actions · Answer 9

If you drop and recreate the indexes you might want to recompute statistics

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 10

Syed Jahanzaib Bin hassan (4/30/2011)

Did you dropped Clustered Index too?....

Regards,

Syed Jahanzaib Bin Hassan

MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

My Blog

http://www.aureus-salah.com[/quote%5D

Syed,

You remarked about dropping the clustered index. I've seen the subject of inserting into a heap come up before and never seen a definitive answer.

Question, anyone:

Is bulk insertion into a heap more efficient than bulk insertion into a clustered index?

LC

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 11

crainlee2 (5/3/2011)
Question, anyone:
Is bulk insertion into a heap more efficient than bulk insertion into a clustered index?

Depends on a few things.

One is it will depend on if you're inserting to a clean table or not. In particular:

If the table has a clustered index and is empty, both data and index pages are minimally logged. In contrast, if a table has a clustered index and is non-empty, data pages and index pages are both fully logged regardless of the recovery model

Since this usually only applies to staging tables, it's worth knowing about but probably not an affect on what you're dealing with.

Check out this link for the full ruleset:

http://msdn.microsoft.com/en-us/library/ms191244.aspx

Another thing that will matter heavily is how your clustered index is organized. If you're inserting all over the place, it will depend on data volume if a rebuild is better than just working through it. If you're only inserting to the tail, it doesn't matter as much.

The big deal about the clustered index existence is page splitting during large inserts. The reason you can't find a definitive answer is "It Depends." 🙂

EDIT: Bad brain, almost reintroduced a myth, sorry.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 12

Gail Wanabee

SSCrazy Eights

Points: 8471

May 4, 2011 at 10:38 am

#1319910

Thanks, Craig.