Importing data performance

  • We are in the process of importing millions of rows of data into an existing database. To speed up the process I was going to switch the database from full recovery to simple mode and drop all the indexes. I would recreate the indexes after the import and switch back to Full mode. Is this a good idea and how do I script out the existing indexes for recreation?

  • Was my question that stupid? lol

  • smitty-1088185 (4/29/2011)


    Was my question that stupid? lol

    Ofcourse not! I would recommend you to insert such a huge record set in batches otherwise your recovery model setting and else will not work.

    Sql server will consider it as a single transaction and it will remain active until the insertion is completed so even if your recovery model is simple it will keep growing and will not get truncated.

    So, better you divide it into batches of lets say 100000 and measure the time and resourcing it is consuming if everything seems to be under control try another batch and continue like that.

    If you expect the table to be very big, you can also look for partioning which could help in boosting your query performance..

  • Dropping all indexes may help with the inserts, but then again it might not. It really depends on if any of them are necessary for lookup operatoins during the inserts. If not, yes, dropping them and recreating them is the way to go. To script them out you can use the scripting utility built into Management Studio. Right click the database and select tasks, generate script...

    You already know about simple recovery from the previous post.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I would leave the database at full recovery and separate the insert into smaller batches. Then perform a transaction log backup between the batches.

    If you plan on going ahead with the simple recovery plan then make sure you perform a full backup before and after.

    -------------------------------------------------
    Will C,
    MCITP 2008 Database Admin, Developer

  • Did you dropped Clustered Index too ?

    You can create all indexes script RIGHT CLICK on the a specific database and then check attachment for further steps

    Regards,

    Syed Jahanzaib Bin Hassan

    MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

    My Blog

    http://www.aureus-salah.com

    Regards,
    Syed Jahanzaib Bin Hassan
    BSCS | MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

    My Blog
    www.aureus-salah.com

  • Very well said Sachnam....

    Follow his advise. You never should insert millions of records at a time. Insert them in small batches.

    Go with Simple recovery model and small batches.

    Thank You,

    Best Regards,

    SQLBuddy

  • WOW! That is great information from everyone. You guys rock!!! I will go with simple and batch jobs with backups in between. Now I have to figure out how to seperate the data into smaller chunks 😉

  • While you don't specify where the data you're importing is coming from (flatfile, another table, etc). You should use a WHILE loop to batch process your records. You can try something like this:

    SET NOCOUNT ON

    GO

    /* Declarations */

    DECLARE

    @batchsize int, @PhyDB sysname, @Table varchar(25),

    @Schema varchar(3),@sql nvarchar(500), @Error int,

    @RowCount int, @LoopCount int

    /*Set the variables */

    SELECT

    @PhyDB = 'MyDB', @Schema = 'dbo', @Table = 'MyTable',

    @BatchSize = 10000, @RowCount = 0, @LoopCount = 0

    /* Create the statements */

    SET @sql = 'SQL Statement to SELECT and INSERT your data '

    WHILE @batchsize <> 0

    BEGIN

    /* Delete the data */

    EXEC sp_executesql @sql

    SELECT @Error = @@ERROR, @batchsize = @@ROWCOUNT

    SELECT @RowCount = @RowCount + @batchsize, @LoopCount = @LoopCount + 1

    IF @LoopCount = 10

    BEGIN

    CHECKPOINT

    PRINT('CHECKPOINT REACHED (100,000 rows deleted)')

    SET @LoopCount = 0

    END

    PRINT ('Records Deleted: ' + CAST(@Batchsize as varchar(25)) + ', Total Count: ' + CAST(@RowCount as varchar(25)))

    END

    SET NOCOUNT OFF

    GO

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • If you drop and recreate the indexes you might want to recompute statistics

  • Syed Jahanzaib Bin hassan (4/30/2011)


    Did you dropped Clustered Index too?....

    Regards,

    Syed Jahanzaib Bin Hassan

    MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

    My Blog

    http://www.aureus-salah.com[/quote%5D

    Syed,

    You remarked about dropping the clustered index. I've seen the subject of inserting into a heap come up before and never seen a definitive answer.

    Question, anyone:

    Is bulk insertion into a heap more efficient than bulk insertion into a clustered index?

    LC

  • crainlee2 (5/3/2011)


    Question, anyone:

    Is bulk insertion into a heap more efficient than bulk insertion into a clustered index?

    Depends on a few things.

    One is it will depend on if you're inserting to a clean table or not. In particular:

    If the table has a clustered index and is empty, both data and index pages are minimally logged. In contrast, if a table has a clustered index and is non-empty, data pages and index pages are both fully logged regardless of the recovery model

    Since this usually only applies to staging tables, it's worth knowing about but probably not an affect on what you're dealing with.

    Check out this link for the full ruleset:

    http://msdn.microsoft.com/en-us/library/ms191244.aspx

    Another thing that will matter heavily is how your clustered index is organized. If you're inserting all over the place, it will depend on data volume if a rebuild is better than just working through it. If you're only inserting to the tail, it doesn't matter as much.

    The big deal about the clustered index existence is page splitting during large inserts. The reason you can't find a definitive answer is "It Depends." 🙂

    EDIT: Bad brain, almost reintroduced a myth, sorry.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks, Craig.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply