Inserting records into a large table

  • Let's say i have a table with 20 million records. I have a process that adds about 50,000 records every day to that table. Before the insert I drop the indexes on the table and re-build the indexes after the insert. Is it always the best thing to drop and re-create indexes in this situation? What if it takes on 10 minutes to insert the extra records with the indexes intact, but it takes up to 3 hours to re-build the indexes after the insert. Is it still necessary to drop and re-create the indexes? Is it necessary to redo the statistics on the tables too?

  • How many indexes and what order are you dropping and rebuilding them (include if the index is nonclustered or clustered).

  • Where I work, I started this same type of operation based on suggestions of other people while our tables were smaller. After a while, I realized this was not the best approach for our system. We have tables with approximately half-billion records. Our inserts range from a million to 4 million and take a few minutes to 20 minutes. We are on a SAN with many drives. These particular large tables have a clustered and just 3 non-clustered indexes.

    The only way to determine what will work best is analyzing your indexes. The more non-clustered indexes you have, the longer time it will take for the inserts to complete. If you do decide to drop/re-create the indexes, make sure you create the clustered index first (if you have one) since it physically sorts the data. The non-clustered indexes point to the physical location, so having the clustered index first will reduce extra work.

  • richard garth (11/5/2011)


    What if it takes on 10 minutes to insert the extra records with the indexes intact, but it takes up to 3 hours to re-build the indexes after the insert

    Gosh, Richard. I'm surprised you even asked that question. Based on the numbers you gave, which do YOU think is the more effective of the two methods? 😉

    Shifting gears.... it shouldn't take 10 minutes to insert 50,000 rows into a table no matter how big it is. Such long times are usually caused by the clustered index being on the wrong columns to allow quick inserts or some really poorly formed non-clustered indexes based on columns with a very low level of unique values.

    There are techniques to help avoid such long inserts. Clustered index based on something temporal. Partitioned views and Partitioned tables. NOT using such a large table for reporting purposes which means you don't need so many non-clustered indexed (use "Divide'n'Conquer methods to quickly isolate the current report data in a Temp Table).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • richard garth (11/5/2011)


    Is it always the best thing to drop and re-create indexes in this situation?

    No, not always. The main advantages of dropping and re-creating indexes are:

    (1) It can often be faster, especially if parallelism is used, and the index rebuild is minimally logged

    (2) Rebuilt nonclustered indexes will likely be denser and more contiguous

    It is perfectly possible that a query plan that maintains nonclustered indexes at the same time as modifying rows in the table will execute faster, but even with a wide update plan the physical index structure is likely to be less optimally stored than if the index were rebuilt.

    richard garth (11/5/2011)


    Is it necessary to redo the statistics on the tables too?

    If the changes significantly affect the distribution of values in the table, yes - you cannot necessarily rely on auto-update to take care of this for you.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply