Slow process

  • I have a table of 82439877 records, i want to create a clustered index and modify few existing indexes on this table. When i try to do so it takes a very long time of around 15-18 hours and then i had to stop. Any suggesstions for making this fast?

  • When you create a new clustered index, you are rebuilding the table.  The entire table is copied to build the clustered index, and on 82M records this will take some time.

    You can use the 'sort in tempdb' option to speed it up a bit, but it will still take hours.

    You can try exporting your data, sorting it, then loading into a new table.  This will still take many hours, but can be done alongside normal use of your existing table.  However, you will have to apply any IUD activity to your live table to your new table before swapping the table names.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • One option, with a backup in place, is to use a reorg scenario.  Unload the data to a flat file.  Sort the data Descending Key sequence, build just the clustered index and then reload the data without logging.  After the data is loaded, recreate other indexes. 

  • Here's all the things that need to be true to do non-logged bulk inserts from BOL...

    • The recovery model is simple or bulk-logged.
    • The target table is not being replicated.
    • The target table does not have any triggers.
    • The target table has either 0 rows or no indexes.
    • The TABLOCK hint is specified. For more information, see Controlling the Locking Behavior.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Depending on your HD capacity, you might create a new table with the same column sctructure and with the proper indexes you want. And execute

    Insert into [newtable] select * from [oldtable]

    What's more if you have an Unique identifier, you may execute this statement in groups:

    Insert into [newtable] select * from [oldtable] where [UI] between 1 and 1000000

    So you have control over it, the problem is the HD capacity (you might need 3 times the space the table ocupies in the HD), but it should take almost as a [select * from] statement would take.

    Later you execute

    drop table [oldtable]

    and finally execute

    sp_rename '[newtable]', '[oldtable]'

    So what do you think ?

    best regards

    Jorge Escobar

  • I would use integration services...


    Cheers,

    Ben Sullins
    bensullins.com
    Beer is my primary key...

  • Very valuable suggestion.

    The only problem is that statement

    Insert into [newtable] select * from [oldtable]

    will lock [oldtable] for the entire insertion period.

    If you apply (NOLOCK) or do it in groups you need to take care of lines been updated during that long running insert.

    I would create timestamp column on [oldtable] and copy its values into corresponding varbinary(8) column in [newtable].

    Then I'd perform (NOLOCK) copy.

    After that I'd start a loop for:

    - UPDATE for rows where oldtable.timestamp newtable.timestampcopy (updated since copying started),

    - delete from new table lines have been deleted from oldtable and

    - insert lines have been inserted.

    After each statement I'd collect number of affected lines using @@ROWCOUNT.

    If total number is > 0 repeat the loop.

    Else - begin transaction, drop oldtable, rename newtable, drop timestampcopy column, commit.

    P.S. It should not take 18 hours or more. I'm more than sure when you do reindexing you are getting into deadlock situation. That's why it cannot not complete the task.

    _____________
    Code for TallyGenerator

  • Aprat from other suggestions make sure you have not got AutoShrink on - I have seen this get locked up when doing massive operations.

    Remember that data is stored in the order of the clustered index so adding one means moving all the actual data to the right position in the index btree.

    One of the fastest was I have come across to do this is BCP out the data in the order of the proposed clustered index and load it back in with the BCP flag indicating data is in order (can't recall BCP flags off hand)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply