Index Rebuild - Job running over 16 hours.....

  • So I have a large heap table. I has over 40 million rows, 250 GB 7.2 GB of index space.

    Data is inserted daily - all day every day. The oldest data is rolled off nightly. The non-clustered indexes become fragmented though the week and I rebuild the indexes on the weekend. Right now (Friday) 3 of the indexes have > 40% fragmentation. I'm running SQL 2008 Std., Dual Core, 8 GB Memory. The issue is that the Index Rebuild job is taking over 16 hours. I thought that stopping the process that inserts to the table would speed up the index rebuild but I tried that and it still takes 16+ hours. Any advice to speed up the rebuild?

    Any Help would be greatly appreciated!

    Details below.

    SHOWCONTIG output;

    DBCC SHOWCONTIG scanning 'Journal' table...

    Table: 'Journal' (149575571); index ID: 0, database ID: 7

    TABLE level scan performed.

    - Pages Scanned................................: 16744610

    - Extents Scanned..............................: 2093574

    - Extent Switches..............................: 2093573

    - Avg. Pages per Extent........................: 8.0

    - Scan Density [Best Count:Actual Count].......: 99.98% [2093077:2093574]

    - Extent Scan Fragmentation ...................: 42.53%

    - Avg. Bytes Free per Page.....................: 3276.2

    - Avg. Page Density (full).....................: 59.52%

    Table and indexes are below;

    CREATE TABLE [dbo].[Journal](

    [uuid] [varchar](50) NOT NULL,

    [message_id] [varchar](50) NOT NULL,

    [message_version] [varchar](20) NOT NULL,

    [message_type] [varchar](20) NOT NULL,

    [direction] [varchar](3) NOT NULL,

    [processing_code] [char](1) NOT NULL,

    [event_time] [datetime] NOT NULL,

    [sending_app] [varchar](45) NULL,

    [sending_facility] [varchar](45) NULL,

    [receiving_app] [varchar](45) NULL,

    [receiving_facility] [varchar](45) NULL,

    [person_id] [varchar](45) NULL,

    [person_last_name] [varchar](45) NULL,

    [person_first_name] [varchar](45) NULL,

    [person_dob] [varchar](45) NULL,

    [j_path] [varchar](255) NOT NULL,

    [j_service_name] [varchar](45) NOT NULL,

    [ack] [nvarchar](max) NULL,

    [message] [nvarchar](max) NOT NULL

    ) ON [PRIMARY]

    With the following indexes;

    CREATE NONCLUSTERED INDEX [IX_j_service_name] ON [dbo].[Journal]

    (

    [j_service_name] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [IX_Journal] ON [dbo].[Journal]

    (

    [uuid] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [IX_Journal_EventTime] ON [dbo].[Journal]

    (

    [event_time] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [IX_Journal_Lastname] ON [dbo].[Journal]

    (

    [person_last_name] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [IX_Journal_Person_Id] ON [dbo].[Journal]

    (

    [person_id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    GO

    Thank you.

  • This was removed by the editor as SPAM

  • I would have to do some testing, but I wonder if perhaps a clustered index would make these operations more efficient. Hmmm. Interesting question.

    Here is what I am thinking, but I'll need to test it:

    1. Drop all of the indexes

    2. Add an artificial identity column

    3. Make the artificial identity column the primary key

    4. Create a clustered index on that column

    5. Recreate the non-clustered indexes

    Obviously, this would take a long time to do. So, you wouldn't want to try it without testing it first, but I think it might help.

    Please give us some details about the hardware on which this database is running. There might be some other things you could do as well.

  • Stewart "Arturius" Campbell (10/28/2011)


    If the fragmentation is bad weekly, have you tried:

    > reorganising on a daily basis;

    > drop and create regularly?

    also, do you have a clustered index on your table? If so, what is the fragmentation thereon?

    I am guessing, but they probably went to weekly rebuilds on the indexes because they had problems with the daily index reorganizations.

    Dropping and recreating the indexes would be faster. (That is a good point! :-)) However, the indexes would be offline during that operation, and this table is heavily used.

    That brings up a good question, though. May we see the index maintenance scripts?

  • This was removed by the editor as SPAM

  • Thanks for the replies. There isn't a natural primary key on the table - no column is really unique now. To add one I would have to create an identity column (auto increment).

    I think I may try to drop and recreate the indexes. Seems like a strange solution but my guess is that it will be faster. I guess I'm not really sure why the rebuild is taking so long....

  • Alright, since no one has mentioned it I'm going to throw it out there for consideration.

    Index rebuilds in full recovery are fully logged. Because of this, the index may be taking a long time because of transaction log issues. It might be running out of disk space (or reaching its preset limit) or might be waiting for another transaction to finish (blocking).

    Might try switching to bulk-logged before the index maintenance and switching back afterwards. Doing this shouldn't break your log chain and won't fully log the rebuild operation.

    Another thing to try would be to turn SORT_IN_TEMPDB = ON, offloading some work to the tempdb. But make sure your tempdb can handle it with appropriate file sizes and disk setup.

    good luck!

    ______________________________________________________________________________________________
    Forum posting etiquette.[/url] Get your answers faster.

  • Index rebuilds in full recovery are fully logged. Because of this, the index may be taking a long time because of transaction log issues. It might be running out of disk space (or reaching its preset limit) or might be waiting for another transaction to finish (blocking).

    Might try switching to bulk-logged before the index maintenance and switching back afterwards. Doing this shouldn't break your log chain and won't fully log the rebuild operation.

    Another thing to try would be to turn SORT_IN_TEMPDB = ON, offloading some work to the tempdb. But make sure your tempdb can handle it with appropriate file sizes and disk setup.

    That is a good point: Index rebuilds are logged. Dropping an index and then recreating it will also bypass the transaction log, however. Switching your database from one recovery mode to another has a lot of different effects. For one thing, it can break your backup chain (assuming you are using transaction log backups). You should figure out what recovery mode you need for your database and just leave it there.

    We really do need to know about your hardware before we can make other recommendations.

  • David Moutray (10/28/2011)


    Index rebuilds in full recovery are fully logged. Because of this, the index may be taking a long time because of transaction log issues. It might be running out of disk space (or reaching its preset limit) or might be waiting for another transaction to finish (blocking).

    Might try switching to bulk-logged before the index maintenance and switching back afterwards. Doing this shouldn't break your log chain and won't fully log the rebuild operation.

    Another thing to try would be to turn SORT_IN_TEMPDB = ON, offloading some work to the tempdb. But make sure your tempdb can handle it with appropriate file sizes and disk setup.

    That is a good point: Index rebuilds are logged. Dropping an index and then recreating it will also bypass the transaction log, however. Switching your database from one recovery mode to another has a lot of different effects. For one thing, it can break your backup chain (assuming you are using transaction log backups). You should figure out what recovery mode you need for your database and just leave it there.

    We really do need to know about your hardware before we can make other recommendations.

    Dropping and recreating the indexes does not bypass the transaction log.

    Switching from Full Recovery model to Bulk_Logged recovery model will not break the log chain.

    If you do switch to bulk_logged before the index maintenance and back to full recovery after, be sure to run a t-log backup prior to the index maintenance and another after. You will not be able to do a point in time restore using the t-log after the index maintenance but your log chain will remain intact.

  • This was removed by the editor as SPAM

  • jackimo (10/28/2011)


    Thanks for the replies. There isn't a natural primary key on the table - no column is really unique now. To add one I would have to create an identity column (auto increment).

    Clustered Index != Primary Key.

    A Clustered Index can be non-unique, although it does actually create a hidden column to create uniqueness internally. If the majority of your queries filter on eventtime ranges and your archiving process removes rows based on event time, this may well be a worthwhile design change.

  • jackimo (10/28/2011)


    The oldest data is rolled off nightly.

    These are actually a fairly easy problems (rolloff of old data and index maintenance) to solve but I need to know something first. How do you determine what the "oldest data" actually is and which columns are you using to do that? I can guess that you do it based on the "EventTime" column but guess waste time. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes the oldest data is rolled off using the column EventTime. Thanks!

  • Thanks, Jackimo. My apologies because I should have asked this other question at the same time because it will make a difference in the solution...

    Are you running Standard Edition or Enterprise Edition?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Alright... while we're waiting for that answer, let me just come right out and tell you what I'm thinking...

    The word is "partitioning" and you have the perfect candidate "key" for such a think in the form of the EntryTime column for either of two methods.

    If you're using the Standard Edition, check Books Online for how to make a partitioned [font="Arial Black"]view [/font]work.

    If you're using the Enterprise Edition, check Books Online for how to make a patitioned [font="Arial Black"]table [/font]work.

    Both have roughly the same affect... they both split up the data into separate pieces (actual separate tables for the partitioned views) and allow you to reindex each part separately. The fun part about that is if you have a frag checker in your index process, you find that only the latest 1 or 2 parts needs to be reindexed.

    On the subject of reindexing, for this table, you should have a clustered, non-unique index on the EntryTime column... at all times.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply