Index Defragmentation Alternatives

  • Databases become fragmented - dba's counter it by defragmenting.

    Is there a good substitute/alternative to this process. Is there technology/processes which can auto defragment ?

    • This topic was modified 4 years, 3 months ago by  mtz676.
    • This topic was modified 4 years, 3 months ago by  mtz676.
  • mtz676 wrote:

    Databases become fragmented - dba's counter it by defragmenting.

    Is there a good substitute/alternative to this process. Is there technology which can auto defragment a disk?

    Yes, there are products that will automatically defragment a "DISK"... just not indexes on your tables and the ones that come close generally have the same issues as any manual process because they generally default to the current "Best Practices" for index maintenance, which are actually very INCORRECT.  If you read the MS documentation on the subject, it explicitly states that the 5/30 rule (previously 10/30 rule) was meant only as a suggested starting point and explicitly states that "experimentation" to find out what's best for each index must be accomplished.  Of course, 99.99999% of people ignore that because they don't have the knowledge or the time to do such a thing even if they had the knowledge.  Nor do they understand where all the blocking on their systems is coming from on the "morning after" index maintenance (if they're even aware that such blocking is occurring and most are not).

    The "best" way to keep from having to defragment indexes is to set them up to not fragment in the first place.  And, to be sure, that's not possible on ALL indexes... But it is possible on a lot of the larger indexes (which are usually Clustered Indexes).

    One of the leading culprits that causes fragmentation of other-wise fragment free indexes (such as indexes that have an ever-increasing key) is when people have variable width columns that aren't populated during the insert and, instead, are populated sometime shortly after the insert.

    For example... a lot of people have a "poor man's audit" column that usually follows a naming convention of something like "LastModifiedBy".  It's normally a VARCHAR() or NVARCHAR() and it normally starts off as a NULL during the initial insert.

    Except for one very rare instance, inserts don't follow the Fill Factor.  Instead, inserts try to fill pages to their maximum for whatever row size they have (usually, very close to 100% full).  A lot of people have ever-increasing indexes (due to a clustered index on an IDENTITY or date/time column) and so the logical end of the index is where all the inserts put their data and that also means that, regardless of Fill Factor, that data fills those most recent pages to 100% and it usually does so with zero fragmentation).

    Then, along comes some process that processes and changes the content of those 100% full pages.  A typical part of such a process is to mark the "LastModifiedDate" and "LastModifiedBy" columns.  Changing the LastModifiedDate from NULL to some value is no big deal because, if it is a temporal data type, it's a fixed width column and the change from NULL to some value does NOT increase the size of any given row.

    Not true, though, with that bloody LastModifiedBy column.  Since it's normally a VARCHAR()/NVARCHAR(), it's a variable width column and changing such a column from NULL to a non-null value (even an empty string) will cause the given row size to expand.  Since all of those recent pages are 100% full because of the nature of inserts, all such pages that are updated WILL cause page splits and the ensuing fragmentation that results.

    There are a ton of different ways to prevent that fragmentation from happening, including making the LastModifiedBy column an INT and having the update-code populate it with an ID associated with a name table and, if you can make it so the variable width columns are not "ExpAnsive" in nature, your ever-increasing indexes will be mostly "Defragmented by Default".  Since those are normally the clustered indexes, you can seriously decrease the amount of time you spend defragmenting your indexes.

    Non-clustered indexes are a bit of a pain because they are normally NOT in the same order as the inserts occur.  But, they're usually also a whole lot smaller than the clustered indexes and so don't create such large problems for index maintenance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply