rebuild on clustered index

  • Sorry for the confusion. It doesn't release space to OS, I mean to say free space in the DB increases, when we rebuild PK. Probably the best way to handle this issue is to introduce partitioning on the table.

  • muthyala_51 (11/9/2012)


    Sorry for the confusion. It doesn't release space to OS, I mean to say free space in the DB increases, when we rebuild PK. Probably the best way to handle this issue is to introduce partitioning on the table.

    is the PK a clustered index? and what is the PK on the table (guid, identity, natural key)? the next question is what is the fill factor set to when you rebuild the index?

    To those more experienced:

    Am i barking up the wrong tree that there may be allot of page splitting if the fill factor is set to high, leading to a large number of half empty pages? then when the Clustered Index is rebuilt the data is moved around on the data pages freeing up quite a bit of space?

    EDIT: After posting i realized that is the exact definition of index fragmentation.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • What I think may be occurring:

    The page splits caused by inserts can create less than full pages that are now out of physical order (external index fragmentation) and the deletes can cause previously full pages to become almost empty (low page density, aka internal index fragmentation). Rebuilding an index can cause the amount of pages required to hold the data in the index to significantly drop when the original fill factor is again re-established and all those less-than-full pages are now close to full again, thereby requiring fewer pages and freeing more space within the data file.

    Edit:

    What is the fill factor? Consider lowering it in your next rebuild and se how it affect fragmentation the following night.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • fill factor is set to 100. It's the clustered index on primary key and it's an identity column. So when new data is added it will added at the end causing more number of pages and deletes causing almost empty pages as you said.

  • muthyala_51 (11/10/2012)


    fill factor is set to 100. It's the clustered index on primary key and it's an identity column. So when new data is added it will added at the end causing more number of pages and deletes causing almost empty pages as you said.

    In your case, since the leading edge of your clustered index is an IDENTITY column, new pages created by inserts will not be the cause of mid-page page-splits (source of external fragmentation) so you should be getting as full of a page as possible given the actual size of the inserted rows. The deletions however will empty existing pages and over time your index will begin to appear as if it has sprawled out on disk, requiring far more pages than would be if you did a rebuild, which you are seeing.

    What about updates? That would be another source of external fragmentation. Are any updates issued against the table? If not, then lowering the fill factor of the clustered index likely will not help you at all, and may even hurt you since initially after a rebuild your index would most likely be larger than it would have been if you left it at 100.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • No updates on this table.

  • I think you're doing everything right with how you're describing the workload and how things are setup. Depending on how well-defined the subset of rows that accepts deletes is, you might be able to isolate the non-changing subset (if one exists), the subset that accepts deletes, and the new edge accepting inserts using table partitioning and only rebuild the index for the partitions that change.

    edit: spelling

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • This doeesn't make any sense. Correct me if I'm wrong, please. The op says that the amount of space used in the database drops by 1/5th when they rebuild the CI yet the CI is on an IDENTITY column. They also say that the CI is rebuilt at "{sic} > 4 fragmentation". If the CI is truly on the IDENTITY column, I'm thinking there should be zero fragmentation (even with a fill factore of 100) especially since there are supposedly no updates to the table.

    I can't even imagine stats going haywire on such a thing.

    Something's not quite right here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Everyday we delete millions of rows, so this will cause some of the pages to be empty and new rows added to the new pages so that's the reason for increase in size of the table.

  • Jeff Moden (11/11/2012)


    This doeesn't make any sense. Correct me if I'm wrong, please. The op says that the amount of space used in the database drops by 1/5th when they rebuild the CI yet the CI is on an IDENTITY column. They also say that the CI is rebuilt at "{sic} > 4 fragmentation". If the CI is truly on the IDENTITY column, I'm thinking there should be zero fragmentation (even with a fill factore of 100) especially since there are supposedly no updates to the table.

    I can't even imagine stats going haywire on such a thing.

    Something's not quite right here.

    Mid-page page-splits are not in play here because of how the clustered index is defined, plus the fact that there are no updates issued against the table. There are only inserts into the last page of the clustered (per the IDENTITY) which will not cause lasting fragmentation, and deletes which only cause internal fragmentation (i.e. low page density). Throughout the course of a day, after many rows in a large swath of the indexes pages are deleted, the pages that still have rows on them will still occupy 8KB each in the index but will be less than 100% full. When the index is rebuilt that night, the space from those less-than-full pages is mostly reclaimed as rows are moved onto new pages that are filled to 100% per the fill factor, allowing the index to occupy far less space on disk than before the rebuild.

    This is a fantastic example of the type of workload that causes internal fragmentation in a production scenario and how it might affect performance by causing the index to occupy far more space than it technically needs to represent the data.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (11/12/2012)


    Jeff Moden (11/11/2012)


    This doeesn't make any sense. Correct me if I'm wrong, please. The op says that the amount of space used in the database drops by 1/5th when they rebuild the CI yet the CI is on an IDENTITY column. They also say that the CI is rebuilt at "{sic} > 4 fragmentation". If the CI is truly on the IDENTITY column, I'm thinking there should be zero fragmentation (even with a fill factore of 100) especially since there are supposedly no updates to the table.

    I can't even imagine stats going haywire on such a thing.

    Something's not quite right here.

    Mid-page page-splits are not in play here because of how the clustered index is defined, plus the fact that there are no updates issued against the table. There are only inserts into the last page of the clustered (per the IDENTITY) which will not cause lasting fragmentation, and deletes which only cause internal fragmentation (i.e. low page density). Throughout the course of a day, after many rows in a large swath of the indexes pages are deleted, the pages that still have rows on them will still occupy 8KB each in the index but will be less than 100% full. When the index is rebuilt that night, the space from those less-than-full pages is mostly reclaimed as rows are moved onto new pages that are filled to 100% per the fill factor, allowing the index to occupy far less space on disk than before the rebuild.

    This is a fantastic example of the type of workload that causes internal fragmentation in a production scenario and how it might affect performance by causing the index to occupy far more space than it technically needs to represent the data.

    Yep... that's mostly what I said but the OP didn't say there were deletes involved. The CI on this table should not be suffering from any fragmentation whatsoever.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/12/2012)


    opc.three (11/12/2012)


    Jeff Moden (11/11/2012)


    This doeesn't make any sense. Correct me if I'm wrong, please. The op says that the amount of space used in the database drops by 1/5th when they rebuild the CI yet the CI is on an IDENTITY column. They also say that the CI is rebuilt at "{sic} > 4 fragmentation". If the CI is truly on the IDENTITY column, I'm thinking there should be zero fragmentation (even with a fill factore of 100) especially since there are supposedly no updates to the table.

    I can't even imagine stats going haywire on such a thing.

    Something's not quite right here.

    Mid-page page-splits are not in play here because of how the clustered index is defined, plus the fact that there are no updates issued against the table. There are only inserts into the last page of the clustered (per the IDENTITY) which will not cause lasting fragmentation, and deletes which only cause internal fragmentation (i.e. low page density). Throughout the course of a day, after many rows in a large swath of the indexes pages are deleted, the pages that still have rows on them will still occupy 8KB each in the index but will be less than 100% full. When the index is rebuilt that night, the space from those less-than-full pages is mostly reclaimed as rows are moved onto new pages that are filled to 100% per the fill factor, allowing the index to occupy far less space on disk than before the rebuild.

    This is a fantastic example of the type of workload that causes internal fragmentation in a production scenario and how it might affect performance by causing the index to occupy far more space than it technically needs to represent the data.

    Yep... that's mostly what I said but the OP didn't say there were deletes involved. The CI on this table should not be suffering from any fragmentation whatsoever.

    I am not sure why you're saying there will be 0% fragmentation...you can achieve fragmentation solely from deleting rows. Proof:

    USE tempdb

    GO

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'dbo.wide_column')

    AND type IN (N'U') )

    DROP TABLE dbo.wide_column;

    GO

    -- table that will fit only two rows on a page

    CREATE TABLE dbo.wide_column

    (

    wide_column_id INT IDENTITY(1, 1)

    CONSTRAINT [pk_dbo.wide_column] PRIMARY KEY,

    some_data CHAR(3000),

    some_date DATETIME

    );

    -- insert 600 rows to our table

    INSERT INTO dbo.wide_column

    (

    some_data,

    some_date

    )

    SELECT TOP 600

    'blah',

    GETDATE()

    FROM master.sys.columns t1

    CROSS JOIN master.sys.columns t2;

    -- frag ~= 0, as expected

    SELECT *

    FROM sys.dm_db_index_physical_stats(2, OBJECT_ID(N'dbo.wide_column'), 1, NULL, 'detailed');

    -- only delete one row from each page

    DELETE FROM dbo.wide_column

    WHERE wide_column_id % 2 = 1;

    -- pages still = 300, frag ~= 0, as expected

    SELECT *

    FROM sys.dm_db_index_physical_stats(2, OBJECT_ID(N'dbo.wide_column'), 1, NULL, 'detailed');

    -- now do a selective delete to empty some pages

    DELETE FROM dbo.wide_column

    WHERE wide_column_id % 8 = 0

    AND (

    wide_column_id BETWEEN 100 AND 200

    OR wide_column_id BETWEEN 400 AND 500

    );

    -- wait for the ghost-cleanup process to deallocate our empty pages

    WAITFOR DELAY '00:00:20'

    -- page count = 274, and ~10% fragmentation

    SELECT *

    FROM sys.dm_db_index_physical_stats(2, OBJECT_ID(N'dbo.wide_column'), 1, NULL, 'detailed');

    EDIT: ~0

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply