REBUILD for heaps

  • opc.three (9/26/2011)


    Sorry Grant and Gail if I am being dense, but I am still not sure I got an answer for my original question. Maybe I am incorrect, but it seems both of you originally answered the question as to whether the indexes would be updated with the new RIDs as the heap was rebuilt and pages were physically moved. I was never in doubt as to whether the non-clustered indexes would be updated with the new RID locations, rather I was in doubt as to the efficiency of simply calling ALTER TABLE...REBUILD versus whether I should take control of the index rebuilds that would certainly need to occur by disabling them first and then rebuilding them after the heap was rebuilt. I started creating a test-bed for this problem but found that creating fragmented heaps from scratch was more difficult than it initially seemed like it would be. Am I chasing minor bits here?

    There isn't anything to take control of. When the heap gets rebuild, the non-clustered indexes will get rebuilt too. You don't then need to rebuild them separately because they've been done for you, automatically.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • opc.three (9/26/2011)


    @MM Something occurred to me, you may still have heavy fragmentation from the continuous addition of new rows and the deleting of existing rows over time. Here is what I use for finding heap fragmentation:

    ...

    Note this will check all DBs on the instance. I use DETAILED because LIMITED does not show you the forwarded_record_count. It takes longer but it's relevant for my system, it may not be for yours.

    Thanks for that. I tried it and found that I did indeed have high fragmentation, but zero forward entries as expected.

    I thought I would try rebuilding using the same method as you - of adding an identity and a unique clustered index on that identity, then removing it....my fragmentation increased!

    As far as your question about the most efficient way to do this, my testing shows that adding the clustered index also rebuilds the non-clustered indexes that you have previously disabled, then removing the clustered index rebuilds them again... so it seems to be right that you do not have the control you hoped for that way.

    I have tried dropping the non-clustered indexes first then re-creating them after the rebuild and that seems to most efficient on sql 2005 as it only rebuilds them once that way.

    On SQL 2008, I only have a smaller database to test, but testing there showed that ALTER TABLE...REBUILD in my case did an efficient rebuild of all indexes once only. Disabling the indexes first makes no difference as they are rebuilt by the ALTER TABLE ... REBUILD command anyway as Grant stated also below.

    Edit: mention Grant's previous reply.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (9/27/2011)


    opc.three (9/26/2011)


    @MM Something occurred to me, you may still have heavy fragmentation from the continuous addition of new rows and the deleting of existing rows over time. Here is what I use for finding heap fragmentation:

    ...

    Note this will check all DBs on the instance. I use DETAILED because LIMITED does not show you the forwarded_record_count. It takes longer but it's relevant for my system, it may not be for yours.

    Thanks for that. I tried it and found that I did indeed have high fragmentation, but zero forward entries as expected.

    Good deal, HTH.

    I thought I would try rebuilding using the same method as you - of adding an identity and a unique clustered index on that identity, then removing it....my fragmentation increased!

    In my system fragmentation drops, but not to the levels I would like. Some heaps remain above 30% fragmentation, but that was down from 99+%. I chalked it up to the evilness of heaps, but it may just be related to my specific situation. I am adding a unique clustered index to an existing identity column defined as an INT, i.e. I am not actually adding a new identity column. Did you use an INT for your identity column? Try it with BIGINT. I have a theory that since a RID is 8 bytes that using a BIGINT will be better for defragmenting.

    As far as your question about the most efficient way to do this, my testing shows that adding the clustered index also rebuilds the non-clustered indexes that you have previously disabled, then removing the clustered index rebuilds them again... so it seems to be right that you do not have the control you hoped for that way.

    I have tried dropping the non-clustered indexes first then re-creating them after the rebuild and that seems to most efficient on sql 2005 as it only rebuilds them once that way.

    Disabled indexes are not maintained...unless BOL is wrong which is something Gail has compelled me to consider given all the times she has cited incorrect docs 😉 That is the whole idea of disabling the indexes, they are not supposed to take any updates. What do you see that makes you think the disabled indexes are being updated when the unique clustered index is added and removed? After I disable an index it no longer shows up in sys.dm_db_index_physical_stats and sp_spaceused shows the index space has been freed.

    Docs:

    Disabling Indexes

    From Guidelines for Disabling Indexes and Constraints: "The index is not maintained while it is disabled."

    On SQL 2008, I only have a smaller database to test, but testing there showed that ALTER TABLE...REBUILD in my case did an efficient rebuild of all indexes once only. Disabling the indexes first makes no difference as they are rebuilt by the ALTER TABLE ... REBUILD command anyway as Grant stated also below.

    Edit: mention Grant's previous reply.

    It is good to know that ALTER TABLE...REBUILD did an efficient job reducing fragmentation. I know the end result is that the indexes are rebuilt, but I am frustrated with myself that I cannot get my point across.

    Thanks (everyone) for hanging in there with me on this thread. Consider this, if I rebuild a heap and a bunch of RIDs change in the process then a bunch of non-clustered indexes must be updated to hold these new RIDs where they once held the old RIDs. Now I go and rebuild those non-clustered indexes, moving the pages around to defragment that index. I have just done two things:

    1) updated a fragmented non-clustered index so pages that contained old RIDs now contain new RIDs, and

    2) rebuilt fragmented non-clustered indexes to remove fragmentation.

    Is this how ALTER TABLE...REBUILD works internally?

    Now consider, if I were to disable the non-clustered indexes before issuing ALTER TABLE...REBUILD on the heap and then rebuilt the disabled indexes could I save the work from item 1) above?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (9/27/2011)


    Disabled indexes are not maintained...unless BOL is wrong which is something Gail has compelled me to consider given all the times she has cited incorrect docs 😉 That is the whole idea of disabling the indexes, they are not supposed to take any updates. What do you see that makes you think the disabled indexes are being updated when the unique clustered index is added and removed? After I disable an index it no longer shows up in sys.dm_db_index_physical_stats and sp_spaceused shows the index space has been freed.

    Disabled indexes are not maintained during DML (disabling removes the b-tree), but removing or adding a clustered index rebuilds all nonclustered indexes, including disabled ones. Remember that you enable a disabled index by rebuilding it. So when the cluster is removed or added, SQL rebuilds all nonclustered indexes essentially recreating the disabled ones.

    Now consider, if I were to disable the non-clustered indexes before issuing ALTER TABLE...REBUILD on the heap and then rebuilt the disabled indexes could I save the work from item 1) above?

    The Alter Table rebuild would (afaik) rebuild those disabled indexes for you. In essence gaining you nothing.

    http://sqlskills.com/BLOGS/KIMBERLY/post/Trusting-the-tools!.aspx

    Some changes to table structure (changing from a heap to a clustered or a clustered to a heap) cause disabled nonclustered to get rebuilt (and re-enabled). That's not even SSMS - that's a general SQL Server problem. In my opinion this means that disabling indexes is somewhat broken. So... gotta use drop/create (which is a bummer!).

    Not 100% sure if the alter table rebuild does the same, so test.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • opc.three (9/27/2011)


    Did you use an INT for your identity column? Try it with BIGINT. I have a theory that since a RID is 8 bytes that using a BIGINT will be better for defragmenting.

    I did use an INT - I can try a BIGINT.

    Edit: Tried BIGINT, no better - still got increased fragmentation...never mind, this is your thread, not mine!

    Disabled indexes are not maintained...unless BOL is wrong which is something Gail has compelled me to consider given all the times she has cited incorrect docs 😉 That is the whole idea of disabling the indexes, they are not supposed to take any updates. What do you see that makes you think the disabled indexes are being updated when the unique clustered index is added and removed? After I disable an index it no longer shows up in sys.dm_db_index_physical_stats and sp_spaceused shows the index space has been freed.

    Ok, the docs may be wrong or misleading... After adding a clustered index, you see table scans relating to the index rebuild and the indexes all end up re-enabled. The same for the ALTER TABLE REBUILD (ATR) - the indexes are enabled again after the operation and you can see the table scans in the statistics and, of course, the execution plan shows it all...

    Thanks (everyone) for hanging in there with me on this thread. Consider this, if I rebuild a heap and a bunch of RIDs change in the process then a bunch of non-clustered indexes must be updated to hold these new RIDs where they once held the old RIDs. Now I go and rebuild those non-clustered indexes, moving the pages around to defragment that index. I have just done two things:

    1) updated a fragmented non-clustered index so pages that contained old RIDs now contain new RIDs, and

    2) rebuilt fragmented non-clustered indexes to remove fragmentation.

    Is this how ALTER TABLE...REBUILD works internally?

    Circumstantial evidence (the plan) shows that the answer is no. The heap is rebuilt, then the indexes are rebuilt.

    Now consider, if I were to disable the non-clustered indexes before issuing ALTER TABLE...REBUILD on the heap and then rebuilt the disabled indexes could I save the work from item 1) above?

    No, because they are rebuilt anyway.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • GilaMonster (9/27/2011)


    opc.three (9/27/2011)


    Disabled indexes are not maintained...unless BOL is wrong which is something Gail has compelled me to consider given all the times she has cited incorrect docs 😉 That is the whole idea of disabling the indexes, they are not supposed to take any updates. What do you see that makes you think the disabled indexes are being updated when the unique clustered index is added and removed? After I disable an index it no longer shows up in sys.dm_db_index_physical_stats and sp_spaceused shows the index space has been freed.

    Disabled indexes are not maintained during DML (disabling removes the b-tree), but removing or adding a clustered index rebuilds all nonclustered indexes, including disabled ones. Remember that you enable a disabled index by rebuilding it. So when the cluster is removed or added, SQL rebuilds all nonclustered indexes essentially recreating the disabled ones.

    The bolded statement above is not an intuitive outcome to me. Even after reading the documentation carefully it was not clear that we would see this behavior. Plus I was fooled by the results of sys.dm_db_index_physical_stats and sp_spaceused after disabling the indexes. In 2005 I have reverted back to the practice of dropping and recreating all non-clustered indexes when rebuilding heaps. By using DISABLE I was hoping to get the ancillary benefit of avoiding having to deal with the index DDL, oh well.

    And ALTER TABLE...REBUILD is the way to go when on 2008.

    Now consider, if I were to disable the non-clustered indexes before issuing ALTER TABLE...REBUILD on the heap and then rebuilt the disabled indexes could I save the work from item 1) above?

    The Alter Table rebuild would (afaik) rebuild those disabled indexes for you. In essence gaining you nothing.

    Sorry if you (and everyone) were saying this all along. It is clear now what is happening.

    http://sqlskills.com/BLOGS/KIMBERLY/post/Trusting-the-tools!.aspx

    Some changes to table structure (changing from a heap to a clustered or a clustered to a heap) cause disabled nonclustered to get rebuilt (and re-enabled). That's not even SSMS - that's a general SQL Server problem. In my opinion this means that disabling indexes is somewhat broken. So... gotta use drop/create (which is a bummer!).

    I agree with Kim, "somewhat broken" is a fair assessment...booooo 😛

    Not 100% sure if the alter table rebuild does the same, so test.

    Thank you as always for assisting Gail! Thanks Grant and MM!

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • mister.magoo (9/27/2011)


    opc.three (9/27/2011)


    Did you use an INT for your identity column? Try it with BIGINT. I have a theory that since a RID is 8 bytes that using a BIGINT will be better for defragmenting.

    I did use an INT - I can try a BIGINT.

    Edit: Tried BIGINT, no better - still got increased fragmentation...never mind, this is your thread, not mine!

    Yet another reason not to use heaps...it appears as if you cannot defragment them effectively in some scenarios, at least not on 2005. Sad, but given your scenario and SQL version you may be better of leaving the heaps alone.

    I have exhausted my knowledge on trying to assist you with your issue. Maybe a new thread would not be a bad idea.

    Thanks for your time and effort responding to my issue!

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (9/27/2011)


    mister.magoo (9/27/2011)


    opc.three (9/27/2011)


    Did you use an INT for your identity column? Try it with BIGINT. I have a theory that since a RID is 8 bytes that using a BIGINT will be better for defragmenting.

    I did use an INT - I can try a BIGINT.

    Edit: Tried BIGINT, no better - still got increased fragmentation...never mind, this is your thread, not mine!

    Yet another reason not to use heaps...it appears as if you cannot defragment them effectively in some scenarios, at least not on 2005.

    Bear in mind that fragmentation on a heap is different from fragmentation of an index. You can't directly compare them. Apples and watermelons again.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/27/2011)


    opc.three (9/27/2011)


    mister.magoo (9/27/2011)


    opc.three (9/27/2011)


    Did you use an INT for your identity column? Try it with BIGINT. I have a theory that since a RID is 8 bytes that using a BIGINT will be better for defragmenting.

    I did use an INT - I can try a BIGINT.

    Edit: Tried BIGINT, no better - still got increased fragmentation...never mind, this is your thread, not mine!

    Yet another reason not to use heaps...it appears as if you cannot defragment them effectively in some scenarios, at least not on 2005.

    Bear in mind that fragmentation on a heap is different from fragmentation of an index. You can't directly compare them. Apples and watermelons again.

    Oh no...are you referring to the fragmentation % from the physical index stats check, or the effect fragmentation has on heap performance, or what specifically? And in your example is the heap the apple or the watermelon?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Both.

    I'm not saying one is the apple, I'm just saying they're not the same thing and so can't be directly compared.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/27/2011)


    Both.

    From my experience with this client, and admittedly it is my first major dealing with heaps, I see that a high percentage affects performance adversely. As to how high is bad enough to warrant a defrag...that's open for debate. With what little guidance there is out there on this topic (other than to add a clustered index) I started with the same percentage I use for indexes, 30% fragmentation warrants a rebuild. That said, after a rebuild I rarely get large heaps below 20%, and sometimes I start at 90+% and end up with ~70%.

    I'm not saying one is the apple, I'm just saying they're not the same thing and so can't be directly compared.

    My comment was half-joking and half-kidding 😀

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (9/27/2011)


    GilaMonster (9/27/2011)


    Both.

    From my experience with this client, and admittedly it is my first major dealing with heaps, I see that a high percentage affects performance adversely.

    It probably will, but it won't be as bad as logical fragmentation of indexes (which is a measure of what percentage of pages are out of order). Heaps don't have a logical order, so the only fragmentation they can have is extent fragmentation, defined as:

    Percentage of out-of-order extents in scanning the leaf pages of an index. An out-of-order extent is one for which the extent containing the current page for an index is not physically the next extent after the extent containing the previous page for an index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/27/2011)


    opc.three (9/27/2011)


    GilaMonster (9/27/2011)


    Both.

    From my experience with this client, and admittedly it is my first major dealing with heaps, I see that a high percentage affects performance adversely.

    It probably will, but it won't be as bad as logical fragmentation of indexes (which is a measure of what percentage of pages are out of order). Heaps don't have a logical order, so the only fragmentation they can have is extent fragmentation, defined as:

    Percentage of out-of-order extents in scanning the leaf pages of an index. An out-of-order extent is one for which the extent containing the current page for an index is not physically the next extent after the extent containing the previous page for an index.

    OK, is this fair to say?

    In a read-heavy system where indexes serve the majority of the queries, or in a write-heavy system where all columns are fixed-length, index maintenance will give the most benefit overall, rendering heap maintenance secondary.

    In a write-heavy system with lots of updates to variable-length columns (inner-table page splits) and queries producing table scans heap fragmentation can play a bigger role in performance problems and therefore heap maintenance should be a consideration.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (9/27/2011)


    In a write-heavy system with lots of updates to variable-length columns (inner-table page splits) and queries producing table scans heap fragmentation can play a bigger role in performance problems and therefore heap maintenance should be a consideration.

    Heaps do not ever incur page splits. Page splits can only occur when there is a logical ordering of rows necessitating that new or updated rows go onto a particular page.

    Heap fragmentation (which is extent fragmentation) is solely from inserts, allocating new extents that are not contiguous in the file.

    Updates to a heap cause forwarding pointers, which is far more a problem than extent fragmentation and would probably require heap rebuilds to fix before the extent fragmentation becomes a problem (depending of course on your insert, update and select patterns)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/27/2011)


    opc.three (9/27/2011)


    In a write-heavy system with lots of updates to variable-length columns (inner-table page splits) and queries producing table scans heap fragmentation can play a bigger role in performance problems and therefore heap maintenance should be a consideration.

    Heaps do not ever incur page splits. Page splits can only occur when there is a logical ordering of rows necessitating that new or updated rows go onto a particular page.

    Sorry, you are right, the term "page split" is misplaced. The result of updating a variable length column in a heap so the row no longer fits on the page creates a forwarded record. Those do contribute to fragmentation, as do a page splits, but they are not the same.

    Heap fragmentation (which is extent fragmentation) is solely from inserts, allocating new extents that are not contiguous in the file.

    Inserts and the creation of forwarded records, right? Physically speaking, would a new page allocation due to the creation of a forwarded record not be analogous to an insert and create extent frag?

    Updates to a heap cause forwarding pointers, which is far more a problem than extent fragmentation and would probably require heap rebuilds to fix before the extent fragmentation becomes a problem (depending of course on your insert, update and select patterns)

    I see a lot of forwarded records in this particular system.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Viewing 15 posts - 16 through 30 (of 31 total)

    You must be logged in to reply to this topic. Login to reply