Index Rebuild and Reorg - how to in VLDB 24/7 env

  • I wanted to share my findings with you (and a script)

    We have 24/7 workload with no downtime, up to 200K IOps, multiple databases 10-20-30Tb, on few servers 80Tb total.

    I used RESUMABLE=ON on enterprise, tried to do as much as possible using REORGANIZE and simple REBUILD on servers with Standard Edition

    ONLINE=ON and RESUMABLE=ON is not a silver bullet, and even slow and 'safe' REORGANIZE can bring you system down.

    Read what I had learned:

    https://habr.com/ru/articles/761518/

  • I'm curious as to what measurable benefit you realized from performing reindexing / reorg on your indexes?

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John wrote:

    I'm curious as to what measurable benefit you realized from performing reindexing / reorg on your indexes?

    😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • tzimie wrote:

    I wanted to share my findings with you (and a script)

    We have 24/7 workload with no downtime, up to 200K IOps, multiple databases 10-20-30Tb, on few servers 80Tb total.

    I used RESUMABLE=ON on enterprise, tried to do as much as possible using REORGANIZE and simple REBUILD on servers with Standard Edition

    ONLINE=ON and RESUMABLE=ON is not a silver bullet, and even slow and 'safe' REORGANIZE can bring you system down.

    Read what I had learned:

    https://habr.com/ru/articles/761518/

    In your article your first myth is this:

    Myth 1. We use SSD (or super duper storage), so we should not care about the fragmentation. False. Index rebuild compactifies a table, with compression it makes it sometimes several times smaller, improving the cache hits ratio and overall performance (this happens even without compression).

    Reindexing itself does not improve performance.  Statistics are updated as part of the reindexing process.  That is what results in better performance.

    From Microsoft's documentation:

    Customers often observe performance improvements after rebuilding indexes. However, in many cases these improvements are unrelated to reducing fragmentation or increasing page density.

    An index rebuild has an important benefit: it updates statistics on key columns of the index by scanning all rows in the index. This is the equivalent of executing UPDATE STATISTICS ... WITH FULLSCAN, which makes statistics current and sometimes improves their quality compared to the default sampled statistics update. When statistics are updated, query plans that reference them are recompiled. If the previous plan for a query was not optimal because of stale statistics, insufficient statistics sampling ratio, or for other reasons, the recompiled plan will often perform better.

    Customers often incorrectly attribute this improvement to the index rebuild itself, taking it to be result of reduced fragmentation and increased page density. In reality, the same benefit can often be achieved at much cheaper resource cost by updating statistics instead of rebuilding indexes.

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • I am using COMPRESSION=PAGE

    Even without COMPRESSION, tables become 1.5-2 times smaller after rebuild

    With the compression, below is the extract from the output of my script. You can see, sometimes tables become 4-5 times smaller. Even with our RAm =2Tb, it improves cache hits ratio a lot

    stats updated, 240613 Mb -> 89863 Mb, delta 150750 Mb

    stats updated, 241733 Mb -> 83994 Mb, delta 157739 Mb

    stats updated, 251678 Mb -> 95739 Mb, delta 155939 Mb

    stats updated, 319968 Mb -> 107648 Mb, delta 212320 Mb

    stats updated, 335910 Mb -> 123393 Mb, delta 212517 Mb

    stats updated, 350021 Mb -> 151067 Mb, delta 198954 Mb

    stats updated, 53 Mb -> 33 Mb, delta 20 Mb

    stats updated, 81 Mb -> 58 Mb, delta 23 Mb

    stats updated, 617 Mb -> 490 Mb, delta 127 Mb

    stats updated, 1671 Mb -> 535 Mb, delta 1136 Mb

    stats updated, 3809 Mb -> 1995 Mb, delta 1814 Mb

    stats updated, 4268 Mb -> 2177 Mb, delta 2091 Mb

    stats updated, 4345 Mb -> 2658 Mb, delta 1687 Mb

    stats updated, 5272 Mb -> 2721 Mb, delta 2551 Mb

    stats updated, 11700 Mb -> 2655 Mb, delta 9045 Mb

    stats updated, 15428 Mb -> 4785 Mb, delta 10643 Mb

    stats updated, 19375 Mb -> 2228 Mb, delta 17147 Mb

    stats updated, 20488 Mb -> 6032 Mb, delta 14456 Mb

    stats updated, 25631 Mb -> 0 Mb, delta 25631 Mb

    stats updated, 29572 Mb -> 12956 Mb, delta 16616 Mb

    stats updated, 29657 Mb -> 17659 Mb, delta 11998 Mb

    stats updated, 59463 Mb -> 16095 Mb, delta 43368 Mb

    stats updated, 59533 Mb -> 15927 Mb, delta 43606 Mb

    stats updated, 60228 Mb -> 16438 Mb, delta 43790 Mb

    stats updated, 60645 Mb -> 12131 Mb, delta 48514 Mb

    stats updated, 60963 Mb -> 16156 Mb, delta 44807 Mb

    stats updated, 64684 Mb -> 17540 Mb, delta 47144 Mb

    stats updated, 64826 Mb -> 17200 Mb, delta 47626 Mb

    stats updated, 352689 Mb -> 153812 Mb, delta 198877 Mb

  • I am aware of that effect. We do update stats which changes (sometimes) execution plans

    We monitor Query Store a lot for regressed queries and check for execution plans change.

    As you can see, rebuild make tables several times smaller. It is a great result

     

  • tzimie wrote:

    I am using COMPRESSION=PAGE

    Even without COMPRESSION, tables become 1.5-2 times smaller after rebuild

    With the compression, below is the extract from the output of my script. You can see, sometimes tables become 4-5 times smaller. Even with our RAm =2Tb, it improves cache hits ratio a lot

    While there are a lot of size differences listed, I still cannot see how this translates into something that is better.  Do the queries return faster?  Is there less blocking?  What about the effect on the system when the reindexing is executing?  What is a DBA's time worth to create a process, and likely support it, that may not be improving anything?

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • tzimie wrote:

    I am aware of that effect. We do update stats which changes (sometimes) execution plans

    We monitor Query Store a lot for regressed queries and check for execution plans change.

    As you can see, rebuild make tables several times smaller. It is a great result

    Same question.  What measurable improvements have you observed from this effort?

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • We had reversed organic databases growth, so Used space in a database, which steadily increased, dropped sharply in few month. So I had bought 1-2 years of life on the same SSDs. As you understand, with bare metal servers you can;t just 'increase the disk', as you do in a cloud or virtual infrastructure.

    Regarding the performance, as I am still rebuilding indexes, I add additional workload on a server so my goal is not to make harm. I have a custom.ps1 module in my script which checks about 20+ parameters (job durations and delays in data flows) so rebuild is paused if any metrics are out of range

    I can't claim that 'this process is 2 times faster now' because there are too many tables, too many procesess, and metrics are not straightforward. But when I finish the first pass of the rebuild I will do the analysis

    • This reply was modified 1 year, 2 months ago by  tzimie.
    • This reply was modified 1 year, 2 months ago by  tzimie.
    • This reply was modified 1 year, 2 months ago by  tzimie.
  • Also, as reindexing cycle takes several month, and business is quickly growing and we have more data, there are too many variables.

    At least we keep all SLAs with growing amount of data

  • tzimie wrote:

    Also, as reindexing cycle takes several month, and business is quickly growing and we have more data, there are too many variables.

    At least we keep all SLAs with growing amount of data

    Here is the point I am trying to make.  All of your effort, which is to be commended, is probably a waste.  I too was in that trap of reindexing thinking it was a good thing.  Based upon my testing, and largely on the research of Jeff Moden, I stopped reindexing.

    Prior to stopping reindexing, we averaged 21 tickets per week in Service Now that were caused by the database performance (deadlocks, slowness, timeouts, etc.).  After I stopped reindexing, the number of tickets was ZERO in FIVE YEARS.

    There is a whole series of YouTube videos that Jeff created. Here is one: https://www.youtube.com/watch?v=rvZwMNJxqVo

    Google "Jeff Moden Black Arts Index Maintenance"

     

     

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John wrote:

    tzimie wrote:

    Also, as reindexing cycle takes several month, and business is quickly growing and we have more data, there are too many variables.

    At least we keep all SLAs with growing amount of data

    Here is the point I am trying to make.  All of your effort, which is to be commended, is probably a waste.  I too was in that trap of reindexing thinking it was a good thing.  Based upon my testing, and largely on the research of Jeff Moden, I stopped reindexing.

    Prior to stopping reindexing, we averaged 21 tickets per week in Service Now that were caused by the database performance (deadlocks, slowness, timeouts, etc.).  After I stopped reindexing, the number of tickets was ZERO in FIVE YEARS.

    There is a whole series of YouTube videos that Jeff created. Here is one: https://www.youtube.com/watch?v=rvZwMNJxqVo

    Google "Jeff Moden Black Arts Index Maintenance"

    Deadlocks, slowness, timeouts - that is an exactly the point I was making in my article - you can't just start INDEX REBUILD in, say, management studio, and run it for, say, a day for a big index. I would cause locks, slowness, LDF and AlwaysOn runaways etc. This is why I created the script which run it checking the health of all metrics constantly

    And of course, if I had unlimited resources I would not do index rebuild

    • This reply was modified 1 year, 2 months ago by  tzimie.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply