Index rebuild / stats update taking lot of disk space.

  • Dear Friends,

    Please advise why the Index rebuild and Update stats  jobs on one server consume lots of disk space after they run..I have to manually shrink the data file to release the space....Thank you in advance.

     

     

  • Oh man never (unless it is a drastic data delete and a one off shrink) shrink the data file, your getting yourself into a shrink rebuild shrink cycle and that is not good for you or your database.

    The rebuild requires space to sort the index before the old and new index is swapped in/out, especially if the index is rebuilt online.

    The old index remains in place and a new index is built, then when ready the index pages are swapped around.

    So you could end up using around 2x the amount of space of your largest index and then that becomes white space after the pages are swapped and dropped.

    If your largest index is say 100GB, you can use up to around 200GB extra just to sort and store that while the rebuild is running.

     

     

    Now you need to ask yourself why are you rebuilding indexes, is it for performance, if so what type of storage are you on?  Maybe you never need to rebuild indexes ever and just use statistics maintenance instead.

    Even Microsoft say you now may never need to rebuild indexes, I would look at what you are doing and why you are doing it and potentially re-work your maintenance routines.

    https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-ver16#index-maintenance-strategy

  • Arsh wrote:

    Dear Friends,

    Please advise why the Index rebuild and Update stats  jobs on one server consume lots of disk space after they run..I have to manually shrink the data file to release the space....Thank you in advance.

    As Ant-Green correctly advises (and has sited the correct documentation), I'd recommend that you actually stop doing index maintenance until you know more about it.  The reason why Microsoft finally changed this is that a lot of us have proven that the supposed "Best Practices" than 98% of the world has been following is horribly incorrect and has been wrong since the beginning.  Even Paul Randal, the fellow that came up with those number under gun-point by Microsoft said way back in 2008 "to take those [5/30] numbers with a grain of salt and don't treat them as absolute".

    https://www.sqlskills.com/blogs/paul/where-do-the-books-online-index-fragmentation-thresholds-come-from/

    What's worse is that the documentation on REORGANZE and WITH ONLINE = ON is a bit confusing to most.  In the last couple of years, I've proven that REORGANIZE does NOT do what they think it does (like I said, although the documentation on it is correct, it leaves most folks thinking that it "follows the Fill Factor just like REBUILD does and it does NOT) nor is "less resource intensive" as both the documentation and the rest of the world claim.  It will cause log file entries many times the size of whatever index you're using it on including some huge Clustered Indexes.

    I've also proven that it's MUCH better to do no index maintenance that it is to do it wrong.  That's why I stopped doing any general index maintenance on Monday, the 18th of January, 2016.  One of the reasons is because of the massive "morning after blocking" that will occur.

    I and a lot of other good folks have also proven that logical fragmentation mostly does NOT matter and, if you look at all the 'tubes and blogs and articles on the subject of index maintenance, none of them actually prove that it does matter.  They just repeat the same old crap about how it CAN matter but never do a demonstration.

    The information about index maintenance is so wrong that people also believe that Random GUIDs are a fragmentation problem because they're Random... in fact, them being so evenly Random makes them the very epitome of how people think an index should behave and they can have very high insert rates and, quite literally, go for month with < 1% logical fragmentation.  And, I PROVE all that and more in the following video.  Watch it all because it's not just about GUIDs.  And watch the out-take after the Q'n'A section.

    https://www.youtube.com/watch?v=rvZTwMNJxqVo

    So, my advice to you is to stop doing index maintenance and pay more attention to rebuilding stats.  Once you know more about your indexes, science out what they're doing and only rebuild indexes when their average page density gets below 80%. And, even then, you've got to be careful... if they're fragmenting like that and you rebuild them at the default of "0", will that cause more blocking and you need to reduce the fill factor or will reducing the fill factor do NOthing to help?

    Whatever you do, STOP following the supposed "Best Practices"... they're actually a WORST practice and many of us have proven it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I will add that rebuilding indexes is not about performance - and should not be considered as a method to improve performance.  Index maintenance is about resource usage.

    How much wasted storage and wasted memory do you have if you have a 1TB clustered index that has an average page density of 50%?  For any cloud based systems, paying for an additional 300GB of space just because you are not performing index maintenance is an expensive cost for a lot of wasted space.

    One rule of thumb I use is that if an index is selected to be rebuilt (doesn't matter what you use to determine this) - do not rebuild that index with a FILLFACTOR of 0 (100) unless you know exactly how and when data is inserted/updated/deleted.  For example, if the process is part of an ETL and the data is only inserted/updated during the ETL process - then you can rebuild after the load to 100 since you also know the data won't change until the next load.

    The next thing you need to determine is how fast the index is fragmenting and what you can do to reduce that frequency.  That is what FILLFACTOR is all about - making sure you have enough space available on each page to support new rows and updates to avoid expensive page splits.  If you are able to set a reasonable fill factor to get you to a day/week/month/quarterly schedule for that index - without consuming too much space/memory then you can schedule those processes to manage those indexes.

    Final Note: you can reduce the amount of space used in the data file by setting SORT_IN_TEMPDB on.  But be aware this will require much more space in your tempdb and you need to ensure tempdb is capable of handling the additional workload.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Ya gotta be careful, though.  If you do the index maintenance incorrectly, you could be setting up the index to do massive page splits which will cost you dearly in the hours and, sometimes, days after.  If you do have an index with a 50% average page density and you don't actually know if the fragmentation occurred only at the broom tip (hot spot at the insertion point of the end of the index), then rebuild it at 82%.  You be gaining a lot in space savings without the possibility of it killing you with page splits because you made the mistake of rebuilding it with a "0" fill factor, leaving no space.  If you CAN prove that the fragmentation only occurs in the hot-spot, the rebuild it at 97 % so that you know just by looking at the fill factor, what kind of index it is.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Ya gotta be careful, though.  If you do the index maintenance incorrectly, you could be setting up the index to do massive page splits which will cost you dearly in the hours and, sometimes, days after.  If you do have an index with a 50% average page density and you don't actually know if the fragmentation occurred only at the broom tip (hot spot at the insertion point of the end of the index), then rebuild it at 82%.  You be gaining a lot in space savings without the possibility of it killing you with page splits because you made the mistake of rebuilding it with a "0" fill factor, leaving no space.  If you CAN prove that the fragmentation only occurs in the hot-spot, the rebuild it at 97 % so that you know just by looking at the fill factor, what kind of index it is.

    Agreed - I would only ever rebuild an index to 100 if I absolutely know that there are no inserts, updates or deletes that can be performed on that table by any other process.  Generally, that is going to be some type of ETL process where we have absolute control of the population of the tables.  Another place where a 100 fill factor is reasonable would be lookup tables, which generally don't change very often and are normally smaller in size.

    As to what fill factor to use - that is where the analysis needs to be done.  You need to know how the application(s) insert, update and delete data for that table.  You also need to know the volume of changes and determine the best fill factor for that system.  If you have a large enough maintenance window and can rebuild the indexes on that table every night then you can get away with a much higher fill factor - but if that isn't the case then lowering the fill factor *may* allow time to put off rebuilding for a week or month or longer.

    Frankly, not rebuilding indexes is going to lead to much higher costs - especially considering how cloud services charge based on compute resources and memory, storage, IO and CPU all factor into the total cost.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Here's the "formula" I've been using in production

    FF

    100 - Static or usually static.

    99 - Usually, some form of ever-increasing keyed index that suffers no fragmentation, which also means no "ExpAnsive" updates anywhere.

    98 - Permanently fragmented "Sequential Silos".  The odd critters fragment massively but stay near 100% full all the time because, oddly enough, there is no page splitting.  Sounds impossible but it's not.

    97 - Like type 99 BUT they suffer from "broom tip" fragmentation by the bad pattern of doing inserts followed soon after by "ExpAnsive" updates.  It's just a huge waste to use a lower fill factor because none of the rest of the index is affected.  I rebuild these when they're wasting enough memory/disk space to warrant it.

    96 - Random Silos... fragment forever and might not be able to prevent page splits but need space recovery .  Usually not worth leaving any headroom but could become more like a TYPE 1 index when analyzed if the Silos are wide enough to benefit from.

    Type 1 - these are usually evenly distributed fragmenters like Random GUIDs and other things.  Depending on their insert rates, I'll rebuild them at 91, 81, or 71 FF (always ending with a "1" to identify what they are.  The big reason to rebuild these is to prevent the massive page splits they would otherwise suffer.  If the Logical fragmentation hit's 1%, it means it's time to rebuild because the area above the fill factor is full and all hell is getting ready to break loose.

    Type 2 - Usually set to 82 (ending with a "2")  These are indexes that fragment and you need to recover space but you haven't analyzed then yet.  The "2" is to remind you that you still have an analysis "TO DO". 😉  They frequently turn out to be a "97" but haven't done the analysis yet.

    I don't leave "o" fill factor indexes... those are new indexes that need to be classified.  The same goes for existing indexes that don't have the values above.  Someone else added a Fill Factor that usually ends in a 0 or a 5 and they need to be identified and checked.

    I also use the Swap'n'Drop method when recovering space on monster indexes to that I don't leave a bunch of totally unnecessary free space.

    And... I might do index maintenance somewhere between once a month and once a quarter or so.  I don't do this on a schedule.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dear All,

    Firstly , Thank you to all of you sharing your knowledge..The issue was with an Agent Jjob for Index Rebuild , done for all the Indexes irrespective of the Fragmentation  percentage. Since many tables are over 20 GB , hence the issue I suppose. Also found , was 'Update Stats' agent job that was running every week with Full Scan , indiscriminately. Not sure if that contributed to this issue  , but I disabled for the time being...Would be glad if anybody can shed some light on 'Update Stats' too in the context of this problem. The space consumption is in much control now 50%, but I still wonder what else can be the cause. This drive , only has the data files of two databases.. Thank you in advance.

    • This reply was modified 1 year, 9 months ago by  Arsh.
  • Regardless of how indexes are selected to be rebuilt - space is required in the data file to rebuild the indexes.  If you are constantly shrinking the data file, rebuilding indexes - shrinking the data file then you are causing the file to need to grow each time and the shrink itself is going to cause those indexes to fragment.

    If you just left the data file alone - it wouldn't need to grow again since it can use the space that is available to perform the rebuilds.

    Update Stats doesn't take up a lot of space and that is something you really need.  That job should not be disabled - especially if you decide to not rebuild indexes, since rebuilding indexes will update the statistics with a full scan.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks everyone for your advices..I did a Rebuild of only selected indexes , with a frag percent more than 30% unlike the earlier maintenance plan that had a blanket Rebuild for all..Now the disk space issue has been solved 50% but still I see the space being taken up..strange this is that the Database properties show that the free-Space is 98% , still I see the so much of disk space taken up..Thank you for any comments/opinions.

    Rgds

  • Hi,

    looking at other side of it, is there a way to reclaim the space on the drive after Index Rebuild ? thank you.

  • Arsh wrote:

    Hi,

    looking at other side of it, is there a way to reclaim the space on the drive after Index Rebuild ? thank you.

    Yes, but you really shouldn't do it

    You would shrink the file, but then you introduce fragmentation to a point where it is then nearly probably you will need to rebuild the indexes again, so you are constantly in a rebuild, shrink, rebuild cycle.

    Your best bet, just leave the space alone.

  • Thank you Ant-Green. I have changed the maintenance plan and categorized the indexes rather into Rebuild and Reorganized categories depending on which side they fall on the 30% mark replacing a policy of doing a blanket run for all 600+ indexes on a database sized 700 GB around. Now the problem with the Disk exhaustion is solved 60%. Still I the datafiles growing , which can also be due to the application as it may require to write data.

    Last comment I would need is if you can advise on reclaiming the space taken by index defragmentation.thank you in advance.

  • Well what is the reasoning behind rebuilding indexes?  Performance? And has it been proven that the index rebuild actually helps or is it the UPDATE STATISTICS by product which is actually helping here for performance?

     

    The 5% / 30% values that where "supposed" best practise are actually a "worst" practise.

     

    Microsoft say in their index maintenance strategy that if your rebuilding for performance, stop and just do UPDATE STATISTICS instead and see if that solves you problem.

    Read back up to my original post, it shows a sample index maintenance strategy.

     

    And again no, don't reclaim the space, you will end up in a constant shrink, rebuild (caused growth), shrink cycle.

  • Ant-Green wrote:

    Microsoft say in their index maintenance strategy that if your rebuilding for performance, stop and just do UPDATE STATISTICS instead and see if that solves you problem.

    It does now but only as of 20 April 2021 when they finally "got the word".  They had another change after that to the current wording.  They still don't identify the extreme dangers of using REORGANIZE.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply