Shrinking a Databse (pro\con)

  • OK, as a rule, I think shrinking a database or database file is bad. I would not normally allow for such behavior but I have a rather unique situation. We have our data partitioned across filegroups and these filegroups are partitioned on YEAR. So, all of our 2013 data is in a 2013 partition, all of our 2012 data is in a 2012 partition. Once these years are locked down, we mark the data files \ filegroups read only because that data will never change again. It is purely read only data.

    As part of the year end clean up process, we see that we can reclaim about 100+ GB of space by shrinking the 2013 partitioned data file. This file will NEVER be written to again (2013 is now history). We could use the 100+ GB back for our 2014 partitioned file.

    In this case, would it be ok to shrink the 2013 database file prior to locking it down and setting it to read only? Also, assuming it is ok to shrink, what would be best practices for correcting any fragmentation that may result from that file shrink. I'm assuming rebuilding indexes post shrink would be necessary.

    Any guidance or thoughts here would be appreciated. I'm looking for feedback \ opinions \ thoughts.

    Thanks

  • I had a similar situation with a month-end read only database created for an application.

    I did several steps:

    1. Compress all the tables in the database (partitions for older years in your case)

    Then:

    2. Shrink the files for the older data to eliminate any unused space.

    3. Reorganize all the table partitions for the older data.

    Repeat 2 and 3 until there is no more space to eliminate. I often went through 5 or 6 cycles of this.

    4. Update statistics on all tables with fullscan, all columns

    4. Set the file to read only.

    Index rebuilds would leave the indexes in better condition, but then you would need to leave enough free space to rebuild the largest partition. If the older data is not accessed that often, then the reorganized partitions will be OK.

  • jbarry 87949 (1/29/2014)


    OK, as a rule, I think shrinking a database or database file is bad. I would not normally allow for such behavior but I have a rather unique situation. We have our data partitioned across filegroups and these filegroups are partitioned on YEAR. So, all of our 2013 data is in a 2013 partition, all of our 2012 data is in a 2012 partition. Once these years are locked down, we mark the data files \ filegroups read only because that data will never change again. It is purely read only data.

    As part of the year end clean up process, we see that we can reclaim about 100+ GB of space by shrinking the 2013 partitioned data file. This file will NEVER be written to again (2013 is now history). We could use the 100+ GB back for our 2014 partitioned file.

    In this case, would it be ok to shrink the 2013 database file prior to locking it down and setting it to read only? Also, assuming it is ok to shrink, what would be best practices for correcting any fragmentation that may result from that file shrink. I'm assuming rebuilding indexes post shrink would be necessary.

    Any guidance or thoughts here would be appreciated. I'm looking for feedback \ opinions \ thoughts.

    Thanks

    I'm actually doing this at work and writing both a presentation and an article on the subject. Microsoft didn't make any of this easy.

    A couple of facts for anyone else that might read this...

    1. Shrinking a file causes massive fragmentation of the data in that file. You're practically guaranteed to have both the B-Tree and Leaf levels of all indexes, including the Clustered Index, be fragmented to 99%.

    2. Reorganizing indexes has two problems. It's ALWAYS fully logged and it does not reorganize the B-TREE. Some will say that it doesn't matter what the condition of the B-TREE is because it doesn't affect sington lookups nor Index Seeks that are followed by range scans but I'd just as soon not take a chance. Besides that, it can have page splits, which wastes spaces there, just like anyplace else.

    3. Rebuilding indexes also has a major problems, especially on partitions. While it will allow for Minimally logged rebuilds, if there are more than 128 Extents (that's just 8MB), the old index (or heap) will be preserved until after the new index is created. So, if you have a 100GB table in a partition on other than the PRIMARY FileGroup, that partition will blow out to at least 200GB when you rebuild the Clustered Index. If you shrink the database after that, massive fragmentation again. Vicious cycle unless you use REORGANIZE and that won't defrag the B-TREE.

    4. Does having a fragmented B-TREE really affect reporting THAT much? "It depends" on the indexes and the code, of course. The real key is that the process of doing multiple shrinks and REORGANIZEs (as I'm sure that Micheal Valentine Jones will confirm) is pretty much "hit'n'miss" and requires a bit of luck. If it takes 5 or 6 reorgs, and your database is in any Recovery Model other than SIMPLE, then you have 5 or 6 times the size of the table to backup on your log files because REORGANIZE is full logged (even in SIMPLE but log reuse occurs in that mode) not to mention all the disk and cpu activity of doing such a thing 5 or 6 times.

    The code to do what I've done isn't difficult but it's complex. To make a real long story short, I do 1 REORGANIZE of the partition, measure the size of the data and the indexes (Magic Number), copy the data from the partition to a staging table, build a new partition file using the Magic Number as initial size (and a very small growth usually less than 10MB) , rebuild the clustered index on the staging table to move it to the new file and rebuild all the same non-clustered indexes, switch out the old partition to a drop table and drop the table, merge the empty partition, rebuild the partition using the Partition Scheme and Paritition Function (previous read-only partition makes this more fun than it should be), and then switch the new file into the rebuilt partition. You will have successfully shrunk the partition to it's smallest size and, if you do it right, only the initial reorganize is "fully logged" and the total time of data unavailability is sub second. Since the data is only moved twice, it's not bad for performance, either, but that really doesn't matter so much because of the sub second time previously mentioned.

    The total time that the users are without the data in just that one partition is measured in milliseconds. Don't forget that if you make things Read Only, you have to set the whole database to SINGLE user, set the partition to Read Only, and then set back to multi-user. If you don't understand the significance of that, I'll be more blunt. It will throw everyone and everything except you out of the database including but certainly not limited to all WebServers and other client connections. That's why I make a separate database for these types of archives.

    I hope that helps. I'd post code for what I've done but it's not fully debugged/acid tested and it's a bit esoteric for use at my current company. Besides, it would make an article on the subject a bit of an anti-climax.

    I'll also state that this process could be done in fewer steps but I couldn't figure out a better way to keep the data available for as long as possible.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply