Restructuring File Groups and consolidation of multiple files

  • We are currently preparing for a transition of our VLDB(250GB) production database from SQL2000 to SQL2005 and are planning to consolidate the database file groups and flatten as many of the file groups as possible. The database currently has 11 file groups spread out over a total of 33 files. Each file group has its set of files located on the same logical volume.

    What I am interested in knowing is what are some ways to determine how the filegroups should be structured as well as what general criteria should be used to determine the number of filegroups needed? Also, when I do consolidate/flatten the filegroups what is the best method for doing the actually work once my plan has been established?

  • It is hard to give you an answer without knowing the reason you are planning to do this.

    So what is the reason?

  • Reasons:

    - All database files for each file group reside on the same array thus negating the effectiveness of the structure that was put in place prior to consolidation of the arrays within the production hardware.

    - General house keeping following 4 years of use following go live. The overall structure of the filegroups were designed based upon 12 seperate arrays at go live and has since changed to 4 arrays. We want to consolidate to assist us in better monitoring file growth trends, maintenance criteria.

    These are the key two resons we want to consolidate/flatten the database files and filegroups if at all possible. Since this is my first experience with this nature of re-design I don't really know where to start or what to analyze to determine object allocation across file groups nor do I know how to analyze what performance gains we might get from consolidating the data files for each file group. Ideally we would like to have 5 file groups instead of the 11 and each would be placed on its own seperate array within the LUN on the SAN. However, before we can get there I have been tasked with preparing the migration plan for this action. Thus, I am in need of a bit of the good ole fashion "hand holding" in guiding me through the steps need to properly plan for this re-structuring.

  • I don’t see a good reason to do what you are planning.

    The problem with having one filegroup for each array is they you do not know how much space each will need in the future, and you do not know the actual IO lode on each filegroup.

    I would recommend creating one file for each filegroup on each array in order to distribute the IO load and space usage as evenly a possible over all arrays. You do not have to do a lot of analysis to determine the hot spots, because the SQL Server will distribute the data more or less evenly over the arrays. You can consolidate filegroups if you want, but there is probably not much advantage to it.

  • Depending on how you structure your SAN, you might not get benefits from splitting out on the SAN.

    You have some good reasons, but I've not seen any really good articles or papers on how the the filegroups work when flattening them out. If you come up with something, I'd like to publish it.

    I'd only split out things based on

    - performance. If you can move indexes and gain speed, it makes sense. Same for archiving data

    - backup. If you can reduce backups by moving archival data to a separate filegroup (or read only data), then that's a good thing.

  • Ok, so consolidating the data files within each of our file groups IS a good idea but consolidation of then number of file groups isn't? If this is true then if I am going back to basics in this re-design of the filegroups and overall layout of them I should plan to allocate the larger more heavily accessed data tables evenly across all filegroups? Or should I maintain file groups that are fairly dis-proportionate in there utilization?

    Steve,

    Thanks for the thoughts related to the SAN. I will for sure feedback my findings and notes regarding how flattening of the filegroups affects performance in a SAN implementation.

  • I normally just create one filegroup called DATA and make it the default filegroup. Then I create one data file for that filegroup on each array, assuming that they are more on less equal in size, and let SQL Server distribute the data across the files.

    Since your database already has multiple filegroups, I would just create one file on each array for each filegroup to spread the data across the arrays.

    If you want, you can combine some of the filegroups, but you probably will not see much benefit from that, so I would be inclined to leave filegroups as they are.

  • Michael,

    In a multiple data file per filegroup structure how do you keep indexes healthy as you can't see how fragmented each data file is within the file group without being highly creative in your determinations of re-indexing needs for each. This is another issue we are currently combatting in our 2000(non-SAN) implementation of this database. If each filegroup is on its own array isn't index health easier to analyze and determine maintenance needs on each filegroup? I know that at minimum Scan Density is scewed on indexes located in filegroups spanning multiple data files (iregardless of wether the filegroup data files are on multiple arrays on a single array). Is this still true with 2005 implementations as well?

    Also, what affect does the spanning of arrays have on statistics and the histograms used by the query processor?

  • Scott Clark (1/14/2008)


    Michael,

    In a multiple data file per filegroup structure how do you keep indexes healthy as you can't see how fragmented each data file is within the file group without being highly creative in your determinations of re-indexing needs for each. This is another issue we are currently combatting in our 2000(non-SAN) implementation of this database. If each filegroup is on its own array isn't index health easier to analyze and determine maintenance needs on each filegroup? I know that at minimum Scan Density is scewed on indexes located in filegroups spanning multiple data files (iregardless of wether the filegroup data files are on multiple arrays on a single array). Is this still true with 2005 implementations as well?

    Also, what affect does the spanning of arrays have on statistics and the histograms used by the query processor?

    I am not sure what you are asking. If you want to determine index fragmentation, you can use DBCC SHOWCONTIG. Maintenance is performed on tables and indexes, not filegroups or files. Just reindex or defragment the indexes that you feel are too fragmented.

  • when an index is contained in a filegroup which spans multiple database files DBCC SHOWCONTIG isn't as accurate as it is on indexes which are contained within a filegroup on a single data file. This is what I am concerned about as we currently can't utilize half of the results provided by DBCC SHOWCONTIG for determination of index health due to the spanning.

  • Scott Clark (1/14/2008)


    when an index is contained in a filegroup which spans multiple database files DBCC SHOWCONTIG isn't as accurate as it is on indexes which are contained within a filegroup on a single data file. This is what I am concerned about as we currently can't utilize half of the results provided by DBCC SHOWCONTIG for determination of index health due to the spanning.

    What is your basis for this statement? Can you site a reference?

  • Don't have a reference other than what we have experienced in our production environments. This has already caused us to move indexes on several occasions to filegroups which have only one filegroup on the same array. Our re-indexing scheme is to re-index tables which have a scan density > 80% and fragmentation level < 5%. Prior to moving one specific index from a spanning filegroup to a non-spanning filegroup it was never re-indexed via our process as these two conditions were never met. Following the relocation this same index is re-indexed almost nightly and doesn't require manual re-indexing when performance degrades on it. We also have seen degredation of an index via profiler traces on this same index which weren't reflected in the SHOWCONTIG output prior to relocation of the index to a non-spanning file group. As we have seen this behavior multiple times that is one of the key drivers to our flattening project. Granted there will be tables and indexes which benefit from spanning however, indexes are negatively affected by spanning, as far as the use of DBCC SHOWCONTIG results are concerned in determining index health.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply