MDF File Size Management

  • Here's the deal. One of the DBs I manage is about 1.5 TB in size. I desire to make the data files more manageable by making them smaller in size. I'm not going to go crazy with it, but in the instances where I have to deal with the individual files, it's ALWAYS a pain dealing with a single MDF that is that big. I'm thinking 3 data files would be sufficient.

    I created a second data-file to hold the non-clustered indexes. That helped a little bit (about 1/3). That still leaves me with a 1 TB data file. Some of my larger tables are set up such that the primary key does not match the clustered index. That's not going to change. I'd like to hear the general consensus about moving non-clustered, primary-key indexes to a different file system. Will it affect performance 1) the same as moving any other non clustered index? 2) decrease performance? 3) probably make no difference? 4) Something else here as all good lists have 4 options to choose from.

    So, in the olden days it was a best practice to move non clustered indexes into their own file group. I can't find where that is true in SQL Server 2008. Everytime I talk to someone about the benefits of moving non clustered indexes to their own file system pretty much say it's the thing to do, because that's they way they've ALWAYS DONE IT. Yeah I get it, if they're on different spindles from the data itself, I could see where this would be a benefit. I still don't find anything current that discusses this as a performance enhancer anymore. Can anyone point me to a scholarly resource that confirms or disputes this?

  • I would say option 1) There is no performance difference between the index used for the primary key and any other non-clustered index.

    Instead of placing different objects in different filegroups, have you considered partitioning your largest tables, and placing different partitions in different filegroups ?

    If you want multiple files for maximum perfornmance, I would not bother. If you have multiple disks, it is a lot easier to just use striping to create a single large striped volume on all the disks.

    Manually putting different objects in different filegroups to get maximum performance is usually more work than its worth.

    But of course, as usual: It depends. 🙂

    /SG

  • As I more or less alluded to above, this isn't for peformance reasons. I work for a mega-company and as such a lot of our ancillary processes are farmed out to different departments. One of them being backup and recovery. We've had problems in the past restoring a single 1+ TB db, and would would have faced less obstacles if we could have restored the db in smaller data files.

    However you bring up a point that I've been trying to justify to my higher-ups... partitioned tables. I think we could really benefit from using partitioned tables. One reason is because of what you stated; I cold have a table in different file groups (as an FYI to my concerns above, I also thought about moving our bigger tables into their own file group, but I know the only benefit of that would be the stated goal creating a manageable file group size).

    Our biggest table has over 1.1 billion rows in it and consumes about 211 GB of data alone. If I understand partitioned tables, I could see a real performance benefit here by partitioning this table in SQL 2008. As soon as we load our monthly data for one client, we're loading the next client while simultaneously aggregating the data for the previously loaded client. So our EOM processing is pretty huge, and it seems like loading data into one partition of this table while we're aggregating from a different partition could only increase performance, even if the entire table were on the same drive. Your thoughts?

  • You can partition out tables to gain some performance, but if this is just a management issue, I would say that you might want to move some tables (moving their clustered index) to new file/filegroups.

    However, what issue do you have restoring a 1TB file? A backup is still going to contain all the data, so if you are having issues there, I'd look at solving those issues.

  • It sure sounds like you could benefit from partitioning.

    It sounds like you have a very large table with data that covers a long time period and the table only gets updated and inserted to at the end.

    This is a perfect scenario for partitioning.

  • The issue we most recently had was finding enough space to restore the file to. We were still on SQL Server 2000 at the time. Our backup/restore is handled by a different department. We only needed to restore a specific set of data (i.e. single customer's records). We've had problems with this (them) in the past, and we weren't going to let go of the old database until we knew we could restore successfully. They couldn't. We ended up having to go back six months to find a backup that worked (3rd party tool, not SQL Backup); so we made the decision to restore the six month old data, and copy over the data we need to 'fix' and we'd be able to manually load the last six months of data for this customer because we kept the source data extracts for them.

    Among the multiple problems this outside dept faced was how to instantly come up with over a TB of data. If the data had been in multiple files the could come up that much space, but not all together.

    I'm only repeating what they said. After the last fiasco we did manage to switch to a different backup/restore tool. We haven't had to use it yet but I have high hopes that it will work better (TSM TDP). It certainly backs up the db a lot faster than the previous solution did.

  • If the issue is finding space, then separate filegroups will definitely help. Just didn't want you to think that you could easily piecemeal restore. You can do some filegroup restores, but you need to practice that for sure.

  • Stefan_G (4/12/2010)


    It sure sounds like you could benefit from partitioning.

    It sounds like you have a very large table with data that covers a long time period and the table only gets updated and inserted to at the end.

    This is a perfect scenario for partitioning.

    I don't want to get off topic too much from my OP, but to clarify, our data does cover a long period of time, but it is not added to the end. For the most part, our data is sorted (clustered index) by the client number then the accounting period.

  • BobMcC (4/12/2010)


    Stefan_G (4/12/2010)


    It sure sounds like you could benefit from partitioning.

    It sounds like you have a very large table with data that covers a long time period and the table only gets updated and inserted to at the end.

    This is a perfect scenario for partitioning.

    I don't want to get off topic too much from my OP, but to clarify, our data does cover a long period of time, but it is not added to the end. For the most part, our data is sorted (clustered index) by the client number then the accounting period.

    I just wanted to add a couple of thoughts about partitioning if you are interested in it.

    I work for a company and when I started here we were in much the same situation that you are. We had one datafile that was about 800 GB and about to overflow the 1 TB drive that it was on. So like your original thoughts for this thread I split the data up between data and indexes. That worked to fix the problem that I had but that’s not why I am answering your post if your database is large partitioning is the way to go. After the initial split of our database I did some work with partitioning and ended up using a few partitions across the database and the performance gains are tremendous. Not only do you get the multiple files that you are looking for but your seek and load times can drop tremendously when the partitions are done correctly. The same database now that I started with has 5 partitions running throughout the tables is 2.5 TB in size (before compression) and has about 400 files. As far as rows go last time I did a count it has about 4 billion rows total in the database. My load times are what I found the most interesting when this work was completed. Our database grows about 10 GB a week from the data we are ingesting so load time is very important around here after the partitioning was completed I saw load times on my large tables go down by 50% to 25% of what they were.

    You mentioned trying to get your larger tables into their own file groups, partitions are excellent for that in fact if you work them out correctly they can help you archive off data without affecting the rest of you database or if you are keeping everything if some data is used more than others you can keep the most used data on the fastest drives.

    Partitioning can be a very flexible solution to solve a lot of large database issues but when you are trying to architect your solution keep in mind the big picture of your database and applications that use it. Don’t think about how to partition only one table a good partitioning architecture has to work across your entire database to really see the gains that it can yield.

    In answer to your question if the partition key is correct for the table it won’t matter where the data is inserted, in the middle or end, you will still be able to see performance gains from partitioning.

  • Steve Jones - Editor (4/12/2010)


    If the issue is finding space, then separate filegroups will definitely help. Just didn't want you to think that you could easily piecemeal restore. You can do some filegroup restores, but you need to practice that for sure.

    Nope, the big issue in this case was finding space. That's why I was looking for logical ways to create manageable data file sizes; and if I could improve performance a bit too, then so much the better. One thing I didn't want to do though was HURT performance. I couldn't think of a reason why moving the nonclustered primary key matching index would be any different than moving any other nonclustered key, so I thought I'd throw it out here to the masses. In my mind, three 500GB data files seemed to be a good combination of easy to manage in terms of data size (not being too big), and number of files (not having too many data files). Regarding piecemeal restore... Unfortunately file group restore is not an option for us since this particular database is set for Simple Recovery.

    Oh, and thanks for the replies; and I enjoy the podcasts.

    Bob McC

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply