Changing a database from a single mdf file to mutiple files

Question

Changing a database from a single mdf file to mutiple files

Dan Barber

SSC Enthusiast

Points: 124
More actions
May 18, 2005 at 4:28 am

#109407

Hi, I have a database which only has one mdf file yet it is sitting on a RAID 5 3-Disk array so i want to make use of the number of disks in the array by adding 2 secondary data files (This means i can have three threads running instead of just one). I can create the 2 extra ndf files no problem but have tried several ways to restore a backup accross the 3 files but without success. I'm not sure even whether this is possible but wanted to know for definite. Is my only choice to restore the database to a single mdf file and then only use the 2 secondary files for data added after that? Hope i have explained this clearly
Thanks
Dan

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Andrew Scott-149650 SSC Enthusiast Points: 172 More actions · Answer 1

You are already using the 3 disks with the single mdf file, the file will be striped with parity across the three disks, having 3 files will not mean that each file will sit on an independant disk they two will be stipped with parity accross the 3 disks. If you want to split the database up so different files exist on different disks you will need to use more disks and have 3 Raid 5 arrays each with a minimum of 3 disks.

Dan Barber SSC Enthusiast Points: 124 More actions · Answer 2

I have been told that if you have 3 disks in your raid array it is best for performance to have a database file per disk in your array as then in my case you can have 3 simultaneous threads running instead of only 1. With only one file, it can only be accessed by one person at one time. Having three files would allow 3 people access to the same information all at the same time. Is this not correct?

Dan

Johnny Ed Old Hand Points: 326 More actions · Answer 3

Dan,

"a database file per disk in your array "

I assume you inferring having the same number of files as disks within your array, rather than a file on each disk within the array. If not, the youll find RAID 5 is different to, say, RAID 1. ( see http://www.pcguide.com/ref/hdd/perf/raid/levels/singleLevel5-c.html )

With RAID 5 ( where data and parity is stripped across all drives) you cannot control where the files are located. Therefore a single file would probably be distributed across the physical disks in a similar fashion to having 3 files totalling to same size. The performance bottleneck is always going to be with I/O and the of disk drive reading the data.

So AFAIK, it wont make any difference having 3 files instead of 1. Out of interest, what you are suggesting is not that dissimilar to using different filegroups.

Books online suggests that "The same performance improvement [from using filegroups] can be accomplished with a single file created on a RAID (redundant array of independent disks) stripe set"

We are running a large warehouse (>120GB) with essentially 1 large datafile spread across a large RAID 5 array. The only reason I can see for splitting up a datafiles within the same RAID array is if it reaches maximum file size, or for specific object placement.

See the following BOL excerpt on filegroups;

"Secondary Filegroups

These files hold all of the data that does not fit in the primary data file. If the primary file can hold all of the data in the database, databases do not need to have secondary data files. Some databases may be large enough to need multiple secondary data files or to use secondary files on separate disk drives to spread data across multiple disks.

...

Filegroups allow files to be grouped together for administrative and data allocation/placement purposes. For example, three files (Data1.ndf, Data2.ndf, and Data3.ndf) can be created on three disk drives, respectively, and assigned to the filegroup fgroup1. A table can then be created specifically on the filegroup fgroup1. Queries for data from the table will be spread across the three disks, thereby improving performance. The same performance improvement can be accomplished with a single file created on a RAID (redundant array of independent disks) stripe set. Files and filegroups, however, allow you to easily add new files on new disks. Additionally, if your database exceeds the maximum size for a single Microsoft Windows NT® file, you can use secondary data files to allow your database to continue to grow."

Dan Barber SSC Enthusiast Points: 124 More actions · Answer 4

Hi Edward thanks for your explanation.

In ref : I assume you inferring having the same number of files as disks within your array, rather than a file on each disk within the array. If not, the youll find RAID 5 is different to, say, RAID 1. ( see http://www.pcguide.com/ref/hdd/perf/raid/levels/singleLevel5-c.html )

Sorry badly worded yes you were correct i did mean same number of files as disks within the array. Ooops....

Maybe thats why i could find anyone else who had done this . I recently attended a microsoft datawarehousing course and was told amoungst other things to do as i described. The guy was an MCSE, MCDBA so i thought what he said would be correct. It seems not so. Looks like i will stick to my original single mdf file then. What could he have meant about the 3 threads rather than one?

Thanks again both of you for your explainations

Dan

ChrisMoix-87856 SSCertifiable Points: 7288 More actions · Answer 5

regarding the thread comment, SQL Server will spawn a seperate thread for each file (say you have a filegroup of three files, SQL Server will spawn three threads to read from a table on that filegroup), regardless of whether the three files are on the same windows drive (this could be a physical drive or a Raidset, Mirrorset, etc.) or not.

The performance gained by doing this will obviously be better if the files are on their own Windows drive (and that windows drive is mapped to a physical drive or raidset, not partitioned). This stuff can get complicated depending on your underlying disk setup (local drives vs. RAID arrays vs. SAN).

The best performance I had (assuming no change in the underlying hardware) was to have three mirrored drives (actually RAID 10 arrays). I had a filegroup comprising three files, one per windows drive. I created tables on the filegroup (therefore the tables were 'striped' accross the three windows disks). I was able to bring the Average Disk Queue length down from a number getting up to as high as 900/sec. to around 5-15 per second from the previous setup (the filegroup made up of one file on a RAID 5 set).

David Scotland-132255 SSCertifiable Points: 5599 More actions · Answer 6

As previously stated having a numerous files on a single Raid 5 array will likely make little difference to your performance but there is a more important consideration, in my opinion, which is related to how many processors you have in your server.

This is because you can only write to one file with one thread. So if you have multiple processors and only one file then only one of them will be able to write data to disk. To see if this is a problem monitor your server and see if only one processor seems to be doing all the work while the rest are havinf a nap. This can have a significant hit on performance as you don't want anything waiting on disk I/O unecessarily.

Regards

David

ChrisMoix-87856 SSCertifiable Points: 7288 More actions · Answer 7

Good point, David. I remember another thread from a few days ago where someone said something to the same effect.

David Scotland-132255 SSCertifiable Points: 5599 More actions · Answer 8

I should also have noted that if you use multiple files do not rely on auto grow to grow your databases as it will only increase the size of one file at a time and you will end up with same problem that only one processor can write to disk at a time.

David

Johnny Ed Old Hand Points: 326 More actions · Answer 9

David,

Couple of questions for you;

"So if you have multiple processors and only one file then only one of them will be able to write data to disk. To see if this is a problem monitor your server and see if only one processor seems to be doing all the work while the rest are havinf a nap. This can have a significant hit on performance as you don't want anything waiting on disk I/O unecessarily"

How would monitor which processor is doing what?

When MSS2k writes across multiple files does it write evenly across all files?

JE

David Scotland-132255 SSCertifiable Points: 5599 More actions · Answer 10

It writes to each of the files in a round robin fashion so that data is evenly distributed accross each.

From Books Online:

As data is written to the filegroup, Microsoft® SQL Server™ writes an amount proportional to the free space in the file to each file within the filegroup, rather than writing all the data to the first file until full and then writing to the next file.

As soon as all the files in a filegroup are full, SQL Server automatically expands one file at a time in a round-robin fashion to accommodate more data (provided that the database is set to grow automatically).

Also you may want to look at this microsoft UK guys blog (it was him that pointed out the auto grow pitfall to me)

http://blogs.technet.com/mat_stephen/archive/2005/02/02/365325.aspx

http://blogs.technet.com/mat_stephen/archive/2005/2/2.aspx

You just need to monitor your processors and see if one of them is much busier than the others. You can

do this in task manager or using performance monitor if you laso want to capture processor queue lengths etc.

Regards

David

Shawn Denogeon SSChasing Mays Points: 634 More actions · Answer 11

With RAID configurations, the primary benefit seems to be if one of the files becomes corrupt. Supposedly, if the index file becomes corrupt, if it's in a separate file, then, the database is still intact and recovery is a matter of reindexing. If the indexes were in the same file as the databases, then, recovery might or likely will require a backup restore.

The secondary benefit seems to be that multiple processing threads will be used which could possibly increase process throughput.

I still don't know if it's a good overall idea: e.g. additional maintenance requirements vs. performance and fault tolerance increase.