February 11, 2009 at 9:12 am
Hello everybody:
This is my situation:
I had a database composed by four files 1 MDF and 3 NDFs.
Now I'm moving the database to a new server because of the growth of database size and use. The new server has 8 processors and I would like to have 1 database file per proccessor, so I created 8 files, but when the database is restored and used , the first 4 files use up all the database size and grow as the database grows, but the other 4 files only use their initial creation size and do not grow at all. How can I achieve a balanced growth of the 8 database files???
Thankls in advance,
Felix
February 11, 2009 at 9:52 am
The approach you took to balance the data cross 8 files can achieve by:
1. Find out the newly-restored database file sizes
2. Make the original 4 data files the same size
3. Expand the 4 new data files (no data yet) to the same size as Step 2
The new data then can be proportionally distributed into the data files based on their available free space.
February 11, 2009 at 10:41 am
FelixG (2/11/2009)
The new server has 8 processors and I would like to have 1 database file per proccessor,
Why? That recomendation is solely for TempDB because of potential allocation bottlenecks, not for user databases. For user databases the recommendation is no more than 1/4 the number of processors, less on a datawarehouse.
You will see absolutely no advantage unless those files are on separate physical drives
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 11, 2009 at 10:57 am
I have read this is valid for user databases too. The recommendation is from 1/4 to 1 database file per processor. In this case each file will be located on separated physical disk. I would be OK with 4 files, but now I have the possibility of 8 separated disk arrays and want to take advantage of this fact.
February 11, 2009 at 1:15 pm
Then you should start off with the 4 files, baseline performance, then add the 4 new files and check peformance again.
Are all these files in the same filegroup? How do you know you won't do even better by separating out new filegroups for your indexes, or perhaps separating tables that are joined a lot?
Throwing files at user databases seems to becoming very popular, I suspect this might be one of those urban myths..................
To protect myself from over zealous devs I'm trying to come up with good admin reason why this is over the top.
---------------------------------------------------------------------
February 11, 2009 at 1:28 pm
george sibbald (2/11/2009)
Throwing files at user databases seems to becoming very popular, I suspect this might be one of those urban myths..................
It is. There are articles at the SQLCAT site and Microsoft whitepapers that explain why more files is good for tempDB, under what circumstances more files are needed, why it's not necessarily the right thing to do to a user database and why.
The recommendation for tempDB is due to bottlenecks on the allocation pages (mainly the SGAM) when there are multiple processors, very frequent table creation and only one file. It's much less of a problem on 2005 and above than it was on 2000, for assorted reasons that are beyond the scope of this post.
User databases are split for two reasons.
1) IO Performance. If the drive that the data file is on is not coping, then split the data over more than one drive. To get best results, spread tables and indexes out and don't just let SQL put data in whichever file it feels like
2) Availability. Using Enterprise edition's partial database availability and file/filegroup backup this allows quicker backups, quicker availability in case of a disaster and more options for restores. For this to work, the database has to be carefully laid out over the files/filegroups to ensure that partial availability is actually useful.
Hmmm, maybe I should write an article on this...
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 11, 2009 at 2:00 pm
Gail,
thanks for the feedback. It confirms my thoughts on the subject. One thing I am having quoted at me is that with a lot of databases on SANS nowadays using RAID, there can be performance gains with multiple files on the same drive, because you are effectively getting striping across your raid devices of the files and i/o improvements that way, any comments on that?
Even if the above gives some improvements, it just feels intrinsically wrong to me and the older established reasons for splitting out filegroups should come first. I wonder if I am being a stick in the mud because a database with loads of files just looks a mess to me, and I would want to argue its harder to administer, i.e for DR, backup and recovery, refreshes to dev etc, but all these complications can be overcome by automating the process with scripts. So is there a killer argument why a database with multiple files is going to cause administration problems? I can't think of one.
---------------------------------------------------------------------
February 11, 2009 at 2:25 pm
george sibbald (2/11/2009)
Gail,thanks for the feedback. It confirms my thoughts on the subject. One thing I am having quoted at me is that with a lot of databases on SANS nowadays using RAID, there can be performance gains with multiple files on the same drive, because you are effectively getting striping across your raid devices of the files and i/o improvements that way, any comments on that?
Even if the above gives some improvements, it just feels intrinsically wrong to me and the older established reasons for splitting out filegroups should come first. I wonder if I am being a stick in the mud because a database with loads of files just looks a mess to me, and I would want to argue its harder to administer, i.e for DR, backup and recovery, refreshes to dev etc, but all these complications can be overcome by automating the process with scripts. So is there a killer argument why a database with multiple files is going to cause administration problems? I can't think of one.
I personally tested two variants: multiple files on the same drive (array) versus multiple files located on separate drives and the performance gains are significant when using the second approach for OLTP databases. I do know it's mandatory to split out filegroups to optimize the overall performance, but I just wanted to know if the fact of having one file per processor (as many docs claim) will carry additional performance gains.
By the way I can't find a reason to think that multiple files will cause administration problems... I would like to hear more about this .
February 11, 2009 at 2:42 pm
FelixG (2/11/2009)
I do know it's mandatory to split out filegroups to optimize the overall performance.
It's not mandatory. It's only needed when the drive that the data file is on cannot handle the IO load that the database generates, which it sounds was the case in your test. It is not something that must be done to every single database everywhere, all the time. It's something that you test and evaluate on a case-by-case basis and the exact number of files depends on the workload, the drives and other factors, not the number of processors.
That's my problem with the multiple files, not that it doesn't help, it does, but that it's overdone where not needed at all and that lots of people seem to do it without understanding why or what it's for.
If your database needs 16 files spread across 16 different RAID 10 arrays, then go for it. If it doesn't, then one file is OK.
There was a post from one of the SQLCATs that excessive files slowed down backups and restores, but I don't recall the details
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 11, 2009 at 3:00 pm
personally tested two variants: multiple files on the same drive (array) versus multiple files located on separate drives and the performance gains are significant when using the second approach for OLTP databases.
did you test a single file against multiple files on the same array? any figures on that?
---------------------------------------------------------------------
February 12, 2009 at 7:13 am
[/quote]
did you test a single file against multiple files on the same array? any figures on that?[/quote]
Yes, I did... and again it's better to have multiple files (even in the same array)... of course, as Gail said it depends on the database size and I/O. I'm talking about a large heavily used OLTP database.
In my particular case I use Dell equipment (server and storage) and this is the Dell's advice:
"The recommended number of disks for data should equal the number of physical CPU sockets, and each virtual disk created in the storage system is mapped to an operating system file partition. To achieve optimal performance that scales with heavy workloads, Microsoft recommends that the number of data files configured for a SQL Server 2005 database equal the total number of CPU cores installed on the server. Therefore, [highlight=#ffff11]within each of the partitions used for data, the recommended number of data files should equal the number of cores per socket[/highlight]. However there is a cost associated with having multiple files in a filegroup. If these files share spindles then this can lead to contention when they are accessed simultaneously. Therefore a decision to balance the decreased contention of multiple files against the increased I/O load has to be made. [highlight=#ffff11]The recommendation is to break data files into a number of files equal to half the number of cores[/highlight]."
Gail, I did not mean exactly that every database must be split in filegroups. I was talking about my particular case where the OLTP database has high I/O activity and the its size is about 1 TB.
February 12, 2009 at 8:15 am
Dell
To achieve optimal performance that scales with heavy workloads, Microsoft recommends that the number of data files configured for a SQL Server 2005 database equal the total number of CPU cores installed on the server
Microsoft does not make that recommendation for any database other than TempDB. The recommendation for user databases is no more than 1/4-1/2 the number of cores.
I'll search for docs when I get home I think I have it saved somewhere.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply