Data File Sizes - Best Practice?

Question

Data File Sizes - Best Practice?

SouthernConfusion

Mr or Mrs. 500

Points: 545
More actions
July 19, 2010 at 7:34 pm

#223504

Anyone have any info regarding Best Practices relating to max data file size for large DB's?
The scenario is a database of 1 TB in size. We normally keep data files limited to 80GB each before we create a new one. I'm curious about the thoughts of others on this topic.
Thanks

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Tara Kizer SSCommitted Points: 1789 More actions · Answer 1

Reply deleted.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
Ramblings of a DBA (My SQL Server Blog)[/url]
Subscribe to my blog

gaojiwen_ustc99 SSC Rookie Points: 47 More actions · Answer 2

Hi Tara,

Do you have a document from MS and says less than 64 GB?

Thank you

Tara Kizer SSCommitted Points: 1789 More actions · Answer 3

I removed my reply as it was a bad recommendation by me. I misunderstood the MS engineer.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
Ramblings of a DBA (My SQL Server Blog)[/url]
Subscribe to my blog

SQLisAwe5oMe SSChampion Points: 14052 More actions · Answer 4

SouthernConfusion (7/19/2010)
Anyone have any info regarding Best Practices relating to max data file size for large DB's?
The scenario is a database of 1 TB in size. We normally keep data files limited to 80GB each before we create a new one. I'm curious about the thoughts of others on this topic.
Thanks

Well, a Data file(.mdf) can grow up to 16 TB and a database can grow up to maximum of 524,258 TB. So, 80GB per data file is not big at all, however, if you are looking for performance wise it is recommended to create multple secondary(.ndf) files. I am not really sure what's the best practice, I will see what others have to say about this topic.

Thanks,

TA

Regards,
SQLisAwe5oMe.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 5

SQLCrazyCertified (7/23/2012)
SouthernConfusion (7/19/2010)
Anyone have any info regarding Best Practices relating to max data file size for large DB's?
The scenario is a database of 1 TB in size. We normally keep data files limited to 80GB each before we create a new one. I'm curious about the thoughts of others on this topic.
Thanks
Well, a Data file(.mdf) can grow up to 16 TB and a database can grow up to maximum of 524,258 TB. So, 80GB per data file is not big at all, however, if you are looking for performance wise it is recommended to create multple secondary(.ndf) files. I am not really sure what's the best practice, I will see what others have to say about this topic.
Thanks,
TA

Do you have a reference for that NDF thing?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 6

I guess the recommendation would depend on the RIAD configuration and what other operation you intend to perform on it. The primary reason i say this is becuase the more data files that can spread across disks the better IO you achieve. Also if you have compression etc and partitioning invovled then these numbers are likley to be very different.

Jayanth Kurup[/url]

SQLisAwe5oMe SSChampion Points: 14052 More actions · Answer 7

Jeff Moden (7/23/2012)
SQLCrazyCertified (7/23/2012)
SouthernConfusion (7/19/2010)
Anyone have any info regarding Best Practices relating to max data file size for large DB's?
The scenario is a database of 1 TB in size. We normally keep data files limited to 80GB each before we create a new one. I'm curious about the thoughts of others on this topic.
Thanks
Well, a Data file(.mdf) can grow up to 16 TB and a database can grow up to maximum of 524,258 TB. So, 80GB per data file is not big at all, however, if you are looking for performance wise it is recommended to create multple secondary(.ndf) files. I am not really sure what's the best practice, I will see what others have to say about this topic.
Thanks,
TA
Do you have a reference for that NDF thing?

Jeff, I do not have any reference for that. Basically as Jayanath mentioned, the more secondary files the better performance you get. Let me know your thoughts, if you disagree.

Thanks,

TA

Regards,
SQLisAwe5oMe.

SQLRNNR SSC Guru Points: 281344 More actions · Answer 8

Here is a good reference on the multiple data files thingy...

http://www.sqlskills.com/BLOGS/PAUL/post/Benchmarking-do-multiple-data-files-make-a-difference.aspx

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Nadrek SSC-Insane Points: 20039 More actions · Answer 9

SQLRNNR (7/23/2012)
Here is a good reference on the multiple data files thingy...
http://www.sqlskills.com/BLOGS/PAUL/post/Benchmarking-do-multiple-data-files-make-a-difference.aspx

As Paul stated, "Now, this is a very contrived test, with a single operation in my workload - it's definitely NOT representative of a mixed-operation OLTP workload." Paul also stated, "In the best case, the eight-file case on two arrays was just over 6% faster than the single-file case on the single array." Note also that going from 1 file to 2 files on the single 8-drive DATA file array, and all 16 file configurations, resulted in a performance decrease. It's clearly not as easy as "more is better".

I would suggest that unless you're doing fairly serious levels of benchmarking across all levels of your hardware and operating system, a maximum of 6% (and quite possibly less for an arbitrarily chosen configuration) either way may well be lost in the noise of other configuration choices (RAID stripe size, choice of hardware RAID controller, RAID level variations, number of drives per RAIDSET, how much buffer you allocated SQL Server, Cost Threshold of Parallelism and default MAXDOP choices, VLF counts, which data and log files are on which spindle sets, IO depth settings on your SAN HBA drivers, etc.) If you're running enough tests to optimize all of these values, then yes, run tests on different data file configurations as well.

If you're going with a "best practice" approach, I suspect right-sizing files (i.e. capacity planning), defragmentation within SQL Server and at the OS filesystem level, estimating your IOPS and throughput requirements, separating data and log files on different spindle sets, and other basic steps will be more valuable uses of your time.

If I'm going to have more than one data file, in general I want them to be on separate, dedicated spindle sets, so sequential reads and writes stay that way. Note that I haven't worked with data files or tables/clustered indexes over the half-terabyte mark, either.

MysteryJimbo SSC-Insane Points: 24203 More actions · Answer 10

SQLCrazyCertified (7/23/2012)
if you are looking for performance wise it is recommended to create multple secondary(.ndf) files. I am not really sure what's the best practice, I will see what others have to say about this topic.

I believe that information is a bit outdated and refers to Direct Attached Storage as does the article from Paul.

Its a realistic approach and works but you can get even more benefit by intelligently looking at your logical database structure for "hotspot" objects and look to move those onto different arrays. This can apply to indexes too.

SQLRNNR (7/23/2012)
Here is a good reference on the multiple data files thingy...
http://www.sqlskills.com/BLOGS/PAUL/post/Benchmarking-do-multiple-data-files-make-a-difference.aspx

What kind hardware are you running on and does it include a SAN? I don't bother splitting any files until I reach drive capacity and have got 4TB files knocking around.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 11

SQLCrazyCertified (7/23/2012)
Basically as Jayanath mentioned, the more secondary files the better performance you get. Let me know your thoughts, if you disagree.

I would disagree.

It's far from as simple as 'more data files = more performance". It's perfectly possible to split into multiple data files and have no improvement. It's possible to split and have a degradation in performance.

Whether or not a DB would be split for performance reasons requires investigating things like the current primary bottleneck, the underlying IO subsystem design and layout, the IOPS the drives can manage and their latency, the access pattern of the tables and indexes, etc.

That's not even considering splitting for recoverability purposes.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass