November 13, 2005 at 5:49 am
When a filegroup consists of multiple files, SQL Server spreads the data inserts proportionally across all files in the filegroup so they are kept approximately the same percentage full. If your computer has multiple processors, it can also be advantageous to spread data across as many physical drives as possible in order to improve parallel data access throughput.
I was reading my book on SQL Server and got halted at the above statement in bold. How does having multiple processors help in improving parallel data access throughput? Do they have their own I/O subsystems to fetch data from their own disk drives. Please clarify this. I would highly appreciate it.
Karim
November 14, 2005 at 7:07 am
This has been mentioned a number of times at Technet presentations and is said to apply to raid arrays too.
I am unsure too as one part of me says 'yes I can see this' whilst another part says 'what?????! '
I'd be very interested as to members views, I run true 8 way boxes, however to split out my prod databases to each use 8 files would give a number of issues including support - however if it gets 8 times performance I'm very interested < grin >
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
November 14, 2005 at 10:03 am
In BOL under "files and filegroups", it goes into this quite alot .... but i think one important point that it mentions is that the effect is seen more in large databases (i think we're talking Tb's!). I think u'd be very lucky to see an 8 times improvement tho!? In query execution plans, when a query uses parallelism there is usually quite a reasonable cost for merging the data from the x number of threads ... which is why not much gain is seen in smaller DB's. Also, unless you have seperate physical drive arrays, the multiple threads are perfectly capable of flooding ur I/O ... so there can be a bit of a balancing act there!
Personally, i do deploy based on the number of processors ... so if i'm on a quad processor machine, i'll deploy the database(s) with 4 files. To be frank tho, i've never seen any significant improvement from playing around with files, filegroups (and distributing tables between them) etc. ... but i live in the comfort that i've at least future proofed my deployment a little!? Not that i've managed to get a DB up to a Tb yet!?
November 14, 2005 at 10:29 am
It appears that the statement is attempting to minimize processor and I/O contention by taking advantage of multiple processor threads and disk spindles. They are not mentioning that for this to fully minimize I/O contention, you need multiple controllers or a fiber channel controller that can be logically split.
It makes sense to structure your system this way if you have the available resouces; however, I aggree that you would not see significant improvements on small and mid-sized databases. The exception to this rule would be the case where you already know that you have I/O contention possibly caused by table scans. In this event, you would see more performance improvements from SQL code optimization and/or index tuning than you would tuning/optimizing your storage.
November 14, 2005 at 1:43 pm
It's this whole issue of spindles and controllers which seems to blur with the files. I have used/understand the use of seperate controllers/channels per raid - I've set raid 10 arrays on two channels etc. I also follow the files if you have individual disks , not in arrays. Most of my current servers are on SAN and disk performance is not good, any way to maximise performance would be helpful - there's a good ms article about putting tempdb on multiple files ( in the primary filegroup ) for performance which I can follow .. I'd still like to know if it works < grin >
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
November 14, 2005 at 2:00 pm
Colin,
In your first post in this thread, you mentioned that you've seen Technet presentations. I don't know if you've seen this one, so I'll post it anyway. It talks about some issues relating to running SQL server on a SAN, but the long and short of it simply states that the SAN administrator needs to check into possible contention with physical disks/controllers on whichever LUN that he has placed your data. There is a ton of information on-line relating to SQL Server performance tunging, specifically storage and hardware optimization.
As a general rule, it is optimal to reserve a seperate disk or RAIDed set of disks for your tempDB, system DB's and each of your user DB's. Of course, each disk or RAID set should ideally have it's own controller. Many times this is hard to do when dealing with a SAN as the whole point of SAN storage is to share resources. I have never been a big fan of using SANs for database storage as I feel that dedicated hardware and storage should be the rule for any production database where performance is important.
For filegroups, I doubt you'll see an 8X increase by seperating out a filegroup for each logical processor thread, but I would recommend going sofar as to seperating indexes and heavy hitting tables into seperate filegroups on seperate disks/RAID sets.
I do not have specific experience in seeing this approach positively effect a Windows/SQL Server environment; however, I have used this approach when dealing with Oracle and Progess running on UNIX systems. Optimizing your storage areas and hardware configruations in these environments provides positive measurable increases in performance due to the decreases in contention.
November 14, 2005 at 2:03 pm
Oops, here's the presentation:
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply