December 1, 2010 at 3:31 pm
I'm setting up a new SQL Server 2008 database. I've got one huge table (around 100 million records) and other tables that I'll link to this huge table.
Questions:
1) Can I split up the location of the huge table across multiple file groups?
2) To improve 'join' speed, should I place the other smaller tables in a file group different than the huge table?
3) (Maybe a dumb question), if column B of the huge table links to column A of one of the smaller tables, do I have to index column B in the huge table, or will the query speed be the same if I don't create the index.
4) As a general rule, should data and indexes be store on separate physical drives?
5) It looks like each separate file group will have its own thread, how many file groups should be created? (relates to question 1).
Thanks for your help!
December 1, 2010 at 5:46 pm
Bear in mind our company went through some financial issues so we are not quite running 2008 yet so others may have better ideas on newer features.
1) If you partition the table you can set it up to be in multiple file groups. You may want to review some of the material on partitioning there are pluses and minuses. One place to start is with Kimberly's post, if you search there it looks like there is some updated material on 2008 updates but this is a good starting point and there is additional research information in SQLCentral on partitioning. http://www.sqlskills.com/BLOGS/KIMBERLY/post/Clarifying-LEFT-and-RIGHT-in-the-defintion-of-a-PARTITION-FUNCTION-in-SQL-Server-2005.aspx
We have some servers/DB's with partitioned tables and some without. Just beware of the overhead it might have on you maintaining the partitions (so testing a lot.) By planning how data gets loaded, accessed you can partition it in ways that best fit your needs (most of the time.) We have had some strange errors in one of our applications after we partitioned (Error 8623).
2) By having different tables from your queries in different file groups you reduce bottlenecks on the file groups but I'm not aware of any metrics out on how much that would gain you performance wise. I would generally say not as much as properly tuned/indexed tables but I do try to spread things out. In general the cost of the query returning the results is still going to be mainly driven by the cost of the slowest part.
3) I think reviewing the cost of your execution plans or the missing query index DMV's is the best way to tell. In general probably as long as the cost of maintaining the index(es assuming you have several small tables with the same relationship) doesn't impact the maintenance of the large table.
4) We generally try to create at least one Indexes file group to contain any non-clustered indexes. On separate hard drive/spindles also.
5) Depends. How fast is the table growing (need to ensure free partitions so you don't get one lopsided one at the end.) How is the data accessed? If the queries are only accessing 1-2 partitions or across all the partitions at the same time. Probably a lot more questions than answers on this.
I think also with larger tables like this you will probably need to ensure your TempDB is not your bottleneck.
December 1, 2010 at 5:53 pm
david.milburn (12/1/2010)
I'm setting up a new SQL Server 2008 database. I've got one huge table (around 100 million records) and other tables that I'll link to this huge table.Questions:
1) Can I split up the location of the huge table across multiple file groups?
Yes, look up 'partitions'
2) To improve 'join' speed, should I place the other smaller tables in a file group different than the huge table?
Debateable. Really small tables will usually just reside in memory. By rights, for fastest speed, every table/index would be on its own unique physical spindles. Reality makes this a little rediculous. If you find a couple of medium sized tables.
3) (Maybe a dumb question), if column B of the huge table links to column A of one of the smaller tables, do I have to index column B in the huge table, or will the query speed be the same if I don't create the index.
You want to index column A of the small table for sure. Column B will only matter as at least an include in the non-clustered index you build for the process.
4) As a general rule, should data and indexes be store on separate physical drives?
You can't separate the table and the clustered index. You can separate non-clustereds. If you are looking for optimization off a non-clustered, then yes, separate drives.
5) It looks like each separate file group will have its own thread, how many file groups should be created? (relates to question 1).
Thanks for your help!
Analysis of the data will tell you this. How often do you self join, to what extend, and how many subtables are you involving?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply