Partitioning for performance: filegroup per table or aligned tables?

  • Hello,

    I read some quite contradictory statements about partitioning strategies and I can't make up my mind on how to proceed. I have a very simple case and I wonder how to optimize it:

    If we have TableA and TableB and the query

    SELECT *

    FROM TableA a

    JOIN TableB b ON a.col1 = b.col1

    WHERE colID = 123

    which is better and why:

    1) put each table on its own filegroup and then partition it on that filegroup only.

    2) align tables: make TableA and TableB use the same partitioning function and partition by colID. This way all records from both tables containing having colID = 123 would be on the same partition. The benefits are explained here:

    In SQL Server 2005, related tables (i.e. Order and OrderDetails) that are partitioned to the same partitioning key and the same partitioning function are said to be aligned. When the optimizer detects that two partitioned and aligned tables are joined, SQL Server 2005 can choose to join the data which resides on the same partitions first and then combine the results. This allows SQL Server 2005 to more effectively use multiple-CPU machines.

    Source: http://www.sqlskills.com/resources/Whitepapers/Partitioning%20in%20SQL%20Server%202005%20Beta%20II.htm

    Thanks for helping me on this one!

  • SELECT *

    FROM TableA a

    JOIN TableB b ON a.col1 = b.col1

    WHERE colID = 123

    If both tables have a ColID column, your query will give you an error. So, either your join needs not be modified, or your partitioning plan 2 is not possible.

    Since you are posting what must be a simplified version of what you are dealing with, it is going to be a bit hard to give you a straight answer. What you want to do with multiple file groups is to try to reduce the amount of contention for disk resources.

    If you have a lot of contention because people are constantly querying the same table(s) for different sets of records, using a partitioning scheme that allows one query to be satisfied by one disk array and another query to be satisfied by another array - then you will have no contention for disks by the two queries.

    If you have a lot of people running queries, but the queries are very often including the same records, you want to try to split things into filegroups that allow the parallel portions of the queries to run without conflicting for disk resources. So, if you have two tables and you put them on different file groups, the speed of reading these tables can often be split across multiple dets of read heads. Also, if you have a table that has three fields updated regularly and an index on that table that does not include the regularly updated indexes, putting them on different file groups can reduce read/write contention.

    I don't know if that will help or hurt you, but this is going to be different depending a lot on the data structure and the types of requests.

  • Michael Earl (7/2/2008)


    SELECT *

    FROM TableA a

    JOIN TableB b ON a.col1 = b.col1

    WHERE colID = 123

    If both tables have a ColID column, your query will give you an error. So, either your join needs not be modified, or your partitioning plan 2 is not possible.

    Since you are posting what must be a simplified version of what you are dealing with, it is going to be a bit hard to give you a straight answer. What you want to do with multiple file groups is to try to reduce the amount of contention for disk resources.

    Right I should've been more precise: the query is more like:

    SELECT *

    FROM TableA a

    JOIN TableB b ON (a.userid= B.useridAND a.col1 = b.col1)

    WHERE a.userid= 123

    If you have a lot of contention because people are constantly querying the same table(s) for different sets of records, using a partitioning scheme that allows one query to be satisfied by one disk array and another query to be satisfied by another array - then you will have no contention for disks by the two queries.

    We will have ppl querying the same tables for different records, everyone will be looking for their own data (correspondong to their userid). Does this mean I should go for option 2) aligned tables ?

    If you have a lot of people running queries, but the queries are very often including the same records, you want to try to split things into filegroups that allow the parallel portions of the queries to run without conflicting for disk resources. So, if you have two tables and you put them on different file groups, the speed of reading these tables can often be split across multiple dets of read heads.

    I am a bit confused here, when you say split things do you mean spread tables or spread each tables' partitions?

    Also, if you have a table that has three fields updated regularly and an index on that table that does not include the regularly updated indexes, putting them on different file groups can reduce read/write contention.

    ok thanks, we shouldn't have problems with updates though.

    I don't know if that will help or hurt you, but this is going to be different depending a lot on the data structure and the types of requests.

    All input is helpful thanks!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply