November 20, 2012 at 7:39 pm
let's assume that I have scenario as following:
- Table tbl1 for customer 1, table tbl2 for customer 2.
- tbl1 has one relative table named tbl1_detail. Same for tbl2.
- I partition tbl1 and tbl1_detail into 4 months. Same for tbl2.
- Clustered index and nonclustered index have been set.
- I use 4 disk drives. Each drive stores 1 partition.
drive 1drive 2drive 3drive 4
tbl1 m1 m2 m3 m4
tbl1_detail m1 m2 m3 m4
tbl2 m1 m2 m3 m4
tbl2_detail m1 m2 m3 m4
with 2 tables tbl1 and tbl1_detail, if I query them with INNER JOIN, it just query on only 1 driver1, right? but I want other drives need to be used, it meant that different month will be in different disk drive, so, I change a little bit as follow:
drive 1drive 2drive 3drive 4
tbl1 m1 m2 m3 m4
tbl1_detail m2 m3 m4 m1
tbl2 m2 m3 m4 m1
tbl2_detail m1 m2 m3 m4
so, my questions here are:
1. with tb1, when I query m1 (month 1) and m2 (month 2), how the SQL work with drive 1 and 2?
2. with query tbl1 inner join tbl1_detail in month 1, will the SQL work in parallel by querying data in drive 1 and drive 4 in the same time? OR SQL will query data at drive 1 first, and query data at drive 4 then?
3. with query tbl1 and tbl2 by UNION ALL for month 1, will the SQL work in parallel at drive 1 and drive 4? OR SQL will query data at drive 1 first, and query data at drive 4 then?
4. What I want here is how to improve querying data, show result faster by using all available disk drives. Do you have any idea about this? do we have any algorythm to set up partition on different disk drive?
Thank a lot.
November 21, 2012 at 12:26 am
No.
Partitioning is not done for performance.
It is done for manageability.
if drive1 dies in your first example, you could still access m2, m3 and m4 while restoring m1 (partial db availability with piecemeal restore in enterprise ed)
Sql server only issues I/O's when the data isn't already cached in the buffer pool. If you have enough ram, the disks won't matter for read performance.
Sql is really, really good about caching data. And doing queries in parallel. And doing readahead reads. And doing lots of other stuff to the get the best performance.possible on hw from $1k laptops to $300k servers.
I would not partition. Instead, I would take the individual drives and create 1 big RAID drive. The advantage is that every query that reads from disk has the chance leverage multiple drives. More physical spindles = better.
November 21, 2012 at 1:35 am
SpringTownDBA (11/21/2012)
No.Partitioning is not done for performance.
It is done for manageability.
if drive1 dies in your first example, you could still access m2, m3 and m4 while restoring m1 (partial db availability with piecemeal restore in enterprise ed)
Sql server only issues I/O's when the data isn't already cached in the buffer pool. If you have enough ram, the disks won't matter for read performance.
Sql is really, really good about caching data. And doing queries in parallel. And doing readahead reads. And doing lots of other stuff to the get the best performance.possible on hw from $1k laptops to $300k servers.
I would not partition. Instead, I would take the individual drives and create 1 big RAID drive. The advantage is that every query that reads from disk has the chance leverage multiple drives. More physical spindles = better.
yes, thank you for your reply.
but, could you please answer the second question? Actually, I don't understand what will SQL do? thanks
November 21, 2012 at 1:37 am
Almost completely agree with the previous poster, Partitioning the database into seperate file groups is really only done on Very Large DB's, as it can offer some performance gains.
I'm working on a project at the moment to look at the impact of partioning a database across disks, on a data wrehouse that is 2 TB in size, with fact tables of around 300gb, and thats mainly to try and reduce the disk IO contention when doing an ETL and DW processing.
However, its also happening in conjunction with a full Code and indexing strategy review, with DB Paritioning being used to try and get every last drop of performance out of the server that is possible.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 21, 2012 at 1:43 am
lovinginmyheart (11/20/2012)
2. with query tbl1 inner join tbl1_detail in month 1, will the SQL work in parallel by querying data in drive 1 and drive 4 in the same time? OR SQL will query data at drive 1 first, and query data at drive 4 then?
Assuming that none of the data is in cache, SQL will issue async IO requests for all the data and then let the OS sort out the details. Exactly the same as if they were on the same drive.
Partitioning can give you a performance benefit, but it's not automatic, partition and get a huge gain. It requires careful planning, possibly code changes, lots of testing. Partitioning across multiple disks gives an advantage when the partitions will be accessed together, the data is not in cache and there's too much IO for a single array/LUN to handle.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply