Partitioning: I thought it would be faster...

  • In our database we have an table with 150 million records which grows with 1,5 million a day. The table contains statistic data which has an clustered index on item (foreign key to his owner) and date. For performance reasons i wanted to partition the table on the date column but a first test showed me that this was only slower.

    So i build a testcase which i want to share with you:

    --Create the partition function
    CREATE PARTITION FUNCTION fnPartitionByQuarter (smalldatetime)
    AS RANGE RIGHT 
    FOR VALUES (
    '1/1/2008',
    '3/1/2008',
    '6/1/2008',
    '9/1/2008',
    '1/1/2009',
    '3/1/2009',
    '6/1/2009',
    '9/1/2009')
    
    --create the partition scheme
    CREATE PARTITION SCHEME SASScheme
    AS PARTITION fnPartitionByQuarter
    ALL to ([Primary])
    
    --create a table that will be partitioned
    CREATE table testPart (id int identity(1,1), item int, date smalldatetime, value int) ON SASScheme(date)
    CREATE unique clustered index IX_index1 ON testPart (Item, Date) ON SASScheme(date)
    
    --create the same table without partitioning
    CREATE table testNoPart (id int identity(1,1), item int, date smalldatetime, value int) 
    CREATE unique clustered index IX_index1 ON testNoPart (Item, Date) 
    
    --fill the tables with testdata
    declare @a int 
    declare @date smalldatetime
    
    set @date = '2008-01-01'
    set @a = 1
    
    while @a < 10000
    begin
    insert into testPart (item, date, value)
    values (@a, @date, @a)
    
    insert into testNoPart (item, date, value)
    values (@a, @date, @a)
    
    set @a = @a + 1
    end
    
    while @date < getDate()
    begin
    insert into testPart (item, date, value)
    select item, @date, value
    from testPart where date = '2008-01-01'
    
    insert into testNoPart (item, date, value)
    select item, @date, value
    from testNoPart where date = '2008-01-01'
    
    set @date = @date + 1
    end
    
    --Just a select to see that the table is indeed partitioned
    select * from sys.partitions where object_name(object_id)='testPart'
    select * from sys.partitions where object_name(object_id)='testNoPart'
    
    --Set the timing on to see what the query is doing
    set statistics time on
    set statistics io on
    
    --Do a select which will pull the data from 2 of the partitions
    select item, sum(value) from testPart 
    where item between 1 and 5000
    and date between '2008-10-01' and '2009-02-01'
    group by item
    
    select item, sum(value) from testNoPart 
    where item between 1 and 5000
    and date between '2008-10-01' and '2009-02-01'
    group by item
    

    The result of the last 2 queries:

    SQL Server parse and compile time: 
       CPU time = 16 ms, elapsed time = 17 ms.
    
    (5000 row(s) affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'testPart'. Scan count 2, logical reads 4229, physical reads 0, read-ahead reads 36, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    
    SQL Server Execution Times:
       CPU time = 328 ms,  elapsed time = 233 ms.
    
    (5000 row(s) affected)
    Table 'testNoPart'. Scan count 5, logical reads 15083, physical reads 0, read-ahead reads 173, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    
    SQL Server Execution Times:
       CPU time = 375 ms,  elapsed time = 113 ms.
    

    As you can see, the partitioned table is (in this case) 2 times as slow as the unpartitioned table. But the number of logical reads is much smaller. Can anyone explain this to me?

  • Although the IO "seems" faster the actual plan for the Partitioned table runs a nested loop over each partition. while the Nonpartioned table simply traverse one index btree. To notice performance benefits the number of rows in the tables has to be well in the 100million or more.

    Partitioning is primary a manageability feature that "can" produce some speed ups but the primary purpose is not performance.


    * Noel

  • Noel is exactly right - partition for manageability (for example the ability to rebuild indexes for a single partition) rather than for performance - at least in 2K5.

    For performance, partitioned views can be well worth a look, but you don't get the other benefits that partitioned tables offer.

    Cheers,

    Paul

  • The relationship between partitioning and performance depends very much on the queries being executed (as per everything else to do with SQL Server, and all other DBMSs: I wish I could patent the term "It Depends").

    If the query optimiser works out a query can be serviced by reading a single partition then performance will be quicker, because the index/data it has to traverse (ie. number of reads) is much less. However if the optimiser isn't sure it will build a plan that "joins" the partitions so the number of reads may be much less than reading from a non-partitioned table but in the background its got more work to do.

    I haven't had a chance to look into SQL2k8's processing of partitions in parallel, instead of single-threaded as SQL2k5 does it, but theoretically that should improve performance significantly in the right circumstances.

    As mentioned by the others though, partitioning has other benefits, so performance isn't the only criteria for partitioning a table (just a very, very important one, at least in measuring and accepting the change in performance before implementing it).

  • Hey Glenn,

    Criag Freedman has some excellent articles on this subject here

    I think you would enjoy reading it.

    BTW partitioning can increase performance in certain fairly narrow situations, but there are a lot of hoops to jump through and caveats to be aware of. In general, and by an overwhelming margin, partitioning (other than by default!) will result in degraded performance.

    Cheers

    Paul

  • Thanks Paul, very illuminating.

    That will really help when we're looking at reasons to upgrade to SQL2k8 instead of SQL2k5. I hadn't thought of testing on an empty table: I'd been waiting to get my hands on a machine with 2TB of disk so I could test one of our real databases, with all the overhead that entails. Something like this is so simple even a manger can understand it.:-)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply