Will partitioning help?

  • Not necessarily. Properly done indexing can provide just about the same benefit. Partitioning may give you somewhere around an additional 1% performance improvement over just indexing, but is that really worth it?

    Not only 1% some gains are even higher.There are cases when it doesnt perform better than unparitioned table.so it all depends. However, index maintenance would be easier on partitioned table. So there are cases when partitioning provide lots of benefit and at other times it doesnt make much sense.

    select MAX(mycol) from mytable_unpart where id <= 10

    go

    select MAX(mycol) from mytable_part where id <= 10

    go

    If the queries are like above then gain could be much higher..

    select MAX(uselesscol) from mytable_unpart where mycol = CONVERT(char(10),12942)

    go

    select MAX(uselesscol) from mytable_part where mycol = CONVERT(char(10),12942)

    go

    Queries like above will have somewhat poor performance than unpartitioned table.

    p.s. That's based on testing, not a wild guess

    I have seen a case where clustered index(it was not distinct but had quite a lot of distinct values) was as effective as the partitioning ( almost all queries had the clustered index key as sarg). But still used partitioning because DBA's sugegsted that maintenance would be easier. But this decision was based on more like maintenance rather than performance benefit of it.

    GulliMeel

    Finding top n Worst Performing queries[/url]
    Improve the performance of Merge Join(special case)
    How to Post Performance Problem -Gail Shaw[/url]

  • Gullimeel (5/23/2012)


    Not necessarily. Properly done indexing can provide just about the same benefit. Partitioning may give you somewhere around an additional 1% performance improvement over just indexing, but is that really worth it?

    Not only 1% some gains are even higher.There are cases when it doesnt perform better than unparitioned table.so it all depends. However, index maintenance would be easier on partitioned table. So there are cases when partitioning provide lots of benefit and at other times it doesnt make much sense.

    select MAX(mycol) from mytable_unpart where id <= 10

    go

    select MAX(mycol) from mytable_part where id <= 10

    go

    If the queries are like above then gain could be much higher..

    select MAX(uselesscol) from mytable_unpart where mycol = CONVERT(char(10),12942)

    go

    select MAX(uselesscol) from mytable_part where mycol = CONVERT(char(10),12942)

    go

    Queries like above will have somewhat poor performance than unpartitioned table.

    p.s. That's based on testing, not a wild guess

    I have seen a case where clustered index(it was not distinct but had quite a lot of distinct values) was as effective as the partitioning ( almost all queries had the clustered index key as sarg). But still used partitioning because DBA's sugegsted that maintenance would be easier. But this decision was based on more like maintenance rather than performance benefit of it.

    Speaking of "based on testing, not a wild guess" - have you actually TRIED your SELECT MAX(... query on a large partitioned table? I think not. Try it and see what happens. Then read here (http://connect.microsoft.com/SQLServer/feedback/details/240968/partition-table-using-min-max-functions-and-top-n-index-selection-and-performance) to understand WHY performance is horribly bad AND also a slick workaround based on the $partition function.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Speaking of "based on testing, not a wild guess" - have you actually TRIED your SELECT MAX(... query on a large partitioned table?

    I had put a real time example to support what Gail mentioned that the partitioning doesnt always mean better performance.

    Related to that max example, there are two max query which one are you talking about if second then i already mentioned that it will perform badly.If it is for the first i tested it for around 100k rows and it was better than non parttitoned.

    GulliMeel

    Finding top n Worst Performing queries[/url]
    Improve the performance of Merge Join(special case)
    How to Post Performance Problem -Gail Shaw[/url]

  • Gullimeel (5/24/2012)


    Speaking of "based on testing, not a wild guess" - have you actually TRIED your SELECT MAX(... query on a large partitioned table?

    I had put a real time example to support what Gail mentioned that the partitioning doesnt always mean better performance.

    Related to that max example, there are two max query which one are you talking about if second then i already mentioned that it will perform badly.If it is for the first i tested it for around 1 million rows and it was better than non parttitoned.

    Any chance you can show your testing? I am VERY curious how you avoided the KNOWN BUG!! I could have some big wins with clients if you found a solution!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • drop table mytable_part

    go

    drop partition scheme ps_mytable

    go

    drop partition function pf_mytable

    go

    create partition function pf_mytable(int)

    as range left for values (10,20,30,40)

    go

    create partition scheme ps_mytable

    as partition pf_mytable all to ( [primary])

    go

    create table mytable_part (id int, mycol char(10) not null default 'mycol',uselesscol char(200) not null default 'uselesscol')

    on ps_mytable(id)

    go

    /*

    select * from sys.partition_functions

    select * from sys.partition_range_values

    select * from sys.partitions

    */

    go

    drop table mytable_unpart

    go

    create table mytable_unpart (id int, mycol char(10) not null default 'mycol',uselesscol char(200) not null default 'uselesscol')

    go

    insert into mytable_unpart with(tablockx) (id,uselesscol)

    select NTILE(50) over( order by cnt),

    --CONVERT(varchar,cnt%200000),

    CONVERT(varchar,cnt%50000) from Nums where cnt <= 100000

    go

    create clustered index mytable_unpart_id on mytable_unpart(id)

    go

    create nonclustered index mytable_unpart_mycol on mytable_unpart(mycol)

    go

    --select id,COUNT(*) from mytable_unpart group by id

    insert into mytable_part with(tablockx) (id,uselesscol)

    select NTILE(50) over( order by cnt),

    --CONVERT(varchar,cnt%200000),

    CONVERT(varchar,cnt%50000) from Nums where cnt <= 100000

    go

    create clustered index mytable_part_id on mytable_part(id)

    go

    create nonclustered index mytable_part_mycol on mytable_part(mycol)

    go

    select * from sys.dm_db_index_physical_stats(db_id(),object_id('dbo.mytable_unpart','U'),null,null,null)

    go

    select * from sys.dm_db_index_physical_stats(db_id(),object_id('dbo.mytable_part','U'),null,null,null)

    go

    dbcc dropcleanbuffers

    dbcc freeproccache

    go

    select MAX(mycol) from mytable_unpart where id<=10

    go

    select MAX(mycol) from mytable_part where id<=10

    go

    above is the query i ran and it was just for 100K rows... Attached is the screenshot of the query stats for last 2 queries.

    I will run more tests again on bigger machine..with more physical disks..I have just a c derive so kind of useless system for parallel and partiotioning...

    GulliMeel

    Finding top n Worst Performing queries[/url]
    Improve the performance of Merge Join(special case)
    How to Post Performance Problem -Gail Shaw[/url]

  • I updated my post to include the connect bug link.

    Here are my findings:

    select MAX(mycol) from mytable_unpart where id<=10

    NC INDEX scan

    Table 'mytable_unpart'. Scan count 1, logical reads 283

    select MAX(mycol) from mytable_part where id<=10

    NC INDEX seek - clearly why this IS more efficient

    Table 'mytable_part'. Scan count 1, logical reads 72

    BUT the BUG IS FOR this TYPE OF query:

    select MAX(mycol) from mytable_unpart

    NC INDEX scan - actual number OF ROWS ONE

    Table 'mytable_unpart'. Scan count 1, logical reads 3

    select MAX(mycol) from mytable_part

    NC INDEX scan - but hits ALL DATA IN ALL PARTITIONS! actual number OF ROWS 100000

    Table 'mytable_part'. Scan count 5, logical reads 360

    The same bug EXISTS IF you USE TOP WITH ORDER BY (read the connect thread and workarounds using the $PARTITION function)

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Let me clarify two things..

    1. I was not testing the performance of max function. The reason I used max was that i did not want to use ssms resultset discard and did not want to return 20000 rows as well. Thus I used max so that i do not need to discard results and at the same time do not get too much rows in resultset.

    2. I am aware of that bug and min on ascending order index wont cause the issue but max will do.I could have very well used min but i am kind of always stuck with max. I wanted to show the performance of index scan and thus used the same value for the mycol and thus bug did not affected my perfromance.

    Thus bug is simple.. For non partitioned indexes it can use the backward scan and as soon as it find diff value it is done..

    But it is not doing the backward scan for the partitioned indexes and thus it has to read the full index.

    Thus if i want to simulate the bug..simply change the value of mycol to be more selective. The partition table does full index scan around 72 pages but non partitioned is smart and doing a backward scan and thus uses just 3 IO's.

    I mentioned in my post that benefit of partitioning is only when you can include the partitioning column in each and every query.Otherwise it might perform badly then the unpartitioned thus i did not test performance of query without the parttitoning column. Without that column query wont performa better than the unpartitoned table.

    GulliMeel

    Finding top n Worst Performing queries[/url]
    Improve the performance of Merge Join(special case)
    How to Post Performance Problem -Gail Shaw[/url]

  • While i am enjoying this dicussion I wanted to point out there is a more obvious benefit to partioning than just the elimantion of unwanted partitions. Lock escalation occurs to the partition level instead of the table level and could improve concurrency. I also remember seeing something about how it improves performance when doing parallelism , i'll see if i can dig it up.

    PS:- I love it when DBAs argue 😀 its great fun to watch

    Jayanth Kurup[/url]

  • Jayanth_Kurup (5/25/2012)


    While i am enjoying this dicussion I wanted to point out there is a more obvious benefit to partioning than just the elimantion of unwanted partitions. Lock escalation occurs to the partition level instead of the table level and could improve concurrency. I also remember seeing something about how it improves performance when doing parallelism , i'll see if i can dig it up.

    PS:- I love it when DBAs argue 😀 its great fun to watch

    Newer editions of SQL Server definitely improved the partitioning picture (although the damn max/min/top bug is STILL there even in SQL 2012!!). You can indeed have multiple threads per partition now which you could not in 2005 and your statement about lock escalation is also correct. However, given just 3 partitions the lock escalation will potentially provide only marginal benefit.

    Oh, and we are debating, not arguing! :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Look... I just want to say, and I am no expert, but it seems clear from all of my research and time spent on this forum that to improve performance you do not just jump to partitioning. If you have done everything else and your data warrants a partition due to data management and how the data is being queried (If partitioning on years and you "almost" never query multiple years) it "may" help. However, the general question as posed "Will partitioning improve performance" is as always "it depends" and "if you have a performance issue, have you tried all other options."

    I believe that partitioning was designed to help better manage data, not improve performance. Sure, some things about it "may" improve query performance, but from what I can tell that was not the original goal of partitioning data.

    Jared
    CE - Microsoft

  • Gullimeel (5/24/2012)


    I am aware of that bug and min on ascending order index wont cause the issue but max will do.I could have very well used min but i am kind of always stuck with max. I wanted to show the performance of index scan and thus used the same value for the mycol and thus bug did not affected my perfromance. Thus bug is simple.. For non partitioned indexes it can use the backward scan and as soon as it find diff value it is done..But it is not doing the backward scan for the partitioned indexes and thus it has to read the full index.

    Nonsense. You have not understood the nature of the missed optimization (it is not a bug) at all. You are confusing the fact that the storage engine cannot perform a parallel backward scan or seek, with a completely separate issue related to transforming MIN and MAX to the equivalent TOP (1) + Ordered Index Scan. It has nothing to do with whether MIN or MAX is specified:

    SELECT MIN(MYCOL) FROM mytable_part;

    SELECT MAX(MYCOL) FROM mytable_part;

    SELECT MIN(MYCOL) FROM mytable_unpart;

    SELECT MAX(MYCOL) FROM mytable_unpart;

    By the way, an optimal index for the non-partitioned table in your script is:

    CREATE INDEX nc1

    ON dbo.mytable_unpart (mycol)

    WHERE id <= 20;

    Notice I didn't use id <= 10, just to show you how flexible this method is. It doesn't require partitioning the whole table, and provides many of the same benefits, e.g. locking.

    I mentioned in my post that benefit of partitioning is only when you can include the partitioning column in each and every query.Otherwise it might perform badly then the unpartitioned thus i did not test performance of query without the parttitoning column. Without that column query wont performa better than the unpartitoned table.

    Pretty much everyone on this thread has told you the same thing: partitioning is primarily about ease of maintenance. There are some cases where performance may improve, and others where it will get worse. You seem to be keen to prove that partitioning might increase performance in some cases - a point that no-one has disagreed about!

  • Gullimeel (5/24/2012)

    --------------------------------------------------------------------------------

    I am aware of that bug and min on ascending order index wont cause the issue but max will do.I could have very well used min but i am kind of always stuck with max. I wanted to show the performance of index scan and thus used the same value for the mycol and thus bug did not affected my perfromance. Thus bug is simple.. For non partitioned indexes it can use the backward scan and as soon as it find diff value it is done..But it is not doing the backward scan for the partitioned indexes and thus it has to read the full index.

    Nonsense. You have not understood the nature of the missed optimization (it is not a bug) at all. You are confusing the fact that the storage engine cannot perform a parallel backward scan or seek, with a completely separate issue related to transforming MIN and MAX to the equivalent TOP (1) + Ordered Index Scan. It has nothing to do with whether MIN or MAX is specified:

    SELECT MIN(MYCOL) FROM mytable_part;

    SELECT MAX(MYCOL) FROM mytable_part;

    SELECT MIN(MYCOL) FROM mytable_unpart;

    SELECT MAX(MYCOL) FROM mytable_unpart;

    You have misunderstood me. I was not chekcing the performance of the min or max in any way or top 1 etc. I was showing the performance of the nonclustered (covered)index scan. I was lazy not to use select * but used the max() to avoid some typing.

    I should have used something like below

    drop table #abc

    go

    select mycol into #abc

    from mytable_unpart where id <= 10

    go

    drop table #def

    go

    select mycol into #def from mytable_part where id <= 10

    go

    It has nothing to do with whether MIN or MAX is specified:

    SELECT MIN(MYCOL) FROM mytable_part;

    SELECT MAX(MYCOL) FROM mytable_part;

    SELECT MIN(MYCOL) FROM mytable_unpart;

    SELECT MAX(MYCOL) FROM mytable_unpart;

    It matters for partitioned table whether min is used or max is used. In my case i have kept the data same so it doesnt even matter.But if your data is selective enough in the mycol then min is at par with min of unpartitioned table but max does whole index scan.

    However, if you create your index in desc order instead of asc then max is at par with unpartitioned table but min will use the index scan and thus performs badly against unpartitioned table. I have attached a script where you could see the difference.

    By the way, an optimal index for the non-partitioned table in your script is:

    CREATE INDEX nc1

    ON dbo.mytable_unpart (mycol)

    WHERE id <= 20;

    Notice I didn't use id <= 10, just to show you how flexible this method is. It doesn't require partitioning the whole table, and provides many of the same benefits, e.g. locking.

    I did not mention that only way to imrpove the performance was parttitioning.

    It doesn't require partitioning the whole table, and provides many of the same benefits

    Again i have used just one query to show that but that doesnt represent the whole picture.If my query were using say 70-80% of time <=10 then you could have used filtered index but suppose it is equally likely that all partitios will be used you need to have 5 filetred index. Similarly if there are another indexes those need to have filtered indexes the same way. Other option is partitioning ,which provides the ease of maaintenence as well.

    You can partion by using <=20 as well by specifying the first value as 20.

    Pretty much everyone on this thread has told you the same thing: partitioning is primarily about ease of maintenance. There are some cases where performance may improve, and others where it will get worse. You seem to be keen to prove that partitioning might increase performance in some cases - a point that no-one has disagreed about!

    I did not disagree on this point with anyone .I showed in my script both cases where your performance is better and worse so that when you decide the partitoning just for performance what kind of queries will be benefitted and which one wont.Then based on that you can decide whether partitioning will provide the performance gain you are looking for or not.

    Below is the query which shows that min/max performance on partitioned table is based on the order of the index(asc or desc). Parttioned table indexes uses just forward scan and do not use the backward scan.

    I hope this time I have made myself more clear.

    drop table mytable_part

    go

    drop partition scheme ps_mytable

    go

    drop partition function pf_mytable

    go

    create partition function pf_mytable(int)

    as range left for values (10,20,30,40)

    go

    create partition scheme ps_mytable

    as partition pf_mytable all to ( [primary])

    go

    create table mytable_part (id int, mycol char(10) not null default 'mycol',uselesscol char(200) not null default 'uselesscol')

    on ps_mytable(id)

    go

    /*

    select * from sys.partition_functions

    select * from sys.partition_range_values

    select * from sys.partitions

    */

    go

    drop table mytable_unpart

    go

    create table mytable_unpart (id int, mycol char(10) not null default 'mycol',uselesscol char(200) not null default 'uselesscol')

    go

    insert into mytable_unpart with(tablockx) (id,

    mycol,

    uselesscol)

    select NTILE(50) over( order by cnt),

    CONVERT(varchar,cnt%200000),

    CONVERT(varchar,cnt%50000) from Nums where cnt <= 100000

    go

    create clustered index mytable_unpart_id on mytable_unpart(id)

    go

    create nonclustered index mytable_unpart_mycol on mytable_unpart(mycol)

    go

    --select id,COUNT(*) from mytable_unpart group by id

    insert into mytable_part with(tablockx) (id,

    mycol,

    uselesscol)

    select NTILE(50) over( order by cnt),

    CONVERT(varchar,cnt%200000),

    CONVERT(varchar,cnt%50000) from Nums where cnt <= 100000

    go

    create clustered index mytable_part_id on mytable_part(id)

    go

    create nonclustered index mytable_part_mycol on mytable_part(mycol)

    go

    select * from sys.dm_db_index_physical_stats(db_id(),object_id('dbo.mytable_unpart','U'),null,null,null)

    go

    select * from sys.dm_db_index_physical_stats(db_id(),object_id('dbo.mytable_part','U'),null,null,null)

    go

    --max is performing bad..

    dbcc dropcleanbuffers

    dbcc freeproccache

    go

    select max(mycol) from mytable_unpart where id<=10

    go

    select max(mycol) from mytable_part where id<=10

    go

    --min is almost same

    dbcc dropcleanbuffers

    dbcc freeproccache

    go

    select min(mycol) from mytable_unpart where id<=10

    go

    select min(mycol) from mytable_part where id<=10

    go

    --drop indexes and create the index as desc

    drop index mytable_unpart.mytable_unpart_mycol

    go

    create nonclustered index mytable_unpart_mycol_desc on mytable_unpart(mycol desc)

    go

    drop index mytable_part.mytable_part_mycol

    go

    create nonclustered index mytable_part_mycol_desc on mytable_part(mycol desc)

    go

    --now max will perform same

    dbcc dropcleanbuffers

    dbcc freeproccache

    go

    select max(mycol) from mytable_unpart where id<=10

    go

    select max(mycol) from mytable_part with (index=mytable_part_mycol_desc) where id<=10

    go

    --now min will perform badly..

    dbcc dropcleanbuffers

    dbcc freeproccache

    go

    select min(mycol) from mytable_unpart where id<=10

    go

    select min(mycol) from mytable_part where id<=10

    go

    GulliMeel

    Finding top n Worst Performing queries[/url]
    Improve the performance of Merge Join(special case)
    How to Post Performance Problem -Gail Shaw[/url]

  • Gullimeel (5/26/2012)


    You have misunderstood me. I was not chekcing the performance of the min or max in any way or top 1 etc.

    You're missing the point. I'm not talking about us rewriting the query to use TOP, I'm talking about the optimizer transformation that turns MAX or MIN into a TOP (1) operation on a suitably sorted index. This is the bug we have been telling you about.

    Below is the query which shows that min/max performance on partitioned table is based on the order of the index(asc or desc).

    No, this is just the missed optimization again. In every case where you see the whole partition being scanned, there is just an index scan and a stream aggregate in the plan. In every case where you see optimal performance, there is an extra TOP iterator in the plan. You didn't write it, the optimizer put it in. This is the bug.

    Parttioned table indexes uses just forward scan and do not use the backward scan.

    Try this, and tell me which direction your partition scan is:

    SELECT

    MAX(mp.mycol)

    FROM dbo.mytable_part AS mp

    WHERE

    mp.id <= 10

    AND $PARTITION.pf_mytable(mp.id) = $PARTITION.pf_mytable(mp.id);

    SELECT

    MIN(mp.mycol)

    FROM dbo.mytable_part AS mp

    WHERE

    mp.id <= 10

    AND $PARTITION.pf_mytable(mp.id) = $PARTITION.pf_mytable(mp.id);

    With that trick in, it doesn't matter whether you use MIN or MAX or whether the index is ASC or DESC. The engine will perform a FORWARD or BACKWARD scan of the partition, and the query plan will always include a TOP iterator. Is that clear now?

  • Next time I wont save the typing efforts because it has caused much more typing then i thought it would save me(not a nice optimization decision:-)) so that discussion stays around what it has been started for.

    GulliMeel

    Finding top n Worst Performing queries[/url]
    Improve the performance of Merge Join(special case)
    How to Post Performance Problem -Gail Shaw[/url]

  • Gullimeel (5/26/2012)


    Next time I wont save the typing efforts because it has caused much more typing then i thought it would save me(not a nice optimization decision:-)) so that discussion stays around what it has been started for.

    So did you see the TOP operators and the BACKWARD scans? 😉

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply