January 29, 2013 at 9:41 am
Hello all,
At the moment I am trying to come to grips with partitioning.
I am trying a number of scenario's to gain knowledge about partitioning and how to use partitioning. I did read a number of articles on the web, but getting your feet wet with actual code does help as wel.
Queries.
1. Selection on the partkey, (clientid,) startdttm (Est. S c. 0.0071077)
2. Selection on the startdttm (Estimated Subtree cost 0.0183308)
3. Selection on the clientid, startdttm (Est. S. c. 0.10641)
For 3 far more diskreads are done :alien:
Index on startdttm is not partitioned. (Used in 1 and 2 and 3)
Index on clientid is partitioned. (Used in 3)
To me this lookes like that the optimizer does not or hardly looks at the fact that some indexes are partitioned and some are not.
Does the optimiser take into account that indexes can be partitioned?
Sorry I am not able to post a complete example with script to script the table and the contents.
My conclusions:
1. Using partitioning is a riscy bussiness.
2. The optimiser does not handle partitioning very wel.
I can make some guesses why this is happening, but can anyone explain this behaviour to me?
ben brugman
dbcc dropcleanbuffers
dbcc freeproccache
SET STATISTICS TIME ON
SET STATISTICS IO ON
select * from contact where
PartKey = 15
and ClientID = 1000000010984
and startdttm = '2012-03-16 07:32:00.000'
-- Table 'contact'. Scan count 1, logical reads 4, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- CPU time = 0 ms, elapsed time = 108 ms.
dbcc dropcleanbuffers
dbcc freeproccache
select * from contact where
startdttm = '2012-03-16 07:32:00.000'
-- Table 'contact'. Scan count 1, logical reads 4, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- CPU time = 0 ms, elapsed time = 91 ms.
dbcc dropcleanbuffers
dbcc freeproccache
select * from contact where
ClientID = 1000000010984
and startdttm = '2012-03-16 07:32:00.000'
--Table 'contact'. Scan count 30, logical reads 62, physical reads 61, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- CPU time = 0 ms, elapsed time = 1202 ms.
Index information
objectnameindexname partitions
contact NUNCI_contact_STRTDTTM 1
contact NUNCI_ENCTR_CLIIDENDDT 30
contact PK_contact 30
Index information
2[PK_contact]nonclustered, unique, primary key located on fclientPScheme[ID], [PartKey]
6[NUNCI_contact_CLIIDENDDT]nonclustered located on fclientPScheme[clientID], [EndDTTM]
10[NUNCI_contact_STRTDTTM]nonclustered located on NCIndexFG[StartDTTM]
January 29, 2013 at 9:53 am
https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 29, 2013 at 4:02 pm
You may or may not need partitioning.
But you absolutely need to determine the proper clustered index for the table.
It's the column(s) you (almost) always specify in a WHERE clause, particularly if they are in a range.
In your case, it's likely "startdttm", although I can't say for sure without more details.
That will give you vastly better performance, partitioned or not.
Then, if you do partition, partition on the clustering key. Partitioning does allow you to specify different compression for each partition, if you want to (i.e., you can compress older, historical data w/o being forced to compress current data, even though both are in the same table). Sadly, I don't think we can yet specify a different FILLFACTOR for each partition, which could be tremendously helpful as well.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 31, 2013 at 2:07 am
GilaMonster (1/29/2013)
https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/
Thank you Gail for the link.
From the link:
'summary would be: don't change or implement anything without research and without knowing what it's going to affect. '
My quest at the moment is introducing Clustering for a number of tables. By now I have shown that clustering will give an significant improvement for allmost all actions. (Select/Insert/Update/Delete, complex statements, cache). And no significant degradation for any action.
But I do not have the resources to benchmark all situations. (Partitioning and thousands of users is one of them and a enormously large number of potential statements).
But the company is 'a bit' reluctant to the change. So I am trying to gain some knowledge about partitioning, do this by reading articles (yours) and by performing some simple 'tests', hence my question. With the understanding of internals I am in a stronger position.
Thanks for the link,
Ben Brugman
January 31, 2013 at 2:51 am
ScottPletcher (1/29/2013)
You may or may not need partitioning.But you absolutely need to determine the proper clustered index for the table.
I totally agree that a clustered index will help performance in our situation. But to convince the 'decision' makers, I have to know the effect in our sitiuation, hence my questions about partitioning.
Thx,
ben
(The example comes from a table with 13 indexes. And startdttm is not the most used index. Using a clustered index on another field, might give similar results.).
January 31, 2013 at 9:42 am
ben.brugman (1/31/2013)
ScottPletcher (1/29/2013)
You may or may not need partitioning.But you absolutely need to determine the proper clustered index for the table.
I totally agree that a clustered index will help performance in our situation. But to convince the 'decision' makers, I have to know the effect in our sitiuation, hence my questions about partitioning.
Thx,
ben
(The example comes from a table with 13 indexes. And startdttm is not the most used index. Using a clustered index on another field, might give similar results.).
startdttm was the only column used in all three of your sample queries. If you have other queries that are more representative of the normal work load than what you've provided us with as we try to assist, then of course go by the more represenative queries.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 31, 2013 at 10:05 am
ScottPletcher (1/31/2013)
startdttm was the only column used in all three of your sample queries. If you have other queries that are more representative of the normal work load than what you've provided us with as we try to assist, then of course go by the more represenative queries.
This was a query with a suprising effect. If you compare the 2 situation with the 3 situation you can see that the query is almost the same, but that for the 3 situation an extra field is used for selection.
compare:
2: Select * from A where B = 1
3: Select * from A where B = 1 and C = 2
What did suprise me that the second query was a ten fold more expensive. Both in estimated cost as in diskreads.
If for the situation 2 the number of diskreads is 4, for situation 3 the diskreads can always be done in the same number or less reads than in situation 2.
But the optimizer chooses to take another path this did supprise me.
Hence the question.
Ben
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply