Improving performance using partitioning

Question

Improving performance using partitioning

nadersam

SSCertifiable

Points: 6542
More actions
April 10, 2017 at 7:29 am

#401927

Hi All,
I have a few related tables that are becoming big in size and row count.
The Header table has a date field which i will use to group data for each year.
while the detail tables linked to it has no date field so there are 2 paths i could think of to do in partitioning on the detail tables.
1. Use the primary key for detail table and give it an average start and end range depending on average row count for each year.
2. Based on header table which is linked with detail through a foreign-primary key relation , get the start range of primary key for detail table based on date partitioning done on header table.
Ex: I will get the range for primary key for detail tables based on link with header for each year.
I hope i could explain my idea clearly and looking forward for your feedback please.
Regards
Nader

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101120 More actions · Answer 1

Cluster the header table on the date column.
Cluster the detail table on the key from the header table. Get the key/fk by querying the header table first.
Edit: On the detail table, you'll definitely want to have a unique clustering key; you can add identity or some other column(s) to the end of the clustering key to guarantee that it's unique. On the header table, you can make it unique if you prefer, but it probably isn't absolutely necessary like it is for the detail table.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 2

Partitioning of data is meant to be a management tool, not a query performance tuning tool. It is quite rare to get performance enhancements out of partitioned data. You have to be absolutely certain that the queries will always and only filter down to a single partition. If it has to scan across partitions, then performance will be much worse than you are currently experiencing.

You'll be much better served by doing traditional query tuning and index tuning to ensure your queries run as efficiently as possible.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

Grant Fritchey - Monday, April 10, 2017 8:00 AM
Partitioning of data is meant to be a management tool, not a query performance tuning tool. It is quite rare to get performance enhancements out of partitioned data. You have to be absolutely certain that the queries will always and only filter down to a single partition. If it has to scan across partitions, then performance will be much worse than you are currently experiencing.
You'll be much better served by doing traditional query tuning and index tuning to ensure your queries run as efficiently as possible.

I strongly second that! And, it's a bit ironic... all of the correct indexing to actually make partitions work almost always make code run against a monolithic table run much faster. Like Grant said, about the only time you'll see a performance increase in code is where "partition elimination" can come into play and I'll add that it's usually bad code that needs such a thing. Good code usually sees no benefit and is sometimes slowed down by partitioning.

Unless you're trying to shorten backups by making old static data read_only or you're trying to shorten index maintenance (which I don't do anymore and haven't for over a year), partitioning is likely going to be a PITA for you that has many caveats including the ability to quickly do a DR restore.

To state it again, partitioning is NOT a panacea of performance and frequently slows things down a bit.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 4

Jeff Moden - Monday, April 10, 2017 8:49 AM
To state it again, partitioning is NOT a panacea of performance and frequently slows things down a bit.

And sometimes a lot more than just a bit.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Eric M Russell SSC Guru Points: 125562 More actions · Answer 5

The ALTER INDEX statement supports an optional PARTITION clause. This is useful if you have very large tables, allowing you to REBUILD or REORGANIZE one partition at a time or only the latest partition.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

nadersam SSCertifiable Points: 6542 More actions · Answer 6

ScottPletcher - Monday, April 10, 2017 7:59 AM
Cluster the header table on the date column.
Cluster the detail table on the key from the header table. Get the key/fk by querying the header table first.
Edit: On the detail table, you'll definitely want to have a unique clustering key; you can add identity or some other column(s) to the end of the clustering key to guarantee that it's unique. On the header table, you can make it unique if you prefer, but it probably isn't absolutely necessary like it is for the detail table.

Thank you for your reply. I will try that and see if there is a performance gain.

nadersam SSCertifiable Points: 6542 More actions · Answer 7

Grant Fritchey - Monday, April 10, 2017 8:00 AM
Partitioning of data is meant to be a management tool, not a query performance tuning tool. It is quite rare to get performance enhancements out of partitioned data. You have to be absolutely certain that the queries will always and only filter down to a single partition. If it has to scan across partitions, then performance will be much worse than you are currently experiencing.
You'll be much better served by doing traditional query tuning and index tuning to ensure your queries run as efficiently as possible.

Unfortunately the index tuning path has already been exhausted, we currently have data for more than 10 years so we thought by doing partitioning we will minimize scanning through data, specially that most queries are looking for data within last 3 months.

Can you please tell me why after doing partitioning still the queiries have to scan through different partitions even when i specify date range? or please correct me if i misunderstood you.
Thanks

nadersam SSCertifiable Points: 6542 More actions · Answer 8

Thank you guys for all the replies, from what i see based on replies , it's recommended to do partitioning to solve a performance issue.
What i just need to understand please why doesn't partitioning help decrease scanning through historical data by just scanning through current(last) partition only.

Thanks again

Nader

spaghettidba SSC Guru Points: 105732 More actions · Answer 9

nadersam - Tuesday, April 11, 2017 3:04 AM
Thank you guys for all the replies, from what i see based on replies , it's recommended to do partitioning to solve a performance issue.
What i just need to understand please why doesn't partitioning help decrease scanning through historical data by just scanning through current(last) partition only.
Thanks again
Nader

The problem with partitioning is that partition elimination may happen only when a predicate on the partitioning key is encountered in the query. If such a predicate is not found, any other predicate must be checked against multiple B-Trees instead of just one, which is more expensive than its unpartitioned alternative.
If you cannot guarantee that your queries always contain a predicate on the partitioning key, partitioning won't help you.
If your queries always work on the latest portion of the data, you could also consider working with filtered indexes or indexed views, which could achieve good performance results without the managament burden of partitioning. In this case, make sure that the application always uses the correct SET options when reading/writing data.

-- Gianluca Sartori

nadersam SSCertifiable Points: 6542 More actions · Answer 10

spaghettidba - Tuesday, April 11, 2017 3:13 AM
nadersam - Tuesday, April 11, 2017 3:04 AM
Thank you guys for all the replies, from what i see based on replies , it's recommended to do partitioning to solve a performance issue.
What i just need to understand please why doesn't partitioning help decrease scanning through historical data by just scanning through current(last) partition only.
Thanks again
Nader
The problem with partitioning is that partition elimination may happen only when a predicate on the partitioning key is encountered in the query. If such a predicate is not found, any other predicate must be checked against multiple B-Trees instead of just one, which is more expensive than its unpartitioned alternative.
If you cannot guarantee that your queries always contain a predicate on the partitioning key, partitioning won't help you.
If your queries always work on the latest portion of the data, you could also consider working with filtered indexes or indexed views, which could achieve good performance results without the managament burden of partitioning. In this case, make sure that the application always uses the correct SET options when reading/writing data.

Thank you very much, i will look into that.

nadersam SSCertifiable Points: 6542 More actions · Answer 11

spaghettidba - Tuesday, April 11, 2017 3:13 AM
nadersam - Tuesday, April 11, 2017 3:04 AM
Thank you guys for all the replies, from what i see based on replies , it's recommended to do partitioning to solve a performance issue.
What i just need to understand please why doesn't partitioning help decrease scanning through historical data by just scanning through current(last) partition only.
Thanks again
Nader
The problem with partitioning is that partition elimination may happen only when a predicate on the partitioning key is encountered in the query. If such a predicate is not found, any other predicate must be checked against multiple B-Trees instead of just one, which is more expensive than its unpartitioned alternative.
If you cannot guarantee that your queries always contain a predicate on the partitioning key, partitioning won't help you.
If your queries always work on the latest portion of the data, you could also consider working with filtered indexes or indexed views, which could achieve good performance results without the managament burden of partitioning. In this case, make sure that the application always uses the correct SET options when reading/writing data.

Following what you mentioned , i found the filtered index to be very helpful in my case.
I took a query and checked it's execution plan and got one of the indexes it uses and created it again with same syntax but adding the where condition to change it to a filtered index.
The problem is that the query is still using the old index not the new one, i have to keep the old one in case someone needs to fetch old data.
Can i get help please why SQL didn't use the new index when the query has a condition that's satisfied by the new filtered index.

Thanks
Nader

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 12

Gianluca already answered the question better than I would.

It's down to having a rock solid guarantee of partition elimination. So often we hear that, oh sure, they'll always filter by dates, only to find that, well, they mostly filter by dates or even frequently filter by dates. No filtering, no partition elimination. No partition elimination, increased execution time, not to mention, more contention on disk, in I/O and CPU as you process across more, rather than fewer, objects.

As with all things, test it. Set up an environment, partition the data, run a captured copy of your standard workload against it. See how it behaves. You may be in the rarer situation where you will get partition elimination and a performance increase. More likely though, you'll be in the situation where it hurts.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 13

nadersam - Tuesday, April 11, 2017 5:16 AM
Following what you mentioned , i found the filtered index to be very helpful in my case.
I took a query and checked it's execution plan and got one of the indexes it uses and created it again with same syntax but adding the where condition to change it to a filtered index.
The problem is that the query is still using the old index not the new one, i have to keep the old one in case someone needs to fetch old data.
Can i get help please why SQL didn't use the new index when the query has a condition that's satisfied by the new filtered index.
Thanks
Nader

Without seeing the query, the table or the index, these are guesses.

Because it's not a covering index, the optimizer chooses to go to the regular index. Do you have INCLUDE columns with the filtered index to make it covering?

The WHERE clause for the query and the one for the filtered index actually aren't aligning. If the optimizer can't recognize that the your filtered index would work with the query, it won't use it.

The WHERE clause has additional criteria, possibly even non-sargeable criteria, that make choosing the nonclustered index impossible.

The WHERE clause of the filtered index doesn't actually filter the data very much so the regular index looks just as attractive to the optimizer, so it still uses it.

Could even be something else entirely. Without details, I'm just guessing here.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

nadersam SSCertifiable Points: 6542 More actions · Answer 14

Grant Fritchey - Tuesday, April 11, 2017 6:05 AM
nadersam - Tuesday, April 11, 2017 5:16 AM
Following what you mentioned , i found the filtered index to be very helpful in my case.
I took a query and checked it's execution plan and got one of the indexes it uses and created it again with same syntax but adding the where condition to change it to a filtered index.
The problem is that the query is still using the old index not the new one, i have to keep the old one in case someone needs to fetch old data.
Can i get help please why SQL didn't use the new index when the query has a condition that's satisfied by the new filtered index.
Thanks
Nader
Without seeing the query, the table or the index, these are guesses.
Because it's not a covering index, the optimizer chooses to go to the regular index. Do you have INCLUDE columns with the filtered index to make it covering?
The WHERE clause for the query and the one for the filtered index actually aren't aligning. If the optimizer can't recognize that the your filtered index would work with the query, it won't use it.
The WHERE clause has additional criteria, possibly even non-sargeable criteria, that make choosing the nonclustered index impossible.
The WHERE clause of the filtered index doesn't actually filter the data very much so the regular index looks just as attractive to the optimizer, so it still uses it.
Could even be something else entirely. Without details, I'm just guessing here.

May be i didn't elaborate on that, what i did is get a query that performs lots of reads on a table then from the execution plan get which index it uses(Covering index).
Then i created that same index exactly but with a filter condition that gets data only within the criteria of that query.
If you need me to send any details of database structure, please tell me what's needed and i will provide it.

Thank you
Nader