July 24, 2013 at 9:54 am
I have five large (500m+ records) tables with differing informaton and column structures. However, each of these tables have five fields in common. Coincidentally, the combination of those five fields makes up the logical key for each table. As such, the tables all have a primary key built on those five fields. I have to run a daily report that draws some data from each of those tables, and when I run the SQL for this report against query analyzer, it's clear that it is using these clustered indexes appropriately and doing merge joins and/or nested loops to get it done. The query takes about 90 minutes to complete.
Then I set about trying to leverage partitioning to improve performance. I dropped the primary key on each of the tables and recreated all of the primary keys with the same five fields, but all on the same partition scheme - effectively aligning all of the tables and their clustered indexes on the same partition scheme. The scheme uses the same one of those five fields as it's partition key (a date field, if that matters) for each table.
The query now runs in about 45 minutes - not bad, right? However, in the original execution plan, all of the time was spent doing the clustered index scans - about 90% of it. In the new execution plan, only 10% of the time is spend scanning indexes, and 80% is spent sorting the data between the index scan and the join (a step which previously didn't exist). Is there a way I can eliminate the sort by structuring my query differently, or is this intermediate sort a natural product of pulling data from partitioned tables?
July 24, 2013 at 1:08 pm
In general, partitioning is not a great mechanism for improving query performance.
Without seeing the query or the execution plan, I'm only guessing.
It sounds like the order in which the data is stored is not the order in which it needs to be for a Merge operation, so the data is being reordered. Look at the sort operation and the merge to understand why each is being chosen. Possibly you can reorder your clustered key to more accurately reflect the order needed. Or, you may need to create an additional non-clustered index. But these are just guesses.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 24, 2013 at 1:24 pm
I just worked it out - I didn't have all of the tables in the join partitioned on the same scheme - I was missing one. So I guess the optimizer thought it made sense to reorder the scan results from the partitioned tables to jive with the order of the non-partitioned table (which happened to be the lead table in the query). Once I got that last table partitioned, everything went direct from the index scan to the nested loop joins with no sorting, and query time dropped another 30%. As it stands, I've been able to reduce a query that had been taking 90 minutes and reduce it to 32, just by partitioning the tables on the same scheme!
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply