August 6, 2012 at 1:50 am
I've been having a look at a project that is underperforming when selecting from a view, and noticed that in the primary View there are a number of interesting joins going on.
The structure of the tables is basically
CREATE Table #Partitioned (
Id Int Identity(1,1) NOT FOR REPLICATION
,Col1Id int
,Col2Id int
,Col3Id int
,SysId int
)
Create Table #Lookup (
ColId Int Identity(1,1) NOT FOR REPLICATION
SysId Int
ColDesc varchar(255)
)
With the table #Partitioned Partitioned by SysId, with a compound PK consisting of Id and SysId
The table #Lookup has a Compound Primary key of ColId and SysId, even though ColId is Unique, due to it being an Identity column
The query in view that joins these is basically
Select *
from #Partitioned p
LEFT JOIN #Lookup l
on p.Col1Id=l.ColId
AND p.SysId=l.SysId
LEFT JOIN #Lookup l2
on p.Col2Id=l2.ColId
AND p.SysId=l2.SysId
LEFT JOIN #Lookup l3
on p.Col2Id=l3.ColId
AND p.SysId=l3.SysId
Row counts approx 10 million rows in #Partitioned and 50,000 rows in #Lookup.
I've done some preliminary testing by removing the SysId, from the Join and had around 4-5% performance increase with no visible impact on the result set.
In both instances the SQL server Query plan (Actual and Estimated) both show a Table Scan of the #Lookup
I've asked why this was done and I received an answer i didnt think was correct, so I would like to understand the following
1) Is the SysId in the PK on the #lookup table pointless? As the col1Id is Unique by default so it doesnt need the additional column to make it unique.
2) Why doesnt SQL server use the PK on the join?
3) Would an Index hint on the join Increase the performance?
I cant give actual query plans or code, but would just love to understand the rational behind this design.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
August 6, 2012 at 5:39 pm
The optimizer is choosing to scan the lookup table because it thinks that is the fastest way for it to deliver the data. Consider that for each row in #Partitioned, there are 3 LEFT JOINs, meaning it would have to do thirty million index seek operations against the #Lookup table. I suspect it opted instead for a hash join. The execution plan would confirm this.
The SysID in the primary key of the #lookup table isn't necessarily pointless, but it really doesn't help this query, because the SysIDs are all spread out due to the fact that ColID is the primary column in the key. The index you described would only be truly helpful for queries that requested ColID and SysID only, because those queries could be answered without going to the leaf level of the index.
As for the index hint, try it and see what happens. 😉 Be sure to turn STATISTICS IO ON and get a baseline of the number of logical reads before you use the index hint.
Hope this helps.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 6, 2012 at 6:09 pm
Jason-299789 (8/6/2012)
1) Is the SysId in the PK on the #lookup table pointless? As the col1Id is Unique by default so it doesnt need the additional column to make it unique.
From your description, yes... but so is including it as a PK component on #Partitioned, since that's ALSO an identity. As part of the clustered index however it will help if you have a more restrictive where clause that only uses certain sysIDs. This may be a case of overusage.
2) Why doesnt SQL server use the PK on the join?
Depends, is the PK also the clustered? It'll also depend on column ordering and the like. Without the execution plan (which I realize you can't provide) pretty much everything we can offer would be a guess.
3) Would an Index hint on the join Increase the performance?
For a specific use case, possibly. Generically, not usually. They're the last-ditch effort to try to nail down what's wrong in the design so you can eventually remove the hints anyway and let the optimizer run properly.
In general, when you're pulling that many rows back, you need to evaluate the design in general rather than what the process is doing. The purpose of it, the expectations, etc etc. Without more details though, it's hard to help optimize a process that could have over a hundred different tweaks that are causing you more headaches than expected.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 7, 2012 at 12:34 am
Thank you to both of the replies so far its confirmed a few things, and given me a few things to ponder.
The PK is Non-clustered on the #Lookup table, and looking at the table there is no clustered Index at all, unfortunately id assumed the PK (or secondary index) was clustered, which probably explains the table scan. Id like to see what happens if you make the PK clustered.
I did some 'off-record' testing by dropping the SysId from the joins inthe actual query, and over a couple of runs there was an approx 5% performance improvement.
Also in the actual query the #Lookup table is referenced around 8 times. At the moment the query returns around 25% of the data in a little over 13-14 minutes.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
August 7, 2012 at 1:49 pm
Jason-299789 (8/7/2012)
Thank you to both of the replies so far its confirmed a few things, and given me a few things to ponder.The PK is Non-clustered on the #Lookup table, and looking at the table there is no clustered Index at all, unfortunately id assumed the PK (or secondary index) was clustered, which probably explains the table scan. Id like to see what happens if you make the PK clustered.
I did some 'off-record' testing by dropping the SysId from the joins inthe actual query, and over a couple of runs there was an approx 5% performance improvement.
Also in the actual query the #Lookup table is referenced around 8 times. At the moment the query returns around 25% of the data in a little over 13-14 minutes.
Assuming the optimizer thinks there is 25% of the table's rows coming back and that estimate is accurate you will almost certainly get scans in your case since you have no WHERE clause. The only thing that will help this is a) better hardware (especially RAM and IO) and b) not trying to return too much data to SSMS if that is what you are doing with your SELECT.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply