August 29, 2012 at 12:52 pm
I have the following query:
SELECT CE.ID ,
CA.CustomerID,
CA.PostalCode
FROM DBXN.CommunicationEntry CE
LEFT OUTER JOIN ( SELECT A.ID ,
CA.CustomerID ,
A.PostalCode,
A.EnglishAddressLine1
FROM DBXN.Customer_Address AS CA
INNER JOIN DBXN.Address AS A ON CA.AddressID = A.ID
) AS CA ON CE.CustomerID = CA.CustomerID
WHERE CE.ID = 6298103
When I run this I get very good results, there's an index seek and a key lookup on the Customer_Address table, and an index seek on the Address table - all with very low estimated number of rows (between 1-3). Each have seek predicates coming from the joined column.
However, the second I introduce this additional column:
SELECT CE.ID ,
CA.CustomerID,
CA.PostalCode,
CA.EnglishAddressLine1
FROM DBXN.CommunicationEntry CE
LEFT OUTER JOIN ( SELECT A.ID ,
CA.CustomerID ,
A.PostalCode,
A.EnglishAddressLine1
FROM DBXN.Customer_Address AS CA
INNER JOIN DBXN.Address AS A ON CA.AddressID = A.ID
) AS CA ON CE.CustomerID = CA.CustomerID
WHERE CE.ID = 6298103
...the query plan changes to a full index scan where the estimated number of rows includes every record in the table, and there are no predicates whatsoever used until further up stream when it hits a filter. By that time there's an estimated 82.5 million rows with an estimated data size of 21GB.
The EnglishAddressLine1 column is a computed column that takes several of the other fields in the Address table and builds a formatted address, there are several others like it for Line2/Line3, etc - but this is the simplest illustration of the problem.
Just wondering if anyone has any insight on why a computed column coming from a left outer join would cause this?
August 29, 2012 at 3:18 pm
Please post plans for both of the queries.
August 30, 2012 at 10:46 am
Hey Alex,
I've added the .sqlplan files to the original post... Thanks for your response!
August 30, 2012 at 1:04 pm
Just a guess but you might consider looking at your indexes. You probably have an index that covers CE.ID, CA.CustomerID and CA.PostalCode but doesn't include CA.EnglishAddressLine1.
I did a quick test and I was able to include a computed column in an index. Both as part of the index and as an included column.
Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]
August 30, 2012 at 1:37 pm
Kenneth.Fisher (8/30/2012)
Just a guess but you might consider looking at your indexes. You probably have an index that covers CE.ID, CA.CustomerID and CA.PostalCode but doesn't include CA.EnglishAddressLine1.I did a quick test and I was able to include a computed column in an index. Both as part of the index and as an included column.
Unfortunately it's not the case here. The function is referencing columns from other tables.
August 30, 2012 at 1:55 pm
Kevin,
Can you post the schema and indexes for the tables CommunicationEntry, Customer_Address, and Address please?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 30, 2012 at 2:17 pm
It's also only one of twelve similar columns on the same table, and that table currently sits at 19.7 million rows. Even if we could index each of them, I'm not sure we would. This is a db in development at the moment, and it would probably make more sense to go another route - like calling the address formatting function on the A.ID on the left side of the join, or elsewhere.
What I'm really interested in though, is determining why the join predicate gets applied so differently in each of the scenarios.
August 30, 2012 at 2:50 pm
Hey Craig,
I'm not sure I could post them in a useful format, if you're looking to try and re-create the actual problem. There ends up being quite a few foreign key relations through those three tables.
I'll check with the powers that be, and see how much I can post here.
August 30, 2012 at 3:02 pm
There are more tables in the query than those 3.
To be able to re-create the issue these tables are needed as well:
[DBXN].StreetNumberSuffixLookup
[DBXN].StreetTypeLookup
[DBXN].StreetDirectionLookup
[DBXN].RouteServiceTypeLookup
[DBXN].DeliveryInstallationTypeLookup
[DBXN].CountryLookup
[DBXN].ProvinceLookup
August 30, 2012 at 3:06 pm
Kevin Dahl (8/30/2012)
Hey Craig,I'm not sure I could post them in a useful format, if you're looking to try and re-create the actual problem. There ends up being quite a few foreign key relations through those three tables.
I'll check with the powers that be, and see how much I can post here.
I'm not looking to recreate the issue, I won't have your data disparity and other assundry bits to do that. What I'm looking to do is be able to analyze the indexes available, what's being ignored and what else it could use, things like that. With an understanding of the clustered vs. non-clustered we can offer you an alternative version of the same logic that will work with what you've already got built... hopefully.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 31, 2012 at 10:54 am
Hey Craig/Alex,
I'm still waiting to hear back (from management) on how much of the schema I am able to post here.
I'm actually not so much interested in the particulars of improving this specific query though, as I can easily do that with a few slight modifications - selecting just the Address.ID field from the left outer join and specifically calling the format function in the result set, or specifying the CustomerID in for the CommunicationEntry in the WHERE clause, for instance, both avoid the issue...
What I'm really trying to get my head around is why this behavior would occur in the first place though. It seems to be very specific to when computed columns (or OUR specific computed columns) are coming from the left outer join and the field being joined on is not specified in the WHERE clause.
This Address table is fairly pivotal to our app, and if the computed columns are going to be a problem we may have to re-evaluate that particular decision and look at other options.
August 31, 2012 at 3:18 pm
Kevin Dahl (8/29/2012)
I have the following query:
SELECT CE.ID ,
CA.CustomerID,
CA.PostalCode
FROM DBXN.CommunicationEntry CE
LEFT OUTER JOIN ( SELECT A.ID ,
CA.CustomerID ,
A.PostalCode,
A.EnglishAddressLine1
FROM DBXN.Customer_Address AS CA
INNER JOIN DBXN.Address AS A ON CA.AddressID = A.ID
) AS CA ON CE.CustomerID = CA.CustomerID
WHERE CE.ID = 6298103
When I run this I get very good results, there's an index seek and a key lookup on the Customer_Address table, and an index seek on the Address table - all with very low estimated number of rows (between 1-3). Each have seek predicates coming from the joined column.
However, the second I introduce this additional column:
SELECT CE.ID ,
CA.CustomerID,
CA.PostalCode,
CA.EnglishAddressLine1
FROM DBXN.CommunicationEntry CE
LEFT OUTER JOIN ( SELECT A.ID ,
CA.CustomerID ,
A.PostalCode,
A.EnglishAddressLine1
FROM DBXN.Customer_Address AS CA
INNER JOIN DBXN.Address AS A ON CA.AddressID = A.ID
) AS CA ON CE.CustomerID = CA.CustomerID
WHERE CE.ID = 6298103
...the query plan changes to a full index scan where the estimated number of rows includes every record in the table, and there are no predicates whatsoever used until further up stream when it hits a filter. By that time there's an estimated 82.5 million rows with an estimated data size of 21GB.
The EnglishAddressLine1 column is a computed column that takes several of the other fields in the Address table and builds a formatted address, there are several others like it for Line2/Line3, etc - but this is the simplest illustration of the problem.
Just wondering if anyone has any insight on why a computed column coming from a left outer join would cause this?
I looked at both execution plans, and it looks like the definition of the computed column DBXN.Address.EnglishAddressLine1 calls one or more functions in the process of assembling the value. This can cause all kinds of strangeness as the optimizer tries to tease out an execution plan from multiple nested function calls. After trying for a bit, the optimizer will stop looking for a better plan and go with a "good enough" plan, where "good enough" means "delivers the correct results even if it takes 12 hours to do so."
The short answer as to why the first query runs so well is that the optimizer is smart enough to see that the column EnglishAddressLine1 will not appear in the final result set and is not required for any intermediate steps. Since that column is not needed, the optimizer can just skip the work that would be required to create it.
When you include the EnglishAddressLine1 column in the outer SELECT statement so that it will appear in the final result set, the optimizer has to come up with an execution plan that will create it. In trying to build this plan, the optimizer will only try for so long to untangle the nested function calls and subqueries to come up with an efficient execution plan until it falls back on a "good enough" plan that will definitely deliver correct results. SQL Server can almost always deliver correct results with table/clustered index scans.
So that's *why* you get the performance problem with that computed column. Now, what should you do about it?
DBXN.Address calls scalar functions to create the EnglishAddressLine1 computed column. Scalar functions can cause tremendous performance problems. I didn't see anything in your execution plan that couldn't be done at run time with a well-written inline table-valued function that would perform like a dream or even a view if all you are doing is concatenating columns from the same row with no need to pass in any parameters to influence the output.
Oh, and by the way, much of the definitions of your functions can be derived from the execution plan. You may be disclosing information you do not want or intend to disclose by posting them on the forum.
Jason Wolfkill
September 4, 2012 at 3:03 pm
Thanks wolfkillj,
That's essentially what I was looking for with regards to the optimizer determining the best query plan. I've always been wary of scalar functions, but these particular ones always seemed to perform well in all our performance testing to date. At least now I'll know what to keep an eye out for.
Also, I was aware of the function definitions in the .sqlplan, I had reviewed what was there (and changed a few names) before I posted it. It's nothing particularly proprietary, and the minimal I felt I could post to exemplify the issue.
Thanks again for your explanation 🙂
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply