January 19, 2015 at 5:57 am
Hi,
I have a simple query that joins a largeish fact table (3 million rows) to a view that returns 120 rows. The SKEY in the view is returned via a scalar function. The view returns instantly if queried on it's own however when joined to the fact table in the simple query below results in a query execution plan that runs forever. Interestingly if I change the INNER JOIN to a LEFT OUTER JOIN the query returns the matched results almost instantly.
Select
Dimension.Age_Band.[10_Year_Age_Band],
Count(*)
From
Fact.APC_Episodes
Inner Join Dimension.Age_Band ON
Fact.APC_Episodes.AGE_BAND_SKEY = Age_Band.AGE_BAND_SKEY
Group By
Dimension.Age_Band.[10_Year_Age_Band]
I know joining to a view using a column generated by a scalar function is not a good recipe for performance. I also know that I could fix this by populating a physical table with the view first as I have already tested this though I hoping not to have to go down that route.
Does anyone have any ideas why a LEFT OUTER JOIN works and not an INNER JOIN or anyway I can get the query optimizer to generate an execution plan that works?
Thanks
Daniel Forrester
January 19, 2015 at 7:15 am
Hi Dan, have a look at the execution plan, the leftmost operator, SELECT. Rightclick to raise the properties tab. Look for something like "Reason for early termination..." and post the value string. Better still, post the actual execution plan as a .sqlplan attachment.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 19, 2015 at 7:26 am
How long does it take to get a return if you run the following, making the appropriate change in the FROM clause, of course?
SELECT TOP 10 *
FROM yourviewname
;
Shifting gears back to looking at the code... does the Age_Band.AGE_BAND_SKEY column contain ONLY unique values?
And, yes. Please attach the execution plan as a file along with the necessary DDL. Please see the second link under "Helpful Links" in my signature line below for what we need and how we need it to help you troubleshoot a performance problem. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 19, 2015 at 8:16 am
Hi Guys,
I found a work around which seems to trick the query optimizer into working properly by comparing the Age_Band View to other dimensions that don't seem to cause the problem. I have found that if I introduce a Union statement that determines the type of the columns (in my example AGE_BAND_SKEY) within the view/sub query it works and the results are returned instantly. In the example below if I comment out the first statement in the sub query (Select 0 As AGE_BAND_SKEY) this query never finished, put in the union and instant response.
Select
Count(*)
From
Fact.APC_Episodes
Inner Join (
Select
0 As AGE_BAND_SKEY
Where
1=2
Union All
Select
dbo.fn_Generate_SKEY(Age_Band.Age,Age_Band.Valid_From) As AGE_BAND_SKEY
From
NHSH_Reference.Dimension.Age_Band With (NoLock)
) As Age_Band ON
Fact.APC_Episodes.AGE_BAND_SKEY = Age_Band.AGE_BAND_SKEY
I have attached the plans for your interest however I don't want to spend too much time on this. The reason for the views is because I am trying to rapidly prototype a data mart based on existing data and therefore reduce the time spent on the ETL development. I'm sure I'll have to do this properly at some point.
Thanks for your interest.
Daniel
January 19, 2015 at 8:30 am
Daniel Forrester 123 (1/19/2015)
I'm sure I'll have to do this properly at some point.
Why not now? The logic and code is fresh in your mind. It almost always takes less time to do it correctly the first time instead of slamming out something that works today but you have to go back and fix at some point.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 β Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
January 19, 2015 at 8:41 am
I wonder if the reason is the Scalar Function, which seems to be generating an on the fly surrogate key, is there a reason why the Age band view isn't materialised as a Physical table?
Any chance you can post the scalar so it can be evaluated.
Also the WITH(NOLOCK) Hint could do with being removed as you can get dirty reads and so inconsistent data, possibly not as much of an issue on a static dataset, as it would be on an OLTP system.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
January 19, 2015 at 8:49 am
Daniel Forrester 123 (1/19/2015)
Hi Guys,I found a work around which seems to trick the query optimizer into working properly by comparing the Age_Band View to other dimensions that don't seem to cause the problem. I have found that if I introduce a Union statement that determines the type of the columns (in my example AGE_BAND_SKEY) within the view/sub query it works and the results are returned instantly. In the example below if I comment out the first statement in the sub query (Select 0 As AGE_BAND_SKEY) this query never finished, put in the union and instant response.
Select
Count(*)
From
Fact.APC_Episodes
Inner Join (
Select
0 As AGE_BAND_SKEY
Where
1=2
Union All
Select
dbo.fn_Generate_SKEY(Age_Band.Age,Age_Band.Valid_From) As AGE_BAND_SKEY
From
NHSH_Reference.Dimension.Age_Band With (NoLock)
) As Age_Band ON
Fact.APC_Episodes.AGE_BAND_SKEY = Age_Band.AGE_BAND_SKEY
I have attached the plans for your interest however I don't want to spend too much time on this. The reason for the views is because I am trying to rapidly prototype a data mart based on existing data and therefore reduce the time spent on the ETL development. I'm sure I'll have to do this properly at some point.
Thanks for your interest.
Daniel
Thanks, Daniel, but I don't see any of the plans attached. It would be fun to look this one over because the optimizer has obviously been tricked into performance.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 19, 2015 at 8:54 am
Jason-299789 (1/19/2015)
Also the WITH(NOLOCK) Hint could do with being removed as you can get dirty reads and so inconsistent data, possibly not as much of an issue on a static dataset, as it would be on an OLTP system.
Of course if the data is static it is highly unlikely to have locks anyway. π I agree 100% about removing that hint unless actually needed (and all too often it is just no fully understood).
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 β Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
January 19, 2015 at 1:16 pm
The execution plan is your bestest buddy in situations like this.
Joining to views is a common code smell in T-SQL. The language just doesn't lend itself to code reuse. While it looks like it should be great to write a series of JOINs once into a view and then use them over and over, it's actually a poor choice.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 19, 2015 at 3:58 pm
You could also defer decoding/lookup until after the counts are computed. This might make the final query more "natural-looking":
Select
age.[10_Year_Age_Band],
epi.count
From (
Select
AGE_BAND_SKEY, Count(*) AS epi_count
From
Fact.APC_Episodes
Group By
AGE_BAND_SKEY
) AS epi
Inner Join NHSH_Reference.Dimension.Age_Band age With (NoLock) ON
age.AGE_BAND_SKEY = epi.AGE_BAND_SKEY
Order by
[10_Year_Age_Band]
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 20, 2015 at 6:31 am
Hopefully the plans have attached successfully this time. Like I said before the use of views is to enable me to quickly prototype the dimensions for a data mart to support the proposed reporting requirements. With the specifications being pretty slack the views enable me to make changes and see the results straight away. It is not my recommended final solution!
Regards
Daniel
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply