September 7, 2016 at 12:52 pm
I have a view that has 6 datasets created within it. Each subsequent dataset joins on previous datasets to filter out the desired results. The very first dataset gets dates from an xref table that get passed down all the way to final select statement in the view. If I hard code a date in the first data set and then run select * from the_view, it runs under 30 seconds. If I run the view as select * from the_view where date='xx-xx-xxxx' then it takes over an hour. I cannot change the view to be a stored proc or function with date passed as parameter because the end users are only granted permissions on views. See the pseudo-code of the view below:
WITH Date_Dataset ( Effective_Dt )
AS ( SELECT DISTINCT
Effective_Dt
FROM Date_Xref
),
TcklrTypeDataset
AS ( SELECT A,B,C
FROM Tickler_Table
INNER JOIN Date_Dataset ON Tickler_Table.Date = Load_ID_Dataset.Date
INNER JOIN dbo.tbl_Dim_Tcklr_Type tcklrtype ON tcklrtype.Tcklr_Type_Key = tcklr.Tcklr_Type_Key
WHERE tcklrtype.Tcklr_Type_Cd IN ( '062', '063', '064', '065' )
),
Dataset2 AS (select x,y,z from Main_Table inner join TcklrTypeDataset)
select * from Dataset2
So when user queries the view as select * from the_View where Date = 'xx-xx-xxxx', it runs for more than an hour. I need to be able to streamline this view so I don't have to hard code a date in first dataset as users query for different dates.
September 7, 2016 at 1:17 pm
There's no way anyone can help you from what you posted. You'll need to follow the indications in this article to get actual help: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
It's not that we don't want to help, it's just that we can't see the problem.
September 8, 2016 at 1:09 am
There's no way to pass parameters to a view. Either the users apply the date filter when they query the view, or you'll need to change to a procedure or in-line table valued function
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 8, 2016 at 2:05 am
farazkhan1981 (9/7/2016)
I have a view that has 6 datasets created within it. ...
Those aren't datasets, they are CTE's, which are somewhat like views in that they are more or less substituted into the body of the query at run time rather than creating intermediate result sets. Chaining CTE's in this manner masks complexity - often the reason for choosing CTE's in the first place - and most devs know from experience that this may result in extremely poor performance as the optimiser is unable to guarantee that a good plan will be picked from the endless possibilities, and times out. Can you rewrite the code without chaining so many CTE's?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 8, 2016 at 6:43 am
There is a potential problem right up at the front. The codes starts with a "Date_Dataset" cte that does a SELECT DISTINCT to produce a list of values. The optimizer doesn't know what values might come out of that select distinct, and so has no statistics to help it find a more efficient execution plan. This forces the optimizer to take a most conservative approach. Try putting those results in a #distinct table indexed on the date column. Then join to that table rather than to the cte.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 8, 2016 at 8:58 am
The Dixie Flatline (9/8/2016)
There is a potential problem right up at the front. The codes starts with a "Date_Dataset" cte that does a SELECT DISTINCT to produce a list of values. The optimizer doesn't know what values might come out of that select distinct, and so has no statistics to help it find a more efficient execution plan. This forces the optimizer to take a most conservative approach. Try putting those results in a #distinct table indexed on the date column. Then join to that table rather than to the cte.
Actually, I think that section should be rewritten as a WHERE EXISTS. He's only pulling one field from that table, and he's using it for an inner join, so he can use the other side of the join in subsequent steps.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 8, 2016 at 11:02 am
You're right, Drew. If he's just using it to filter by, there is no need for the join.
In fact, if the original Date_Xref is indexed on date, unique or not, an EXISTS should produce the right result without having to do the select distinct at all.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 8, 2016 at 11:53 am
As others have said, CTE's are just for easy formatting, they're all run together at execution. What you're effectively running is this...
select *
from (select x,y,z
from Main_Table
inner join ( SELECT A,B,C
FROM Tickler_Table
INNER JOIN ( SELECT DISTINCT Effective_Dt FROM Date_Xref) ON Tickler_Table.Date = Load_ID_Dataset.Date
INNER JOIN dbo.tbl_Dim_Tcklr_Type tcklrtype ON tcklrtype.Tcklr_Type_Key = tcklr.Tcklr_Type_Key
WHERE tcklrtype.Tcklr_Type_Cd IN ( '062', '063', '064', '065' )
) -- missing a join condition here
) as alias
Those subselects are making it diffcult for SQL to pick a good plan. I'd write it as a single select, nothing too complex here.
The difference in performance will be because with the hardcoded date, the where is in the ( SELECT DISTINCT Effective_Dt FROM Date_Xref) part.
When run selecting from the view, it's at the end of the query.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply