August 22, 2011 at 10:18 pm
Hello, I have a view that basically joins two tables of > 1million rows.
Example Only:
create VIEW TESTVIEW as select a.*, b.* from table1 a join table2 on a.ID = b.ID
clustered index on the ID field
When I do Select * from TESTVIEW where ID = 123 it takes ~44 seconds.
When I just run the ad-hoc T-SQL code in the VIEW it take < 1 second.
Looking at the execution plans I see the VIEW has two high % steps at the beginning using a clustered index scan on both tables and returning a large number of rows before filtering in one of the final steps.
The ad-hoc query used a clustered index seek at the beginning and only returns one row at this point.
So to me anyway it looks like the VIEW returns ALL rows (hence the index scan) first and then does the WHERE clause and returns one row where as the ad-hoc T_SQL does the filter early on??
Is this normal and can this be prevented so the view runs with the same or similar execution plan as the ad-hoc code as I thought a view might have the intelligence to work out the WHERE clause is selective so process that first...but obviously not.
I know a view is not necessarily going to be faster but I am surprised by the difference.
I would be interested to know the logic behind this if anybody know....thanks
August 23, 2011 at 1:49 am
It highly depends on the view definition vs T-SQL code you're running.
Can you post the actual code and actual execution plans?
However, a couple of pointers should apply anyway:
* Are you selecting the same number of columns?
* Are filters identical?
Hope this helps
Gianluca
-- Gianluca Sartori
August 23, 2011 at 2:06 am
I was copying the exact code from inside the view and just added the "WHERE ID = 123" bit at the end, compared to running the view "select * from TESTVIEW where ID = 123
The view code has a left join between two tables as below.
As mentioned if i add "and ID = 123" on the end of this and run it as a T-SQL statement it filters first and uses an index seek thus finishing in 1 second
If I run the view all rows are returned and then the WHERE filter is applied. And with a lot of fields and CASE statements etc being returned for every row it takes over 40 seconds.
thanks
SELECT [CUSTOMER_ID]
,[Name_Local]
,[Name_Eng]
,[TITLE]
,[FIRST_NAME]
,[MIDDLE_NAME]
,[LAST_NAME]
,[SALUTATION]
,[GENDER]
,[DOB]
,[Marital_Status]
,[BUILDING]
,[BLOCK]
,[STREET_NO]
,[STREET]
,[SUBURB]
,[CITY]
,[FLOOR]
,[POST_CODE]
,[STATE]
,[COUNTRY]
,[Nationality]
,[Occupation]
,[Annual_Income]
,
,[Home_Phone]
,[Work_Phone]
,[Cell_Phone]
,[FAX]
,[Enrolled_Store_ID]
,[Most_Visited_Store_ID]
,[First_Transaction_Date]
,[Privacy]
,[Customer_Membership]
, isnull([Member_Status],'1-Never Used Card') as [Member_Status]
,[FIRST_DATA_ENTRY]
,cast(floor(cast([FIRST_DATA_ENTRY_DATE] as float)) as datetime) [FIRST_DATA_ENTRY_DATE]
,[FIRST_DATA_ENTRY_BY]
,[Contactable_By_Mail]
,[Contactable_By_Email]
,[Contactable_By_SMS]
,[Contactable_By_Phone]
,[Total_Spend_at_Enrolled]
,[Total_TXN_at_Enrolled]
,[Total_Spend_at_Other]
,[Total_TXN_at_Other]
,[Last_Transaction_Date]
,[Last_Transaction_Store_ID]
,[Last_Customer_Level]
,[Last_YTDSpend]
,[Last_Cardnumber]
,[Last_ISO_Cardnumber]
,[Last_Name_On_Card]
,[Last_Current_Points_Balance]
,[Last_Current_Savings_Balance]
,[Last_Current_Debit_Balance]
,[Total_Spend]
,rank()over(order by Total_Spend desc) as Rank_PTD_Spend
,[Total_Spend_Ex_GST]
,[Total_TXN]
,[Total_Credits]
,[Total_Points_Deducted]
,[Total_Savings_Added]
,[Total_Savings_Deducted]
,[Total_Debit_Added]
,[Total_Debit_Deducted]
,[Total_Visits]
,[Points_to_next_Voucher]
,[No_of_Interim_Awards_Issued]
,null as [Migration_flag]
,Enrol_Age
,[Enrol_Age_Group] = CASE
WHEN Enrol_Age <14 THEN '< 14 yrs'
WHEN Enrol_Age between 14 and 17 THEN '14 - 17 yrs'
WHEN Enrol_Age between 18 and 24 THEN '18 - 24 yrs'
WHEN Enrol_Age between 25 and 34 THEN '25 - 34 yrs'
WHEN Enrol_Age between 35 and 49 THEN '35 - 49 yrs'
WHEN Enrol_Age between 50 and 64 THEN '50 - 64 yrs'
WHEN Enrol_Age between 65 and 110 THEN '65 + yrs'
ELSE 'UNKNOWN'
END
,[Current_AGE]
,[CURRENT_AGE_GROUP] = case when [Current_AGE] is not null then
(CASE
WHEN [Current_AGE] <14 THEN '< 14 yrs'
WHEN [Current_AGE] between 14 and 17 THEN '14 - 17 yrs'
WHEN [Current_AGE] between 18 and 24 THEN '18 - 24 yrs'
WHEN [Current_AGE] between 25 and 34 THEN '25 - 34 yrs'
WHEN [Current_AGE] between 35 and 49 THEN '35 - 49 yrs'
WHEN [Current_AGE] between 50 and 64 THEN '50 - 64 yrs'
WHEN [Current_AGE] between 65 and 110 THEN '65 + yrs'
ELSE 'UNKNOWN'
END )
else 'UNKNOWN'
end
,[DNS]
,[DNS_DATE]
,[LifeStage_ID]
, Total_Days_since_Enrol
, Total_Days_Since_Last_Transaction
, Total_Days_Since_Last_visit
,[CUST_SUNDRY_INT_1]
,[CUST_SUNDRY_INT_2]
,isnull([CUST_SUNDRY_VARCHAR_1],0) [CUST_SUNDRY_VARCHAR_1]
,[CUST_SUNDRY_VARCHAR_2]
,[DELETED]
,C2.[Batch_Nbr]
, Batch_no
FROM dbo.[CUSTOMERS_DW] C1
LEFT JOIN dbo.[Customers_PTD_KPI] C2 ON C1.[CUSTOMER_ID] = C2.[CUSTOMER_ID]
where c1.deleted = 0
August 23, 2011 at 2:13 am
Can you attach the actual execution plans for both statements?
-- Gianluca Sartori
August 23, 2011 at 12:25 pm
Something's not right here... you should be getting an error about not being able to resolve the Customer_ID column because it lives in both tables and you don't have a table alias on it.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2011 at 1:13 pm
The last line on the ad-hoc statement (code inside the view run from SSMS) would normally be "where c1.deleted = 0 and C1.CUSTOMER_ID = 123" and when running the VIEW "Select * from TESTVIEW where customer_id = 123"
Unfortunately I do not know the technique or protocol to display two execution plans in this forum so they are viewable.
August 23, 2011 at 1:41 pm
I'm talking about the first line of the SELECT list.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2011 at 6:45 pm
it should be and was "Select C1.[CUSTOMER_ID]" -- a typo on my behalf here.
August 28, 2011 at 6:47 pm
uncle_boris (8/23/2011)
Unfortunately I do not know the technique or protocol to display two execution plans in this forum so they are viewable.
Run the code with the "Actual Execution Plan" turned on. Right click the execution plan and save it.
Start a new post here. Before you save the post, scroll down and find the "Edit Attachements" button.
Follow your nose from there. š
--Jeff Moden
Change is inevitable... Change for the better is not.
August 28, 2011 at 6:57 pm
Or... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 29, 2011 at 8:40 am
"When I do Select * from TESTVIEW where ID = 123 it takes ~44 seconds.
When I just run the ad-hoc T-SQL code in the VIEW it take < 1 second"
Boris, when you are selecting from the view, is the where inside the view, or outside? I would imagine that might alter the execution plan if the where was outside the view.
(I am positive someone here with better internals knowledge than I will correct me if I am wrong)
August 29, 2011 at 1:31 pm
Hello, thank you for all the replies.
I have found the issue and like a lot of problems the answer is simple and obvious.
The cause of the difference between running the VIEW and running the VIEW code ad-hoc is caused by the RANK function.
It looks like when running the ad-hoc code the WHEREclause is run first and then the RANK is done on a record set of one -- hence the < 1 second run time.
When running the view and passing in the WHERE clause the RANK is done on all the records (> 1000000) and then the WHERE clause retrieves the one row.
so the respective RANK values end up as:
Ad-hoc
Rank: 1
VIEW
Rank: 9548
even though the queries appears to be the same.
This makes sense to me anyway so an interesting lesson learnt.
Again, thank you for the replies.
August 30, 2011 at 3:21 am
Glad to hear you solved your problem!
You could have seen it clearly from the execution plans, by the way.
This is why it's so important to always include actual execution plans when posting performance problems.
Regards,
Gianluca
-- Gianluca Sartori
August 30, 2011 at 11:03 am
.
August 30, 2011 at 11:06 am
royce.bacon (8/30/2011)
Iām not positive on this but I believe what is happening is that when you execute SELECT * FROM TESTVIEW WHERE customer_id = 123 that it is first running the view which has no limits/filters in it therefore it returns all records and then the WHERE is applied to the results from the output of the view. When you execute the dynamic SQL with the WHERE in the SQL it knows to filter on the one customer id immediately.
SQL is not that stupid. The predicate applied to the view will be pushed as far down as possible. Depending on the view, that may be right down to the tables, or it may not, but it depends on what the view is defined as.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply