April 21, 2009 at 6:44 am
I have a view that selects data from several tables and also contains two columns that are sub queries of two additional tables. The sub queries are expensive for performance. When I select from the view but do not select the sub query columns it appears that the subqueries are executed due to the execution time, which is what I expected. However, when I compare the execution plans for selecting from the base tables and selecting from the view (minus the sub query columns) the execution plans are identical and do not show the sub query execution. However the run time for selecting from the view would suggest that the sub queries are being executed even though they do not show up in the execution plan.
My questions are:
Are sub queries in views always executed even if their respective columns are not selected?
If so, why does the sub query execution not show up in the execution plan for the query?
Are their any ways to turn off executing the sub query in a view if it is not selected?
Thanks in advance.
April 21, 2009 at 8:41 am
I would have to see the code to fully understand the issue. However, no, there's no way to selectively limit the query defined within a view. The query is going to run as you define it, including sub-selects even though you're not referencing those columns.
As far as identical query plans between the base tables and the view w/o the sub-query, again, I'd need to see the two queries that you're comparing, but if the query is different, you will get different execution plans.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 21, 2009 at 9:39 am
Thank you Grant for your reply and help!
Here is the setup...
/*VIEW DEFINITION*/
/*NOTE: VBaseAllUsers accesses the base tables directly*/
create view VAllUsers as
select
VBaseAllUsers.*,
(
select count(*)
from UserStudentAssoc
where UserStudentAssoc.UserID=VBaseAllUsers.iUser
and AssocType=1
and DomainID = 1
)
as iNumStudents,
(select count(*)
from UserStudentAssoc
where UserStudentAssoc.UserID=VBaseAllUsers.iUser
and AssocType>1 and DomainID = 1
)
as iNumTeamStudents,
(select
case
when sum(bOutOfCompliance) is null then 0
else sum(bOutOfCompliance)
end
from VStudents
where VStudents.iCaseManager=VBaseAllUsers.iUser
)
as iNumOutOfCompliance,
(select
case
when sum(bInCompliance) is null then 0
else sum(bInCompliance)
end
from VStudents
where VStudents.iCaseManager=VBaseAllUsers.iUser
)
as iNumInCompliance
from VBaseAllUsers
/*BASE TABLE QUERY*/
SELECT users.id,
fullname,
usercode,
usertypes.name,
workphone,
FROM users
JOIN usertypes
ON usertypes.id = users.usertype
WHERE users.id = 235
AND users.inactive = 0
/*VIEW QUERY, NOT REFERENCING SUB QUERIES IN VIEW, RESULTS SAME AS ABOVE QUERY*/
SELECT iuser,
fullname,
usercode,
usertype,
workphone,
FROM vallusers
WHERE iuser = 235
AND inactive = 0
Both the base table query and the view query return the same result and have identical execution plans. The view query is much slower than the base table query. When I explicitly select the sub query columns from the view the processing is then displayed in the execution plan but the execution time is roughly the same as when the sub query columns are not selected.
It would appear that, like you said, all parts of the query are being executed even when the sub query columns are not referenced but the sub query part of the execution plan is only being displayed if the sub query columns are referenced explicitly. There also appears to be a slight amount of overhead to actually displaying the sub query section of the execution plan as the execution time is slightly longer than that that does not reference the sub query columns. They are both far slower than the query of the base tables.
If the sub queries are being executed regardless of whether they are referenced in the select, why does the execution plan not display their contribution to the execution if the columns are not explicitly referenced? It appears that time is being consumed in running the sub queries.
I have uploaded screen captures of the execution plans for:
Execution plan of the view query not referencing the subqueries
Execution plan of the base tables query
Execution plan of the view query with explicit reference to the sub query columns.
Thanks in advance for you help!
April 21, 2009 at 12:20 pm
I just did some tests on my own. I'm getting the same results you are. I never really tried this before, breaking a view down in this way. I'm seeing two different execution plans for two different queries against a view. I'm a little surprised that SQL Server is that smart. This could be a good blog post or a short article... anyway...
I'm also seeing two different query execution times. When I run the sample query and select the column that includes a sub-select similar to yours, I'm getting more scans & reads than I get when I run the sub-select without it. It also seems to run faster (although I'd need better test data than I'm currently working with to confirm this). Can you get the STATISTICS IO from the two queries against the view? I'll keep working with this a bit on my end.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 21, 2009 at 12:43 pm
Hi Grant, Thanks for having a look at this! It would appear that a lot more is happening when explicitly selecting the sub query columns.The statistics IO output...
STATISTICS IO for query of view not referencing sub squery columns
Table 'UserTypes'. Scan count 0, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Users'. Scan count 0, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
STATISTICS IO for query of view that explicitly references sub squery columns
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Students'. Scan count 2, logical reads 4, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'UserStudentAssoc'. Scan count 2, logical reads 4, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'UserTypes'. Scan count 0, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Users'. Scan count 0, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Thanks again, I am very interested to hear your interpretation/thoughts.
April 21, 2009 at 12:46 pm
OK. I'm stupid.
Of course it gets different execution plans. What I said before about the view, that it runs as is, that's flat out wrong. Instead SQL Server unpacks the view, it breaks down the select statement against the view into a new select statement when you run it. So that's perfectly normal and, in fact, you should expect to see a performance difference between your two queries. I was wrong about that too.
That fact that you're not is the issue. I'd be curious if you ran a select against the view twice, one that included a column that is not a sub-select and a second column that is not a sub-select along with another query that includes a column this not a part of a sub-select and one that is. If possible make sure both queries return the same amount of data. I don't care about the execution plan in this case, all I want is STATISTICS IO and TIME.
I suspect that you'll see a difference although it may be small. If my theory is right, the query with two columns that are not part of a sub-select will be a little faster.
I think what's happening is that with all the sub-selects (not the best way to go about gathering data by the way) the unpacking process where SQL Server figures out what parts of the view it needs is taking a long time because of all the sub-selects. I'll have to set up an experiment here to test this theory, but you might be able to see it on your end.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 21, 2009 at 12:49 pm
Right. That IO is exactly what we should expect to see. Now we need to see the execution times... Actually, especially the compile time for the query.
Are you running this on a production system? If not, run DBCC FREEPROCCACHE prior to running the tests in order to be sure we'll get a compile on the queries. But, please, don't run that in production.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 21, 2009 at 1:10 pm
Just getting back to the thread, got caught in a webex...
I am running on my local dev box and am running dbcc dropcleanbuffers and dbcc freeproccache before each run. I'll run the queries you suggest and post the data in moment. Thanks.
April 21, 2009 at 1:39 pm
Grant, let me know if this is not what you are looking for.
AVERAGE EXECUTION TIME for query not referencing sub query columns
Time Statistics
Client processing time0.0000
Total execution time433.9000
Wait time on server replies433.9000
SQL Server parse and compile time:
CPU time = 156 ms, elapsed time = 381 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 27 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
AVERAGE EXECUTION TIME for query that explicitly references sub squery columns
Time Statistics
Client processing time1.5000
Total execution time515.1000
Wait time on server replies513.6000
SQL Server parse and compile time:
CPU time = 219 ms, elapsed time = 470 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 122 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Thanks!
April 21, 2009 at 2:07 pm
Grant, it looks like your theory is, on the surface, correct. The query that does not explicitly reference the sub query columns is consistently a little bit faster than the query that explicitly references the sub query columns. They are both, however, far slower than the query of the base tables of the view. It looks like this process of reformulating the select based on the select on the view is costly. How can one check this?
Thanks
April 21, 2009 at 3:50 pm
Yep, you've got the answer right there.
SQL Server parse and compile time:
CPU time = 156 ms, elapsed time = 381 ms.
That parse & compile time is pretty long. It's because of the unpacking process. The execution time is radically faster, 27ms vs 122ms. So it's the unpacking that is hurting you.
I'm sorry I was giving out bad information initially.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 21, 2009 at 7:35 pm
Grant, no problem. Thanks for your help in figuring this out. We have a lot of views that will need review based on this and I have a new method to do so.
April 22, 2009 at 8:21 am
After more thought it strikes me as this is almost a bug in that it would seem that the parser/normalizer should recognize early on that the sub query is not needed since it is not selected. This way it would not convert it to it's equivalent joins etc. prior to optimizing. This is a case where the time for optimizing and reformulating far exceeds the benefit of the optimization. I recognize that the sub queries are not the most efficient means to get the data but that is how the system I am on has been developed. Is there any way to influence the optimization and reformulation steps to improve this process.
Thanks in advance.
April 22, 2009 at 8:44 am
Well, the optimizer is going through a binding process on the query. It has to resolve everything involved before it begins the process of optimization. I suspect that's the step that's hurting you here. I'm still experimenting with it to replicate the behavior.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 22, 2009 at 8:47 am
Whoops. Hold on a second. These are all views. We're dealing with nested views right? Ah, that adds a pretty severe level of complexity. It's resolving all the objects on all the views and then optimizing. Nesting views is a way to mask complexity. This can be helpful for programming, but it seldom is for optimization.
I need to change my experiments and start again.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply