July 1, 2014 at 9:25 am
Hello,
I have been having a discussion with a colleague regarding different approaches to writing SQL. My colleague argues that it is more efficient to use derived tables rather than joining two tables say and filtering with a Where clause. For example, to query table of orders base on the columns values in linked orderline rows normal I would use this format:
select top
100 o.*
from
EcsPurOrder o
join ecspurextlink ol on o.id = ol.to_order
where
o.id in(5465,6855)
order by
o.id desc
However, using derived tables we could use this format:
select top
100 o.*
from
EcsPurOrder o
join
(select ol.to_order from ecspurextlink ol where ol.to_order in(5465,6855)) ol
on o.id = ol.to_order
order by
o.id desc
I would always argue in favour of the first method if only in terms or readability and assumed there couldn't be much difference, however, when checking the stats, the first method results in two scans;
Stats from normal join and where clause
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'order_'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
vs
Stats using derived table sub query
Table 'order_'. Scan count 0, logical reads 69, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So as I read this, it looks like the derived table generates one less table scan.
However, digging deeper, the execution plans look pretty much the same:
Plan using normal join table
https://drive.google.com/file/d/0B_g8mOkyPeZfRE83Nk1BbXFWajg/edit?usp=sharing
Plan using derived table
https://drive.google.com/file/d/0B_g8mOkyPeZfWjhtNVBobzNNU2c/edit?usp=sharing
Confused :crazy:
July 1, 2014 at 9:40 am
... My colleague argues that it is more efficient to use derived tables rather than joining two tables say and filtering with a Where clause. ...
You can argue to the death. in reallity it is not possible to generalise on this subject. Sometimes you will se better performance with first approach, sometimes with the second. It depends on many more factors other then using sub-query or not.
July 1, 2014 at 10:07 am
wilks (7/1/2014)
I would always argue in favour of the first method if only in terms or readability and assumed there couldn't be much difference, however, when checking the stats, the first method results in two scans;Stats from normal join and where clause
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'order_'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
vs
Stats using derived table sub query
Table 'order_'. Scan count 0, logical reads 69, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So as I read this, it looks like the derived table generates one less table scan.
Did you clean the buffer between both queries? The second one has a lot more logical reads, which mean the data is in cache.
The execution plans indicate that SQL Server engine is processing both queries the same way and the statistics indicate that the data is taken from different places (disk vs memory).
July 1, 2014 at 10:26 am
Scan count != number of table scans. Rather ignore the scan count, it's not consistent.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 1, 2014 at 10:40 am
the second query might have fewer scans, but it also has more reads, 75 vs. 12, so the first query appears to be doing less IO. Both are doing logical reads so neither is actually going to disk in this case.
The queries aren't exactly the same because the predicate is going against to_order in the second query and order.id in the first. This caused the optimizer to change the access order in the plan. In Query 1 the the Clustered Index Seek against EcsPurOrder is done first then passing those results to the Nest Loop join to get the information needs from ecspurextlink. Since the Clustered Index is also the PK, thus unique, it gets 2 rows so ecspurextlink is accessed 2 times. In the second query the index seek on ecspurextlink is done first which means it has to read more pages in because the loop is being executed once for each of the rows returned when seekign ecspurextlink. Since that isn't a unique index it could be thousands of rows.
I'd write the query you have this way because you aren't returning any data from ecspurextlink.
SELECT TOP 100
o.*
FROM
EcsPurOrder
WHERE
EXISTS ( SELECT
1
FROM
ecspurextlink ol
WHERE
o.id = ol.to_order ) AND
o.id IN (5465, 6855)
ORDER BY
o.id DESC
Logically you are only trying to return orders that have details and aren't returning any detail information. I bet it might perform even better.
Lastly, I would want to have my predicate be against the index that is unique so I'd never do what your colleague did. If I was doing a derived table I'd do it either like this:
SELECT TOP 100
o.*
FROM
EcsPurOrder o
JOIN (
SELECT
ol.to_order
FROM
ecspurextlink ol
) ol
ON o.id = ol.to_order
WHERE
o.id IN (5465, 6855)
ORDER BY
o.id DESC
or using cross apply like this:
SELECT TOP 100
o.*
FROM
EcsPurOrder o
CROSS APPLY (
SELECT
ol.to_order
FROM
ecspurextlink ol
WHERE
o.id = ol.to_order
) ol
WHERE
o.id IN (5465, 6855)
ORDER BY
o.id DESC
both of which will likely produce the same execution plan as your join operation.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 1, 2014 at 10:53 am
In the OP case, there are many ways to write the query to return the same results, eg. using temp tables for intermidiate results...
Again, it's impossible to generalise the best way based on such limitted given details. So many factors invloved, eg. indexes, table sizes, dataypes and so on and so on...
July 2, 2014 at 2:02 am
Thank you all for those excellent replies, I had misunderstood the scan count from IO statistics, I see now it can refer to an index or table scan.
I agree that it would make much more sense to do the seek using the PK on the Order table, but I was trying to make the DB do more work.
Crucially, as Jack pointed out, my two queries were using different fields in the where clause so I was comparing apples and pears. Having fixed that, both queries perform equally. Although the other suggestion by Jack using the cross apply seems to perform best, but scores less on the visibility front in my opinion, but that may just be familiarity.
DBCC DROPCLEANBUFFERS
select
o.*
from
EcsPurOrder o
join ecspurextlink ol on o.id = ol.to_order
where
ol.to_order in(5465,6855)
order by
o.id desc
(23 row(s) affected)
Table 'order_'. Scan count 0, logical reads 69, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
select
o.*
from
EcsPurOrder o
join
(select ol.to_order from ecspurextlink ol where ol.to_order in(5465,6855)) ol
on o.id = ol.to_order
order by
o.id desc
(23 row(s) affected)
Table 'order_'. Scan count 0, logical reads 69, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- using sub-query suggested by Jack - note returns different number of results
SELECT
o.*
FROM
EcsPurOrder o
WHERE
EXISTS ( SELECT
1
FROM
ecspurextlink ol
WHERE
o.id = ol.to_order ) AND
o.id IN (5465, 6855)
ORDER BY
o.id DESC
(2 row(s) affected)
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'order_'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
-- using cross apply suggested by Jack
SELECT TOP 100
o.*
FROM
EcsPurOrder o
CROSS APPLY (
SELECT
ol.to_order
FROM
ecspurextlink ol
WHERE
o.id = ol.to_order
) ol
WHERE
o.id IN (5465, 6855)
ORDER BY
o.id DESC
(23 row(s) affected)
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'order_'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Wilks
July 2, 2014 at 2:17 am
wilks (7/2/2014)
I had misunderstood the scan count from IO statistics, I see now it can refer to an index or table scan.
No, it doesn't. It is not a count of the number of scans done. Rather just ignore that value and concentrate on the logical reads.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 2, 2014 at 3:39 am
GilaMonster (7/2/2014)
wilks (7/2/2014)
I had misunderstood the scan count from IO statistics, I see now it can refer to an index or table scan.No, it doesn't. It is not a count of the number of scans done. Rather just ignore that value and concentrate on the logical reads.
And average execution time. While execution time varies wildly due to resource contention, you still want to get an idea of how long it runs. Just measuring reads doesn't always tell you which query is running faster.
(and yeah, I know you know this Gail, just adding to your answer)
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 2, 2014 at 3:41 am
Back to your basic argument, I'm with you. Using the language in the standard mechanism by default is usually the better approach. There are any number of valid reasons to use derived tables, but using them in place of a perfectly straight-forward JOIN is not the way to go. Now, you really want to mess with your co-worker, ask them why not use CROSS APPLY?
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 2, 2014 at 7:29 am
It makes sense that the EXISTS query would return fewer rows. When using JOIN or CROSS APPLY you are going to get the order information for each order detail row. When using EXISTS you will only get the distinct order
information. I'd bet if you did SELECT DISTINCT with JOIN and CROSS APPLY queries you'd only get 2 rows as well.
I'd argue that the way the example query is written you really only want/need the 2 rows returned anyway.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 2, 2014 at 8:58 am
Yep, looks like a good answer to me.
Thanks!
Wilks:-D
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply