Derived tables vs Normal table joins

Question

Derived tables vs Normal table joins

wilks

Old Hand

Points: 325
More actions
July 1, 2014 at 9:25 am

#308686

Hello,
I have been having a discussion with a colleague regarding different approaches to writing SQL. My colleague argues that it is more efficient to use derived tables rather than joining two tables say and filtering with a Where clause. For example, to query table of orders base on the columns values in linked orderline rows normal I would use this format:
select top
100 o.*
from
EcsPurOrder o
join ecspurextlink ol on o.id = ol.to_order
where
o.id in(5465,6855)
order by
o.id desc
However, using derived tables we could use this format:
select top
100 o.*
from
EcsPurOrder o
join
(select ol.to_order from ecspurextlink ol where ol.to_order in(5465,6855)) ol
on o.id = ol.to_order
order by
o.id desc
I would always argue in favour of the first method if only in terms or readability and assumed there couldn't be much difference, however, when checking the stats, the first method results in two scans;
Stats from normal join and where clause
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'order_'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
vs
Stats using derived table sub query
Table 'order_'. Scan count 0, logical reads 69, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So as I read this, it looks like the derived table generates one less table scan.
However, digging deeper, the execution plans look pretty much the same:
Plan using normal join table
https://drive.google.com/file/d/0B_g8mOkyPeZfRE83Nk1BbXFWajg/edit?usp=sharing
Plan using derived table
https://drive.google.com/file/d/0B_g8mOkyPeZfWjhtNVBobzNNU2c/edit?usp=sharing
Confused :crazy:

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 1

... My colleague argues that it is more efficient to use derived tables rather than joining two tables say and filtering with a Where clause. ...

You can argue to the death. in reallity it is not possible to generalise on this subject. Sometimes you will se better performance with first approach, sometimes with the second. It depends on many more factors other then using sub-query or not.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

wilks (7/1/2014)
I would always argue in favour of the first method if only in terms or readability and assumed there couldn't be much difference, however, when checking the stats, the first method results in two scans;
Stats from normal join and where clause
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'order_'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
vs
Stats using derived table sub query
Table 'order_'. Scan count 0, logical reads 69, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So as I read this, it looks like the derived table generates one less table scan.

Did you clean the buffer between both queries? The second one has a lot more logical reads, which mean the data is in cache.

The execution plans indicate that SQL Server engine is processing both queries the same way and the statistics indicate that the data is taken from different places (disk vs memory).

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

Scan count != number of table scans. Rather ignore the scan count, it's not consistent.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jack Corbett SSC Guru Points: 184397 More actions · Answer 4

the second query might have fewer scans, but it also has more reads, 75 vs. 12, so the first query appears to be doing less IO. Both are doing logical reads so neither is actually going to disk in this case.

The queries aren't exactly the same because the predicate is going against to_order in the second query and order.id in the first. This caused the optimizer to change the access order in the plan. In Query 1 the the Clustered Index Seek against EcsPurOrder is done first then passing those results to the Nest Loop join to get the information needs from ecspurextlink. Since the Clustered Index is also the PK, thus unique, it gets 2 rows so ecspurextlink is accessed 2 times. In the second query the index seek on ecspurextlink is done first which means it has to read more pages in because the loop is being executed once for each of the rows returned when seekign ecspurextlink. Since that isn't a unique index it could be thousands of rows.

I'd write the query you have this way because you aren't returning any data from ecspurextlink.

SELECT TOP 100

o.*

FROM

EcsPurOrder

WHERE

EXISTS ( SELECT

1

FROM

ecspurextlink ol

WHERE

o.id = ol.to_order ) AND

o.id IN (5465, 6855)

ORDER BY

o.id DESC

Logically you are only trying to return orders that have details and aren't returning any detail information. I bet it might perform even better.

Lastly, I would want to have my predicate be against the index that is unique so I'd never do what your colleague did. If I was doing a derived table I'd do it either like this:

SELECT TOP 100

o.*

FROM

EcsPurOrder o

JOIN (

SELECT

ol.to_order

FROM

ecspurextlink ol

) ol

ON o.id = ol.to_order

WHERE

o.id IN (5465, 6855)

ORDER BY

o.id DESC

or using cross apply like this:

SELECT TOP 100

o.*

FROM

EcsPurOrder o

CROSS APPLY (

SELECT

ol.to_order

FROM

ecspurextlink ol

WHERE

o.id = ol.to_order

) ol

WHERE

o.id IN (5465, 6855)

ORDER BY

o.id DESC

both of which will likely produce the same execution plan as your join operation.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 5

In the OP case, there are many ways to write the query to return the same results, eg. using temp tables for intermidiate results...

Again, it's impossible to generalise the best way based on such limitted given details. So many factors invloved, eg. indexes, table sizes, dataypes and so on and so on...

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

wilks Old Hand Points: 325 More actions · Answer 6

Thank you all for those excellent replies, I had misunderstood the scan count from IO statistics, I see now it can refer to an index or table scan.

I agree that it would make much more sense to do the seek using the PK on the Order table, but I was trying to make the DB do more work.

Crucially, as Jack pointed out, my two queries were using different fields in the where clause so I was comparing apples and pears. Having fixed that, both queries perform equally. Although the other suggestion by Jack using the cross apply seems to perform best, but scores less on the visibility front in my opinion, but that may just be familiarity.

DBCC DROPCLEANBUFFERS

select

o.*

from

EcsPurOrder o

join ecspurextlink ol on o.id = ol.to_order

where

ol.to_order in(5465,6855)

order by

o.id desc

(23 row(s) affected)

Table 'order_'. Scan count 0, logical reads 69, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

select

o.*

from

EcsPurOrder o

join

(select ol.to_order from ecspurextlink ol where ol.to_order in(5465,6855)) ol

on o.id = ol.to_order

order by

o.id desc

(23 row(s) affected)

Table 'order_'. Scan count 0, logical reads 69, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- using sub-query suggested by Jack - note returns different number of results

SELECT

o.*

FROM

EcsPurOrder o

WHERE

EXISTS ( SELECT

1

FROM

ecspurextlink ol

WHERE

o.id = ol.to_order ) AND

o.id IN (5465, 6855)

ORDER BY

o.id DESC

(2 row(s) affected)

Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'order_'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

-- using cross apply suggested by Jack

SELECT TOP 100

o.*

FROM

EcsPurOrder o

CROSS APPLY (

SELECT

ol.to_order

FROM

ecspurextlink ol

WHERE

o.id = ol.to_order

) ol

WHERE

o.id IN (5465, 6855)

ORDER BY

o.id DESC

(23 row(s) affected)

Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'order_'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Wilks

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 7

wilks (7/2/2014)
I had misunderstood the scan count from IO statistics, I see now it can refer to an index or table scan.

No, it doesn't. It is not a count of the number of scans done. Rather just ignore that value and concentrate on the logical reads.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 8

GilaMonster (7/2/2014)
wilks (7/2/2014)
I had misunderstood the scan count from IO statistics, I see now it can refer to an index or table scan.
No, it doesn't. It is not a count of the number of scans done. Rather just ignore that value and concentrate on the logical reads.

And average execution time. While execution time varies wildly due to resource contention, you still want to get an idea of how long it runs. Just measuring reads doesn't always tell you which query is running faster.

(and yeah, I know you know this Gail, just adding to your answer)

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 9

Back to your basic argument, I'm with you. Using the language in the standard mechanism by default is usually the better approach. There are any number of valid reasons to use derived tables, but using them in place of a perfectly straight-forward JOIN is not the way to go. Now, you really want to mess with your co-worker, ask them why not use CROSS APPLY?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jack Corbett SSC Guru Points: 184397 More actions · Answer 10

It makes sense that the EXISTS query would return fewer rows. When using JOIN or CROSS APPLY you are going to get the order information for each order detail row. When using EXISTS you will only get the distinct order

information. I'd bet if you did SELECT DISTINCT with JOIN and CROSS APPLY queries you'd only get 2 rows as well.

I'd argue that the way the example query is written you really only want/need the 2 rows returned anyway.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

wilks Old Hand Points: 325 More actions · Answer 11

Yep, looks like a good answer to me.

Thanks!

Wilks:-D