Sub-Query or Left-Outer Join (Performance Wise)

Question

Sub-Query or Left-Outer Join (Performance Wise)

Abdullah M. Al-Fararjeh-343215

SSC Enthusiast

Points: 185
More actions
June 22, 2009 at 7:20 am

#211202

Which one is Better and why ?
Use SQL subselects to consolidate queries on multiple tables into a single statement
Or
Use Left Outer Join
Example:
Table Employee(Employee_ID,Employee_Name,Department_ID)
Table Department(Department_ID,Department_Name)
•Table Employee has million records, table Department has Few Records
•Some Of the employees does not have department
Which Query is better for performance?
Select
Employee_ID,
Employee_Name,
(Select Department_Name From Department Where Department .Department_ID = Employee. Department_ID) as Department_Name
From Employee
OR
Select
Employee_ID,
Employee_Name,
Department_Name
From Employee
Left Outer Join
Department
On Department .Department_ID = Employee. Department_ID

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398910 More actions · Answer 1

There are always exceptions, but in general, you're better off using the LEFT JOIN. When in doubt, write the query both ways and look at the execution plans.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

DCPeterson SSCoach Points: 15266 More actions · Answer 2

In this case I'm pretty sure that the optimizer will just rewrite the subquery to be a join. I normally favor the join syntax just for stylistic reasons, but when in doubt, like Grant said, write it both ways and check.

/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

*****************/

Paul White SSC Guru Points: 150468 More actions · Answer 3

Generally speaking, the query optimizer doesn't much care how you write your queries.

Ok, so that is a bold statement - but it is at least partly true, at least to a large extent on up-to-moderately complex statements.

The SQL you write goes through a great deal of analysis, conversion and optimization before it hit the execution engine. Your query text is essentially translated in an internal tree structure of logical operators which represents the logical equivalent of what you have asked for.

The SQL Server Algebrizer (which replaces the 'Normalizer' in 2000) does a fair amount of this work, including helping to decide on parallelism, how to approach sub-queries and aggregations, and where to locate objects in cache memory. That is but one stage - there are many opportunities for the process to translate your written query into something equivalent, if it decides it is worth it.

Anyway, so long as the left join and subquery are exactly equivalent I guess it comes down to a question of writing style. Personally, I almost always write it as a join.

Check the query plans involved - I would bet dollars to doughnuts that the subquery is implemented as a logical left join 🙂

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Gift Peddie SSC Guru Points: 73570 More actions · Answer 4

Check the query plans involved - I would bet dollars to doughnuts that the subquery is implemented as a logical left join

Actually in ANSI SQL 1999 Subquery is equal to an INNER JOIN so it is techincally faster than OUTER JOIN because of the OUTER JOIN default NULL. So if Microsoft implemented it in 2005 and 2008 there should be some difference.

If is the operative word.

Kind regards,
Gift Peddie

Paul White SSC Guru Points: 150468 More actions · Answer 5

Gift Peddie (6/24/2009)
Actually in ANSI SQL 1999 Subquery is equal to an INNER JOIN so it is techincally faster than OUTER JOIN because of the OUTER JOIN default NULL. So if Microsoft implemented it in 2005 and 2008 there should be some difference.

1999? 😉

SQL standards are a myth.

😀

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Grant Fritchey SSC Guru Points: 398910 More actions · Answer 6

Paul White (6/24/2009)
The SQL Server Algebrizer (which replaces the 'Normalizer' in 2000) does a fair amount of this work, including helping to decide on parallelism, how to approach sub-queries and aggregations, and where to locate objects in cache memory. That is but one stage - there are many opportunities for the process to translate your written query into something equivalent, if it decides it is worth it.

While I'm in agreement with the general statements of the post, I've got a pick a nit with this part. The algebrizer creates the parse tree, the first step in optimization. It doesn't determine parallelism or where to locate objects in cache. It does resolve object names and it does flatten operators, which is part of what we're talking about here, rearranging the query so that you can see correlated sub-queries turned into joins or LIKE statements turned into >= and < operations. It also does aggregate & grouping binding prior to the optimization process.

Sorry. Just had to get that one off my chest.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Paul White SSC Guru Points: 150468 More actions · Answer 7

Grant Fritchey (6/25/2009)
While I'm in agreement with the general statements of the post, I've got a pick a nit with this part. The algebrizer creates the parse tree, the first step in optimization. It doesn't determine parallelism or where to locate objects in cache. It does resolve object names and it does flatten operators, which is part of what we're talking about here, rearranging the query so that you can see correlated sub-queries turned into joins or LIKE statements turned into >= and < operations. It also does aggregate & grouping binding prior to the optimization process.
Sorry. Just had to get that one off my chest.

That's ok Grant, but I'm not so sure, and I don't have the SQL source code to hand to check 😉

A quick Google shows several books and articles that assert that the algebrizer does in fact get involved in parallelism and caching.

I'm not going to post a lmgtfy link 🙂 but here's the one that comes up first for me

I don't actually care that much though!

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

ab5sr SSCarpal Tunnel Points: 4056 More actions · Answer 8

Abdullah M. Al-Fararjeh (6/22/2009)

Which one is Better and why ?

Great question! Welcome to the board by the way.

My rule of thumb is...don't listen to rules of thumb and find out which is better on your own. Best-practices aside, write the code in a number of different ways and figure it out using the tools at your disposal.

I just finished up a very large query LEFT OUTER JOINing a table about a dozen times. Turns out that a UNION ALL was 10x faster. Bottom line - don't rat-hole yourself into one or two ways to do something.

Grant Fritchey SSC Guru Points: 398910 More actions · Answer 9

Paul White (6/25/2009)
That's ok Grant, but I'm not so sure, and I don't have the SQL source code to hand to check 😉
A quick Google shows several books and articles that assert that the algebrizer does in fact get involved in parallelism and caching.
I'm not going to post a lmgtfy link 🙂 but here's the one that comes up first for me
I don't actually care that much though!

I tracked down some more information. According to a Microsoft developer I spoke with, the determination for parallelism is made in the optimizer. BUT, there are functions & processes that will be marked by the algebrizer as "these can't be parallel" which will prevent the optimizer from choosing a parallel plan, but the actual determination is part of the optimizer process.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Paul White SSC Guru Points: 150468 More actions · Answer 10

Grant Fritchey (6/26/2009)
I tracked down some more information. According to a Microsoft developer I spoke with, the determination for parallelism is made in the optimizer. BUT, there are functions & processes that will be marked by the algebrizer as "these can't be parallel" which will prevent the optimizer from choosing a parallel plan, but the actual determination is part of the optimizer process.

Cool. I did say "including helping to decide on parallelism" 😉

Thanks though.

edit: added smiley

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi