October 31, 2008 at 9:10 pm
most time, if i want to retrieve a value from other i always use subquery, but if i want many value(more than two column) from other column i use join. but i don't know deeply about how different between both method, especially the speed (while the table become larger and larger).
what is the correct way should i use?
i know both type have their own advantage, what?
thank for supporting idea and documents,
October 31, 2008 at 9:18 pm
There's virtually no difference between a JOIN and an IN (subquery)... they will both create the same execution plan and execute within milliseconds of each other... unless there's a correlation in the sub-query... then, it depends on whether you've used an equality in the correlation or not. If you've used an inequality based on something temporal (IDENTITY or DATETIME column, etc), then you could end up with a "Triangular Join"... a form of RBAR on steroids.
Please see the following article...
http://www.sqlservercentral.com/articles/T-SQL/61539/
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2008 at 9:52 pm
I read somewhere (i think it was AskTom) but depending on how you wanted your data or the business logic (the way you write your query if using a subquery or join) your results may vary...
I'm not the expert, but I just remember reading it and seeing the example, so I thought i would share that, not hoping to confuese...lol
John
October 31, 2008 at 10:07 pm
jsteinbeck (10/31/2008)
I read somewhere (i think it was AskTom) but depending on how you wanted your data or the business logic (the way you write your query if using a subquery or join) your results may vary...I'm not the expert, but I just remember reading it and seeing the example, so I thought i would share that, not hoping to confuese...lol
John
First, "Ask Tom" is about Oracle, not SQL Server... two different engines... two different worlds.
INNER JOIN and WHERE IN are exactly the same in SQL Server... there's a ton of posts on these forums that prove it. Some have shown that certain uses (not many) of WHERE EXISTS can be a bit faster, but normally the rest of the code is such crap, folks should concentrate more on writing the other parts of the query correctly. Correllated SUBqueries that are based on equalities will also normally resolve to an INNER JOIN... sometimes they resolve to something as bad or far worse than a cursor.
Of course, having the wrong criteria in an INNER JOIN can also be far worse than a cursor...
... and if you use a cursor to process RBAR instead of using it to control the processing of sets of information, then you're doing something wrong. Chances are, if you're using a CLR (with very few exceptions like RegExReplace), you're also doing something wrong.
Ok... duck! Here come the zealots that will insist you can use a cursor or CLR if you're pressed for time or don't know T-SQL well enough to do it the right way. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2008 at 10:37 pm
if i use correlated subquery, and join to two table.
what way should i choose for optimal performance when there large number of rows.
For Example
--************Retrieve a value from any one-table *****************************
--JOIN
select col.* , obj.Name
from ( syscolumns as col inner join sysobjects as obj
on col.id=obj.id)
order by obj.Name
--Correlated SubQuery
select col.* , (Select obj.Name from sysobjects as obj where obj.id=col.id) as XXName
from syscolumns as col
Order by xxName
--************Retrieve a value from any one-table *****************************
--JOIN
select col.*
, obj.Name
,type.Name as TypeName
from ( syscolumns as col inner join sysobjects as obj on col.id=obj.id)
inner join systypes as type on col.xtype=type.xtype and type.xtype <>321
order by obj.Name
--SubQuery
select
col.*
,(Select obj.Name from sysobjects as obj where obj.id=col.id) as XXName
,(select type.Name from systypes as type where type.xtype=col.xtype and type.xtype<>231) as TypeName
from syscolumns as col
Order by xxName
thank for your reply
November 1, 2008 at 11:06 am
ry.rith (10/31/2008)
if i use correlated subquery, and join to two table.what way should i choose for optimal performance when there large number of rows.
For Example ...
For the type of queries you wrote, there's no question that the INNER JOIN method would be the one that I would use. But, just to demonstrate, let's build a larger test table than the 4000-5000 rows normally available in the Master database of SQL Server 2000 and see... 100,000 rows should do it...
--=============================================================================
-- Setup to test on more data than available in SysColumns
--=============================================================================
--===== Identify the database to use (NOTHING will be written here)
USE Master
--===== Create a larger test table than SysColumns -- 22 Seconds
SELECT TOP 100000 sc1.*
INTO #BigTable
FROM dbo.SysColumns sc1
CROSS JOIN dbo.SysColumns sc2
--===== Give it the best index we can
CREATE CLUSTERED INDEX CX_BigTable_ID ON #BigTable (ID) -- 4 seconds
... and, now... your first set of queries... run them (in the same window as above) with and without the ORDER BY and study both the Estimated and Actual Execution Plans... IGNORE THE % OF BATCH because it lies, really bad sometimes...
--=============================================================================
-- These are the 1st set of tests like what you had in your example code
--=============================================================================
--===== Begin measuring durations
SET STATISTICS TIME ON
--************Retrieve a value from any one-table *****************************
--===== JOIN
SELECT col.*,
obj.Name
FROM #BigTable col
INNER JOIN dbo.SysObjects obj ON col.ID = obj.ID
ORDER BY obj.Name
PRINT REPLICATE('=',80)
--===== Correlated SubQuery
SELECT col.*,
(SELECT obj.Name FROM dbo.SysObjects obj WHERE obj.ID = col.ID) AS XXName
FROM #BigTable col
ORDER BY xxName
--===== Turn off duration measurements
SET STATISTICS TIME OFF
Here's the performance output I get on my box (SQL Server 2000 Developer's Edition). Clearly, the INNER JOIN method wins especially where precious CPU time is involved...
[font="Courier New"]SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(100000 row(s) affected)
SQL Server Execution Times:
CPU time = 891 ms, elapsed time = 5684 ms.
================================================================================
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 13 ms.
(100000 row(s) affected)
SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 6250 ms.[/font]
Now... let's do something similar with your second set of queries... are you in for a surprise!
--=============================================================================
-- These are the 2nd set of tests like what you had in your example code
--=============================================================================
--===== Begin measuring durations
SET STATISTICS TIME ON
--************Retrieve a value from any one-table *****************************
--===== JOIN
SELECT col.*,
obj.Name,
type.Name AS TypeName
FROM #BigTable col
INNER JOIN dbo.SysObjects obj ON col.ID = obj.ID
INNER JOIN dbo.SysTypes type ON col.XType = type.XType AND type.XType <> 231
ORDER BY obj.Name
PRINT REPLICATE('=',80)
--===== JOIN with proper use of criteria
SELECT col.*,
obj.Name,
type.Name AS TypeName
FROM #BigTable col
INNER JOIN dbo.SysObjects obj ON col.ID = obj.ID
INNER JOIN dbo.SysTypes type ON col.XType = type.XType
WHERE type.XType <> 231
ORDER BY obj.Name
PRINT REPLICATE('=',80)
--===== Correlated SubQuery (returns a different number of rows than either of the above)
SELECT col.*,
(SELECT obj.Name FROM dbo.SysObjects obj WHERE obj.ID = col.ID) AS XXName,
(SELECT type.Name FROM dbo.SysTypes type WHERE type.XType = col.XType AND type.XType <> 231) AS TypeName
FROM #BigTable col
ORDER BY xxName
--===== Turn off duration measurements
SET STATISTICS TIME OFF
First, notice that I added an extra code snippet to put the filter criteria where it belongs. As you can see below, it DOES make a difference in performance...
[font="Courier New"]
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(95044 row(s) affected)
SQL Server Execution Times:
CPU time = 2828 ms, elapsed time = 7660 ms.
================================================================================
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(95044 row(s) affected)
SQL Server Execution Times:
CPU time = 2672 ms, elapsed time = 7343 ms.
================================================================================
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(100000 row(s) affected)
SQL Server Execution Times:
CPU time = 2094 ms, elapsed time = 6986 ms.
[/font]
Hmmm.... also notice that you're query returns ALL the rows instead of being filtered... that's a common mistake when folks use sub-queries... the criteria in a correlated subquery in the Select list has NO effect on the number of rows returned.
Now, for the biggest "surprise" of them all... look at the output of the 3 queries above... see anything wrong? Anything horribly wrong? Heh... [font="Arial Black"]the 3rd output is NOT sorted correctly even though there is a very obvious ORDER BY in the code![/font] That's because the optimizer processes by COLUMN... not by ROW and it resolves the RBAR join between #BigTable and SysObjects first and sorts it and then finally resovles the RBAR join between that result and SysTypes which, of course, results in the wrong sort order. That's also why it appears that the correlated sub-query is faster... the sort isn't working quite as hard.
The bottom line here is that having correlated sub-queries in the Select list are a form of RBAR and databases just weren't meant to use RBAR. Because of that, correlated sub-queries can sometimes create unexpected results and, in other cases, can be a lot slower than good set-based joins.
My recommendation would be this... with maybe 1 or 2 very special case exceptions, don't use correlated sub-queries in the Select list... and since a correlated sub-query with equalities as criteria in the WHERE clause resolve to an INNER JOIN, there almost never a reason to use them in the WHERE clause either. The exception to that might be that WHERE NOT EXISTS and WHERE NOT IN resolve a bit faster than a LEFT OUTER JOIN with an ISNULL criteria to do the same thing.
And, don't try to justify the use of correlated sub-queries based on row counts. Estimated usage can change drastically and, even if they don't, some poor slob will copy your code and use it on something with much larger row counts.
Correlated sub-queries ARE a form a RBAR and should be avoided at all costs with only 1 or 2 exceptions that I've ever done. The exception that I've used is finding "gaps" in serial numbers (heh... don't ask ;)).
--Jeff Moden
Change is inevitable... Change for the better is not.
November 1, 2008 at 4:44 pm
Jeff, are you putting your articles in thread posts again? 🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
November 1, 2008 at 11:01 pm
Heh... it would appear so... maybe this one would make a good article. Thanks, Barry.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 3, 2008 at 6:33 am
Hi Jeff,
Why can't you publish it as an article ? So that it will be useful to lot of folks like me.
karthik
November 3, 2008 at 2:02 pm
Hi Jeff, thanks for the excellent explanation, much appreciated. I prefer replies like these compared to the - ya perhaps, maybe, try this, depends.
Regards
Kevin
November 3, 2008 at 2:05 pm
clive (11/3/2008)
Hi Jeff, thanks for the excellent explanation, much appreciated. I prefer replies like these compared to the - ya perhaps, maybe, try this, depends.Regards
Kevin
When was the last time when yoursel gave such explanation to somebody else?
😉
_____________
Code for TallyGenerator
November 3, 2008 at 6:11 pm
I appreciate the comments guys. Thank you very much. Perhaps I will write it up as an article with a couple of more "gotcha" examples... Thanks again.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 22, 2011 at 5:25 am
In case of sub query sql server affects in speed so as less as possible use this best use is join
Ranjan
Developer in http://www.Hamarashehar.com
August 22, 2011 at 7:13 am
Please note: 3 year old thread.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply