July 14, 2005 at 6:48 am
Hi all! We are once again battling the age old question of performance: the use of Joins versus subqueries. I have tried to look online for an answer, and rather than purchasing yet another T-SQL How To book, I thought I would throw this one out to actual users. We are in a debate over whether or not to use a subquery in the WHERE clause or create a JOIN. Any random thoughts (as it pertains to the question would be most helpful!
Thanks again! -Marti
July 14, 2005 at 6:53 am
I like the derived table/join approach... it gives you room to add as many columns as you need down the line, and it's more often than not faster than the subquery (never seen a subquery faster than the join, only equal).
July 14, 2005 at 7:14 am
While there are exceptions, the general rule would be to use a join against either a table or a derived table. It's almost always going to be faster becuase of how the query optimizer works. That said, your mileage may vary, caveat emptor, test, test, test.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 14, 2005 at 11:22 am
Joins ARE almost always faster to execute than subqueries.
Why use subqueries? If your code will be read or modified in the future, it is often easier to decipher, and understand the logic, of a subquery.
July 14, 2005 at 11:49 am
And as I said it's easier to maintain .
July 14, 2005 at 6:12 pm
I'm still very much using the where clause over joins.
The only reason is that there's still no proof that joins are faster than where clause.
It all comes down to how you write your query.
It would be great if someone can provide two queries, one using joins and the other using where clause and we examine together?
Personally I still find where/subqueries are much readable than joins.
Unless it's proven otherwise I just stick to the where clause
July 14, 2005 at 6:35 pm
July 14, 2005 at 7:30 pm
Remi, you always got solutions in hands. Well I'm trying to catche to you.
here you go:
http://geekswithblogs.net/johnsperfblog/archive/2005/05/26/40646.aspx
July 14, 2005 at 7:52 pm
It's always more of the same... test all the versions you think may work and see which is fastest.
July 14, 2005 at 9:00 pm
I have created two queries on Northwind database, one using Join and the other using subquery.
QUERY A:
select customers.*
from customers
left join orders on orders.customerid = customers.customerid
where orders.customerid is NULL
QUERY B:
select c.*
from customers c
where not exists (select 1 from orders where customerid = c.customerid)
---------------TYPICAL SHOW PLAN RESULTS-----------
QUERY A:
|--Filter(WHERE[Orders].[CustomerID]=NULL))
|--Nested Loops(Left Outer Join, OUTER REFERENCES[Customers].[CustomerID]) WITH PREFETCH)
|--Clustered Index Scan(OBJECT[Northwind].[dbo].[Customers].[PK_Customers]))
|--Index Seek(OBJECT[Northwind].[dbo].[Orders].[CustomerID]), SEEK[Orders].[CustomerID]=[Customers].[CustomerID]) ORDERED FORWARD)
QUERY B:
|--Nested Loops(Left Anti Semi Join, OUTER REFERENCES[c].[CustomerID]) WITH PREFETCH)
|--Clustered Index Scan(OBJECT[Northwind].[dbo].[Customers].[PK_Customers] AS [c]))
|--Index Seek(OBJECT[Northwind].[dbo].[Orders].[CustomerID]), SEEK[Orders].[CustomerID]=[c].[CustomerID]) ORDERED FORWARD)
IF QUERY A and B are run in one batch:
QUERY A - query cost = 52.06%
QUERY B - query cost = 47.94%
In this case can you still say that joins are faster than subqueries???
July 15, 2005 at 4:06 am
What happens if you again run query A and B in one batch, but this time query B before query A?
July 15, 2005 at 5:25 am
You compared exists to left join.
I compared inner join to in. In and exists are not the same type of operators so they can't really be compared. Try the second query like this :
select c.*
from dbo.customers c
where customerid not in (select customerid from dbo.orders)
July 15, 2005 at 7:09 am
I remember reading somewhere that "IN" subqueries were very close to joins in speed, UNLESS the IN operater references a list of comma separated values. If you are running tests, I think you need to compare value lists to subqueries to joins.
Regardless, I find subqueries or value lists to be the most readable, so I prefer them if query efficiency is not an issue.
Sometimes, its a real headache to work around value lists (with temp tables and all), so I will use them as dynamic SQL [slap, slap] until I have time to recode the beast.
July 15, 2005 at 8:20 am
July 15, 2005 at 8:36 am
For 2 :
IF Object_id('fnSplit_Set') > 0
DROP FUNCTION dbo.fnSplit_Set
GO
IF Object_id('Numbers') > 0
DROP TABLE dbo.Numbers
GO
CREATE TABLE dbo.Numbers (PkNumber int identity(1,1), dude bit null, CONSTRAINT Pk_Number PRIMARY KEY CLUSTERED (PkNumber))
GO
INSERT INTO dbo.Numbers (dude)
SELECT NULL FROM (SELECT TOP 100 NULL AS A FROM master.dbo.spt_values) dt100 cross join (SELECT TOP 80 null AS A FROM master.dbo.spt_values) dt80
GO
ALTER TABLE dbo.Numbers
DROP COLUMN dude
GO
--Select min(PkNumber) as MinA, max(PkNumber) as MaxA from dbo.Numbers
--1, 8000
GO
CREATE FUNCTION [dbo].[fnSplit_Set] (@IDS as varchar(8000), @vcDelimiter varchar(3))
RETURNS TABLE
WITH SCHEMABINDING
AS
Return
Select dtSplitted.EachID, dtSplitted.Rank from (
SELECT SUBSTRING(@vcDelimiter + @IDs + @vcDelimiter, N.PkNumber + len(@vcDelimiter),
CHARINDEX(@vcDelimiter, @vcDelimiter + @IDs + @vcDelimiter, N.PkNumber + len(@vcDelimiter)) - N.PkNumber - len(@vcDelimiter)) as EachID
, (LEN(SUBSTRING(@IDs, 1, N.PkNumber)) - LEN (REPLACE (SUBSTRING(@IDs, 1, N.PkNumber), ',', ''))) + 1 AS Rank
FROM dbo.Numbers N
WHERE SUBSTRING(@vcDelimiter + @IDs + @vcDelimiter, N.PkNumber, len(@vcDelimiter)) = @vcDelimiter
AND PkNumber 0
GO
DECLARE @Ids as varchar(8000)
SET @IDS = ''
Select @IDS = @IDS + CAST(id as varchar(10)) + ',' from (Select TOP 10 id from dbo.SysObjects order by NEWID()) dtIds
SET @IDS = left(@IDS, ABS(len(@IDS) - 1))
PRINT @IDS
Select O.id, dtIDS.EachId, O.name, dtIDS.Rank from dbo.SysObjects O inner join (Select CAST(EachID as INT) AS EachID, Rank from dbo.fnSplit_Set (@IDS, ',')) dtIDS on O.id = dtIDS.EachID order by O.Name
Select O.id, dtIDS.EachId, O.name, dtIDS.Rank from dbo.SysObjects O inner join (Select CAST(EachID as INT) AS EachID, Rank from dbo.fnSplit_Set (@IDS, ',')) dtIDS on O.id = dtIDS.EachID order by dtIDS.Rank
--Don't ever forget to cast the eachid column to the same datatype as in the join to avoid table scans.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply