Joins versus subqueries

Question

Joins versus subqueries

Marti Ryan

Ten Centuries

Points: 1058
More actions
July 14, 2005 at 6:48 am

#90307

Hi all! We are once again battling the age old question of performance: the use of Joins versus subqueries. I have tried to look online for an answer, and rather than purchasing yet another T-SQL How To book, I thought I would throw this one out to actual users. We are in a debate over whether or not to use a subquery in the WHERE clause or create a JOIN. Any random thoughts (as it pertains to the question would be most helpful!
Thanks again! -Marti

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

I like the derived table/join approach... it gives you room to add as many columns as you need down the line, and it's more often than not faster than the subquery (never seen a subquery faster than the join, only equal).

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 2

While there are exceptions, the general rule would be to use a join against either a table or a derived table. It's almost always going to be faster becuase of how the query optimizer works. That said, your mileage may vary, caveat emptor, test, test, test.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

sara karasik SSCommitted Points: 1573 More actions · Answer 3

Joins ARE almost always faster to execute than subqueries.

Why use subqueries? If your code will be read or modified in the future, it is often easier to decipher, and understand the logic, of a subquery.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

Ninja's_RGR'us

SSC Guru

Points: 294069

July 14, 2005 at 11:49 am

#574503

And as I said it's easier to maintain .

ND-215452 SSCrazy Points: 2037 More actions · Answer 5

I'm still very much using the where clause over joins.

The only reason is that there's still no proof that joins are faster than where clause.

It all comes down to how you write your query.

It would be great if someone can provide two queries, one using joins and the other using where clause and we examine together?

Personally I still find where/subqueries are much readable than joins.

Unless it's proven otherwise I just stick to the where clause

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

Here you go :

http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=201238&p=2

ND-215452 SSCrazy Points: 2037 More actions · Answer 7

Remi, you always got solutions in hands. Well I'm trying to catche to you.

here you go:

http://geekswithblogs.net/johnsperfblog/archive/2005/05/26/40646.aspx

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

It's always more of the same... test all the versions you think may work and see which is fastest.

ND-215452 SSCrazy Points: 2037 More actions · Answer 9

I have created two queries on Northwind database, one using Join and the other using subquery.

QUERY A:

select customers.*

from customers

left join orders on orders.customerid = customers.customerid

where orders.customerid is NULL

QUERY B:

select c.*

from customers c

where not exists (select 1 from orders where customerid = c.customerid)

---------------TYPICAL SHOW PLAN RESULTS-----------

QUERY A:

|--Filter(WHERE[Orders].[CustomerID]=NULL))

|--Nested Loops(Left Outer Join, OUTER REFERENCES[Customers].[CustomerID]) WITH PREFETCH)

|--Clustered Index Scan(OBJECT[Northwind].[dbo].[Customers].[PK_Customers]))

|--Index Seek(OBJECT[Northwind].[dbo].[Orders].[CustomerID]), SEEK[Orders].[CustomerID]=[Customers].[CustomerID]) ORDERED FORWARD)

QUERY B:

|--Nested Loops(Left Anti Semi Join, OUTER REFERENCES[c].[CustomerID]) WITH PREFETCH)

|--Clustered Index Scan(OBJECT[Northwind].[dbo].[Customers].[PK_Customers] AS [c]))

|--Index Seek(OBJECT[Northwind].[dbo].[Orders].[CustomerID]), SEEK[Orders].[CustomerID]=[c].[CustomerID]) ORDERED FORWARD)

IF QUERY A and B are run in one batch:

QUERY A - query cost = 52.06%

QUERY B - query cost = 47.94%

In this case can you still say that joins are faster than subqueries???

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 10

What happens if you again run query A and B in one batch, but this time query B before query A?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

You compared exists to left join.

I compared inner join to in. In and exists are not the same type of operators so they can't really be compared. Try the second query like this :

select c.*

from dbo.customers c

where customerid not in (select customerid from dbo.orders)

Richard Moldwin SSCommitted Points: 1872 More actions · Answer 12

I remember reading somewhere that "IN" subqueries were very close to joins in speed, UNLESS the IN operater references a list of comma separated values. If you are running tests, I think you need to compare value lists to subqueries to joins.

Regardless, I find subqueries or value lists to be the most readable, so I prefer them if query efficiency is not an issue.

Sometimes, its a real headache to work around value lists (with temp tables and all), so I will use them as dynamic SQL [slap, slap] until I have time to recode the beast.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 13

For one :

Arrays and Lists in SQL Server

The Curse and Blessings of Dynamic SQL

Dynamic Search Conditions in T-SQL

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 14

For 2 :

IF Object_id('fnSplit_Set') > 0

DROP FUNCTION dbo.fnSplit_Set

GO

IF Object_id('Numbers') > 0

DROP TABLE dbo.Numbers

GO

CREATE TABLE dbo.Numbers (PkNumber int identity(1,1), dude bit null, CONSTRAINT Pk_Number PRIMARY KEY CLUSTERED (PkNumber))

GO

INSERT INTO dbo.Numbers (dude)

SELECT NULL FROM (SELECT TOP 100 NULL AS A FROM master.dbo.spt_values) dt100 cross join (SELECT TOP 80 null AS A FROM master.dbo.spt_values) dt80

GO

ALTER TABLE dbo.Numbers

DROP COLUMN dude

GO

--Select min(PkNumber) as MinA, max(PkNumber) as MaxA from dbo.Numbers

--1, 8000

GO

CREATE FUNCTION [dbo].[fnSplit_Set] (@IDS as varchar(8000), @vcDelimiter varchar(3))

RETURNS TABLE

WITH SCHEMABINDING

AS

Return

Select dtSplitted.EachID, dtSplitted.Rank from (

SELECT SUBSTRING(@vcDelimiter + @IDs + @vcDelimiter, N.PkNumber + len(@vcDelimiter),

CHARINDEX(@vcDelimiter, @vcDelimiter + @IDs + @vcDelimiter, N.PkNumber + len(@vcDelimiter)) - N.PkNumber - len(@vcDelimiter)) as EachID

, (LEN(SUBSTRING(@IDs, 1, N.PkNumber)) - LEN (REPLACE (SUBSTRING(@IDs, 1, N.PkNumber), ',', ''))) + 1 AS Rank

FROM dbo.Numbers N

WHERE SUBSTRING(@vcDelimiter + @IDs + @vcDelimiter, N.PkNumber, len(@vcDelimiter)) = @vcDelimiter

AND PkNumber 0

GO

DECLARE @Ids as varchar(8000)

SET @IDS = ''

Select @IDS = @IDS + CAST(id as varchar(10)) + ',' from (Select TOP 10 id from dbo.SysObjects order by NEWID()) dtIds

SET @IDS = left(@IDS, ABS(len(@IDS) - 1))

PRINT @IDS

Select O.id, dtIDS.EachId, O.name, dtIDS.Rank from dbo.SysObjects O inner join (Select CAST(EachID as INT) AS EachID, Rank from dbo.fnSplit_Set (@IDS, ',')) dtIDS on O.id = dtIDS.EachID order by O.Name

Select O.id, dtIDS.EachId, O.name, dtIDS.Rank from dbo.SysObjects O inner join (Select CAST(EachID as INT) AS EachID, Rank from dbo.fnSplit_Set (@IDS, ',')) dtIDS on O.id = dtIDS.EachID order by dtIDS.Rank

--Don't ever forget to cast the eachid column to the same datatype as in the join to avoid table scans.