Where Column IN (x,y,z) Better Than Where In (Select from table)

Question

Where Column IN (x,y,z) Better Than Where In (Select from table)

souLTower

SSCarpal Tunnel

Points: 4026
More actions
March 30, 2016 at 7:57 am

#399948

I've probably asked this before but each time I encounter it, it baffles me. I have queries similar to the following:
-- Example 1
-- Populate a temp table
Select D.X INTO #tmp FROM
(SELECT 100 AS X UNION ALL
SELECT 200 AS X ) D
SELECT A, B, C FROM myView -- Joined to other tables
WHERE A IN (SELECT T.X FROM #tmp T)
-- Example 2
SELECT A, B, C FROM myView -- Joined to other tables
WHERE A IN (100, 200)
In short, example 2 smokes example 1. It results in a completely different execution plan. Is it simply because the constant values are resolved more easily than the same values selected in another table?

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Yes. In Example 1, you've got the extra expense of compiling the query because it's more complex (it involves a join), and also of actually retrieving the values from the table. I'm surprised you notice a difference with just two rows... I imagine that you've simplified it before posting?

John

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 2

In short, yes. The first it two seeks against the table (assuming appropriate indexes), the second is a join between two tables, might be a loop join, might not.

The difference between the two will change as the number of entries in the IN changes. Try that again with 2000 entries in the IN vs a table with 2000 rows. Should be a different story,

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

souLTower SSCarpal Tunnel Points: 4026 More actions · Answer 3

I was surprised at the difference with only one entry but it was significant enough to investigate. The procedure/query will always use between 1 and 200 values so we've implemented a dynamic SQL approach.

Thanks for the explanations.

ST

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 4

Neither of these two forms is guaranteed to be faster than the other. But the plans will be different.

If you already have the values in a table, use form 1 and don't go through hoops to somehow force form 2.

If you have the values as constants, use form 2 and don't use a needless temp table as in form 1.

Only deviate from these guidelines if you find that the performance is unacceptable and the other form fixes it.

That being said, I personally only use [NOT] IN with a delimited list, never with a subquery. The latter can always be rewritten as a [NOT] EXISTS, with the additional benefits of not having issues with NULL values in the subquery and not requireing nasty workarounds if you work with composite keys.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

ScottPletcher SSC Guru Points: 101238 More actions · Answer 5

You should definitely cluster the temp table on the key column as well. It can't hurt, but it can help under certain conditions.

SELECT TOP (0) D.X INTO #tmp FROM

(SELECT 100 AS X) D

CREATE CLUSTERED INDEX tmp__CL ON #tmp ( X ) WITH ( FILLFACTOR = 100 );

INSERT INTO #tmp

SELECT D.X

FROM

(VALUES(100),(200),(300),(400),(500),(600),(700),(800),(900),(1000)) D(X)

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".