March 30, 2016 at 7:57 am
I've probably asked this before but each time I encounter it, it baffles me. I have queries similar to the following:
-- Example 1
-- Populate a temp table
Select D.X INTO #tmp FROM
(SELECT 100 AS X UNION ALL
SELECT 200 AS X ) D
SELECT A, B, C FROM myView -- Joined to other tables
WHERE A IN (SELECT T.X FROM #tmp T)
-- Example 2
SELECT A, B, C FROM myView -- Joined to other tables
WHERE A IN (100, 200)
In short, example 2 smokes example 1. It results in a completely different execution plan. Is it simply because the constant values are resolved more easily than the same values selected in another table?
March 30, 2016 at 8:05 am
Yes. In Example 1, you've got the extra expense of compiling the query because it's more complex (it involves a join), and also of actually retrieving the values from the table. I'm surprised you notice a difference with just two rows... I imagine that you've simplified it before posting?
John
March 30, 2016 at 8:05 am
In short, yes. The first it two seeks against the table (assuming appropriate indexes), the second is a join between two tables, might be a loop join, might not.
The difference between the two will change as the number of entries in the IN changes. Try that again with 2000 entries in the IN vs a table with 2000 rows. Should be a different story,
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 30, 2016 at 8:10 am
I was surprised at the difference with only one entry but it was significant enough to investigate. The procedure/query will always use between 1 and 200 values so we've implemented a dynamic SQL approach.
Thanks for the explanations.
ST
March 30, 2016 at 8:12 am
Neither of these two forms is guaranteed to be faster than the other. But the plans will be different.
If you already have the values in a table, use form 1 and don't go through hoops to somehow force form 2.
If you have the values as constants, use form 2 and don't use a needless temp table as in form 1.
Only deviate from these guidelines if you find that the performance is unacceptable and the other form fixes it.
That being said, I personally only use [NOT] IN with a delimited list, never with a subquery. The latter can always be rewritten as a [NOT] EXISTS, with the additional benefits of not having issues with NULL values in the subquery and not requireing nasty workarounds if you work with composite keys.
March 31, 2016 at 9:46 am
You should definitely cluster the temp table on the key column as well. It can't hurt, but it can help under certain conditions.
SELECT TOP (0) D.X INTO #tmp FROM
(SELECT 100 AS X) D
CREATE CLUSTERED INDEX tmp__CL ON #tmp ( X ) WITH ( FILLFACTOR = 100 );
INSERT INTO #tmp
SELECT D.X
FROM
(VALUES(100),(200),(300),(400),(500),(600),(700),(800),(900),(1000)) D(X)
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply