July 26, 2011 at 3:03 pm
I'm hoping someone can confirm my findings here, because I understood EXCEPT to be a higher optimized method of finding Anti-semi Joins.
My benchmark code is below. What I'm finding relatively consistently is that a not exists seems to be performing the best on my testbed here, and I'm hoping someone can poke some holes in why. The only time I can get EXCEPT to outperform the other methods is when secondary data is not-included. This makes sense, as EXCEPT is going to compare each field in the listing. But even using sub-query selects on only the key fields still runs slower. Though again, this makes sense in context, as we're now doing a 3 way join in essence.
Consistently, the NOT EXISTS seems to run the fatest. LEFT JOIN - IS NULL second fastest, though The 'EASY VERSION EXCEPT', a non-subqueried version, will run faster when no fields secondary to the join mechanic are involved. The Subqueried EXCEPT generally runs faster then standard two field EXCEPT, even with the extra work. The easy except seems to suicide when I drop indexes on the #tmps, for some reason. I haven't looked into execution plans on that one yet to figure out why, but just be aware that it's not just you if you see that too. 🙂
So, opinions, conversations, options, and general abuse of my methodology? 😉
SET NOCOUNT ON
/*
USE Model
GO
SELECT TOP 1000000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Let the public use it
GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC
GO
USE tempdb
GO
-- REPEAT above select.
*/
IF OBJECT_ID( 'tempdb..#blah') IS NOT NULL
DROP TABLE #blah
IF OBJECT_ID( 'tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp
IF OBJECT_ID( 'tempdb..#tmp2') IS NOT NULL
DROP TABLE #tmp2
IF OBJECT_ID( 'tempdb..#tmp3') IS NOT NULL
DROP TABLE #tmp3
IF OBJECT_ID( 'tempdb..#tmp4') IS NOT NULL
DROP TABLE #tmp4
CREATE TABLE #blah
(GroupVal INT NOT NULL,
FabricatedIDINT IDENTITY( 1,1) NOT NULL,
SomeValueVARCHAR(10))
ALTER TABLE #blah ADD PRIMARY KEY CLUSTERED (FabricatedID)
CREATE TABLE #tmp
(FabricatedID INT, SomeValue VARCHAR(10))
CREATE TABLE #tmp2
(FabricatedID INT, SomeValue VARCHAR(10))
CREATE TABLE #tmp3
(FabricatedID INT, SomeValue VARCHAR(10))
CREATE TABLE #tmp4
(FabricatedID INT, SomeValue VARCHAR(10))
INSERT INTO #blah
(GroupVal, SomeValue)
SELECT TOP 1000000 -- 1 mill test
t1.n%15 AS GroupVal, -- 15 groups
REPLICATE( CHAR(t1.n), 10 ) AS SomeValue
FROM
tempdb..Tally AS t1 -- Whatever you use for tally here.
INSERT INTO #tmp
SELECT FabricatedID, SomeValue
FROM#blah
WHEREGroupVal <> 14
INSERT INTO #tmp2
SELECT FabricatedID, SomeValue
FROM#blah
WHEREGroupVal <> 14
INSERT INTO #tmp3
SELECT FabricatedID, SomeValue
FROM#blah
WHEREGroupVal <> 14
INSERT INTO #tmp4
SELECT FabricatedID, SomeValue
FROM#blah
WHEREGroupVal <> 14
-- Real testing starts here.
-- Paralleled ran slower across the board on my desktop server. Left here in case
-- someone wants to test.
/*
SET STATISTICS IO, TIME ON
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'Old-skool LEFT JOIN - IS NULL (Parallelled)'
INSERT INTO #tmp2
SELECT
b.FabricatedID, b.SomeValue
FROM
#blah AS b
LEFT JOIN
#tmp2 AS t
ONb.FabricatedID = t.FabricatedID
WHERE
t.FabricatedID IS NULL
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'NOT EXISTS (Parallelled)'
INSERT INTO #tmp3
SELECT
b.FabricatedID, b.SomeValue
FROM
#blah AS b
WHERE
NOT EXISTS ( SELECT 1 FROM #tmp3 AS t WHERE b.FabricatedID = t.FabricatedID)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'EXCEPT (Parallelled)'
INSERT INTO #tmp
SELECT
FabricatedID, SomeValue
FROM
#blah
EXCEPT
SELECT
FabricatedID, SomeValue
FROM
#tmp
SET STATISTICS IO, TIME OFF
*/
-- Real testing starts here.
SET STATISTICS IO, TIME ON
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'Old-skool LEFT JOIN - IS NULL (MAXDOP 1)'
INSERT INTO #tmp2
SELECT
b.FabricatedID, b.SomeValue
FROM
#blah AS b
LEFT JOIN
#tmp2 AS t
ONb.FabricatedID = t.FabricatedID
WHERE
t.FabricatedID IS NULL
OPTION (MAXDOP 1)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'NOT EXISTS (MAXDOP 1)'
INSERT INTO #tmp3
SELECT
b.FabricatedID, b.SomeValue
FROM
#blah AS b
WHERE
NOT EXISTS ( SELECT 1 FROM #tmp3 AS t WHERE b.FabricatedID = t.FabricatedID)
OPTION (MAXDOP 1)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'EASY VERSION EXCEPT (MAXDOP 1)'
INSERT INTO #tmp
SELECT
FabricatedID, SomeValue
FROM
#blah
EXCEPT
SELECT
FabricatedID, SomeValue
FROM
#tmp
OPTION (MAXDOP 1)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'SUBQUERY EXCEPT (MAXDOP 1)'
INSERT INTO #tmp4
SELECT
b.FabricatedID, b.SomeValue
FROM
#blah AS b
JOIN
(SELECT
FabricatedID
FROM
#blah
EXCEPT
SELECT
FabricatedID
FROM
#tmp4
) AS drv
ONb.FabricatedID = drv.FabricatedID
OPTION (MAXDOP 1)
-- Force Order doesn't make a difference here, really.
-- The execution plan seems strange to me, but that's another story.
SET STATISTICS IO, TIME OFF
-- Time to see if indexing on the #tmps makes a difference here.
DELETE t FROM #tmp AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14
DELETE t FROM #tmp2 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14
DELETE t FROM #tmp3 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14
DELETE t FROM #tmp4 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14
CREATE INDEX idx_c ON #tmp (FabricatedID)
CREATE INDEX idx_c ON #tmp2 (FabricatedID)
CREATE INDEX idx_c ON #tmp3 (FabricatedID)
CREATE INDEX idx_c ON #tmp4 (FabricatedID)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'AFTER INDEX GENERATION ON #tmps'
SET STATISTICS IO, TIME ON
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'Old-skool LEFT JOIN - IS NULL (MAXDOP 1)'
INSERT INTO #tmp2
SELECT
b.FabricatedID, b.SomeValue
FROM
#blah AS b
LEFT JOIN
#tmp2 AS t
ONb.FabricatedID = t.FabricatedID
WHERE
t.FabricatedID IS NULL
OPTION (MAXDOP 1)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'NOT EXISTS (MAXDOP 1)'
INSERT INTO #tmp3
SELECT
b.FabricatedID, b.SomeValue
FROM
#blah AS b
WHERE
NOT EXISTS ( SELECT 1 FROM #tmp3 AS t WHERE b.FabricatedID = t.FabricatedID)
OPTION (MAXDOP 1)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'EASY VERSION EXCEPT (MAXDOP 1)'
INSERT INTO #tmp
SELECT
FabricatedID, SomeValue
FROM
#blah
EXCEPT
SELECT
FabricatedID, SomeValue
FROM
#tmp
OPTION (MAXDOP 1)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'SUBQUERY EXCEPT (MAXDOP 1)'
INSERT INTO #tmp4
SELECT
b.FabricatedID, b.SomeValue
FROM
#blah AS b
JOIN
(SELECT
FabricatedID
FROM
#blah
EXCEPT
SELECT
FabricatedID
FROM
#tmp4
) AS drv
ONb.FabricatedID = drv.FabricatedID
OPTION (MAXDOP 1)
-- Execution plan after the index is in play seems to behave better, though the non-subquery
-- version now outperforms it.
SET STATISTICS IO, TIME OFF
-- Alright, one last test cycle... EXCEPT without SomeValue affecting the problem.
ALTER TABLE #tmp DROP COLUMN SomeValue
ALTER TABLE #tmp2 DROP COLUMN SomeValue
ALTER TABLE #tmp3 DROP COLUMN SomeValue
ALTER TABLE #tmp4 DROP COLUMN SomeValue
DELETE t FROM #tmp AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14
DELETE t FROM #tmp2 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14
DELETE t FROM #tmp3 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14
DELETE t FROM #tmp4 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14
DROP INDEX idx_c ON #tmp
DROP INDEX idx_c ON #tmp2
DROP INDEX idx_c ON #tmp3
DROP INDEX idx_c ON #tmp4
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'INDEXES DROPPED on #tmps, SomeValue column removed'
SET STATISTICS IO, TIME ON
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'Old-skool LEFT JOIN - IS NULL (MAXDOP 1)'
INSERT INTO #tmp2
SELECT
b.FabricatedID
FROM
#blah AS b
LEFT JOIN
#tmp2 AS t
ONb.FabricatedID = t.FabricatedID
WHERE
t.FabricatedID IS NULL
OPTION (MAXDOP 1)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'NOT EXISTS (MAXDOP 1)'
INSERT INTO #tmp3
SELECT
b.FabricatedID
FROM
#blah AS b
WHERE
NOT EXISTS ( SELECT 1 FROM #tmp3 AS t WHERE b.FabricatedID = t.FabricatedID)
OPTION (MAXDOP 1)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'EASY VERSION EXCEPT (MAXDOP 1)'
INSERT INTO #tmp
SELECT
FabricatedID
FROM
#blah
EXCEPT
SELECT
FabricatedID
FROM
#tmp
OPTION (MAXDOP 1)
PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'SUBQUERY EXCEPT (MAXDOP 1)'
INSERT INTO #tmp4
SELECT
b.FabricatedID
FROM
#blah AS b
JOIN
(SELECT
FabricatedID
FROM
#blah
EXCEPT
SELECT
FabricatedID
FROM
#tmp4
) AS drv
ONb.FabricatedID = drv.FabricatedID
OPTION (MAXDOP 1)
-- Execution plan after the index is in play seems to behave better, though the non-subquery
-- version now outperforms it.
SET STATISTICS IO, TIME OFF
-- Need this otherwise it whines on reruns about differing table schemas.
IF OBJECT_ID( 'tempdb..#blah') IS NOT NULL
DROP TABLE #blah
IF OBJECT_ID( 'tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp
IF OBJECT_ID( 'tempdb..#tmp2') IS NOT NULL
DROP TABLE #tmp2
IF OBJECT_ID( 'tempdb..#tmp3') IS NOT NULL
DROP TABLE #tmp3
IF OBJECT_ID( 'tempdb..#tmp4') IS NOT NULL
DROP TABLE #tmp4
A sample of my results:
------------------------------
Old-skool LEFT JOIN - IS NULL (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp2_______________________________________________________________________________________________________________000000000049'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1076 ms, elapsed time = 1076 ms.
------------------------------
NOT EXISTS (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp3_______________________________________________________________________________________________________________00000000004A'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 936 ms, elapsed time = 932 ms.
------------------------------
EASY VERSION EXCEPT (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp________________________________________________________________________________________________________________000000000048'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1279 ms, elapsed time = 1349 ms.
------------------------------
SUBQUERY EXCEPT (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 2, logical reads 4216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp4_______________________________________________________________________________________________________________00000000004B'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1170 ms, elapsed time = 1188 ms.
------------------------------
AFTER INDEX GENERATION ON #tmps
------------------------------
Old-skool LEFT JOIN - IS NULL (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table '#tmp2_______________________________________________________________________________________________________________000000000049'. Scan count 1, logical reads 293900, physical reads 0, read-ahead reads 1393, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 134372, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1186 ms, elapsed time = 2274 ms.
------------------------------
NOT EXISTS (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table '#tmp3_______________________________________________________________________________________________________________00000000004A'. Scan count 1, logical reads 293900, physical reads 0, read-ahead reads 1658, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 134372, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1092 ms, elapsed time = 1986 ms.
------------------------------
EASY VERSION EXCEPT (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table '#tmp________________________________________________________________________________________________________________000000000048'. Scan count 1, logical reads 293313, physical reads 0, read-ahead reads 1652, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1950 ms, elapsed time = 3170 ms.
------------------------------
SUBQUERY EXCEPT (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table '#tmp4_______________________________________________________________________________________________________________00000000004B'. Scan count 1, logical reads 293900, physical reads 0, read-ahead reads 1021, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 134372, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 2, logical reads 4216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1217 ms, elapsed time = 1953 ms.
------------------------------
INDEXES DROPPED on #tmps, SomeValue column removed
------------------------------
Old-skool LEFT JOIN - IS NULL (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp2_______________________________________________________________________________________________________________000000000049'. Scan count 1, logical reads 1501, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1060 ms, elapsed time = 1063 ms.
------------------------------
NOT EXISTS (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp3_______________________________________________________________________________________________________________00000000004A'. Scan count 1, logical reads 1501, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 920 ms, elapsed time = 926 ms.
------------------------------
EASY VERSION EXCEPT (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp________________________________________________________________________________________________________________000000000048'. Scan count 1, logical reads 1501, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 920 ms, elapsed time = 929 ms.
------------------------------
SUBQUERY EXCEPT (MAXDOP 1)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 2, logical reads 4216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp4_______________________________________________________________________________________________________________00000000004B'. Scan count 1, logical reads 1501, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1139 ms, elapsed time = 1170 ms.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
July 27, 2011 at 4:53 am
I haven't used Except yet, and I seldom use Not Exists. But now, I'm going to try and see what I come up with.
You're probably crazy, but that doesn't mean you're wrong. @=) I'll let you know what I find.
July 27, 2011 at 5:36 am
http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/
I'll ping in Gail to see if she wants to blog more about this.
July 27, 2011 at 5:43 am
Are you always using MAXDOP 1 in your queries?
You don't trust MS at all are you? 😀
It used to be that LEFT JOIN with check for NULL was always the fasterst option over NOT EXISTS, however recently i've also found it's not the case. Now, sometime, they perform with the same. But I wouldn't advice to introduce the "iron rule" to always use NOT EXISTS.
I had the case recently where replacing it with LEFT JOIN reduced the query execution time significantly.
As most things in SQL are: It all depends...
July 27, 2011 at 3:05 pm
Eugene Elutin (7/27/2011)
Are you always using MAXDOP 1 in your queries?You don't trust MS at all are you? 😀
Not for benchmark tests, no. Besides, if you'll see my notes, I did try the paralleled versions and in this test case they ran slower generically.
It used to be that LEFT JOIN with check for NULL was always the fasterst option over NOT EXISTS, however recently i've also found it's not the case. Now, sometime, they perform with the same. But I wouldn't advice to introduce the "iron rule" to always use NOT EXISTS.
I had the case recently where replacing it with LEFT JOIN reduced the query execution time significantly.
As most things in SQL are: It all depends...
Of course, it was more about the EXCEPT being... well... slower overall. It's the newfangled optimal method, afterall.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
July 27, 2011 at 10:08 pm
Craig Farrell (7/26/2011)
I'm hoping someone can confirm my findings here, because I understood EXCEPT to be a higher optimized method of finding Anti-semi Joins.
There's nothing special about EXCEPT or INTERSECT. EXCEPT is TSQL shorthand for a DISTINCT on a NOT EXISTS, and INTERSECT is shorthand for DISTINCT on an EXISTS.
TSQL is not a programming language, for the most part. There are extensions for control-of-flow and things like that, but the core of SQL is a declarative, or specification language. You specify what you want, and the query processor writes a program to meet that specification.
In principle, whether you write semantically-equivalent query specifications using JOIN, EXISTS, or EXCEPT should not matter. In practice, not every possible transformation between semantically identical constructions is considered, so the written form of the query can have an effect on the final program (query plan).
The general recommendation is to start off by writing a query specification using the most natural syntax. If you want to join rows, use (OUTER) JOIN. If you want to check that a match exists (or not) in another table, use (NOT) EXISTS. If you want distinct items from one set that do (or do not) appear in another set, use EXCEPT or INTERSECT.
If the program SQL Server comes up with (some of which you can see in the graphical or XML show plan output) is not optimal, ensure you have good estimates (statistics),appropriate indexing, and a relational design. If the plan is suboptimal because of limitations in SQL Server's model, or its ability to reason about alternatives, consider alternative forms of query syntax. Ultimately, you can also encourage the engine to come up with a better plan by the use of hints or other advanced techniques.
It is quite wrong to compare the 'performance' of JOIN versus EXISTS and so on - the only thing you can meaningfully compare is (what you can see of) the program written by SQL Server to meet the specification you gave.
As an aside, the queries in your test rig have different semantics - they will produce different results in some cases. For example, the OUTER JOIN has different behaviour regarding duplicates in the preserved side compared with EXISTS. In general, different query syntax tends to have (often subtle) differences in meaning, particularly with respect to NULLs and cardinality, and some logical constructions (like EXISTS) can logically stop searching as soon as a match is found. Likewise, a NOT EXISTS can stop searching as soon as it does find a match.
I don't really expect anything I say to prevent people debating whether JOINs are inherently 'faster' than EXISTS or not (sigh).
Last thing - please be careful to specify NOT NULL or NULL on your columns. I have no idea if you intended SomeValue to be NULL in the #blah table for FabricatedIDs higher than 255, but they are there. On a SQL Server with a different effective setting for the ANSI NULL default options, your script would produce an error.
July 28, 2011 at 4:39 am
Paul,
Thank you for that post. It was very informative.
July 28, 2011 at 7:21 am
"If the program SQL Server comes up with (some of which you can see in the graphical or XML show plan output) is not optimal, ensure you have good estimates (statistics),appropriate indexing, and a relational design. If the plan is suboptimal because of limitations in SQL Server's model, or its ability to reason about alternatives, consider alternative forms of query syntax. Ultimately, you can also encourage the engine to come up with a better plan by the use of hints or other advanced techniques."
Well stated; this one paragraph pretty much says it all, no only about this particular subject but most other query performance issues period.
The probability of survival is inversely proportional to the angle of arrival.
July 29, 2011 at 1:54 pm
Paul,
I found your explanation quite informative. I hadn't actually thought INTERSECT and EXCEPT through to the DISTINCT part. When you look at it that way, then it is likely to not perform quite as well as (NOT) EXISTS.
I don't use EXCEPT much except in update triggers to determine if something actually changed or not. Something on the order of:
SELECT ItemNO, Price
INTO #SomeTemp
FROM inserted
EXCEPT
SELECT ItemNo, Price
FROM deleted
I haven't noticed any performance issues with this, but the Item table updates usually involve only a few records.
I'll have to give this a think for future use in case I have to deal with a huge number of updates at once.
Todd Fifield
July 29, 2011 at 2:59 pm
tfifield (7/29/2011)
I hadn't actually thought INTERSECT and EXCEPT through to the DISTINCT part. When you look at it that way, then it is likely to not perform quite as well as (NOT) EXISTS.
Hi Todd,
It depends whether the logical DISTINCT can be optimized away, for example if there is a uniqueness constraint. In your example, perhaps ItemNo is enforced to be unique, so there would likely be no sort distinct or grouping aggregate needed in the query plan. In any case, if performance is good with the natural EXCEPT syntax, it's all good!
July 30, 2011 at 10:56 am
SQLkiwi (7/29/2011)
tfifield (7/29/2011)
I hadn't actually thought INTERSECT and EXCEPT through to the DISTINCT part. When you look at it that way, then it is likely to not perform quite as well as (NOT) EXISTS.Hi Todd,
It depends whether the logical DISTINCT can be optimized away, for example if there is a uniqueness constraint. In your example, perhaps ItemNo is enforced to be unique, so there would likely be no sort distinct or grouping aggregate needed in the query plan. In any case, if performance is good with the natural EXCEPT syntax, it's all good!
Paul,
Another good point on unique constraints. I do think about them when designing tables in order to get better query plans. I really have to think these things through a bit better when choosing which query operator to use.
Todd Fifield
July 30, 2011 at 8:42 pm
There's another thing to consider. IIRC, both INTERSECT and EXCEPT essentially do a NULL = NULL comparison without any special considerations. If such a thing is required by the nature of the data, INTERSECT and EXCEPT could end beating things like INNER JOIN with NULL handling.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 30, 2011 at 9:11 pm
Jeff Moden (7/30/2011)
There's another thing to consider. IIRC, both INTERSECT and EXCEPT essentially do a NULL = NULL comparison without any special considerations. If such a thing is required by the nature of the data, INTERSECT and EXCEPT could end beating things like INNER JOIN with NULL handling.
Yep.
August 1, 2011 at 12:06 pm
I wanted to thank everyone (particularly Paul) for helping me find, or at least learn what to go research, in regards to my confusion. I'm sorry I haven't been particularly active but I did want to thank you.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply