October 24, 2018 at 5:30 am
A little while a go I was told that NOT EXISTS will "never perform better" than NOT IN, and that NOT EXISTS is the "worst of all options". personally, this wasn't something I had experienced, and so in the interest of learning asked for any evidence of citation for such a claim; I was met with the answer "I'm an expert" and "Any citation will be by someone not as less experience, so take my word as the truth." (this is barely paraphrasing). This person claimed themselves as a SQL Expert, however, could give themselves no accreditation, and I personally had no seen their input anywhere else before (although they did have the "points", sql server was not an area that provided those). The attitude really annoyed me, but I had more pressing projects to work on, and so left it until I had such time to test.
So, today I spent a little time doing a quick comparison of the 3 options: NOT IN, LEFT JOIN ... WHERE {left table PK} IS NULL and NOT EXISTS. The queries I used are below:
I ran these against our Dev server at work; not the fastest machine, but still it very quickly proved a point, as the output was as follows:NOT IN clause
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(3168 rows affected)
Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 1794, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_brcledger'. Scan count 5, logical reads 194280, physical reads 1, read-ahead reads 186739, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_yyclient'. Scan count 8, logical reads 696, physical reads 1, read-ahead reads 628, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3206502 ms, elapsed time = 3204974 ms.
LEFT JOIN and WHERE {left table PK} IS NULL
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(3168 rows affected)
Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_yyclient'. Scan count 7, logical reads 696, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_brcledger'. Scan count 35299, logical reads 230952, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1781 ms, elapsed time = 483 ms.
NOT EXISTS clause
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(3168 rows affected)
Table 'icp_brcledger'. Scan count 38058, logical reads 167725, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_yyclient'. Scan count 7, logical reads 696, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 828 ms, elapsed time = 245 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
The NOT IN was staggering slower.
So, this got me back to thinking; why was I being told that NOT IN is better? Personally, like i said, I'd had better experience with NOT EXISTS. Also Redgate's (and I'm not saying Redgate are infallible) SQL Server prompt recommends the use of EXISTS over IN. Obviously there must be a reason behind it.
Now, this could be a case of my indexing, but all 3 queries were written to do the same thing, and return the same number of rows (3168), so I see this as unlikely. Perhaps someone more obliging could share their experience, and give me a better reasoning than "trust me I'm a (self proclaimed) expert".
Note: Adding the clause BTX.Polref@ = BPY.Polref@ in the NOT IN brings performance far closer to NOT EXISTS and LEFT JOIN (rerun below), but not as good. I excluded this in the original, as the "expert" used the format I have in the above SQL.
NOT IN clause with additional BTX.Polref = BPY.PolRef@ clause
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(3168 rows affected)
Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_brcledger'. Scan count 5, logical reads 194415, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_yyclient'. Scan count 7, logical reads 696, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2922 ms, elapsed time = 784 ms.
LEFT JOIN and WHERE {left table PK} IS NULL
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(3168 rows affected)
Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_yyclient'. Scan count 7, logical reads 696, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_brcledger'. Scan count 35299, logical reads 230952, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1564 ms, elapsed time = 404 ms.
NOT EXISTS clause
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 94 ms, elapsed time = 101 ms.
(3168 rows affected)
Table 'icp_brcledger'. Scan count 38058, logical reads 167729, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_yyclient'. Scan count 8, logical reads 696, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 938 ms, elapsed time = 244 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Either way, in both cases NOT EXISTS was the winner in elapsed time.
Thanks all!
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
October 24, 2018 at 6:39 am
"SQL Expert" can compare their credentials with Gail, I can already guess how that's going to pan out.
https://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
-------------------------------------------------------------------------------------------------------------------------------------
Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses
October 24, 2018 at 7:33 am
SELECT BCM.B@,
BCM.Ref@,
BPY.Polref@
FROM dbo.ic_yyclient BCM
JOIN dbo.ic_brpolicy BPY ON BCM.B@ = BPY.B@ AND BCM.Ref@ = BPY.Ref@
WHERE BCM.B@ != 3
AND BCM.Ref@ != 'XXXX01'
AND BPY.Term_date IS NULL
AND BPY.PolRef@ NOT IN (SELECT BTX.Polref@
FROM dbo.ic_brcledger BTX
WHERE BTX.B@ = BPY.B@
AND BTX.Polref@ = BPY.PolRef@
AND BTX.Trantype IN ('New Business','Transfrd NB','Renewal'));
October 24, 2018 at 7:40 am
Jonathan AC Roberts - Wednesday, October 24, 2018 7:33 AMI'm just wondering why you didn't include "AND BTX.Polref@ = BPY.PolRef@" in the WHERE clause of the NOT IN statement?SELECT BCM.B@,
BCM.Ref@,
BPY.Polref@
FROM dbo.ic_yyclient BCM
JOIN dbo.ic_brpolicy BPY ON BCM.B@ = BPY.B@ AND BCM.Ref@ = BPY.Ref@
WHERE BCM.B@ != 3
AND BCM.Ref@ != 'XXXX01'
AND BPY.Term_date IS NULL
AND BPY.PolRef@ NOT IN (SELECT BTX.Polref@
FROM dbo.ic_brcledger BTX
WHERE BTX.B@ = BPY.B@
AND BTX.Polref@ = BPY.PolRef@
AND BTX.Trantype IN ('New Business','Transfrd NB','Renewal'));
Covered that in my post too 🙂
Thom A - Wednesday, October 24, 2018 5:30 AMNote: Adding the clause BTX.Polref@ = BPY.Polref@ in the NOT IN brings performance far closer to NOT EXISTS and LEFT JOIN (rerun below), but not as good. I excluded this in the original, as the "expert" used the format I have in the above SQL.
When I did add that, however, NOT EXISTS still performs better.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
October 24, 2018 at 7:50 am
Thom A - Wednesday, October 24, 2018 7:40 AMJonathan AC Roberts - Wednesday, October 24, 2018 7:33 AMI'm just wondering why you didn't include "AND BTX.Polref@ = BPY.PolRef@" in the WHERE clause of the NOT IN statement?SELECT BCM.B@,
BCM.Ref@,
BPY.Polref@
FROM dbo.ic_yyclient BCM
JOIN dbo.ic_brpolicy BPY ON BCM.B@ = BPY.B@ AND BCM.Ref@ = BPY.Ref@
WHERE BCM.B@ != 3
AND BCM.Ref@ != 'XXXX01'
AND BPY.Term_date IS NULL
AND BPY.PolRef@ NOT IN (SELECT BTX.Polref@
FROM dbo.ic_brcledger BTX
WHERE BTX.B@ = BPY.B@
AND BTX.Polref@ = BPY.PolRef@
AND BTX.Trantype IN ('New Business','Transfrd NB','Renewal'));Covered that in my post too 🙂
Thom A - Wednesday, October 24, 2018 5:30 AMNote: Adding the clause BTX.Polref@ = BPY.Polref@ in the NOT IN brings performance far closer to NOT EXISTS and LEFT JOIN (rerun below), but not as good. I excluded this in the original, as the "expert" used the format I have in the above SQL.When I did add that, however, NOT EXISTS still performs better.
Oh, I hadn't read the full post.
I always use NOT EXISTS over NOT IN. I'd never heard anyone say EXISTS is a poorer performer.
Did you look at the execution plans?
October 24, 2018 at 8:19 am
Jonathan AC Roberts - Wednesday, October 24, 2018 7:50 AMOh, I hadn't read the full post.
I always use NOT EXISTS over NOT IN. I'd never heard anyone say EXISTS is a poorer performer.
Did you look at the execution plans?
The NOT IN (when using BTX.Polref@ = BPY.Polref@) and NOT EXISTS have identical plans, the LEFT JOIN has a different one.The estimated plan does differ if the BTX.Polref@ = BPY.Polref@ isn't included in the NOT IN, there are additional parallelism added and the Clustered Index Seek on the Ledger changes to a Clustered Index Scan.
Like yourself, I always use EXISTS over IN, so why I'm asking here. I've not really witnessed anyone be so adamant about something and then not back it up (apart from saying, trust me, I'm an expert). if what theya re saying is true, I'm certainly not seeing it.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
October 24, 2018 at 8:39 am
I must ask, Thom, if you've compared the equivalent EXCEPT operator.
October 24, 2018 at 8:41 am
I've never actually played with EXISTS / NOT EXISTS, so I should probably try it out and see if it helps with our ITD feed processes, which are taking longer and longer to run.
I'll look at your code, see if I can adapt it, and verify whether or not it works for my specific scenario.
October 24, 2018 at 8:46 am
Okay, so I do have an additional question. Why use the specific WHERE clauses you've used? Why not test the NOT IN with specific values?
Why are you using this:
AND BPY.PolRef@ NOT IN (SELECT BTX.Polref@
FROM dbo.ic_brcledger BTX
WHERE BTX.B@ = BPY.B@
AND BTX.Trantype IN ('New Business','Transfrd NB','Renewal'));
instead of this?
AND BPY.PolRef@ NOT IN ('value1','value2','value3');
I usually work with specific, short lists of things a value can't be or can be. So I'm curious if NOT EXISTS is actually better than calling out specific values in NOT IN or if it only works better when having to do a subquery.
October 24, 2018 at 8:51 am
Thom A - Wednesday, October 24, 2018 8:19 AMJonathan AC Roberts - Wednesday, October 24, 2018 7:50 AMOh, I hadn't read the full post.
I always use NOT EXISTS over NOT IN. I'd never heard anyone say EXISTS is a poorer performer.
Did you look at the execution plans?The NOT IN (when using BTX.Polref@ = BPY.Polref@) and NOT EXISTS have identical plans, the LEFT JOIN has a different one.The estimated plan does differ if the BTX.Polref@ = BPY.Polref@ isn't included in the NOT IN, there are additional parallelism added and the Clustered Index Seek on the Ledger changes to a Clustered Index Scan.
Like yourself, I always use EXISTS over IN, so why I'm asking here. I've not really witnessed anyone be so adamant about something and then not back it up (apart from saying, trust me, I'm an expert). if what theya re saying is true, I'm certainly not seeing it.
Looks to me like you've met someone full of BS. (and there are plenty of people like that about)
October 24, 2018 at 8:59 am
Brandie Tarvin - Wednesday, October 24, 2018 8:46 AMOkay, so I do have an additional question. Why use the specific WHERE clauses you've used? Why not test the NOT IN with specific values?Why are you using this:
AND BPY.PolRef@ NOT IN (SELECT BTX.Polref@
FROM dbo.ic_brcledger BTX
WHERE BTX.B@ = BPY.B@
AND BTX.Trantype IN ('New Business','Transfrd NB','Renewal'));instead of this?
AND BPY.PolRef@ NOT IN ('value1','value2','value3');
I usually work with specific, short lists of things a value can't be or can be. So I'm curious if NOT EXISTS is actually better than calling out specific values in NOT IN or if it only works better when having to do a subquery.
It's not a short list, Brandie (there are 225810 distinct B@ + Polref@ values in the ledger). If were a small number, I totally agree, I'd use NOT IN ('Value1','Value2',etc...).
The query I've actually used (well the NOT EXISTS), isn't actually one I would normally use (it's not an EXISTS), but it was a base line query I could quickly change to a NOT EXISTS. Normally the query I have would look like this (just the FROM AND WHERE):
That filters my dataset to live (insurance) policies only (as users have a habit of creating a policy, and then instead of marking the termination date, simply delete the transaction and forget about it... /sigh).
From my perspective, what I have works great, both NOT EXISTS and EXISTS, and instead I'm wonder why I have been told with such determination that the method in "the worst", when it clearly isn't. Perhaps it's a hang over from an older version of SQL Server? Perhaps it's something that's changed in the new cardinality estimator (which I don't have access to till me upgrade next(?) year). Perhaps it's that the person just mistaken and refuses to accept that actually they're wrong. I'm using this as a learning exercise to see if I'm missing something obvious or if, going forward, when using a (NOT) IN/EXISTS I should change to IN, from EXISTS.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
October 24, 2018 at 10:24 am
Brandie Tarvin - Wednesday, October 24, 2018 8:46 AMOkay, so I do have an additional question. Why use the specific WHERE clauses you've used? Why not test the NOT IN with specific values?Why are you using this:
AND BPY.PolRef@ NOT IN (SELECT BTX.Polref@
FROM dbo.ic_brcledger BTX
WHERE BTX.B@ = BPY.B@
AND BTX.Trantype IN ('New Business','Transfrd NB','Renewal'));instead of this?
AND BPY.PolRef@ NOT IN ('value1','value2','value3');
I usually work with specific, short lists of things a value can't be or can be. So I'm curious if NOT EXISTS is actually better than calling out specific values in NOT IN or if it only works better when having to do a subquery.
Looks to me like the polref relationship is complicated, when the requirement given was not to include the trantypes listed. Users probably can't give a good list for the polref.
-------------------------------------------------------------------------------------------------------------------------------------
Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses
October 24, 2018 at 2:50 pm
One thing to take into consideration is that the first time you ran the test - the NOT IN actually loaded the buffer cache which is probably the majority of the performance hit. Look at the read-ahead reads for icp_brpolicy and icp_brcledger...
The second execution of the NOT IN with the added parameter does not have those reads.
What happens if you switch up the order and run the NOT EXISTS first and then NOT IN?
Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 1794, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_brcledger'. Scan count 5, logical reads 194280, physical reads 1, read-ahead reads 186739, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'icp_yyclient'. Scan count 8, logical reads 696, physical reads 1, read-ahead reads 628, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
October 24, 2018 at 6:04 pm
Ask the ring knocker if he likes pork chops and then feed him your results.
Lordy, I hate such "experts". As always, one good test is worth a thousand expert opinions.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 25, 2018 at 4:48 am
Jeff Moden - Wednesday, October 24, 2018 6:04 PMAsk the ring knocker if he likes pork chops and then feed him your results.Lordy, I hate such "experts". As always, one good test is worth a thousand expert opinions.
In my expert opinion, there is no such thing as a good test.
<duck> @=)
Viewing 15 posts - 1 through 15 (of 32 total)
You must be logged in to reply to this topic. Login to reply