August 20, 2013 at 4:24 am
The system I'm working on uses a lot of subqueries, and sums the results using UNIONS, which seems very fast. However, sometimes it also needs to exclude data sets as well. It is currently using EXCEPT, and this seems a lot slower.
I am looking at redeveloping this part of the code to make it more efficient, but I don't know which method will be the fastest. SQL 2012 often seems to be fairly smart when it comes to optimising queries, so I'm wondering if I will see any significant improvement by trying to use a JOIN or NOT IN instead.
My subqueries typically return 10's of millions of records, but only has one numeric column in each one.
Any suggestions are very gratefully recieved
August 20, 2013 at 4:50 am
Not sure about the EXCEPT, but...
http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/
http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 21, 2013 at 7:17 am
I have not used EXCEPT often, but when I have, I have found it very fast. The circumstances were very circumscribed: from a listing of about 200K phone numbers broken down into their three component parts, I used the except to determine which ones needed to be added to a list of millions. Usually about half were new. The alternative would have been to use a left join on three fields, which was slower.
I'm not saying it's faster in all or even most cases, but it has its place.
August 21, 2013 at 7:23 am
at least avoid use of NOT IN rather use NOT Exists
Pramod
SQL Server DBA | MCSE SQL Server 2012/2014
in.linkedin.com/in/pramodsingla/
http://pramodsingla.wordpress.com/
August 21, 2013 at 12:05 pm
Keep in mind that when you use EXCEPT you get distinct rows, so if you want to mimic it using NOT EXISTS (counting you have duplicate rows) then you will have to add DISTINCT sub-clause or GROUP BY clause appropriately and here is when you can have overhead if there is no index to support the operation.
Check both execution plans to see the differences.
--
AMB
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply