February 21, 2007 at 6:25 pm
I read an article the other day that advised against the use of the NOT IN and NOT EXISTS statements. It said that even if the system finds the value in the subselect it will continue checking every single record rather than quitting there and then. It said that there should only be a few occasions where you need to use a negative clause and you should try to rewrite your queries to use IN or EXISTS instead.
So I have found a couple of instances of the use of not in, in my application.
Could someone give me some advice of how I could re-write the following insert statement so that it doesn't use NOT IN.
The insert statement is within a proc that is used to add selected candidates to selected folders from a website. The user could select 1 or more candidates and then select 1 or more folders to add them to. The candidate IDs and FolderIDs are passed into the proc as two comma seperated strings. I then use a udf SPLIT function that splits a string on a delimiter and a cross join to populate the link table all in one hit. I thought this would be a better idea than either looping through either the cands or folders one at a time and doing multiple inserts into the table. I only want to insert the candidate into the appropriate folder if they don't already exist and thats where the NOT IN comes in.
A simplified version is:
declare @CandIDs varchar(500), @FolderIDs varchar(500)
insert into tbl_FOLDER_CANDIDATES
(FolderID,CandidateID)
select a.[Value], b.[Value]
from dbo.udf_SPLIT(@CandIDs,',') as a, dbo.udf_SPLIT(@FolderIDs,',') as b
where a.[Value] not in(select CandidateID from tbl_FOLDER_CANDIDATES where FolderID=b.[Value])
I haven't had any problems with this method so far but taking on board this articles advice I'd like to know if I could improve it by replacing the NOT IN.
Any help would be much appreciated and I thank you in advance.
February 22, 2007 at 8:05 am
i don't see how/if A is related to b, but it is something like this i think: maybe left or right joins, you'd have to examine the actual data:
select a.[Value], b.[Value]
from tbl_FOLDER_CANDIDATES as c
FULL OUTER JOIN dbo.udf_SPLIT(@CandIDs,',') as a ON c.CandidateID =a.[Value]
FULL OUTER JOIN dbo.udf_SPLIT(@FolderIDs,',') as b ON c.FolderID = b.[Value]
--how is a related to b??? only thru C i think?
where c.candidateId is NULL
Lowell
February 22, 2007 at 9:24 am
Something like the following gets rid of the NOT IN but introduces the overhead of an OUTER JOIN.
After checking the same results are produced, you will need to look at the execution plan to see if it is better.
INSERT INTO tbl_FOLDER_CANDIDATES (CandidateID, FolderID)
SELECT C.Value, F.Value
FROM dbo.udf_SPLIT(@CandIDs,',') C
CROSS JOIN dbo.udf_SPLIT(@FolderIDs,',') F
LEFT JOIN tbl_FOLDER_CANDIDATES T
ON C.Value = T.CandidateID
AND F.Value = T.FolderID
WHERE T.CandidateID IS NULL
February 22, 2007 at 9:32 am
Ok thanks for both of those I will try them out soon and compare the Plans.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply