The Pitfall of "Not Equal To" operator in queries!
INTRODUCTION
The "Not Equal To" () operator is useful when you need to filter your data and exclude some rows from the result of query, but there are cases that when you really mean MyColumn<>'SomeValue', you should not use this operator in order to get the correct result!
Here is an example: Suppose that a Testing Center registers students for IT exams. Obviously each student can take more than one exam. Let's create the required tables and populate them with some rows:
CREATE TABLE Students ( StID INT PRIMAEY KEY, StName NVARCHAR(50)) GO INSERT Students VALUES (1,'Jack') INSERT Students VALUES (2,'Anna') INSERT Students VALUES (3,'Bob') GO CREATE TABLE StudentExam ( StID INT, ExamName VARCHAR(50)) INSERT StudentExam VALUES (1,'SQL Server') INSERT StudentExam VALUES (2,'VB.NET') INSERT StudentExam VALUES (2,'C#.NET') INSERT StudentExam VALUES (1,'XML')
In a perfect design, we would need another table called Exams which stores the full specifications of each exam, but I have denormalized it with StudentExam table for the sake of simplification. Now we are asked to prepare a report that lists information of students who have taken SQL Server exam:
SELECT s.* FROM Students s JOIN StudentExam se ON s.StID=se.StID WHERE se.ExamName='SQL Server' StID StName ----------- -------------------------------------------------- 1 Jack (1 row(s) affected)
Well, everything is OK so far, but the issue turns up when we are asked to query the students who have NOT taken SQL Server exam. The first thing that might come to mind is replacing the "Equal" operator in WHERE clause with "Not Equal To" operator:
SELECT s.* FROM Students s JOIN StudentExam se ON s.StID=se.StID WHERE se.ExamName<>'SQL Server' StID StName ----------- -------------------------------------------------- 1 Jack 2 Anna 2 Anna (3 row(s) affected)
Note that although Bob did not appear in the result because he has not taken any exam to satisfy an INNER JOIN, again Jack is there!
The subtle reason of this is that Jack still satisfies the WHERE clause because of his second exam, XML! This exam causes the WHERE clause return "True" when he is being checked by the condition. Basically you must take care of this problem when dealing with one-to-many joins. If the relationship between Students and StudentExam was one-to-one (each student could take only one exam), the previous query would work fine.
To produce the correct report, you can simply use a subquery to build the list of students who have taken SQL Server exam:
SELECT StID FROM StudentExam WHERE ExamName='SQL Server'
Then any student that does not appear in this list, must be returned for our report:
SELECT * FROM Students WHERE StID NOT IN (SELECT StID FROM StudentExam WHERE ExamName='SQL Server')
As you see in the graphical execution plan, an OUTER JOIN is performed by the Query Optimizer. Alternatively you may write this query and use an OUTER JOIN yourself:
SELECT s.* FROM Students s LEFT JOIN (SELECT StID FROM StudentExam WHERE ExamName='SQL Server') se ON s.StID=se.StID WHERE se.StID IS NULL