August 31, 2009 at 11:31 am
rja.carnegie (8/31/2009)
There could be a role for this IN equivalent (I hope),
WHERE ( col = @value1 OR col = @value2 OR col = @value3 )
in which one or more of @value1, @value2, @value3 may be NULL, of course.
Is it better in that case to rewrite the query separately with one condition fewer, or to re-use one version? I would like to think that the server is smart enough not to spend much time on the comparison to NULL, which counts against writing a separate version for that case.
Sorry for beating this to death, but
You HAVE TO rewrite the query separately. If @value3 is NULL, for example, you have to re-write as
...WHERE col IN (@value1, @value2) OR col IS NULL
which will succeed if col matches @value1 or @value2, or if col is NULL.
...WHERE col IN (@value1, @value2, @value3 )
will ALWAYS fail if @value3 is NULL no matter what value is in col.
But, if your WHERE col IN
clause is a subselect::
...WHERE col IN (SELECT othercol FROM othertable)
you don't have that luxury and should avoid using IN if NULLs might be returned. In this simple case just use an INNER JOIN:
...INNER JOIN othertable ON col = othercol
if you don't want to match on NULLs and
...INNER JOIN othertable ON col = othercol OR col IS NULL
if you do.
August 31, 2009 at 6:29 pm
Andy DBA (8/31/2009)
Sorry for beating this to death, but
As am I. I think we've covered this ground already, and determined that the counter-intuitive behaviour is limited to NOT IN and NOT EXISTS...
Andy DBA (8/31/2009)
You HAVE TO rewrite the query separately. If @value3 is NULL, for example, you have to re-write as
...WHERE col IN (@value1, @value2) OR col IS NULL
which will succeed if col matches @value1 or @value2, or if col is NULL.
...WHERE col IN (@value1, @value2, @value3 )
will ALWAYS fail if @value3 is NULL no matter what value is in col.
Tsk tsk for the CAPS ;-). Also, what you have stated is incomplete or wrong, depending on how charitable the reader is:
SET ANSI_NULLS ON;
DECLARE @T TABLE (A INT NULL);
INSERT @T VALUES (1);
INSERT @T VALUES (2);
INSERT @T VALUES (NULL);
SELECT A FROM @T WHERE A IN (1, 2) -- 2 rows
SELECT A FROM @T WHERE A IN (1, 2, NULL) -- 2 rows
SELECT A FROM @T WHERE A IN (NULL) -- no rows! (query plan is a constant scan)
SET ANSI_NULLS OFF;
SELECT A FROM @T WHERE A IN (1, 2) -- 2 rows
SELECT A FROM @T WHERE A IN (1, 2, NULL) -- 3 rows!
SELECT A FROM @T WHERE A IN (NULL) -- 1 row!
SET ANSI_NULLS ON;
Andy DBA (8/31/2009)
But, if yourWHERE col IN
clause is a subselect::
...WHERE col IN (SELECT othercol FROM othertable)
you don't have that luxury and should avoid using IN if NULLs might be returned. In this simple case just use an INNER JOIN:
...INNER JOIN othertable ON col = othercol
if you don't want to match on NULLs and
...INNER JOIN othertable ON col = othercol OR col IS NULL
if you do.
This adds nothing new. NULL behaviour with IN is well documented and logical.
Consider:
SET ANSI_NULLS ON;
DECLARE @T TABLE (A INT NULL);
DECLARE @X TABLE (A INT NULL);
INSERT @T VALUES (1);
INSERT @T VALUES (2);
INSERT @T VALUES (NULL);
INSERT @X VALUES (1);
INSERT @X VALUES (NULL);
SELECT A FROM @X WHERE A IN (SELECT A FROM @T) -- 1 row (where A = 1)
SET ANSI_NULLS OFF;
SELECT A FROM @X WHERE A IN (SELECT A FROM @T) -- 2 rows (the NULL also matches now)
SET ANSI_NULLS ON;
Had you been referring to NOT IN, you would have more of a point. But again, this has been flogged to death already.
For completeness then:
SET ANSI_NULLS ON;
DECLARE @T TABLE (A INT NULL);
INSERT @T VALUES (1);
INSERT @T VALUES (2);
INSERT @T VALUES (NULL);
SELECT A FROM @T WHERE A NOT IN (1, 2) -- no rows
SELECT A FROM @T WHERE A NOT IN (1, 2, NULL) -- no rows
SELECT A FROM @T WHERE A NOT IN (NULL) -- no rows
SET ANSI_NULLS OFF;
SELECT A FROM @T WHERE A NOT IN (1, 2) -- 1 row (for the NULL)
SELECT A FROM @T WHERE A NOT IN (1, 2, NULL) -- no rows
SELECT A FROM @T WHERE A NOT IN (NULL) -- 2 rows! (for the non-NULL values)
SET ANSI_NULLS ON;
SET ANSI_NULLS ON;
DECLARE @T TABLE (A INT NULL);
DECLARE @X TABLE (A INT NULL);
INSERT @T VALUES (1);
INSERT @T VALUES (2);
INSERT @T VALUES (NULL);
INSERT @X VALUES (1);
INSERT @X VALUES (NULL);
SELECT A FROM @X WHERE A NOT IN (SELECT A FROM @T) -- No rows
SET ANSI_NULLS OFF;
SELECT A FROM @X WHERE A NOT IN (SELECT A FROM @T) -- No rows
SET ANSI_NULLS ON;
Paul
September 27, 2009 at 5:46 pm
These are my 2 cents are the OA:
RE Capitalizing keywords
IMO, this is an outdated suggestion. Back in the day when people used monochrome editors, it made sense. It might even make sense when posting to a forum which only shows code in the same color but in general, I prefer pascal casing as it is much easier to read. Frankly, what matters most is consistency and next is readability.
RE: "Use as few as possible variables."
Nonsense. I challenge the author to show noticiable metrics on the performance difference. IMO, clarity of code is almost always more important than fractional performance differences due to suggestions like this.
RE: Dynamic queries
If you use sp_executesql, then you do get the benefit of query plan reuse which the author point out later. Recompilation isn't the reason against dynamic queries. Obfuscation of code and security issues are the primary reasons against it.
RE: Recompiles
GilaMonster's post on 10-Aug says it quite well, "recompiles aren't always bad."
RE: Set vs Select
Again, clarity of code wins over fractional performance gains. The Set statement illustrates a far clearer intent than a Select statement to set variables. However, using the Select statement to set a bunch of variables should be seen as more of a coding shortcut than a real performance improver.
RE: Cast vs Convert
I would go further and emphasize that the reason to Cast is to make your intent clear. Convert can also be used to format the output which should be avoided in database access code.
RE: Avoid Distinct
Does Distinct really perform any better than Group By? Probably not. It is obviously a good idea to eliminate unnecessary uses of Distinct.
RE: Select Into
My preference would be for developers to avoid Select Into while it is fine for administrative tasks. The reason, again, is clarity of code especially when the resultset contains a large number of columns.
December 1, 2009 at 5:48 am
Sorry Arup, I wanted to rate this topic as 5 and by mistake I rated 1. Apologies for this.
The article is very good and helpful.
Regards,
Meenakshi Pande
December 9, 2010 at 12:20 pm
In regard to operators, just quickly concerning negating operations, ie... someColumn <> 'string' ,
don't forget that rows with a null in that column will not be returned. I just use
where somecolumn != 'string' or someColumn is null.
if that what's intended.
----------------------------------------------------
Viewing 5 posts - 241 through 244 (of 244 total)
You must be logged in to reply to this topic. Login to reply