NULL Equals NULL?

  • Thanks Jack, that is really good to know. That is the assumption I was leaning towards since it is the only thing that seemed to make sense to me, but it is really nice to hear someone else confirm it.

    I love getting all the details I can to help my queries perform better, so thanks for helping me increase my knowledge. 🙂

  • Thanks for your quick reply, Jack. I will give that a shot. I'm not always good at asking the right question. In my problem query, I have 10 columns of data. In each row, only one of the columns produces any data - the rest are blank. So, I'm trying to create an "AS" column that concatenates all ten columns into one (I could do this easily in Access 2007 using the IIF statement, now I've "moved up" to MS SQL). (All of this has to do with populating a Grid View for production credits for my film reviews - I'm primarily a writer, so please laugh along with me as I trip over jargon and try to venture into your area of expertise).

    Cheers,

    James:)

  • Cimode (2/26/2007)


    Obviously the article is posted by somebody who has no clue about database fundamental theory...

    [snip]

    Cimode (2/26/2007) questions whether 3VL is useful.

    Sure, NULL values complicate applications, because they cannot be treated like other values. But that is true for application languages as well. There are ways to avoid NULLs in the database and still represent UNKNOWN without the use of magic values. I'm not an expert on that, and leave it for someone else to discuss. And you'll still have to deal with NULL values AFAIK.

    However, reality is that the world is full of databases with NULL values in them. We can choose to work with them - or limit our employment opportunities to companies who don't allow NULLs at all. I'm afraid I'd be hungry if I did... :->

    Sooo - think about NULL this way:

    NULL means UNKNOWN.

    This is different from 0 or nothing. An example is "middle name". If your source data only has first and last name, the middle name will be UNKNOWN, and NULL is the best representation for middle name in a single table implementation of names. If you refine the name data, you will find that some people do not have a middle name. Then their middle name is no longer unknown, and should not be represented by a NULL value.

    NULL means ANY OF THE ABOVE - WE DON'T KNOW.

    A null doesn't represent a single value - it represents any of the possible values. It would conceivably match any specific value, we just don't know which one. Therefore we can not match it against anything. Including another unknown. If x is unknown and y is unknown, do we know if x = y? No. Do we know if x <> y? x < y? x > y? The only thing we know is that they are both unknown.

    I'm not sure what you are trying to prove with your "proofs", but it is not clear that you have a better grasp on this than most. You are either just expressing dislike with the way things are (3VL), or you haven't quite understood the use and consequences of NULLs. Assuming you do understand it better than most, the following may be useful for those that don't:

    Cimode's Proof1:

    -----------------

    Table with:

    field1 field2

    1 1

    2 1

    3 2

    4 NULL

    You say:

    "select sum(field1 + field2) from table1

    --> it returns 10

    then run...

    select(field1) + sum(field2) from table1

    which should produce the same righ. Wrong!!

    --> it return 14 !

    So use of NULLS will mess up your sum results...

    "

    Take a closer look at how these two examples evaluate - that will help clarify why you get two different results:

    select sum(field1 + field2) from table1

    will do field1 + field2 first, which gives

    1 + 1 = 2

    2 + 1 = 3

    3 + 2 = 5

    4 + NULL = NULL

    Then it will aggregate the resulting values.

    Aggregate functions ignore NULLs. This is well known and well documented. For instance, BOL states "Null values in a column are ignored while an aggregate function is operating" (See Aggregate Functions > Null Values). The ANSI SQL standard treats them the same way. (Note: Count(*) can be considered a special case). In the ANSI-92 SQL, Aggregate functions are called "Set functions" (a quick search didn't yield a SQL-1999 or SQL-2003 standard online text):

    "6.5 ", "General Rules":

    ...

    1) Case:

    a) If COUNT(*) is specified, then the result is the cardinality

    of T.

    b) Otherwise, let TX be the single-column table that is the

    result of applying the to each row of T

    and eliminating null values. If one or more null values are

    eliminated, then a completion condition is raised: warning-

    null value eliminated in set function.

    "

    (Quote from http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt)

    The aggregate Sum() yields:

    Sum(field1 + field2) = Sum({2, 3, 5, NULL}) = Sum({2, 3, 5}) = 10

    The other expression:

    Sum(field1) + Sum(field2)

    will give you:

    Sum({1, 2, 3, 4}) + Sum({1, 1, 2, NULL}) = 10 + Sum({1, 1, 2}) = 14

    Your "proofs" support the article quite nicely, it is just your dislike of having to deal with the UNKNOWN that is questionable... :->


    Regards,

    Tore Bostrup

  • I just performed a quick + dirty test to see the performance difference of using COALESCE in the WHERE clause. I ran three tests on the same simple data set:

    - A query of the form WHERE COALESCE(column, 1) = 1 generated a query plan with a cost of 0.0328005

    - Two queries, one with a WHERE column = 1 and another with WHERE column is NULL, unioned together with UNION ALL produced a query plan with a significantly lower cost of 0.008941

    - Changing the query to WHERE column = 1 OR column IS NULL produced a query plan with a cost of 0.0058602, almost 6 times better than the COALESCE version

    Using COALESCE, ISNULL, or any functions in the WHERE clause or in the ON clause for join criteria will kill your performance. As for COALESCE, it's too bad the optimizer isn't smart enough to realize that COALESCE(column, 1) is equivalent to (column = 1 OR column IS NULL).

  • FYI, COALESCE worked perfectly - one less problem to solve!

    Thanks for your patience and advice.

    Happy Thanksgiving, Canadian style:)

    James

  • Hello all,

    Formatting questions for VWD 2008.

    How to turn 9051234567 into (905) 123-4567

    L2N7L5 to L2N 7L5

    So easy to do in Access and Adobe but I've hit another brick wall in SQL.:w00t:

    Suggestions?

    Thanks!

    James

  • James,

    This question should really be posted in the Development Forum, not in the discussion forum for an article about nulls. You will likely get more responses there.

    As far as formatting goes, when you are using SQL Server you typically would handle formatting issues in the UI, not the database. While you can do string manipulation and formatting in SQL it is not designed to do that well. Although with CLR programming and UDT's you could probably accomplish it easier now. You have to remember that Access includes UI and DB functionality together, so it has more formatting functionality.

  • Thanks Jack - always helpful.

    Cheers,

    James

  • Sorry if this has already been covered but after reading 6 pages of unconstructive criticism by cimode I couldn't take any more.

    Regarding Rule 4, I had an issue when using coalesce in a procedure containing the following query

    UPDATECustomer

    SETSalutation = HF.Title,

    Initials = HF.Initials,

    Country = HF.Country,

    Email = HF.Email,

    Telephone1 = HF.Telephone,

    GiftAidFlag = CASE HF.GiftAidFlag WHEN 'Y' THEN 1 ELSE 0 END,

    AscentID = HF.AscentID

    FROMCustomer

    INNER JOIN[TMP.Housefile] HF ON Customer.Hashtext = HF.Hashtext

    --NOTE DON'T USE COALESCE FUNCTION HERE AS QUERY EXECUTION PLAN IS DIFFERENT SOMEHOW (COULDN'T SEE IN DESIGNER)

    --AND TAKES TOO LONG TO EXECUTE (IF IT DOES)

    WHEREisnull(Customer.AscentID,space(0)) = space(0)

  • Sorry if this has already been covered but after reading 6 pages of unconstructive criticism by cimode I couldn't take any more.

    Regarding Rule 4, I had an issue when using coalesce in a procedure containing the following query

    UPDATECustomer

    SETSalutation = HF.Title,

    Initials = HF.Initials,

    Country = HF.Country,

    Email = HF.Email,

    Telephone1 = HF.Telephone,

    GiftAidFlag = CASE HF.GiftAidFlag WHEN 'Y' THEN 1 ELSE 0 END,

    AscentID = HF.AscentID

    FROMCustomer

    INNER JOIN[TMPHousefile] HF ON Customer.Hashtext = HF.Hashtext

    WHEREcoalesce(Customer.AscentID,space(0)) = space(0)

    Essentially, this query takes forever to execute (as you may have guessed from the comments in my previous misposting), however as soon as I change the where clause to isnull(Customer.AscentID, space(0)) = space(0) then the query executes in no time at all.

    The query plans were different - for the coalesce where clause the cost for a clustered index scan on Tmphousefile is 91% and a subsequent 6% cost on a subsequent nested loop (inner join), but for the isnull, the table scan is only 53% and is followed by a Hash Match (inner join) with a 30% cost. As I said there is a marked difference between the performance of the 2.

    Great articles by the way, I just wanted to point this out and ask if you've come across this before.

  • It looks like you are trying to find records where Customer.AscentId Is Null, so why wouldn't you just use Where Customer.AscentId Is Null as the criteria in the Where clause, were you just using this to test performance of IsNull vs. Coalesce? You should try to avoid using a function on a column in the where clause as you will never get an index seek, the best you will get is a n index scan and many times a table scan.

  • Thanks for the comment Jack.

    I used isnull/coalesce here because I had to consider both NULLs and empty string values owing to how different processes update this table.

    As it happens I have no problems at all now that I'm using isnull so there is no pressing need to address the inconsistant data here as it causes no other issues.

    Basically I wanted to highlight the difference in the performance of the two functions that I experienced here, given Rule 4 in the four rules article; I only started using coalesce in earnest after reading these articles but I guess that Microsoft have geared performance towards their own standard in this instance, perhaps because isnull is a simpler function than coalesce?

  • Hi BonnieMan,

    COALESCE() is different from ISNULL(), and I would guess that the difference in performance you experienced is probably owing to the differences in how COALESCE() and ISNULL() implicitly cast their result. ISNULL() implicitly casts the result to the data type of the first parameter, COALESCE() tries a sort of 'best-fit' strategy for the result based on all parameters. My guess (this is just a guess) would be that this would account for the differences in your query plan and performance. Using ISNULL() has it's own issues as well

    Consider this example:

    CREATE TABLE #t (i INT PRIMARY KEY,

    j TINYINT)

    INSERT INTO #t (i, j)

    SELECT 1, 1

    UNION SELECT 5, NULL

    UNION SELECT 7, NULL

    UNION SELECT 10, 10

    UNION SELECT 255, 255

    The following query will error out with a type conversion:

    SELECT *

    FROM #t

    WHERE ISNULL(j, 257) = 257

    COALESCE() works though. I would recommend rewriting your query to eliminate the function from the WHERE clause altogether, something like this:

    UPDATE Customer

    SET Salutation = HF.Title,

    Initials = HF.Initials,

    Country = HF.Country,

    Email = HF.Email,

    Telephone1 = HF.Telephone,

    GiftAidFlag = CASE HF.GiftAidFlag WHEN 'Y' THEN 1 ELSE 0 END,

    AscentID = HF.AscentID

    FROM Customer

    INNER JOIN [TMPHousefile] HF ON Customer.Hashtext = HF.Hashtext

    WHERE Customer.AscentID IS NULL OR Customer.AscentID = ''

  • Hi Mike, thanks for your input.

    I've changed the WHERE clause now as a matter of course, but out of interest I ran both queries on exactly the same data using both the isnull and OR where clauses and found the execution plans and relative costs to be the same in both.

    BTW, out of curiousity I ran your sample script and amended the WHERE clause to be

    WHERE COALESCE(j, '257') = 257

    running the amended script results in the following message:

    Msg 244, Level 16, State 1, Line 12

    The conversion of the varchar value '257' overflowed an INT1 column. Use a larger integer column

    but using WHERE COALESCE(j,'255') = 255 returns 3 records.

    also WHERE COALESCE(j, 11.4) = 11.4 returns 2 records like your script does

    it looks like using the char value forced some kind of data typing based on the column to take place whereas using a numeric value just used the broadest numeric data type.

  • That's because the integer/smallint/tinyint data types have higher precedence in conversions, so the implicit conversions will try to convert character types to int* types. The COALESCE function you used only has two data types to choose from, tinyint and char/varchar, so it tries to implicitly cast the char/varchar to a tinyint and fails. The following query will fail for the same reason:

    SELECT CASE WHEN 257 = 'A257' THEN 'Equal' ELSE 'Unequal' END

    SQL tries to cast the 'A257' to an integer because of the type casting rules.

Viewing 15 posts - 91 through 105 (of 117 total)

You must be logged in to reply to this topic. Login to reply