So Simple, But Yet ....

  • I must be missing something silly.

    I have Table_A with 488 unique email addresses

    select distinct(email) from Table_A -- 488

    I have Table_B with several million records, including duplicate emails.

    Table_A email varchar(255)

    Table_B Email1 varchar(50)

    Zero Rcds:

    select A.Email from TableA A -- (0 row(s) affected)

    where A.Email NOT in (select email1 from Table_B)

    445 Rcds

    select A.Email from TableA A -- (445 row(s) affected)

    where A.Email in (select email1 from Table_B)

    So I have 43 records that are neither IN nor NOT IN Table_B

    (Query #2 is a copy of #1, just removed NOT. And Yes, the queries really are that simple, just changed the table names.)

    Please administer a dope slap.

    Edit: tweaked example a bit

  • If see 2 options:

    A. You're using a different column from Table1 (email & email1), but it could be a transcription error.

    B. You have null values on email column, check this:

    select email from TableA A

    where A.Email NOT in (select B.email from Table_B B WHERE B.email IS NOT NULL)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • "B" also gives me 445.

  • What if you try this in both directions?

    select a.Email

    from TableA a

    where not exists (select email1

    from Table_B b

    where b.email1 = a.email);

    If it gives you the desired result, then you have nulls in your referenced table. See Gail's article at http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/ for an excellent explanation.

    Edit: Oops. Hit {Enter} when typing the SQL. :crazy:

  • Ed, That gives me the 43 missing records.

  • So are they really missing? Check the first few and do a LIKE query to see if they really are in the second table or not? If not and the tables are supposed to match, then you have to ask where the data being populated from. If, as Luis pointed out, it's a transcription error, then you're never going to get the counts to match unless you correct the data.

    All we've shown with the exists is that you have NULLs. I prefer EXISTS over IN most of the time. I forgot who said it, but I read it in a book written by someone smarter than I am "True is never equal to false, but NULL is always unknown, which is neither true nor false." The EXISTS doesn't care about the NULLs, where IN does.

  • Yes, Table_B has 45,000 nulls in Email1, but that shouldn't be related to the 43 records. I can't see what is different about them. All 488 came from an Excel spreadsheet. Some funny hidden characters ? But they should still either be IN or NOT IN.

    I picked a few of the 43 missing, and did select * from Table_B where email1 like '%Joe@MSN.com%' and get 0

    So, I guess the Nulls are the culprint. Either go with "not Exists" or omit Nulls.

  • Your first query returns 0 rows because of the null email addresses in tableb.

    IN is essentially a shorthand for a series of equality clauses, so

    WHERE A NOT IN (1,2,3)

    becomes

    WHERE

    A <> 1

    AND A <> 2

    AND A <> 3

    The problem is when there is a null in your set, it becomes the equivalent of

    WHERE

    A <> 1

    AND A <> 2

    AND A <> 3

    AND A <> NULL

    Any evaluation against a null like that returns FALSE (actually 'unknown'), which is ANDed with the other clauses so the evaluation always returns FALSE no matter what the value of A is.

  • sestell1 (1/10/2014)


    Your first query returns 0 rows because of the null email addresses in tableb.

    IN is essentially a shorthand for a series of equality clauses, so

    WHERE A NOT IN (1,2,3)

    becomes

    WHERE

    A <> 1

    AND A <> 2

    AND A <> 3

    The problem is when there is a null in your set, it becomes the equivalent of

    WHERE

    A <> 1

    AND A <> 2

    AND A <> 3

    AND A <> NULL

    Any evaluation against a null like that returns FALSE (actually 'unknown'), which is ANDed with the other clauses so the evaluation always returns FALSE no matter what the value of A is.

    Ah... be careful with that. That actually forms a "double triangular join" which is basically an (N-1)2 lookup. In other words, it's almost a full blown Cartesian Product.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply