So Simple, But Yet ....

Question

Post reply

So Simple, But Yet ....

homebrew01

SSC Guru

Points: 55567
More actions
January 9, 2014 at 9:57 am

#281931

I must be missing something silly.
I have Table_A with 488 unique email addresses
select distinct(email) from Table_A -- 488
I have Table_B with several million records, including duplicate emails.
Table_A email varchar(255)
Table_B Email1 varchar(50)
Zero Rcds:
select A.Email from TableA A -- (0 row(s) affected)
where A.Email NOT in (select email1 from Table_B)
445 Rcds
select A.Email from TableA A -- (445 row(s) affected)
where A.Email in (select email1 from Table_B)
So I have 43 records that are neither IN nor NOT IN Table_B
(Query #2 is a copy of #1, just removed NOT. And Yes, the queries really are that simple, just changed the table names.)
Please administer a dope slap.
Edit: tweaked example a bit

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183706 More actions · Answer 1

If see 2 options:

A. You're using a different column from Table1 (email & email1), but it could be a transcription error.

B. You have null values on email column, check this:

select email from TableA A

where A.Email NOT in (select B.email from Table_B B WHERE B.email IS NOT NULL)

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

homebrew01 SSC Guru Points: 55567 More actions · Answer 2

homebrew01

SSC Guru

Points: 55567

January 9, 2014 at 10:12 am

#1678873

"B" also gives me 445.

Ed Wagner SSC Guru Points: 287026 More actions · Answer 3

What if you try this in both directions?

select a.Email

from TableA a

where not exists (select email1

from Table_B b

where b.email1 = a.email);

If it gives you the desired result, then you have nulls in your referenced table. See Gail's article at http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/ for an excellent explanation.

Edit: Oops. Hit {Enter} when typing the SQL. :crazy:

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

homebrew01 SSC Guru Points: 55567 More actions · Answer 4

homebrew01

SSC Guru

Points: 55567

January 9, 2014 at 10:37 am

#1678880

Ed, That gives me the 43 missing records.

Ed Wagner SSC Guru Points: 287026 More actions · Answer 5

So are they really missing? Check the first few and do a LIKE query to see if they really are in the second table or not? If not and the tables are supposed to match, then you have to ask where the data being populated from. If, as Luis pointed out, it's a transcription error, then you're never going to get the counts to match unless you correct the data.

All we've shown with the exists is that you have NULLs. I prefer EXISTS over IN most of the time. I forgot who said it, but I read it in a book written by someone smarter than I am "True is never equal to false, but NULL is always unknown, which is neither true nor false." The EXISTS doesn't care about the NULLs, where IN does.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

homebrew01 SSC Guru Points: 55567 More actions · Answer 6

Yes, Table_B has 45,000 nulls in Email1, but that shouldn't be related to the 43 records. I can't see what is different about them. All 488 came from an Excel spreadsheet. Some funny hidden characters ? But they should still either be IN or NOT IN.

I picked a few of the 43 missing, and did select * from Table_B where email1 like '%Joe@MSN.com%' and get 0

So, I guess the Nulls are the culprint. Either go with "not Exists" or omit Nulls.

sestell1 SSChampion Points: 10230 More actions · Answer 7

Your first query returns 0 rows because of the null email addresses in tableb.

IN is essentially a shorthand for a series of equality clauses, so

WHERE A NOT IN (1,2,3)

becomes

WHERE

A <> 1

AND A <> 2

AND A <> 3

The problem is when there is a null in your set, it becomes the equivalent of

WHERE

A <> 1

AND A <> 2

AND A <> 3

AND A <> NULL

Any evaluation against a null like that returns FALSE (actually 'unknown'), which is ANDed with the other clauses so the evaluation always returns FALSE no matter what the value of A is.

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 8

sestell1 (1/10/2014)
Your first query returns 0 rows because of the null email addresses in tableb.
IN is essentially a shorthand for a series of equality clauses, so
WHERE A NOT IN (1,2,3)
becomes
WHERE
A <> 1
AND A <> 2
AND A <> 3
The problem is when there is a null in your set, it becomes the equivalent of
WHERE
A <> 1
AND A <> 2
AND A <> 3
AND A <> NULL
Any evaluation against a null like that returns FALSE (actually 'unknown'), which is ANDed with the other clauses so the evaluation always returns FALSE no matter what the value of A is.

Ah... be careful with that. That actually forms a "double triangular join" which is basically an (N-1)² lookup. In other words, it's almost a full blown Cartesian Product.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)