October 12, 2010 at 6:30 am
here is my query
INSERT INTO contacts (email, business)
SELECT email, business
FROM scrapes
WHERE (dbo.vaValidEmail(email) = 1)
AND (NOT email IN (SELECT email FROM failures))
AND (NOT email IN (SELECT email FROM removes))
AND (NOT email IN (SELECT email FROM contacts));
I keep getting
Msg 2627, Level 14, State 1, Line 2
Violation of UNIQUE KEY constraint 'UQ__contacts__02FC7413'. Cannot insert duplicate key in object 'dbo.contacts'.
The statement has been terminated.
I thought with the query above I was checking to see if email is in failures, removes and contacts before attempting to insert, but this is obviously not working since it crashes, how to I check to see if "email" is in the table before trying to insert row...
OR
How do I make the query skip (or drop) the duplicate and continue to the next record?
October 12, 2010 at 7:16 am
Check again - looks like it's failing on business rather than email.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 12, 2010 at 10:27 am
You may have duplicates in your SELECT statement. Adding a DISTINCT clause may resolve this.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 12, 2010 at 10:49 am
also, don't you have to exclude NULLS from the underlying emails if you are using NOT IN:
INSERT INTO contacts (email, business)
SELECT email, business
FROM scrapes
WHERE (dbo.vaValidEmail(email) = 1)
AND (NOT email IN (SELECT email FROM failures where email is not null))
AND (NOT email IN (SELECT email FROM removes where email is not null))
AND (NOT email IN (SELECT email FROM contacts where email is not null));
Lowell
October 12, 2010 at 12:22 pm
regarding the nulls, I don't think so because the imported table is already checked for nulls in that field but it can't help.
regarding hanging on the business... business is nat a UNIQUE field so why would it be hanging on business?
regarding DISTINCT... I tried a SELECT DISTINCT email, business statement, but it wanted both email and business together to be distinct. that's not what I want. I just want to check to see if the email is already in the table (no matter what the business) before inserting it. and when I have the DISTINCT on and the UNIQUE CONSTRAINT off on the email, then I get duplicate emails in the table.
The only UNIQUE field is the email, don't want dups there.
How the heck to I get them in there from I list that I know has duplicate email addresses in?
I even tried doing a SELECT DISTINCT email, business on the raw data INTO temporary table and then imported the temporary table... that is when I figured out the DISTINCT email, business is checking for a combined value of email+business.
October 12, 2010 at 12:44 pm
Yeah, INs can have odd results, and are horrible for performance.
Try NOT EXISTS(), viz:
INSERT INTO contacts (email, business)
SELECT email, business
FROM scrapes s
WHERE (dbo.vaValidEmail(email) = 1)
AND (NOT EXISTS(SELECT 1 FROM contacts c WHERE c.email = s.email))
AND (NOT EXISTS(SELECT 1 FROM failures f WHERE f.email = s.email))
AND (NOT EXISTS(SELECT 1 FROM removes r WHERE r.email = s.email))
Scott Pletcher, SQL Server MVP 2008-2010
October 12, 2010 at 12:59 pm
aniccadhamma (10/12/2010)
regarding DISTINCT... I tried a SELECT DISTINCT email, business statement, but it wanted both email and business together to be distinct. that's not what I want. I just want to check to see if the email is already in the table (no matter what the business) before inserting it. and when I have the DISTINCT on and the UNIQUE CONSTRAINT off on the email, then I get duplicate emails in the table.
It sounds like the problem you have is that you have email addresses with multiple distinct business names. You can use the ROW_NUMBER() function and partition by the email address and order by whatever expression will give you the business name that you want to insert in your OVER() clause. From there you can select only the records with ROW_NUMBER() = 1.
Drew
NOTE: If you had provided DDL for the tables and sample data, you would have gotten a tested solution.
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 12, 2010 at 1:01 pm
Try using EXCEPT
INSERT INTO contacts (email, business)
SELECT email, business
FROM scrapes
WHERE (dbo.vaValidEmail(email) = 1)
EXCEPT
SELECT email, business
FROM contacts;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537October 12, 2010 at 1:22 pm
This part is the issue:
AND (NOT email IN (SELECT email FROM failures))
AND (NOT email IN (SELECT email FROM removes))
AND (NOT email IN (SELECT email FROM contacts));
I'd be willing to bet that the email you're trying to insert exists in the contacts table, but not in either failure or removes. Remember with AND clauses, all 3 need to be true for it to go through.
I'd also recommend using EXCEPT instead of NOT IN or NOT EXISTS.
October 12, 2010 at 2:29 pm
Derrick Smith (10/12/2010)
This part is the issue:AND (NOT email IN (SELECT email FROM failures))
AND (NOT email IN (SELECT email FROM removes))
AND (NOT email IN (SELECT email FROM contacts));
I'd be willing to bet that the email you're trying to insert exists in the contacts table, but not in either failure or removes. Remember with AND clauses, all 3 need to be true for it to go through.
I'd also recommend using EXCEPT instead of NOT IN or NOT EXISTS.
Actually, that part is NOT the issue. If the email is already in contacts, then the last expression will evaluate to FALSE causing the entire expression to evaluate to FALSE--regardless of the results of the other two expressions--preventing the duplicate record from being inserted in the contacts table. That is exactly what you want.
I do agree that using an EXCEPT clause will probably be much better. You may also want to have your subquery UNION (ALL) of the results from the three tables.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 12, 2010 at 2:40 pm
drew.allen (10/12/2010)
Derrick Smith (10/12/2010)
This part is the issue:AND (NOT email IN (SELECT email FROM failures))
AND (NOT email IN (SELECT email FROM removes))
AND (NOT email IN (SELECT email FROM contacts));
I'd be willing to bet that the email you're trying to insert exists in the contacts table, but not in either failure or removes. Remember with AND clauses, all 3 need to be true for it to go through.
I'd also recommend using EXCEPT instead of NOT IN or NOT EXISTS.
Actually, that part is NOT the issue. If the email is already in contacts, then the last expression will evaluate to FALSE causing the entire expression to evaluate to FALSE--regardless of the results of the other two expressions--preventing the duplicate record from being inserted in the contacts table. That is exactly what you want.
I do agree that using an EXCEPT clause will probably be much better. You may also want to have your subquery UNION (ALL) of the results from the three tables.
Drew
Yikes, you're right - was reading this backwards with the NOT INs. Skimped on the proofreading ๐
October 12, 2010 at 3:48 pm
thanks guys, will give EXCEPT a shot there
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply