October 6, 2010 at 10:05 am
I have a table with nearly 6,000 rows that needs to be updated with new rows periodically. The update is based on whether the email address already exists. Email is the PK.
The query has been changed and tweaked, but no matter what I do (including using WHERE NOT EXISTS in place of WHERE NOT IN) I get 3 duplicate emails blocking the insert.
This is the code I am using:
INSERT INTO InternalML.dbo.TempEmail
(Email)
SELECT DISTINCT c.Email
FROM AAService.dbo.Customers c WITH (NOLOCK)
WHERE c.Email NOT IN (SELECT t.email
FROM InternalML.dbo.TempML t)
Unfortunately, I cannot post sample data that duplicates the situation. Mocking up data doesn't make this happen and the actual data consists of real email addresses.
I am hoping someone will look at this and see something I don't.
Thanks!
October 6, 2010 at 10:11 am
jerry-621596 (10/6/2010)
I have a table with nearly 6,000 rows that needs to be updated with new rows periodically. The update is based on whether the email address already exists. Email is the PK.The query has been changed and tweaked, but no matter what I do (including using WHERE NOT EXISTS in place of WHERE NOT IN) I get 3 duplicate emails blocking the insert.
This is the code I am using:
INSERT INTO InternalML.dbo.TempEmail
(Email)
SELECT DISTINCT c.Email
FROM AAService.dbo.Customers c WITH (NOLOCK)
WHERE c.Email NOT IN (SELECT t.email
FROM InternalML.dbo.TempML t)
Unfortunately, I cannot post sample data that duplicates the situation. Mocking up data doesn't make this happen and the actual data consists of real email addresses.
Subquery should check in InternalML.dbo.TempEmail rather than in InternalML.dbo.TempML 🙂
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.October 6, 2010 at 10:12 am
Do you have leading/trailing spaces on the fields showing as duplicates? You could try doing ltrim(rtrim(t.email)) on the where not in clause.
October 6, 2010 at 10:14 am
do the trimming in the SELECT DISTINCT also
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
October 6, 2010 at 10:19 am
Sorry, I should have been more clear. The query is inserting into the TempEmail table so I can join back to the table with the conflict to see which emails are duplicates.
I am checking back to TempML with this query:
SELECT e.email, t.email
FROM dbo.TempEmail e
INNER JOIN dbo.TempML t ON e.email = t.email
October 6, 2010 at 10:20 am
I did try trimming on all of the email columns. It made the query run much longer, but came up with the same results.
October 6, 2010 at 10:23 am
Can you try this?
INSERT INTO InternalML.dbo.TempEmail
(Email)
SELECT DISTINCT
Ltrim(Rtrim(c.Email))
FROM AAService.dbo.Customers c WITH (NOLOCK)
EXCEPT
SELECT Ltrim(Rtrim(t.email))
FROM InternalML.dbo.TempML t WITH (NOLOCK)
edit: ah just saw your post about no difference with the trims... I really don't know where to go from here without data.
October 6, 2010 at 10:37 am
Not sure what the difference was with EXCEPT but that worked perfectly!
Thanks!
October 6, 2010 at 10:40 am
jerry-621596 (10/6/2010)
Not sure what the difference was with EXCEPT but that worked perfectly!Thanks!
Well...logically there would be no difference, but I've been using EXCEPT on all 2005+ queries and have had more consistent results and better performance compared to where not in or where not exists..I do not know why. Good to hear it worked though 😀
October 6, 2010 at 1:05 pm
October 6, 2010 at 1:09 pm
pavan_srirangam (10/6/2010)
Try this use group by instead of distinct.Distinct utilizes more resources.
What is your reasoning for this?
You get the same execution plan either way (index scan + hash match)
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply