May 2, 2018 at 12:24 pm
CREATE TABLE #tblContacts
(
ContactID int,
Contact varchar(100)
)
GO
INSERT INTO #tblContacts (ContactID, Contact)
SELECT 1, 'Fred' UNION ALL
SELECT 2, 'Jim' UNION ALL
SELECT 3, 'Arthur'
GO
CREATE TABLE #tblEmails
(
EmailID int,
ContactID int,
EmailAddress varchar(100)
)
GO
INSERT INTO #tblEmails (EmailID, ContactID, EmailAddress)
SELECT 1, 1, 'fred@someorg.cam' UNION ALL
SELECT 2, 2, 'jim@goggle.cam' UNION ALL
SELECT 3, 3, 'Arthur@giggle.cam'
CREATE TABLE #tblNoEmail
(
EmailAddress varchar(100)
)
INSERT INTO #tblNoEmail(EmailAddress)
SELECT 'jim@goggle.cam'
GO
SELECT * FROM #tblContacts
INNER JOIN #tblEmails On #tblContacts.ContactID = #tblEmails.ContactID
WHERE #tblEmails.EmailAddress NOT IN (SELECT EmailAddress FROM #tblNoEmail)
GO
DROP TABLE #tblContacts
DROP TABLE #tblEmails
DROP TABLE #tblNoEmail
Jim has told us he does not want to be contacted by us but, for historical reasons, his contact record cannot be deleted. There are many dozens of places within a number of applications used by a number of different teams that can send an email to a list of people. I need to make sure no email is ever sent to his email address. I can't simply put a flag on the #tblEmails for reasons I won't bore you with. Apart from those reasons, I have been told to create #tblNoEmail and maintain a list of email addresses that must never be contacted. Whenever any email is sent from any of our applications (and there are about 20 teams using their own project planning systems over the same database) - I have to check any email address we are about to send an email to is not in 'tblNoEmail'
Something like ....
SELECT ContactID, Contact, EmailAddress
FROM #tblContacts
INNER JOIN #tblEmails On #tblContacts.ContactID = #tblEmails.ContactID
WHERE #tblEmails.EmailAddress NOT IN (SELECT EmailAddress FROM #tblNoEmail)
should do it (although there is a syntax error in my code above that I can't find and that statement above will not run for some reason ? ). But, given there are 200,000 contact records, about 500,000 records in #tblEmails and will be a few thousand in #tblNoEmail - what's the most efficient way of always making sure that any email in a list of email addresses I am about to send an email to will not contain any of the email addresses in #tblNoEmail
May 2, 2018 at 12:37 pm
Your query just needed aliases.
A fixed version, along with an alternative which may perform better (you should test, of course)
SELECT * FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
WHERE e.EmailAddress NOT IN (SELECT EmailAddress FROM #tblNoEmail)
SELECT * FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
WHERE NOT EXISTS (SELECT 1 FROM #tblNoEmail ne WHERE ne.EmailAddress = e.EmailAddress)
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 2, 2018 at 12:39 pm
webskater - Wednesday, May 2, 2018 12:24 PM
CREATE TABLE #tblContacts
(
ContactID int,
Contact varchar(100)
)GO
INSERT INTO #tblContacts (ContactID, Contact)
SELECT 1, 'Fred' UNION ALL
SELECT 2, 'Jim' UNION ALL
SELECT 3, 'Arthur'GO
CREATE TABLE #tblEmails
(
EmailID int,
ContactID int,
EmailAddress varchar(100)
)GO
INSERT INTO #tblEmails (EmailID, ContactID, EmailAddress)
SELECT 1, 1, 'fred@someorg.cam' UNION ALL
SELECT 2, 2, 'jim@goggle.cam' UNION ALL
SELECT 3, 3, 'Arthur@giggle.cam'CREATE TABLE #tblNoEmail
(
EmailID int
);INSERT INTO #tblNoEmail(EmailID)
SELECT 2;GO
SELECT *
FROM #tblContacts AS c
INNER JOIN #tblEmails AS e ON c.ContactID = e.ContactID
WHERE e.EmailID NOT IN (SELECT n.EmailID FROM #tblNoEmail AS n);GO
DROP TABLE #tblContacts
DROP TABLE #tblEmails
DROP TABLE #tblNoEmailJim has told us he does not want to be contacted by us but, for historical reasons, his contact record cannot be deleted. There are many dozens of places within a number of applications used by a number of different teams that can send an email to a list of people. I need to make sure no email is ever sent to his email address. I can't simply put a flag on the #tblEmails for reasons I won't bore you with. Apart from those reasons, I have been told to create #tblNoEmail and maintain a list of email addresses that must never be contacted. Whenever any email is sent from any of our applications (and there are about 20 teams using their own project planning systems over the same database) - I have to check any email address we are about to send an email to is not in 'tblNoEmail'
Something like ....
SELECT *
FROM #tblContacts AS c
INNER JOIN #tblEmails AS e ON c.ContactID = e.ContactID
WHERE e.EmailID NOT IN (SELECT n.EmailID FROM #tblNoEmail AS n);should do it (although there is a syntax error in my code above that I can't find and that statement above will not run for some reason ? ). But, given there are 200,000 contact records, about 500,000 records in #tblEmails and will be a few thousand in #tblNoEmail - what's the most efficient way of always making sure that any email in a list of email addresses I am about to send an email to will not contain any of the email addresses in #tblNoEmail
I'd say that the most efficient way is to use the EmailID instead of the full address. You would also need proper indexes and Foreign Key constraints. Your code should also use table aliases and qualified names for the columns (using the table aliases). Other than that, I don't see much improvement. I did some changes on your code.
May 2, 2018 at 2:27 pm
Here is one alternative:
With validEmail (EmailAddress)
As (
Select te.EmailAddress
From #tblEmails te
Except
Select tne.EmailAddress
From #tblNoEmail tne
)
Select tc.ContactID
, tc.Contact
, te.EmailID
, te.ContactID
, te.EmailAddress
From #tblContacts tc
Inner Join #tblEmails te On te.EmailID = tc.ContactID
Inner Join validEmail ve On ve.EmailAddress = te.EmailAddress;
Test for performance, of course...
Another option would be to create the no email the table exactly the same as the email table. Then, you can perform an easy insert from the email table into the no email table using all columns.
With validEmail
As (
Select te.EmailID
, te.ContactID
, te.EmailAddress
From #tblEmails te
Except
Select tne.EmailID
, tne.ContactID
, tne.EmailAddress
From #tblNoEmail tne
)
Select tc.ContactID
, tc.Contact
, ve.EmailID
, ve.ContactID
, ve.EmailAddress
From #tblContacts tc
Inner Join validEmail ve On ve.ContactID = tc.ContactID;
To further simplify the changes you need to make - create a view to replace the email table using the second version above and replace #tblEmail with your view.
Jeffrey Williams
βWe are all faced with a series of great opportunities brilliantly disguised as impossible situations.β
β Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 2, 2018 at 2:55 pm
As noted, use NOT EXISTS. Also, uniquely cluster the NoEmail table on the EmailAddress.
...
CREATE TABLE #tblNoEmail
(
EmailAddress varchar(100) PRIMARY KEY CLUSTERED
)
...
SELECT *
FROM #tblContacts C
INNER JOIN #tblEmails E ON C.ContactID = E.ContactID
WHERE NOT EXISTS (
SELECT 1
FROM #tblNoEmail NE
WHERE E.EmailAddress = NE.EmailAddress
)
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 2, 2018 at 5:32 pm
Thank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.
A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?
And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail
I could do this ...(or similar using the CTE solution above)
SELECT
e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;
Is there a better way to do that?
May 3, 2018 at 8:11 am
webskater - Wednesday, May 2, 2018 5:32 PMThank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?
And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail
I could do this ...(or similar using the CTE solution above)
SELECT
e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;
Is there a better way to do that?
As mentioned before, the index on tblNoEmail should be a clustered index on the single column it has (which I still recommend to keep it as int to reduce the size of the data).
If you need to show the whole universe of emails including the ones on the "no list", that's a good way to do it. If you only need to show active emails, use NOT IN or NOT EXISTS.
I recommend to stay away from the CTE with EXCEPT because that needs to read the Emails table twice and that usually means that more resources are needed.
May 3, 2018 at 10:46 am
webskater - Wednesday, May 2, 2018 5:32 PMThank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?
And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail
I could do this ...(or similar using the CTE solution above)
SELECT
e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;
Is there a better way to do that?
That seems to be a good way to do based on your required output.
The simplest solution is often the best one especially in SQL, and also the query you have written is in an uncomplicated way, and therefore you do not have to worry about he best way the optimizer would implement the solution(SQL is a declarative language, i would rather not to specify the algorithm the database should choose to fetch me the data; rather i would keep things such as statistics up to date, and add indexes if it makes sense, add constraints etc anything that helps the optimizer to make smart decisions)
Some people worry about the join "order" in a query; the database would be smart enough to see the query you are requesting as a whole and decide which joins it should do first then the next etc.
May 3, 2018 at 12:00 pm
george_at_sql - Thursday, May 3, 2018 10:46 AMwebskater - Wednesday, May 2, 2018 5:32 PMThank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?
And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail
I could do this ...(or similar using the CTE solution above)
SELECT
e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;
Is there a better way to do that?That seems to be a good way to do based on your required output.
The simplest solution is often the best one especially in SQL, and also the query you have written is in an uncomplicated way, and therefore you do not have to worry about he best way the optimizer would implement the solution(SQL is a declarative language, i would rather not to specify the algorithm the database should choose to fetch me the data; rather i would keep things such as statistics up to date, and add indexes if it makes sense, add constraints etc anything that helps the optimizer to make smart decisions)
Some people worry about the join "order" in a query; the database would be smart enough to see the query you are requesting as a whole and decide which joins it should do first then the next etc.
True enough about trusting the optimizer, but remember that it's only as good as the statistics you keep....
Steve (aka sgmunson) π π π
Rent Servers for Income (picks and shovels strategy)
May 3, 2018 at 12:33 pm
sgmunson - Thursday, May 3, 2018 12:00 PMgeorge_at_sql - Thursday, May 3, 2018 10:46 AMwebskater - Wednesday, May 2, 2018 5:32 PMThank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?
And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail
I could do this ...(or similar using the CTE solution above)
SELECT
e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;
Is there a better way to do that?That seems to be a good way to do based on your required output.
The simplest solution is often the best one especially in SQL, and also the query you have written is in an uncomplicated way, and therefore you do not have to worry about he best way the optimizer would implement the solution(SQL is a declarative language, i would rather not to specify the algorithm the database should choose to fetch me the data; rather i would keep things such as statistics up to date, and add indexes if it makes sense, add constraints etc anything that helps the optimizer to make smart decisions)
Some people worry about the join "order" in a query; the database would be smart enough to see the query you are requesting as a whole and decide which joins it should do first then the next etc.
True enough about trusting the optimizer, but remember that it's only as good as the statistics you keep....
Well, but the optimizer has a pre-set order of operations in some cases, including the order in which it evaluates clauses, which means for some things how you write them can still have a dramatic effect on performance. And from direct experience, I can tell you that using [NOT] EXISTS rather than LEFT JOIN is one of those things which can have a big performance impact in certain cases.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 3, 2018 at 1:12 pm
ScottPletcher - Thursday, May 3, 2018 12:33 PMsgmunson - Thursday, May 3, 2018 12:00 PMgeorge_at_sql - Thursday, May 3, 2018 10:46 AMwebskater - Wednesday, May 2, 2018 5:32 PMThank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?
And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail
I could do this ...(or similar using the CTE solution above)
SELECT
e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;
Is there a better way to do that?That seems to be a good way to do based on your required output.
The simplest solution is often the best one especially in SQL, and also the query you have written is in an uncomplicated way, and therefore you do not have to worry about he best way the optimizer would implement the solution(SQL is a declarative language, i would rather not to specify the algorithm the database should choose to fetch me the data; rather i would keep things such as statistics up to date, and add indexes if it makes sense, add constraints etc anything that helps the optimizer to make smart decisions)
Some people worry about the join "order" in a query; the database would be smart enough to see the query you are requesting as a whole and decide which joins it should do first then the next etc.
True enough about trusting the optimizer, but remember that it's only as good as the statistics you keep....
Well, but the optimizer has a pre-set order of operations in some cases, including the order in which it evaluates clauses, which means for some things how you write them can still have a dramatic effect on performance. And from direct experience, I can tell you that using [NOT] EXISTS rather than LEFT JOIN is one of those things which can have a big performance impact in certain cases.
Don't disagree. Had the same experiences, but again, as you indicate, those are non-choice scenarios, whereas I was referring more generally to the optimizer as a whole, only to make a point about keeping stats up to date.
Steve (aka sgmunson) π π π
Rent Servers for Income (picks and shovels strategy)
May 3, 2018 at 11:04 pm
ScottPletcher - Thursday, May 3, 2018 12:33 PMsgmunson - Thursday, May 3, 2018 12:00 PMgeorge_at_sql - Thursday, May 3, 2018 10:46 AMwebskater - Wednesday, May 2, 2018 5:32 PMThank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?
And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail
I could do this ...(or similar using the CTE solution above)
SELECT
e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;
Is there a better way to do that?That seems to be a good way to do based on your required output.
The simplest solution is often the best one especially in SQL, and also the query you have written is in an uncomplicated way, and therefore you do not have to worry about he best way the optimizer would implement the solution(SQL is a declarative language, i would rather not to specify the algorithm the database should choose to fetch me the data; rather i would keep things such as statistics up to date, and add indexes if it makes sense, add constraints etc anything that helps the optimizer to make smart decisions)
Some people worry about the join "order" in a query; the database would be smart enough to see the query you are requesting as a whole and decide which joins it should do first then the next etc.
True enough about trusting the optimizer, but remember that it's only as good as the statistics you keep....
Well, but the optimizer has a pre-set order of operations in some cases, including the order in which it evaluates clauses, which means for some things how you write them can still have a dramatic effect on performance. And from direct experience, I can tell you that using [NOT] EXISTS rather than LEFT JOIN is one of those things which can have a big performance impact in certain cases.
I agree to that about EXISTS and LEFT JOIN , what i was mentioning about JOIN order was
SELECT cols
FROM bigtable a
JOIN smalltable b
ON a.col=b.col
JOIN smallesttable c
ON b.col=c.col
and that it doesnt matter if you put the bigtable as the first part of your join order followed by smalltable followed by smallesttable
If the statistics is up to date i wouldn't want to worry about making a choice of which table should be first in my SELECT query.
What i take care is to filter as early as possible while joining
eg: say i was joining many tables in the query i would filter the condition in the query like below.
SELECT *
FROM largeemptable a
JOIN smalltables b
ON a.col=b.col
AND a.empid=1000
JOIN anothertable c
ON c.col=b.col
.......................etc etc
JOIN 10thtable d
rather than
SELECT *
FROM largeemptable a
JOIN smalltables b
ON a.col=b.col
JOIN anothertable c
ON c.col=b.col
.......................etc etc
JOIN 10thtable d
WHERE conditions
AND some_other_conditions
AND .....
AND ..
AND a.empid=1000
May 4, 2018 at 2:59 am
Instead of remembering to always join onto your exclusion table why not update the customer's email address to an internal address/nothing and thus guarantee they never get emailed? If someone forgets to use #tblNoEmail in future and Jim gets emailed you risk running foul of things like GDPR which could have financial consequences for your company.
May 4, 2018 at 4:26 am
Have a DistributionList:
ContactId
MediaId (lookup for email, post address, phone, messangers, etc.)
(?) ExpiryDate
_____________
Code for TallyGenerator
May 4, 2018 at 4:37 am
george_at_sql - Thursday, May 3, 2018 11:04 PMScottPletcher - Thursday, May 3, 2018 12:33 PMsgmunson - Thursday, May 3, 2018 12:00 PMgeorge_at_sql - Thursday, May 3, 2018 10:46 AMwebskater - Wednesday, May 2, 2018 5:32 PMThank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?
And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail
I could do this ...(or similar using the CTE solution above)
SELECT
e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;
Is there a better way to do that?That seems to be a good way to do based on your required output.
The simplest solution is often the best one especially in SQL, and also the query you have written is in an uncomplicated way, and therefore you do not have to worry about he best way the optimizer would implement the solution(SQL is a declarative language, i would rather not to specify the algorithm the database should choose to fetch me the data; rather i would keep things such as statistics up to date, and add indexes if it makes sense, add constraints etc anything that helps the optimizer to make smart decisions)
Some people worry about the join "order" in a query; the database would be smart enough to see the query you are requesting as a whole and decide which joins it should do first then the next etc.
True enough about trusting the optimizer, but remember that it's only as good as the statistics you keep....
Well, but the optimizer has a pre-set order of operations in some cases, including the order in which it evaluates clauses, which means for some things how you write them can still have a dramatic effect on performance. And from direct experience, I can tell you that using [NOT] EXISTS rather than LEFT JOIN is one of those things which can have a big performance impact in certain cases.
I agree to that about EXISTS and LEFT JOIN , what i was mentioning about JOIN order was
SELECT cols
FROM bigtable a
JOIN smalltable b
ON a.col=b.col
JOIN smallesttable c
ON b.col=c.coland that it doesnt matter if you put the bigtable as the first part of your join order followed by smalltable followed by smallesttable
If the statistics is up to date i wouldn't want to worry about making a choice of which table should be first in my SELECT query.What i take care is to filter as early as possible while joining
eg: say i was joining many tables in the query i would filter the condition in the query like below.
SELECT *
FROM largeemptable a
JOIN smalltables b
ON a.col=b.col
AND a.empid=1000
JOIN anothertable c
ON c.col=b.col
.......................etc etc
JOIN 10thtable drather than
SELECT *
FROM largeemptable a
JOIN smalltables b
ON a.col=b.col
JOIN anothertable c
ON c.col=b.col
.......................etc etc
JOIN 10thtable d
WHERE conditions
AND some_other_conditions
AND .....
AND ..
AND a.empid=1000
SQL Server will apply filters and joins in whatever order is calculated to be most efficient. In the example above, moving the filter from the where clause to the join is unlikely to change the execution plan.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply