Working with email addresses in SQL Server |
About myself My technical skills My favorites My picture album Shortcut keys My code library VB resources SQLServer resources SQLServer books Replication FAQ Scripting resources ASP resources Search my site Sign my guestbook Contact information SQL Server Articles New ![]() |
Working with email addresses in SQL Server ALTER TABLE [dbo].[Subscribers] WITH CHECK ADD CONSTRAINT [EmailValidator] CHECK ( CHARINDEX(' ',LTRIM(RTRIM([Email]))) = 0 -- No embedded spaces AND LEFT(LTRIM([Email]),1) <> '@' -- '@' can't be the first character of an email address AND RIGHT(RTRIM([Email]),1) <> '.' -- '.' can't be the last character of an email address AND CHARINDEX('.',[Email],CHARINDEX('@',[Email])) - CHARINDEX('@',[Email]) > 1 -- There must be a '.' after '@' AND LEN(LTRIM(RTRIM([Email]))) - LEN(REPLACE(LTRIM(RTRIM([Email])),'@','')) = 1 -- Only one '@' sign is allowed AND CHARINDEX('.',REVERSE(LTRIM(RTRIM([Email])))) >= 3 -- Domain name should end with at least 2 character extension AND (CHARINDEX('.@',[Email]) = 0 AND CHARINDEX('..',[Email]) = 0) -- can't have patterns like '.@' and '..' ) GO
The above CHECK constraint is added on the column "Email", of the table "Subscribers". Notice the comments at the end of each line. Also notice that the constraint is being added with "WITH CHECK" option. So, the constraint will not get created if your table already has invalid email addresses, in which case, you will see the following error message: SELECT * FROM Subscribers WHERE NOT ( CHARINDEX(' ',LTRIM(RTRIM([Email]))) = 0 AND LEFT(LTRIM([Email]),1) <> '@' AND RIGHT(RTRIM([Email]),1) <> '.' AND CHARINDEX('.',[Email],CHARINDEX('@',[Email])) - CHARINDEX('@',[Email]) > 1 AND LEN(LTRIM(RTRIM([Email]))) - LEN(REPLACE(LTRIM(RTRIM([Email])),'@','')) = 1 AND CHARINDEX('.',REVERSE(LTRIM(RTRIM([Email])))) >= 3 AND (CHARINDEX('.@',[Email]) = 0 AND CHARINDEX('..',[Email]) = 0) )Consider that the table "Subscribers", stores the email addresses (along with other details) of people subscribed to your online newsletter. Now you want to find out how many subscribers you have from each domain (for example, Hotmail, Yahoo etc.). Here's the query that does that: SELECT RIGHT(Email, CHARINDEX('@', REVERSE(Email))-1) AS [Domain Name], COUNT(Email) AS [Number of Subscribers] FROM dbo.Subscribers GROUP BY RIGHT(Email, CHARINDEX('@', REVERSE(Email))-1) ORDER BY [Number of Subscribers] DESC, [Domain Name]
The above query extracts the domain names from the email addresses and groups by the domain name, to get the count of subscribers from each domain. Output is sorted in the descending order of number of subscribers, followed by the domain name in ascending order, so that the most popular domain name appears at the top of the list. Here's the sample output:
Domain Name Number of Subscribers ----------------- --------------------- hotmail.com 197 yahoo.com 167 msn.com 160 aol.com 151 attbi.com 149 earthlink.net 147 lycos.com 143 mindspring.com 139 vsnl.com 135 worldnet.att.net 130
You will notice that we used an expression in the SELECT statement, to extract the domain name out of the email address. We had to repeat the same expression in the GROUP BY clause. This is no good. If you change the expression in the column list, you must update the GROUP BY clause accordingly. This makes the query a good candidate for "Views". You could create a view on "Subscribers" table, embedding the expression in the view definition and updating your queries to select from the view instead of the table. If for whatever reason, you need to change the expression that extracts the domain name, you will only change the view, but your queries will remain the same. Cool, isn't it? Here's how the view and your queries will look like:
CREATE VIEW SubscribersView AS SELECT Name, DOJ, Email, RIGHT(Email, CHARINDEX('@', REVERSE(Email))-1) AS DomainName, Country FROM dbo.Subscribers GO SELECT DomainName, COUNT(Email) AS [Number of Subscribers] FROM dbo.SubscribersView GROUP BY DomainName ORDER BY [Number of Subscribers] DESC, DomainName GO As the expression is hidden inside the view definition, the query looks much better and easy to read and is more maintainable. Now, you are asked to generate a report containing name and email address of all those subscribers that are using Hotmail. The following query makes use of the above view and generates the report: SELECT Name, Email FROM dbo.SubscribersView WHERE DomainName = 'hotmail.com' GO
Things are looking great. But have you checked the execution plan of the above query? (You can enable execution plan display by pressing Ctrl + K in Query Analyzer). Though there is an index defined on the Email column, you will see that the query is not using the index efficiently. It is doing a slower 'index scan', rather than a faster 'index seek'. This is because the query optimizer is unable to predict the result of this expression in advance and hence unable to use the index. So, how can we get this query to do an 'index seek' then?
ALTER TABLE dbo.Subscribers ADD DomainName AS RIGHT(Email, CHARINDEX('@', REVERSE(Email))-1) GO CREATE NONCLUSTERED INDEX NU_NC_IDX_Subscribers_DomainName ON dbo.Subscribers(DomainName) GO Now, if you run the following query, the execution plan will reveal an index seek on the computed column's index and the query performs much faster, compared to the previous query against the view SubscribersView: SELECT Name, Email FROM dbo.Subscribers WHERE DomainName = 'hotmail.com' GO
Do investigate the "Indexed Views" feature of SQL Server 2000, that allows the creation of indexes on views. Though it is useful, it has a lot of restrictions. See SQL Server Books Online for more information on Indexed Views.
CREATE TABLE Countries ( CountryCode char(4), Country varchar(40) ) GO2. Populate this Countries table with the country extension and country names, using this script! This script has 252 different country codes and names. Click here to download the script! Now, you can write queries, by joining the subscribers table with the countries table, as shown below: SELECT c.Country, COUNT(*) AS [Number of Subscribers] FROM dbo.Subscribers s JOIN dbo.Countries c ON RIGHT(s.Email, CHARINDEX('.', REVERSE(s.Email))-1) = c.CountryCode GROUP BY c.Country ORDER BY COUNT(*) Desc, c.Country ASC GOYou can simplify the above query by creating a view and querying that view instead, as shown below: CREATE VIEW dbo.CountryView AS SELECT c.Country FROM dbo.Subscribers s JOIN dbo.Countries c ON RIGHT(s.Email, CHARINDEX('.', REVERSE(s.Email))-1) = c.CountryCode GO SELECT Country, COUNT(*) AS [Number of Subscribers] FROM dbo.CountryView GROUP BY Country ORDER BY COUNT(*) DESC, Country ASC GO We covered the storage and query aspects of email addresses. Last but not the least, what about security? Email address is a very private piece of information and should be dealt with carefully. You need to make sure, only the right applications and right people are able to see this column. You don't want your junior DBA to generate a list of email addresses and sell it to your competitor, do you? If you don't trust your junior DBA or anyone else that has got access to the data, create a database role, place these users in that role and deny SELECT permissions for this role on the Email column. For example, the following scripts will add a new login called JuniorDBA, grant him access to Newsletter database, deny permission on Email column, by setting the column level permissions: USE Newsletter GO EXEC sp_addlogin 'JuniorDBA', '3rfsdc#' GO EXEC sp_grantdbaccess 'JuniorDBA' GO EXEC sp_addrolemember 'db_datareader', 'JuniorDBA' GO EXEC sp_addrole 'Restricted' GO DENY SELECT ON Subscribers(Email) TO Restricted GO EXEC sp_addrolemember 'Restricted', 'JuniorDBA' GO That's about it for now. Will update this page with more info, as and when I learn something new, so be sure to check back again. |