  • Hi Guys,

    I have written a very hasty piece of code and have not been given much time to review this part but, because it is so hasty, I know in my water that this is not best practice, though it works and is picking up 85% of the email addresses (even with rubbish as well). It should be able to be written so that it picks up 9x% of clean data and I know that some of you have done similar work before.

    I am trying to pull out a single email address from data, Test data can be as shown below. As you can see, its pretty dirty. It doesn’t pick just the email address. It ignores any with more than one email address (while ideally the first would be best option) and I end up with just 1,7,8 and 9 with more than just email address.

    Ideally it should be all with possibly the exception of 4 which I don’t really know how to deal with and possibly 5.

    A quick table

    Create Table DBO.Email_Checker


    KeyField Int,

    Email_Address Varchar(75)


    Some Test Data

    Insert Into DBO.Email_Checker

    Values (1,'Freddy strange has an email address of');

    Insert Into DBO.Email_Checker

    Values (2,' /');

    Insert Into DBO.Email_Checker

    Values (3,'Email: & User1@Google');

    Insert Into DBO.Email_Checker

    Values (4,'');

    Insert Into DBO.Email_Checker

    Values (5,'Email USER2@giraffe,gaff');

    Insert Into DBO.Email_Checker

    Values (6,' –');

    Insert Into DBO.Email_Checker

    Values (7,'Johns sharp');

    Insert Into DBO.Email_Checker

    Values (8,'');

    Insert Into DBO.Email_Checker

    Values (9,'Mobile: 09120923123 Email');

    The Ugly Code -

    select KeyField, Ltrim(Rtrim(Lower(Email_Address))) AS Email

    from DBO.Email_Checker

    Where Lower(Email_Address) LIKE '%_@__%.__%'

    AND LEN(Email_Address) - LEN(REPLACE(Email_Address,'@','')) =1

    Drop Table

    --Drop Table DBO.Email_Checker

    Any ideas?

  • its not much better but here is another way that retrives any email address if its the last test in your string and has a space.

    select reverse(substring(reverse(Email_Address),0,charindex(' ',reverse(Email_Address))))

    from DBO.Email_Checker

  • As you said the data here is horrible to start with. Here is one way to deal with it. Use the DelimitedSplit8K function to first parse each string element apart based on spaces. Then take the list of all string segments and determine which ones are valid emails. The code for the DelimitedSpklit8K can be found by following the link in my signature about splitting strings.

    The email part is in fact quite timely. This topic is being discussed right now in another thread.

    I would recommend using the one that Lowell posted using System.Net.MailAddress as a way to validate the address. Of course this assumes you are able to use CLR.

    Using those two functions I was able to return 11 of the possible 12 emails in your sample code. The one that gets missed ( is just simply never going to be able to be determined with code.

    Here is the code that I came up with for this.

    --this will split everything into segments first

    ;with cte as


    select * from email_Checker

    cross apply dbo.DelimitedSplit8K(Email_address, ' ')


    , ParseEmail as


    select *, dbo.IsValidEmail(Item) as IsValidEmail from cte


    select * from ParseEmail

    where IsValidEmail = 1

    Make sure you read and understand the code in both of these functions. The splitter function is super fast and easy to use, but understanding it might take a bit of reading and rereading. The CLR function is pretty straight forward.


