October 19, 2014 at 7:27 pm
...
October 19, 2014 at 8:49 pm
I'd start by simply filtering out any "document" that doesn't have an "@" sign in it. That's bound to cut out some of the documents.
The next step would be to split out the "@"s along with contiguous characters that show up in typical email addresses both before and after the "@" signs. This would be like a 2 part "split".
How many characters in your largest VARCHAR(MAX) document?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 19, 2014 at 9:27 pm
nm
October 20, 2014 at 3:34 am
May be home made email index will have better perfomance.
Use regexp using CLR or sp_OACreate 'VBScript.RegExp' to build a
#mailIndex (OriginalRowId, Pos, Mail). Build index and join with the 9 millions of mails. For mail rexep see http://www.regular-expressions.info/email.html
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply