March 31, 2016 at 9:05 am
Hello,
I am performing extensive string manipulation in a T-SQL script. Basically I'm parsing a single text string with an entire email chain in it. I need to parse out To:, From:, etc. and add all From:'s to a single column, all To:'s to a single column, etc.
I have this working in T-SQL at a rate of approx. 15 seconds per 1000 rows. At the data volumes I am working with, this won't work.
So, I'm considering a string parsing CLR function in C#. I believe this would improve speed but don't know how significantly. Any ideas?
Are there any other alternatives?
I'm looping through the dataset and for each row, I'm parsing the string for "To:", "From:", "CC:", and "BCC". I've got to consider:
1) Multiple email addresses on a single line (i.e. person1@co.com, person2@co2.com, person3@abc.com)
2) Single email per line.
(i.e. person1@co.com,
Here is a snippet of my code showing the string manipulation section:
IF PATINDEX('%To:%', @ExtractedTextLine) > 0
AND PATINDEX('%mailto%', @ExtractedTextLine) = 0
BEGIN
IF SUBSTRING(@ExtractedTextLine, LEN(@ExtractedTextLine) - 1, 1) = ','
BEGIN
SET @ToMultiLineFlag = 1;
END;
SET @To = @To + SUBSTRING(@ExtractedTextLine, PATINDEX('%To:%', @ExtractedTextLine) + 3, 10000) + ';';
IF SUBSTRING(@ExtractedTextLine, LEN(@ExtractedTextLine) - 1, 1) <> ','
BEGIN
SET @To = @To + ';';
END;
END;
IF @ToMultiLineFlag = 1
AND @Location = 'To'
AND PATINDEX('%To:%', @ExtractedTextLine) = 0
BEGIN
SET @To = @To + @ExtractedTextLine + ';';
END;
IF @ToMultiLineFlag = 1
BEGIN
SET @ToMultiLineFlag = 0;
IF @Location = 'To'
AND SUBSTRING(@ExtractedTextLine, LEN(@ExtractedTextLine) - 1, 1) = ','
BEGIN
SET @ToMultiLineFlag = 1;
END;
END;
SET @Location = '';
I'm looking for ideas on 1) how to speed up the T-SQL query or 2) how to use CLR for this.
Thanks,
Brett
March 31, 2016 at 9:21 am
CLR is by far the best way to do this, at least until SQL 2016 is released. You can use Delimited8KSplit found here on SSC.com to do this, although the 8K characters could be a limitation.
There is a massive article/forum thread on SSC.com, recently updated with SQL 2016 split command, that has an amazing collection of options (and benchmarking) for string splitting, including some very spiffy CLR code. I think Adam Machanic has some posted online too.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
March 31, 2016 at 9:28 am
Can you share some sample data and expected output?
When you mention single email per line, do you mean using a carriage return or setting a new row?
Which of your variables are parameters? What types are they?
This might not require CLR or splitting.
March 31, 2016 at 9:32 am
How are you with regular expressions? If you are going the clr route, look into .net's regex classes, i think you could find pre-defined patterns on the web for what you are trying to do.
March 31, 2016 at 10:19 am
Hi Luis,
I've took some data and changed the information but the format, etc. is there. This data is from a single column (nvarchar(max)) with an entire email chain. So there are CR's and LF's, etc. Since there is no guaranteed format of the data (email from Outlook, Thunderbird, iPhone, etc.), it's a little tricky.
******* Start sample data ******************
Subject: RE: Important info
From: "Johnson, Al" <al.johnson@biz.com>
Date: Thu, 1 Oct 2015 21:25:01
To: "Smith, Hector" <hector.smith@company.com>,
"Stiles, Harry" <harry.stiles@company.com>,
"Monk, Thelonious" <thelonious.monk@goodjazz.com>,
Email text. Bla bla bla.
Thoughts on important info...
Al Johnson
Office: 111-222-3333
Mobile: 777-888-9999
From: Bill Smith
Sent: Thursday, October 10, 2015 5:24 PM
To: Bach, Johann; Mozart, Amadeaus; Smith, Bob; Walters, Sally; Stewart, Marge
Subject: Important info
Dear valued person,
Email text goes here.
Very important info...
Thank you,
Bill
******* End sample data ******************
From the above example, the output should be 4 variables:
FROM = "Johnson, Al" <al.johnson@biz.com>; Bill Smith
TO = "Smith, Hector" <hector.smith@company.com>; "Stiles, Harry" <harry.stiles@company.com>; "Monk, Thelonious" <thelonious.monk@goodjazz.com>; Bach, Johann; Mozart, Amadeaus; Smith, Bob; Walters, Sally; Stewart, Marge
CC = ''
BCC = ''
Hope this helps.
Thanks,
Brett
March 31, 2016 at 10:47 am
If you are just pulling out the email addresses this is certainly something that can be resolved with a simple splitter, CHARINDEX, PATINDEX and a couple other simple tricks. I'll talk a look at what you posted a little later today.
-- Itzik Ben-Gan 2001
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply