Parse multiple dates from Text Field

Question

Parse multiple dates from Text Field

Tim Daly

SSC Enthusiast

Points: 198
More actions
May 18, 2009 at 4:25 pm

#204381

I have a text field containing from one to x number of dates along with comments such as:
'8/9/08 - out for upgrade; 8/16/08 returned to inventory. 8/31/2008 - deployed to field.'
I need to extract all dates, as well as all associated status notes from the table into separate DATE & TEXT fields into a new target table.
Any suggestions as to how to accomplish this would be MUCH appreciated!
Thank you!

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Dave-3000 SSCertifiable Points: 6186 More actions · Answer 1

By "text" field, do you mean varchar? If so, is it in a SQL Server table? Is there a common delimiter among date/status groupings, like a semicolon or comma? Is there one status field per date?

There is no "i" in team, but idiot has two.

Lowell SSC Guru Points: 323755 More actions · Answer 2

first get everything into a temp table, and make sure you convert to varchar(max)...it's a lot of trouble fiddling with TEXT datatypes.

something simply like SELECT convert(varchar(max),YourTextField) as BetterTextField,* into #tmp From YourTable

then you can do something with charindex and pattern matching, like '%[0-9]/[0-9]/[0-9][0-9][0-9][0-9]%' for1/1/xxxx type dates, with multiple passes for two digit months and year combinations.

That should help to find the start of your date and comment, but you've got to be able to figure out where one item ends and the next begins.

is it always {date} {space}some text ?

if we can establish a pattern, we can use a tally table to pull out all the strings, then parse the strings for date/comment separately. I've done that parsing and split for web pages and dictionary definitions and similar stuff.

can you give us some sample data for the text fields?

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Tom257 SSChampion Points: 11543 More actions · Answer 3

If you have access to visual studio .NET it may be worth looking at a CLR function. You will have access to the regular expression class which will make pattern maching easier.

http://blogs.msdn.com/sqlclr/archive/2005/06/29/regex.aspx

I found that one useful.

Tim Daly SSC Enthusiast Points: 198 More actions · Answer 4

Thank you Dave, Lowell & Tom for your replies.

Here are the answers to your questions:

Dave:

The source table is SQL Server, and the field is TEXT.

Unfortunately there is no common delimeter in the field, so I will have to take multiple passes for each type of entry - it may be a space, a comma or a semicolon. And there is a status for each date entry.

Lowell:

Thank you for the info on the trouble with the TEXT datatype. I'll take your recommendation to load to a temp table and convert it to varchar.

I did already start down the path of the CHARINDEX with the '%[0-9]/[0-9]/[0-9][0-9][0-9][0-9]%' , so guess I was headed down the right path afterall.

Here are a couple of actual entries that show highlight the challenge, NO consistancy!:

08/04/2003 - REC'D JOB FROM KAYLA - REQUEST FOR PDC TO GET R/W; 9/7/04 JON CANCELLED PER CRAIG BULLOCK BECAUSE OWNER CANNOT BE REACHED (LISA MURPHY OF PDC)

08/21/2003 - RESEARCH GIGI. NEEDS EXHIBIT A & R/W TO BE OBTAINED BY PDC. 09/12/2003: REC'D NOTE FROM PDC THAT SURVEY WAS COMPLETED.09/16/2003 - FIELD NOTES TO MI FILE, COPY OF DWG SENT TO J. TODD

Tom:

I do have access to Visual Studio, just have minimal knowledge of using it. I'll check out the link and see if I can figure it out as well.

Lowell SSC Guru Points: 323755 More actions · Answer 5

I'm fiddling with this, and if we have consistent delimiters, it's REALLY easy to pull out with a tally table.

my dilemma is that it looks like multiple passes of updates would have to occur to insert delimiters before and after each possible date pattern.

here's what I'm trying so far: I'm trying to exclude a charindex where the pipe character is part of the pattern...i thought [^\|] woudl work, but it is not...

see if this gets anyone's ideas working better than mine:

create table #example(exampleid int,

exampletext varchar(max)

)

insert into #example

SELECT 1,'08/04/2003 - REC''D JOB FROM KAYLA - REQUEST FOR PDC TO GET R/W; 9/7/04 JON CANCELLED PER CRAIG BULLOCK BECAUSE OWNER CANNOT BE REACHED (LISA MURPHY OF PDC)'

UNION ALL

SELECT 2,'08/21/2003 - RESEARCH GIGI. NEEDS EXHIBIT A & R/W TO BE OBTAINED BY PDC. 09/12/2003: REC''D NOTE FROM PDC THAT SURVEY WAS COMPLETED.09/16/2003 - FIELD NOTES TO MI FILE, COPY OF DWG SENT TO J. TODD'

UNION ALL

SELECT 1,'8/9/08 - out for upgrade; 8/16/08 returned to inventory. 8/31/2008 - deployed to field.'

--x/x/xx dates

UPDATE #example

SET exampletext = SUBSTRING(exampletext,1,CHARINDEX('[^\|][0-9]/[0-9]/[0-9][0-9][^0-9]',exampletext))

+ '|'

+ SUBSTRING(exampletext,CHARINDEX('[^\|][0-9]/[0-9]/[0-9][0-9][^0-9]',exampletext),8000)

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Dave-3000 SSCertifiable Points: 6186 More actions · Answer 6

Tim,

If you can use something other than T-SQL to build a solution to your problem (C#, VB) it will be easier. If so, you should look at regular expressions that can find the date values and insert a delimiter. Then take whatever is left as the comments. I've done a few gigs like this, and always used C# or VB6 to read the data, perform the transformation, and write to a table. Good luck.

There is no "i" in team, but idiot has two.

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 7

Hi Tim

I took Lowells advice to use a tally table based solution in a nested style (first seen from Bob).

@Lowell:

Thanks for the test data!

This should work. I just don't know about the performance with very large data or a large count of rows.

[font="Courier New"]DECLARE @t TABLE (Id INT NOT NULL IDENTITY, Txt TEXT)

INSERT INTO @t

SELECT '8/9/08 8/31/08'

UNION ALL SELECT '08/04/2003 - REC''D JOB FROM KAYLA - REQUEST FOR PDC TO GET R/W; 9/7/04 JON CANCELLED PER CRAIG BULLOCK BECAUSE OWNER CANNOT BE REACHED (LISA MURPHY OF PDC)'

UNION ALL SELECT '08/21/2003 - RESEARCH GIGI. NEEDS EXHIBIT A & R/W TO BE OBTAINED BY PDC. 09/12/2003: REC''D NOTE FROM PDC THAT SURVEY WAS COMPLETED.09/16/2003 - FIELD NOTES TO MI FILE, COPY OF DWG SENT TO J. TODD'

UNION ALL SELECT '8/9/08 - out for upgrade; 8/16/08 returned to inventory. 8/31/2008 - deployed to field.'

; WITH

t (Id, Txt) AS

(

SELECT

Id,

',' + CONVERT(VARCHAR(MAX), Txt) + ','

FROM @t

)

SELECT

Id,

t2.Item

FROM t

CROSS APPLY

(

SELECT

CONVERT(DATETIME, SUBSTRING(t.Txt, tally1.N + 1, tally2.N - (tally1.N + 1))) Item

FROM Tally tally1

CROSS APPLY

(

SELECT TOP(1)

N

FROM Tally

WHERE

N > tally1.N

AND N <= LEN(t.Txt)

AND SUBSTRING(t.Txt, N, 1) IN (',', ';', ' ', ':')

ORDER BY N

) tally2

WHERE tally1.N < LEN(t.Txt)

AND SUBSTRING(t.Txt, tally1.N, 1) IN (',', ';', ' ', ':')

AND ISDATE(SUBSTRING(t.Txt, tally1.N + 1, tally2.N - (tally1.N + 1))) = 1

) t2

[/font]

Greets

Flo

Edit: Just noticed the colon as possible delimiter

Tim Daly SSC Enthusiast Points: 198 More actions · Answer 8

Thank you Lowell & Flo.....

I'll work with this code to see if I can't get it to work!

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 9

Hi Tim

I didn't notice that this was the "Newbie" forum...

If you have any questions, feel free to come back here!

Greets

Flo

Tim Daly SSC Enthusiast Points: 198 More actions · Answer 10

Thanks again Flo.....

I may need to take you up on that! The code as is gives me some errors that I am trying to debug now.

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 11

Errors? I just copy/pasted into an empty query window and executed. I don't get any errors.

Err, I think I know what you need...

You probably don't have a Tally table. Execute this script and the previous should work:

CREATE TABLE dbo.Tally (N INT NOT NULL)

INSERT INTO dbo.Tally

SELECT TOP(11000)

ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM master.sys.all_columns c1

CROSS JOIN master.sys.all_columns c2

ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally

PRIMARY KEY CLUSTERED

(N)

WITH (FILLFACTOR = 100)

To know how a tally table works you should read this article by Jeff Moden:

http://www.sqlservercentral.com/articles/T-SQL/62867/

Greets

Flo

Tim Daly SSC Enthusiast Points: 198 More actions · Answer 12

Well after scratching my head for a bit as to why the code wasn't working, found out that the source server is SQL 2000. :blush:

Guessing some of these commands are new in SQL 2005 and don't work in 2000.:unsure:

So unless anyone has any more ideas on how to handle on 2000, just going to have to foregoe the parsing.:satisfied:

Thanks for everyone's help....in particular you Flo!

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 13

Hi Tim

If you are not sure if you are not sure about your SQL Server version you can execute this statement. It shows the detailed version information:

PRINT @@VERSION

If you use SSE2k you are not able to use CROSS APPLY. Instead of this you have to use a CROSS JOIN. In addition you have to ensure that your source data are enclosed with a comma (or another specified delimiter).

Here a SSE2k version of my previous script. I added some comments to explain how it works:

[font="Courier New"]---==========================

-- Create a tally table if not yet done

IF (OBJECT_ID('dbo.Tally') IS NULL)

BEGIN

SELECT TOP 11000

IDENTITY(INT, 1, 1) AS N

INTO dbo.Tally

FROM MASTER.dbo.syscolumns c1

CROSS JOIN MASTER.dbo.syscolumns c2

ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally

PRIMARY KEY CLUSTERED

(N)

WITH FILLFACTOR = 100

END

GO

---==========================

-- Your source table

DECLARE @t TABLE (Id INT NOT NULL IDENTITY, Txt VARCHAR(8000))

INSERT INTO @t

SELECT '8/9/08 8/31/08'

UNION ALL SELECT '08/04/2003 - REC''D JOB FROM KAYLA - REQUEST FOR PDC TO GET R/W; 9/7/04 JON CANCELLED PER CRAIG BULLOCK BECAUSE OWNER CANNOT BE REACHED (LISA MURPHY OF PDC)'

UNION ALL SELECT '08/21/2003 - RESEARCH GIGI. NEEDS EXHIBIT A & R/W TO BE OBTAINED BY PDC. 09/12/2003: REC''D NOTE FROM PDC THAT SURVEY WAS COMPLETED.09/16/2003 - FIELD NOTES TO MI FILE, COPY OF DWG SENT TO J. TODD'

UNION ALL SELECT '8/9/08 - out for upgrade; 8/16/08 returned to inventory. 8/31/2008 - deployed to field.'

-- We need a leading and trailing delimiter

UPDATE @t SET Txt = ',' + Txt + ','

---===========================

-- Split the source data and return the dates

SELECT

t.Id,

-- Get the text part between startPos and endPos

CONVERT(DATETIME, SUBSTRING(t.Txt, startPos.N + 1, MIN(endPos.N) - (startPos.N + 1))) Item

FROM @t t

-- First tally table to scan for date start position

CROSS JOIN Tally startPos

-- Second tally table to scan for date end

CROSS JOIN Tally endPos

WHERE

-- The start position has to be less than complete length of text

startPos.N < LEN(t.Txt)

-- Determine if the current position is one of the specified delimiter

AND SUBSTRING(t.Txt, startPos.N, 1) IN (',', ';', ' ', ':')

-- For some reason I got duplicate values in case of ". " start position

-- You can try to remove this

AND SUBSTRING(t.Txt, startPos.N + 1, 1) NOT IN (',', ';', ' ', ':')

-- The start position has to be less than the end position

AND startPos.N < endPos.N

-- The end position has to be sess than or equal to len of text

AND endPos.N <= LEN(t.Txt)

-- End position has to be one of hte specified delimiters

AND SUBSTRING(t.Txt, endPos.N, 1) IN (',', ';', ' ', ':')

-- The text between start and end position has to be a date value

AND ISDATE(SUBSTRING(t.Txt, startPos.N + 1, endPos.N - (startPos.N + 1))) = 1

GROUP BY

t.Id,

t.Txt,

startPos.N

ORDER BY

t.Id,

startPos.N

[/font]

Greets

Flo

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 14

The TEXT column in question is that of a "log" maintained by humans. Likely as not, it has some hidden delimiters in it. Please check the data for the presence of CHAR(10) and CHAR(13) separately. Might want to throw in the additional check for CHAR(9), as well. The presence of these characters would greatly simplify this problem.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)