December 18, 2006 at 5:06 pm
All,
here's the problem.
I have a text/blob field that has multiple entries that are pseudo date/time stamped.
I have been trying to use this as an delimiter and in line with a split parser, but am not having much luck.
I need to break out each text entry per record id and migrate to a new table.
Any ideas on how to attack?
December 19, 2006 at 9:45 pm
If you could post a (smallish, representative) sample of your text data and roughly what you might want from it, you'd probably be inundated with tips. It's a bit difficult working blind.
December 20, 2006 at 9:09 am
The datafield is similar to this:
14:57:00 12/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed 12:57:00 11/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed
Part of the problem is that the application is doing rtf formating and then storing that formating with the text in the field. The other rub is that the person doing the notation is not completely standardized. So the general theme is near the same, but not exact, so there's lots of extra characters, carriage returns, escape characters, etc.
As a result simple parsing using the time date stamp as aa delimiter produces varied results.
After looking at the data, its going to have to be scrubbed and then standardized before its parsed out.
The request for this was made without awareness of the state of the data and the expectation that it was a very simple operation.
However, if you have any examples of string parsing with options to parse out any non standard characters that would be awesome.
December 20, 2006 at 2:49 pm
Here is some sample code that I could think of ... this is just to get you an idea. you need to put this code in WHILE loop inorder to process the complete string. I hope "(pacific)" is ths string that would appear after each date.
DECLARE
@val varchar(4000)
DECLARE
@pos1 int
DECLARE
@pos2 int
SET
@val = '14:57:00 12/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed 12:57:00 11/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed '
SET
@pos1 = CHARINDEX('(Pacific)', @val) + 19
SET
@pos2 = CHARINDEX('(Pacific)', @val, @pos1)
@pos1
@pos2
SELECT
substring(@val, @pos1, (@pos2 - @pos1))
in your WHILE loop @pos2 becomes @pos1 this way you can parse it easily.
December 20, 2006 at 3:21 pm
so u want to split the record:
14:57:00 12/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed 12:57:00 11/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed
into two records under the same ID:
14:57:00 12/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed
12:57:00 11/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed
is that the idea? the reason I'm asking is that u never clearly specified what to include in the splits.
i'm assuming u use SQL 2005.
December 20, 2006 at 3:25 pm
yes, thats the idea
December 20, 2006 at 3:34 pm
Yes, all that my code do is it splits the data based on keyword. I just gave an idea on to achieve the splitting of data.
I'm using SQL2005, but I presume the same code will work in SQL2000 since the string functions are almost similar in both versions.
December 20, 2006 at 4:52 pm
yes, but the problem is Max does not know what keyword will be there up front: his data could vary. What he needs is to split on a pattern
[date] [some_text]
, which justifies using regular expressions in this case.
Max did u ever use regular expressions in SQL 2005? That's the way to handle the task.
December 20, 2006 at 4:54 pm
kkprasad,
just tried your code, it returns
'king note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed 12:57:00 11/01/06'
is that what u expected, or i'm doing smth wrong here?
December 20, 2006 at 4:55 pm
You don't need to use regular expressions, PATINDEX would do the trick (and also allow for other time zones' dates in the data). Starting position can be determined by :
select @pos1 = patindex('%[0-9][0-9]:[0-9][0-9]:[0-9][0-9] [0-9][0-9]/[0-9][0-9]/[0-9][0-9] (%', @val)
December 20, 2006 at 5:00 pm
The pity is, of course, that PATINDEX only takes 2 arguments (so no "starting from" displacement) so that to get both @pos1 and @pos2 you have to do something ugly like:
select @pos1 = patindex('%[0-9][0-9]:[0-9][0-9]:[0-9][0-9] [0-9][0-9]/[0-9][0-9]/[0-9][0-9] (%', @val)
,@pos2 = patindex('%[0-9][0-9]:[0-9][0-9]:[0-9][0-9] [0-9][0-9]/[0-9][0-9]/[0-9][0-9] (%', substring(@val, 2, len(@val)-1))
And i'm sure there are neater ways of doing this, but yuou get the idea.
December 20, 2006 at 5:02 pm
a hidden issue that complicates things, is that the application is using rtf so there might be hidden characters, formatting that impacts the delimiter
The data field needs to be scrubbed down and reformatted then parsed
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply