September 26, 2009 at 9:38 am
So I've been tasked with fixing an issue with a bit of code that I'm a little stumped on how to accomplish.
The set up is this: The current design is setup so that users enter a "note" that is saved into a varchar(max) column. The note has some regular features: there are several "categories" and each category will have several questions, with each question followed by an answer. The problem bit of code sends this text to an SSRS report, but tries to split up the data into a temporary table first. The problem is that the table function that handles the split string doesn't always preserve the order of the original text, which is what I've been asked to try to do.
This is the current code:
(@noteid int)
Declare @Temp1 Varchar(max)
Set @Temp1 =
(SELECT
Convert(varchar(max),
'C="') +
Convert(varchar(max),
Replace(Replace(
Replace(Convert(varchar(max), note_text),
char(13)+char(10)+char(9)+char(9), '"\par | A="'),
char(13)+char(10)+char(9), '"\par | Q="'),
char(13)+char(10), '"\par | C="'))
FROM Notes
WHERE note_id = @noteid)
--Print @Temp1
Declare @Temp2 Varchar(max)
Set @Temp2 = Replace(@Temp1,'C=\par |','') --+ ' EndOfNote'
Declare @Temp3 Varchar(max)
Set @Temp3 = Replace((Replace((Replace((Replace(Replace(@Temp2,'\par',''),'C="" |','')),'"','')),'| ', '|')),'| ', '|')
If Object_id('tempdb..#notetext') is not null drop table #notetext
Create Table #notetext
( int Identity(1,1) Not null, note_id int, [format] char(1), [value] varchar(max))
Insert Into #notetext
Select
@noteid
,Left(stringname,1)
,substring(stringname,3,Len(stringname))
From dbo.fn_splitstring(@Temp3, '|') Where stringname <> ''
Select * From #notetext
order by
fn_splitstring winds up calling a piece of .NET code:
return new string(DelimitedString.Value).Split( new string(Delimeter.Value).ToCharArray());
Unfortunately, I'm more or less restricted with either a) working with the original code above to force it to always be in order (which I'm not sure is possible) or b) develop a new split string function that will preserve the original order of the text (which I'm not necessarily that confident about, either).
Any suggestions?
September 26, 2009 at 11:16 am
Yes... step 1 would be to post some data in a readily consumable format. That will do two things... it'll let us see the type of data you're trying to split and it will make it easy for us to build and test possible solutions. Please see the following article for how to do that properly.
http://www.sqlservercentral.com/articles/Best+Practices/61537/
If you post data in that fashion, most people will jump at the chance to help you and you'll also get much better answers.
Shifting gears.... Passing order sensitive information is pretty much a no-no in any database. That, not withstanding, take a look at the following article for how to do such a thing using only T-SQL.
http://www.sqlservercentral.com/articles/T-SQL/63003/
Be advised that there is a slight performance penalty to be had when working the Tally table method in that article when working against VARCHAR(MAX). As much as I hate to admit it, passing/splitting XML or using a very tight and well formed While loop may suite you better when splitting VARCHAR(MAX) parameters.
If you post (or attach) some readily consumable data as previously mentioned, we can probably show you a couple of high speed T-SQL only methods to pull this off very successfully.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 26, 2009 at 12:51 pm
Thank you for the advise. I'm looking through the article you listed atm.
I was a little hesitant to post example data because of how long some of the examples are, and because the data is medical information. Granted, the data below is from a development database with no real patient information, but old habits die hard. 🙂
Some sample data:
--drop table if exists
IF OBJECT_ID('TempDB..#Notes','U') IS NOT NULL
DROP TABLE #Notes
--Create table
create table #Notes
(note_id int, note_text varchar(max))
--Insert Data
insert into #Notes
SELECT '1323261','MEDICAL TREATMENT
WERE WOUND MEASUREMENTS OBTAINED AND DOCUMENTED INSIDE THE ASSESSMENT?
YES
' UNION ALL
SELECT '1327115','MEDICAL TREATMENT
WAS PROCEDURE COMPLETED?
YES
' UNION ALL
SELECT '1327116','MEDICAL TREATMENT
WAS LAB PROCEDURE COMPLETED?
YES
' UNION ALL
SELECT '1327118','MEDICAL TREATMENT
WAS WEIGHT/MEASUREMENTS OBTAINED?
YES
' UNION ALL
SELECT '1327119','MEDICAL TREATMENT
WAS IV/SITE CARE PERFORMED?
YES
' UNION ALL
SELECT '1327120','MEDICAL TREATMENT
WAS PROCEDURE COMPLETED?
YES
' UNION ALL
SELECT '1327122','MEDICAL TREATMENT
WAS WEIGHT/MEASUREMENTS OBTAINED?
YES
' UNION ALL
SELECT '1327123','MEDICAL TREATMENT
WAS WOUND CARE PERFORMED?
YES
' UNION ALL
SELECT '1327125','MEDICAL TREATMENT
WAS WOUND CARE PERFORMED?
YES
' UNION ALL
SELECT '1326832','HOMEBOUND STATUS
INDICATE REASONS PATIENT IS HOMEBOUND: (MARK ALL THAT APPLY)
SHORTNESS OF BREATH
UNABLE TO LEAVE HOME'
I went ahead and limited the returned data to 128 characters for the examples; that's still enough to get a single category, question, and answer. Sadly, it's not enough to demonstrate the problem, where a given "question" and/or "answer" row will wind up out of order from the original text string. A lot of the notes are much longer, though, going as high as about 25,000 characters, and it appears that it's the longer notes that have that trouble. (I can post a long example if it would be useful, but it seemed potentially a tad rude.)
The more I look at this, the more I'm hoping I can find a consistent way to use XML to get the data in order.
September 26, 2009 at 1:29 pm
Thanks. You can always attach longer data as a text file when you post (instead of listing it as part of the body of a post).
What I'm trying to figure out is something to break the questions from the answers. I see the "?" on some of the questions but on "list" type questions, I'm not seeing much in the line of a delimiter except maybe the CrLf at the end of a line.
I think what we need to see is what the original text looks like before you do all of the replacements for things like "\par".
--Jeff Moden
Change is inevitable... Change for the better is not.
September 26, 2009 at 2:02 pm
The attached text file is the text of the long example I've been using that I think should give you a better idea of the overall formatting.
September 26, 2009 at 2:16 pm
Just one final thing and then I may have enough to go on... can you post the table creation statement for where the split data will be inserted into? Mabe even post a couple of rows from that table so we can get a better idea? Of course, nothing personal, private, or confidential, please.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 26, 2009 at 2:38 pm
That temp table at the end of the original code is actually the only table it gets inserted into. The proc is used by an SSRS report, the results are fed into the report directly.
Looking at the SSRS report, it looks like the report itself doesn't transform the data. It uses the [format] column (which will be "C", "Q" or "A") to determine the formatting of each row, but otherwise just uses the value text as is
September 26, 2009 at 2:51 pm
Ah! Got it! My apologies but that brings up one final question... can you mark up the following lines with what the NoteID should be, please? OR... would they all have the same NoteID according to a passed in value of @NoteID? Thanks...
CLIENT DEMOGRAPHICS
(OBQI)(M0100) THIS ASSESSMENT IS CURRENTLY BEING COMPLETED FOR THE FOLLOWING REASON:
1 - START OF CARE - FURTHER VISITS PLANNED
SELECT THE PATIENT''S MARITAL STATUS:
MARRIED
(M0140) RACE/ETHNICITY (AS IDENTIFIED BY PATIENT): (MARK ALL THAT APPLY)
6 - WHITE
INDICATE THE PATIENT''S RELIGION:
CHRISTIAN
HOMEBOUND STATUS
INDICATE REASONS PATIENT IS HOMEBOUND: (MARK ALL THAT APPLY)
SHORTNESS OF BREATH
DIFFICULTY TRANSFERRING
--Jeff Moden
Change is inevitable... Change for the better is not.
September 26, 2009 at 3:08 pm
Yes, the Note_ID is just the @NoteID value. According to the notes I have, it's because of how the report uses it as a parameter, so it needs to be in the result set, even though it'll always be the same.
September 26, 2009 at 3:23 pm
jwhisman (9/26/2009)
Yes, the Note_ID is just the @NoteID value. According to the notes I have, it's because of how the report uses it as a parameter, so it needs to be in the result set, even though it'll always be the same.
Thanks... almost done... I'll be back shortly. You'll like this a lot.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 26, 2009 at 4:01 pm
Thank you for the help; I look forward to seeing it.
I decided to see what would happen if I converted the concept in the original proc to be XML. The below code works (i.e., it fixes my example), but I'm not 100% sure it fixes the actual problem.
(@NoteID int)
DECLARE @charactercleanup varchar(max), @ToXMLFormat varchar(max), @CloseXML varchar(max),
@NoteXML xml, @xmlDocHandle INT
Declare @NoteText table ( int Identity(1,1) Not null, note_id int, [format] char(1), [value] varchar(max))
--Clean up characters that could cause XML to error
select @charactercleanup =
replace(replace(replace(replace(note_text, '"', '"'), '&', '&'), '', '>')
from NOTES where note_id = @NoteID
--Convert text to XML style format
select @ToXMLFormat =
(SELECT
Convert(varchar(max),
' <R Format="C" Value="')+Convert(varchar(max),
'')))
--Close off XML and remove extra "C" rows
select @CloseXML =
replace(Replace(@ToXMLFormat,'<R Format="C" Value="',''),
'', '')
--Cast string as XML
select @NoteXML = CAST(@CloseXML as xml)
--Insert values into table variable
EXEC sp_xml_preparedocument @xmlDocHandle OUTPUT, @NoteXML;
Insert Into @NoteText
Select
@NoteID
,Format
,Value
FROM OPENXML(@xmlDocHandle, N'/ROOT/R', 1)
WITH (Format varchar(1), Value varchar(max));
EXEC sp_xml_removedocument @xmlDocHandle
--Return results
select , cevn_cevid, [format], [value]
from @NoteText
order by
EDIT: Er, except, wow, the forum editor apparently really didn't like that.
September 26, 2009 at 4:29 pm
Heh... the only thing I know about XML is that it uses a fair bit of memory, clogs the pipe with extra data, and is relatively slow.
Give the following a whirl... I know that it will do things in the proper order...
DECLARE @NoteText VARCHAR(MAX),
@NoteID INT
SET @NoteID = 1 --This would probably be a parameter in a proc
CREATE TABLE #NoteText
(
I INT IDENTITY(1,1),
Note_ID INT,
Format CHAR(1),
VALUE VARCHAR(MAX)
)
;
--===== This gets the note and prepares it for a very simple split method
SELECT @NoteText = REPLACE(REPLACE(Note_Text,CHAR(13)+CHAR(10),CHAR(10)),CHAR(10)+CHAR(10),CHAR(10))
FROM dbo.Notes
WHERE Note_ID = @NoteID
;
WITH --This builds a Tally table on the fly
L0 (N) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1
), --10
L1 (N) AS (SELECT 1 FROM L0 a CROSS JOIN L0 b), --100
L2 (N) AS (SELECT 1 FROM L1 a CROSS JOIN L1 b), --10,000
L3 (N) AS (SELECT 1 FROM L2 a CROSS JOIN L2 b), --100,000,000
cteTally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM L3), --Add row numbers
cteSplit AS
(--==== This splits the data based on the position of the LineFeeds that remain
SELECT t.N,
SUBSTRING(@NoteText,t.N,CHARINDEX(CHAR(10),@NoteText+CHAR(10),t.N)-t.N) AS Value
FROM cteTally t
WHERE t.N < LEN(@NoteText)+1
AND SUBSTRING(CHAR(10)+@NoteText,t.N,1) = CHAR(10)
)
--===== This inserts the data into a temp table in the proper order,
-- with the proper labels, and removes the Tab characters...
INSERT INTO #NoteText (Note_ID,Format,Value)
SELECT @NoteID AS Note_ID,
CASE
WHEN LEFT(s.Value,2) = CHAR(9)+CHAR(9) THEN 'A'
WHEN LEFT(s.Value,1) = CHAR(9) THEN 'Q'
ELSE 'C'
END AS Format,
REPLACE(s.Value,CHAR(9),'') AS Value
FROM cteSplit s
WHERE LEN(s.Value) > 0
ORDER BY s.N
;
--Jeff Moden
Change is inevitable... Change for the better is not.
September 26, 2009 at 4:41 pm
Sorry, I'd forgotten to populate the Note_ID column in the Temp Table... I've added it to the code above.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 27, 2009 at 12:52 pm
Wow, thank you, Jeff. I'd been trying to think how to use your Tally example, but hadn't quite wrapped my head around how to do it in a case like this.
Yeah, I can't make up my mind how I feel about XML. Some folks in the office swear by it and some can't stand it, of course.
September 27, 2009 at 7:40 pm
jwhisman (9/27/2009)
Wow, thank you, Jeff. I'd been trying to think how to use your Tally example, but hadn't quite wrapped my head around how to do it in a case like this.Yeah, I can't make up my mind how I feel about XML. Some folks in the office swear by it and some can't stand it, of course.
Heh... I just swear at it. Considering that there's normally a 16 to 1 bloat factor, I hate what it does to the pipe, if nothing else.
If you get a chance, let me know what the final outcome of this problem is. I'm always interested in final outcomes so I can tweek what I post for other folks in the future.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply