Split String and ordering

  • So I've been tasked with fixing an issue with a bit of code that I'm a little stumped on how to accomplish.

    The set up is this: The current design is setup so that users enter a "note" that is saved into a varchar(max) column. The note has some regular features: there are several "categories" and each category will have several questions, with each question followed by an answer. The problem bit of code sends this text to an SSRS report, but tries to split up the data into a temporary table first. The problem is that the table function that handles the split string doesn't always preserve the order of the original text, which is what I've been asked to try to do.

    This is the current code:

    (@noteid int)

    Declare @Temp1 Varchar(max)

    Set @Temp1 =

    (SELECT

    Convert(varchar(max),

    'C="') +

    Convert(varchar(max),

    Replace(Replace(

    Replace(Convert(varchar(max), note_text),

    char(13)+char(10)+char(9)+char(9), '"\par | A="'),

    char(13)+char(10)+char(9), '"\par | Q="'),

    char(13)+char(10), '"\par | C="'))

    FROM Notes

    WHERE note_id = @noteid)

    --Print @Temp1

    Declare @Temp2 Varchar(max)

    Set @Temp2 = Replace(@Temp1,'C=\par |','') --+ ' EndOfNote'

    Declare @Temp3 Varchar(max)

    Set @Temp3 = Replace((Replace((Replace((Replace(Replace(@Temp2,'\par',''),'C="" |','')),'"','')),'| ', '|')),'| ', '|')

    If Object_id('tempdb..#notetext') is not null drop table #notetext

    Create Table #notetext

    ( int Identity(1,1) Not null, note_id int, [format] char(1), [value] varchar(max))

    Insert Into #notetext

    Select

    @noteid

    ,Left(stringname,1)

    ,substring(stringname,3,Len(stringname))

    From dbo.fn_splitstring(@Temp3, '|') Where stringname <> ''

    Select * From #notetext

    order by

    fn_splitstring winds up calling a piece of .NET code:

    return new string(DelimitedString.Value).Split( new string(Delimeter.Value).ToCharArray());

    Unfortunately, I'm more or less restricted with either a) working with the original code above to force it to always be in order (which I'm not sure is possible) or b) develop a new split string function that will preserve the original order of the text (which I'm not necessarily that confident about, either).

    Any suggestions?

  • Yes... step 1 would be to post some data in a readily consumable format. That will do two things... it'll let us see the type of data you're trying to split and it will make it easy for us to build and test possible solutions. Please see the following article for how to do that properly.

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    If you post data in that fashion, most people will jump at the chance to help you and you'll also get much better answers.

    Shifting gears.... Passing order sensitive information is pretty much a no-no in any database. That, not withstanding, take a look at the following article for how to do such a thing using only T-SQL.

    http://www.sqlservercentral.com/articles/T-SQL/63003/

    Be advised that there is a slight performance penalty to be had when working the Tally table method in that article when working against VARCHAR(MAX). As much as I hate to admit it, passing/splitting XML or using a very tight and well formed While loop may suite you better when splitting VARCHAR(MAX) parameters.

    If you post (or attach) some readily consumable data as previously mentioned, we can probably show you a couple of high speed T-SQL only methods to pull this off very successfully.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you for the advise. I'm looking through the article you listed atm.

    I was a little hesitant to post example data because of how long some of the examples are, and because the data is medical information. Granted, the data below is from a development database with no real patient information, but old habits die hard. 🙂

    Some sample data:

    --drop table if exists

    IF OBJECT_ID('TempDB..#Notes','U') IS NOT NULL

    DROP TABLE #Notes

    --Create table

    create table #Notes

    (note_id int, note_text varchar(max))

    --Insert Data

    insert into #Notes

    SELECT '1323261','MEDICAL TREATMENT

    WERE WOUND MEASUREMENTS OBTAINED AND DOCUMENTED INSIDE THE ASSESSMENT?

    YES

    ' UNION ALL

    SELECT '1327115','MEDICAL TREATMENT

    WAS PROCEDURE COMPLETED?

    YES

    ' UNION ALL

    SELECT '1327116','MEDICAL TREATMENT

    WAS LAB PROCEDURE COMPLETED?

    YES

    ' UNION ALL

    SELECT '1327118','MEDICAL TREATMENT

    WAS WEIGHT/MEASUREMENTS OBTAINED?

    YES

    ' UNION ALL

    SELECT '1327119','MEDICAL TREATMENT

    WAS IV/SITE CARE PERFORMED?

    YES

    ' UNION ALL

    SELECT '1327120','MEDICAL TREATMENT

    WAS PROCEDURE COMPLETED?

    YES

    ' UNION ALL

    SELECT '1327122','MEDICAL TREATMENT

    WAS WEIGHT/MEASUREMENTS OBTAINED?

    YES

    ' UNION ALL

    SELECT '1327123','MEDICAL TREATMENT

    WAS WOUND CARE PERFORMED?

    YES

    ' UNION ALL

    SELECT '1327125','MEDICAL TREATMENT

    WAS WOUND CARE PERFORMED?

    YES

    ' UNION ALL

    SELECT '1326832','HOMEBOUND STATUS

    INDICATE REASONS PATIENT IS HOMEBOUND: (MARK ALL THAT APPLY)

    SHORTNESS OF BREATH

    UNABLE TO LEAVE HOME'

    I went ahead and limited the returned data to 128 characters for the examples; that's still enough to get a single category, question, and answer. Sadly, it's not enough to demonstrate the problem, where a given "question" and/or "answer" row will wind up out of order from the original text string. A lot of the notes are much longer, though, going as high as about 25,000 characters, and it appears that it's the longer notes that have that trouble. (I can post a long example if it would be useful, but it seemed potentially a tad rude.)

    The more I look at this, the more I'm hoping I can find a consistent way to use XML to get the data in order.

  • Thanks. You can always attach longer data as a text file when you post (instead of listing it as part of the body of a post).

    What I'm trying to figure out is something to break the questions from the answers. I see the "?" on some of the questions but on "list" type questions, I'm not seeing much in the line of a delimiter except maybe the CrLf at the end of a line.

    I think what we need to see is what the original text looks like before you do all of the replacements for things like "\par".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The attached text file is the text of the long example I've been using that I think should give you a better idea of the overall formatting.

  • Just one final thing and then I may have enough to go on... can you post the table creation statement for where the split data will be inserted into? Mabe even post a couple of rows from that table so we can get a better idea? Of course, nothing personal, private, or confidential, please.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That temp table at the end of the original code is actually the only table it gets inserted into. The proc is used by an SSRS report, the results are fed into the report directly.

    Looking at the SSRS report, it looks like the report itself doesn't transform the data. It uses the [format] column (which will be "C", "Q" or "A") to determine the formatting of each row, but otherwise just uses the value text as is

  • Ah! Got it! My apologies but that brings up one final question... can you mark up the following lines with what the NoteID should be, please? OR... would they all have the same NoteID according to a passed in value of @NoteID? Thanks...

    CLIENT DEMOGRAPHICS

        (OBQI)(M0100) THIS ASSESSMENT IS CURRENTLY BEING COMPLETED FOR THE FOLLOWING REASON:

            1 - START OF CARE - FURTHER VISITS PLANNED

     

        SELECT THE PATIENT''S MARITAL STATUS:

            MARRIED

     

        (M0140) RACE/ETHNICITY (AS IDENTIFIED BY PATIENT): (MARK ALL THAT APPLY)

            6 - WHITE

     

        INDICATE THE PATIENT''S RELIGION:

            CHRISTIAN

     

    HOMEBOUND STATUS

        INDICATE REASONS PATIENT IS HOMEBOUND: (MARK ALL THAT APPLY)

            SHORTNESS OF BREATH

     

            DIFFICULTY TRANSFERRING

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, the Note_ID is just the @NoteID value. According to the notes I have, it's because of how the report uses it as a parameter, so it needs to be in the result set, even though it'll always be the same.

  • jwhisman (9/26/2009)


    Yes, the Note_ID is just the @NoteID value. According to the notes I have, it's because of how the report uses it as a parameter, so it needs to be in the result set, even though it'll always be the same.

    Thanks... almost done... I'll be back shortly. You'll like this a lot.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you for the help; I look forward to seeing it.

    I decided to see what would happen if I converted the concept in the original proc to be XML. The below code works (i.e., it fixes my example), but I'm not 100% sure it fixes the actual problem.

    (@NoteID int)

    DECLARE @charactercleanup varchar(max), @ToXMLFormat varchar(max), @CloseXML varchar(max),

    @NoteXML xml, @xmlDocHandle INT

    Declare @NoteText table ( int Identity(1,1) Not null, note_id int, [format] char(1), [value] varchar(max))

    --Clean up characters that could cause XML to error

    select @charactercleanup =

    replace(replace(replace(replace(note_text, '"', '"'), '&', '&'), '', '>')

    from NOTES where note_id = @NoteID

    --Convert text to XML style format

    select @ToXMLFormat =

    (SELECT

    Convert(varchar(max),

    ' <R Format="C" Value="')+Convert(varchar(max),

    '')))

    --Close off XML and remove extra "C" rows

    select @CloseXML =

    replace(Replace(@ToXMLFormat,'<R Format="C" Value="',''),

    '', '')

    --Cast string as XML

    select @NoteXML = CAST(@CloseXML as xml)

    --Insert values into table variable

    EXEC sp_xml_preparedocument @xmlDocHandle OUTPUT, @NoteXML;

    Insert Into @NoteText

    Select

    @NoteID

    ,Format

    ,Value

    FROM OPENXML(@xmlDocHandle, N'/ROOT/R', 1)

    WITH (Format varchar(1), Value varchar(max));

    EXEC sp_xml_removedocument @xmlDocHandle

    --Return results

    select , cevn_cevid, [format], [value]

    from @NoteText

    order by

    EDIT: Er, except, wow, the forum editor apparently really didn't like that.

  • Heh... the only thing I know about XML is that it uses a fair bit of memory, clogs the pipe with extra data, and is relatively slow.

    Give the following a whirl... I know that it will do things in the proper order...

    DECLARE @NoteText VARCHAR(MAX),

            @NoteID   INT

        SET @NoteID   = 1 --This would probably be a parameter in a proc

     

     CREATE TABLE #NoteText

            (

            I       INT IDENTITY(1,1),

            Note_ID INT,

            Format  CHAR(1),

            VALUE   VARCHAR(MAX)

            )

    ;

    --===== This gets the note and prepares it for a very simple split method

     SELECT @NoteText = REPLACE(REPLACE(Note_Text,CHAR(13)+CHAR(10),CHAR(10)),CHAR(10)+CHAR(10),CHAR(10)) 

       FROM dbo.Notes

      WHERE Note_ID   = @NoteID

    ;

    WITH --This builds a Tally table on the fly

          L0 (N) AS 

                 (

                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                 SELECT 1 

                 ),                                                   --10

          L1 (N) AS (SELECT 1 FROM L0 a CROSS JOIN L0 b),            --100

          L2 (N) AS (SELECT 1 FROM L1 a CROSS JOIN L1 b),            --10,000

          L3 (N) AS (SELECT 1 FROM L2 a CROSS JOIN L2 b),            --100,000,000

    cteTally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM L3), --Add row numbers 

    cteSplit     AS

    (--==== This splits the data based on the position of the LineFeeds that remain

     SELECT t.N,

            SUBSTRING(@NoteText,t.N,CHARINDEX(CHAR(10),@NoteText+CHAR(10),t.N)-t.N) AS Value

       FROM cteTally t

      WHERE t.N < LEN(@NoteText)+1

        AND SUBSTRING(CHAR(10)+@NoteText,t.N,1) = CHAR(10)

    )

    --===== This inserts the data into a temp table in the proper order,

         -- with the proper labels, and removes the Tab characters...

    INSERT INTO #NoteText (Note_ID,Format,Value)

    SELECT @NoteID AS Note_ID,

    CASE

                WHEN LEFT(s.Value,2) = CHAR(9)+CHAR(9) THEN 'A'

                WHEN LEFT(s.Value,1) = CHAR(9) THEN 'Q'

                ELSE 'C'

            END AS Format,

            REPLACE(s.Value,CHAR(9),'') AS Value

       FROM cteSplit s

      WHERE LEN(s.Value) > 0

      ORDER BY s.N

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry, I'd forgotten to populate the Note_ID column in the Temp Table... I've added it to the code above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wow, thank you, Jeff. I'd been trying to think how to use your Tally example, but hadn't quite wrapped my head around how to do it in a case like this.

    Yeah, I can't make up my mind how I feel about XML. Some folks in the office swear by it and some can't stand it, of course.

  • jwhisman (9/27/2009)


    Wow, thank you, Jeff. I'd been trying to think how to use your Tally example, but hadn't quite wrapped my head around how to do it in a case like this.

    Yeah, I can't make up my mind how I feel about XML. Some folks in the office swear by it and some can't stand it, of course.

    Heh... I just swear at it. Considering that there's normally a 16 to 1 bloat factor, I hate what it does to the pipe, if nothing else.

    If you get a chance, let me know what the final outcome of this problem is. I'm always interested in final outcomes so I can tweek what I post for other folks in the future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply