Error while parsing a string with " & " as XML

  • I have a snippet of T-SQL code that I use as a function. All it does is, take a string that has several phrases separated by a predefined separator. The goal is to parse each of the phrases out and insert them into a table as separate rows. The table also has an ID column, which is an identity.

    BEGIN

    DECLARE @UnparsedString VARCHAR(MAX) = 'This|is|a|short|sentence'

    , @Separator VARCHAR(1) = '|'

    DECLARE @ParsedDataSet TABLE([PhraseID] INT IDENTITY, [Phrase] VARCHAR(MAX))

    DECLARE @XMLString XML

    SELECT @XMLString = CONVERT(XML, SQL_TEXT)

    FROM (

    SELECT '<root><item>'

    + REPLACE(@UnparsedString, @Separator, '</item><item>')

    + '</item></root>' AS SQL_TEXT

    ) DerTab

    INSERT INTO @ParsedDataSet([Phrase])

    SELECT t.col.query('.').value('.', 'VARCHAR(1000)') AS [Phrase]

    FROM @XMLString.nodes('root/item') t(col)

    SELECT * FROM @ParsedDataSet

    END

    My code works absolutely fine as long as I don't use any special characters. When I try to add a phrase that has an ampersand, it throws an error.

    When an ampersand is used without any spaces around, say for e.g.

    DECLARE @UnparsedString VARCHAR(MAX) = 'This|is|a|short&Sweet|sentence')

    I get the following error...

    Msg 9411, Level 16, State 1, Line 10

    XML parsing: line 1, character 70, semicolon expected

    When an ampersand is used with spaces around, say for e.g.

    DECLARE @UnparsedString VARCHAR(MAX) = 'This|is|a|short & Sweet|sentence')

    I get the following error...

    Msg 9421, Level 16, State 1, Line 10

    XML parsing: line 1, character 66, illegal name character

    How can I avoid both these situations and parse my input string? Your help and advise is grately appreciated.

    SQLCurious

  • Try this

    Edits: Had to use a screen shot to get the coded ampersand to render as text!

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • You may want to read this article: http://www.sqlservercentral.com/articles/Tally+Table/72993/

  • Thanks, @Phil Parkin

  • Thanks, @Lynn Pettis. That's a wonderful article.

  • SQLCurious (6/1/2016)


    When an ampersand is used without any spaces around, say for e.g.

    DECLARE @UnparsedString VARCHAR(MAX) = 'This|is|a|short&Sweet|sentence')

    I get the following error...

    Msg 9411, Level 16, State 1, Line 10

    XML parsing: line 1, character 70, semicolon expected

    It's invalid XML.

    Therefore you get an error trying to parse it.

    You'll get an error too if you try to parse this string:

    DECLARE @UnparsedString VARCHAR(MAX) = 'Numbera|compared:|2>1')

    Error message will be different, but the reason the same - incorrectly formed XML.

    _____________
    Code for TallyGenerator

  • You should be very careful using xml for splitting or merging texts as there are many exceptions to deal with in xml. Some of the problems can be avoided using escape sequences like the &, > and < sequences. But if your texts can contain low or high ascii/unicode values (for example non-printable characters with ascii codes below 32) you're in for a lot more fun: these characters can't be escaped in (MS SQL's) xml. Your connection will be doomed if you try to convert any text with such characters in it into an xml-type.

    Better use the string splitting algorithms provided in Jeff's articles pointed to in earlier responses, as these do not require any 'magic characters' that need escaping.

    By the way, a way to get any characters escaped easier than using replace() is to have sql server do it for you like this:

    declare @s-2 nvarchar(1000) = '< > &';

    select (select @s-2 as [text()] for xml path(''))

    The result is:

    < > &



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply