T-SQL --> XML; dealing with non-ASCII chars

  • We have several tables which we are retrieving and returning as an unnormalized block of XML. There are several stored procedures (SP) which are already set up as standard and one SP that is run using a dynamically built and parameter passed WHERE statement.

    I have run some of these against our test tables and have had it bomb because some fields contain ASCII chars 0-30 e.g. (0x0002)

    We cannot know which tables might contain these, nor which rows or columns as we have multiple clients using these against their own databases which they may have populated from inconsistant sources.

    Our SQL is built using several nested SELECTS (to cause de-normalisation) each with a FOR XML statement.

    We are returned the following when it bombs...

    "FOR XML could not serialize the data for node 'PremisesName' because it contains a character (0x0002) which is not allowed in XML."

    We would like to recognise any such character and replace them in the SELECTS if this possible.

    If this is not possible without bloating our SPs then do we have any other options?

    Many thanks for any advice.

  • Would you mind sharing an example so we have something to play with?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • This is a cut down version of the code....

    The parameters include the dynamically built WHERE clause and a few of the switches to determine whether to return a table or not. There are a lot of tables so I've cut it right down. Any one of the non-keyed text fields could potentially contain Unicode chars 0-30. In our cases that we have found so far we have 0x-0001 in one of the fields.

    And get the following message...

    System.Data.SqlClient.SqlException: FOR XML could not serialize the data for node 'PremisesName' because it contains a character (0x0002) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data

    ALTER PROCEDURE [dbo].[xxxxx]

    @WhereClause AS VARCHAR(MAX),

    @ClientSwitch AS BIT,

    @ActionDiarySwitch AS BIT,

    @ActionDetailSwitch AS BIT,

    AS

    DECLARE @sql AS VARCHAR(MAX)

    SET @sql = 'SELECT SRequest.SRequestId FROM dbo.vw_SRequest SRequest

    LEFT JOIN vw_LinkSRequestNameAddress LinkSRequestNameAddress

    ON SRequest.SRequestId = LinkSRequestNameAddress.SRequestId

    LEFT JOIN vw_NameAddress NameAddress

    ON LinkSRequestNameAddress.NameAddressId = NameAddress.NameAddressId

    LEFT JOIN vw_LinkSRequestActionDiary LinkSRequestActionDiary

    ON SRequest.SRequestId = LinkSRequestActionDiary.SRequestId

    LEFT JOIN vw_LinkSRequestActionDiary SRequestActionDiary

    ON SRequest.SRequestId = SRequestActionDiary.SRequestId

    LEFT JOIN vw_ActionDiary ActionDiary

    ON SRequestActionDiary.ActionDiaryId = ActionDiary.ActionDiaryId

    LEFT JOIN vw_ActionDetail ActionDetail

    ON ActionDiary.ActionDiaryId = ActionDetail.ActionDiaryId

    LEFT JOIN vw_LinkSRequestScheduleWork LinkSRequestScheduleWork

    ON SRequest.SRequestId = LinkSRequestScheduleWork.SRequestId

    LEFT JOIN vw_ScheduleWork ScheduleWork

    ON LinkSRequestScheduleWork.ScheduleWorkId = ScheduleWork.ScheduleWorkId

    LEFT JOIN vw_ScheduleWorkDetail ScheduleWorkDetail

    ON ScheduleWork.ScheduleWorkId = ScheduleWorkDetail.ScheduleWorkId ' +

    + @Whereclause

    + ' GROUP BY SRequest.SRequestId'

    DECLARE @result TABLE

    (

    SRequestId CHAR(6)

    )

    INSERT INTO @result EXEC(@SQL)

    SELECT SRequest.SRequestId,

    LARef,

    ReceivedDate,

    PremisesId, PremisesName, PremisesSearchName

    HouseName, HouseNo,

    AddressLine1, AddressLine2, AddressLine3, Postcode,

    UPRN, AddressStatus,

    TelephoneNo, FaxNo, MobilePhoneNo,

    EmailAddress, WebAddress,

    WardCode, WardDescription, AreaCode, AreaDescription,

    Easting, Northing, GridRef,

    --

    CASE WHEN @RootTextSwitch = 1

    THEN SRequestText.SRequestText END AS "SRequestText/SRequestText",

    --

    --SELECT EACH CLIENT ENTRY FOR THE NAMEADDRESS / SREQUEST IDS

    (

    SELECT SeqNo,

    ReceivedDate,

    OtherSRequestsFlag,

    HowReceivedCode,

    HowReceivedDescription,

    VIPType,

    VIPName,

    AccessDetail

    FROM dbo.vw_SRequestClient SRequestClient

    WHERE SRequest.SRequestId = SRequestClient.SRequestId AND @ClientSwitch = 1

    FOR XML PATH('SRequestClient'), TYPE

    ),

    --

    --SELECT THE ACTION DIARY ENTRY

    (

    SELECT ActionDiary.ActionDiaryId,

    ActionTypeCode,

    ActionTypeDescription,

    UnitCode,

    --

    CASE WHEN @ActionDiaryTextSwitch = 1

    THEN ActionDiaryText.ActionDiaryText END AS "ActionDiaryText/ActionDiaryText" ,

    --

    --SELECT EACH ACTION DETAIL ENTRY FOR THIS ACTION DIARY

    (

    SELECT ActionDetail.ActionDetailId,

    ActionDetailCode,

    ActionDetailDescription,

    OfficerCode,

    OfficerDescription,

    --

    CASE WHEN @ActionDetailTextSwitch = 1

    THEN ActionDetailText.ActionDetailText END AS "ActionDetailText/ActionDetailText"

    FROM dbo.vw_ActionDetail ActionDetail

    JOIN dbo.vw_ActionDetailText ActionDetailText

    ON ActionDetail.ActionDiaryId = ActionDiary.ActionDiaryId AND ActionDetail.ActionDetailId = ActionDetailText.ActionDetailId

    WHERE ActionDiary.ActionDiaryId = LinkActionDiary.ActionDiaryId AND @ActionDetailSwitch = 1

    FOR XML PATH('ActionDetail'), TYPE

    )

    FROM dbo.vw_ActionDiary ActionDiary

    JOIN dbo.vw_ActionDiaryText ActionDiaryText ON ActionDiaryText.ActionDiaryId = ActionDiary.ActionDiaryId

    WHERE ActionDiary.ActionDiaryId = LinkActionDiary.ActionDiaryId AND @ActionDiarySwitch = 1

    FOR XML PATH('ActionDiary'), TYPE

    ),

    [/endcode]

  • Without table definition and sample data provided as per the first link in my signature we can't really help you.

    When I asked for an example I was talking about sample data to duplicate the error you recieve.

    Something like

    CREATE TABLE #temp()

    INSERT INTO SELECT UNION ALL

    SELECT ... FROM #temp FOR XML PATH(...), TYPE



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • We have encountered the same problem.

    You will have to scrub the data stored in all of your text (char, nchar, varchar, nvarchar, etc.) columns to eliminate all invalid XML characters.

    Valid XML characters for UTF-16 (2-byte) that is supported by .Net:

    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]

    Unicode:

    [\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]

    The scrubbing will need to be performed by the application that provides the data to the database. It can be done via a Regular Expression that either removes the invalid XML characters or replaces them with a blank.

    Regular Expression to detect invalid XML:

    PowerShell: [regex] '[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]'

    C#: Regex rgxInvalidXML = new Regex(@"[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]")

    Note that there are numerous Regular Expressions for this issue floating around on the internet that are wrong! I.e., they do not properly contain the correct set of characters. The above logic has been coded and implemented to a.) correct existing data and b.) used by the application to prevent the bad data from getting in.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • Yes, thank you very much. That is exactly the route we have gone down by wrapping a function round each such field that converts. This was going to involve a follow up question r.e. the regex, as we were having issues with structuring it (and had resorted to writing a CLR .NET function). Your post has answered it and thrown up something we were not taking account of (0009, 000A, 000D).

    Many thanks for your input. Perfect advice.

  • Thanks for the code, it works great however I did have to add the vertical tab to the regex.

    [^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD\u1011]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply