January 25, 2010 at 9:26 am
We have several tables which we are retrieving and returning as an unnormalized block of XML. There are several stored procedures (SP) which are already set up as standard and one SP that is run using a dynamically built and parameter passed WHERE statement.
I have run some of these against our test tables and have had it bomb because some fields contain ASCII chars 0-30 e.g. (0x0002)
We cannot know which tables might contain these, nor which rows or columns as we have multiple clients using these against their own databases which they may have populated from inconsistant sources.
Our SQL is built using several nested SELECTS (to cause de-normalisation) each with a FOR XML statement.
We are returned the following when it bombs...
"FOR XML could not serialize the data for node 'PremisesName' because it contains a character (0x0002) which is not allowed in XML."
We would like to recognise any such character and replace them in the SELECTS if this possible.
If this is not possible without bloating our SPs then do we have any other options?
Many thanks for any advice.
January 25, 2010 at 12:16 pm
Would you mind sharing an example so we have something to play with?
January 26, 2010 at 3:28 am
This is a cut down version of the code....
The parameters include the dynamically built WHERE clause and a few of the switches to determine whether to return a table or not. There are a lot of tables so I've cut it right down. Any one of the non-keyed text fields could potentially contain Unicode chars 0-30. In our cases that we have found so far we have 0x-0001 in one of the fields.
And get the following message...
ALTER PROCEDURE [dbo].[xxxxx]
@WhereClause AS VARCHAR(MAX),
@ClientSwitch AS BIT,
@ActionDiarySwitch AS BIT,
@ActionDetailSwitch AS BIT,
AS
DECLARE @sql AS VARCHAR(MAX)
SET @sql = 'SELECT SRequest.SRequestId FROM dbo.vw_SRequest SRequest
LEFT JOIN vw_LinkSRequestNameAddress LinkSRequestNameAddress
ON SRequest.SRequestId = LinkSRequestNameAddress.SRequestId
LEFT JOIN vw_NameAddress NameAddress
ON LinkSRequestNameAddress.NameAddressId = NameAddress.NameAddressId
LEFT JOIN vw_LinkSRequestActionDiary LinkSRequestActionDiary
ON SRequest.SRequestId = LinkSRequestActionDiary.SRequestId
LEFT JOIN vw_LinkSRequestActionDiary SRequestActionDiary
ON SRequest.SRequestId = SRequestActionDiary.SRequestId
LEFT JOIN vw_ActionDiary ActionDiary
ON SRequestActionDiary.ActionDiaryId = ActionDiary.ActionDiaryId
LEFT JOIN vw_ActionDetail ActionDetail
ON ActionDiary.ActionDiaryId = ActionDetail.ActionDiaryId
LEFT JOIN vw_LinkSRequestScheduleWork LinkSRequestScheduleWork
ON SRequest.SRequestId = LinkSRequestScheduleWork.SRequestId
LEFT JOIN vw_ScheduleWork ScheduleWork
ON LinkSRequestScheduleWork.ScheduleWorkId = ScheduleWork.ScheduleWorkId
LEFT JOIN vw_ScheduleWorkDetail ScheduleWorkDetail
ON ScheduleWork.ScheduleWorkId = ScheduleWorkDetail.ScheduleWorkId ' +
+ @Whereclause
+ ' GROUP BY SRequest.SRequestId'
DECLARE @result TABLE
(
SRequestId CHAR(6)
)
INSERT INTO @result EXEC(@SQL)
SELECT SRequest.SRequestId,
LARef,
ReceivedDate,
PremisesId, PremisesName, PremisesSearchName
HouseName, HouseNo,
AddressLine1, AddressLine2, AddressLine3, Postcode,
UPRN, AddressStatus,
TelephoneNo, FaxNo, MobilePhoneNo,
EmailAddress, WebAddress,
WardCode, WardDescription, AreaCode, AreaDescription,
Easting, Northing, GridRef,
--
CASE WHEN @RootTextSwitch = 1
THEN SRequestText.SRequestText END AS "SRequestText/SRequestText",
--
--SELECT EACH CLIENT ENTRY FOR THE NAMEADDRESS / SREQUEST IDS
(
SELECT SeqNo,
ReceivedDate,
OtherSRequestsFlag,
HowReceivedCode,
HowReceivedDescription,
VIPType,
VIPName,
AccessDetail
FROM dbo.vw_SRequestClient SRequestClient
WHERE SRequest.SRequestId = SRequestClient.SRequestId AND @ClientSwitch = 1
FOR XML PATH('SRequestClient'), TYPE
),
--
--SELECT THE ACTION DIARY ENTRY
(
SELECT ActionDiary.ActionDiaryId,
ActionTypeCode,
ActionTypeDescription,
UnitCode,
--
CASE WHEN @ActionDiaryTextSwitch = 1
THEN ActionDiaryText.ActionDiaryText END AS "ActionDiaryText/ActionDiaryText" ,
--
--SELECT EACH ACTION DETAIL ENTRY FOR THIS ACTION DIARY
(
SELECT ActionDetail.ActionDetailId,
ActionDetailCode,
ActionDetailDescription,
OfficerCode,
OfficerDescription,
--
CASE WHEN @ActionDetailTextSwitch = 1
THEN ActionDetailText.ActionDetailText END AS "ActionDetailText/ActionDetailText"
FROM dbo.vw_ActionDetail ActionDetail
JOIN dbo.vw_ActionDetailText ActionDetailText
ON ActionDetail.ActionDiaryId = ActionDiary.ActionDiaryId AND ActionDetail.ActionDetailId = ActionDetailText.ActionDetailId
WHERE ActionDiary.ActionDiaryId = LinkActionDiary.ActionDiaryId AND @ActionDetailSwitch = 1
FOR XML PATH('ActionDetail'), TYPE
)
FROM dbo.vw_ActionDiary ActionDiary
JOIN dbo.vw_ActionDiaryText ActionDiaryText ON ActionDiaryText.ActionDiaryId = ActionDiary.ActionDiaryId
WHERE ActionDiary.ActionDiaryId = LinkActionDiary.ActionDiaryId AND @ActionDiarySwitch = 1
FOR XML PATH('ActionDiary'), TYPE
),
[/endcode]
January 26, 2010 at 9:23 am
Without table definition and sample data provided as per the first link in my signature we can't really help you.
When I asked for an example I was talking about sample data to duplicate the error you recieve.
Something like
CREATE TABLE #temp()
INSERT INTO SELECT UNION ALL
SELECT ... FROM #temp FOR XML PATH(...), TYPE
January 27, 2010 at 10:16 am
We have encountered the same problem.
You will have to scrub the data stored in all of your text (char, nchar, varchar, nvarchar, etc.) columns to eliminate all invalid XML characters.
Valid XML characters for UTF-16 (2-byte) that is supported by .Net:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
Unicode:
[\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]
The scrubbing will need to be performed by the application that provides the data to the database. It can be done via a Regular Expression that either removes the invalid XML characters or replaces them with a blank.
Regular Expression to detect invalid XML:
PowerShell: [regex] '[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]'
C#: Regex rgxInvalidXML = new Regex(@"[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]")
Note that there are numerous Regular Expressions for this issue floating around on the internet that are wrong! I.e., they do not properly contain the correct set of characters. The above logic has been coded and implemented to a.) correct existing data and b.) used by the application to prevent the bad data from getting in.
January 28, 2010 at 8:34 am
Yes, thank you very much. That is exactly the route we have gone down by wrapping a function round each such field that converts. This was going to involve a follow up question r.e. the regex, as we were having issues with structuring it (and had resorted to writing a CLR .NET function). Your post has answered it and thrown up something we were not taking account of (0009, 000A, 000D).
Many thanks for your input. Perfect advice.
July 30, 2012 at 3:36 pm
Thanks for the code, it works great however I did have to add the vertical tab to the regex.
[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD\u1011]
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy