June 4, 2021 at 4:14 pm
Hi there
I have a piece of code which takes an input parameter of NVARCHAR(Max) and then attempts to cast it to XML as follows
declare @DataSheetXML NVARCHAR(max)
declare @DataSheetXML2 xml
set @DataSheetXML =
N'<?xml version="1.0" encoding="utf-8"?>
<Datasheet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
</Datasheet>'
set @DataSheetXML2= CONVERT(XML,CONVERT(NVARCHAR(MAX),@DataSheetXML))
This fails because I get the following error message:
Msg 9402, Level 16, State 1, Line 14
XML parsing: line 1, character 38, unable to switch the encoding
Is there a way that I can cast @DataSheetXML to @DataSheetXML2, ie from NVARCHAR(MAX) to XML?
June 4, 2021 at 5:55 pm
Was doing a little bit of random testing on this and I think the problem is that utf-8 is going to be VARCHAR. utf-16 would be NVARCHAR.
This is easy to test by taking your SET statement and changing the second convert from NVARCHAR(MAX) to VARCHAR(MAX). Alternately, you can change your encoding to utf-16 in the XML.
Now, to handle this in coded format, I think you would need to have an IF statement. Something along the lines of this:
declare @DataSheetXML NVARCHAR(max)
declare @DataSheetXML2 xml
set @DataSheetXML =
N'<?xml version="1.0" encoding="utf-8"?>
<Datasheet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
</Datasheet>'
IF (SELECT CHARINDEX('utf-8',@DataSheetXML)) = 0
BEGIN
SET @DataSheetXML2= CONVERT(XML,@DataSheetXML)
END
ELSE
BEGIN
SET @DataSheetXML2= CONVERT(XML,CONVERT(VARCHAR(MAX),@DataSheetXML))
END
if the string "utf-8" exists in your XML, then convert to VARCHAR(MAX), otherwise leave it as an NVARCHAR(MAX). No need to CONVERT @DataSheetXML from NVARCHAR(MAX) to NVARCHAR(MAX) as that isn't doing anything.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
June 7, 2021 at 2:25 pm
Thanks Brian...that worked perfectly for me
June 7, 2021 at 2:57 pm
Not a problem. I imagine there are other solutions as well. If you KNOW you are wanting NVARCHAR for sure, you could also do a REPLACE on the utf-8 to be utf-16. This is MAY break the XML though (shouldn't but never say never). Or if you are ALWAYS going to be working with utf-8 XML, then having the first parameter as a VARCHAR would save some conversions.
Something to note, there are a LOT of different encoding types supported by XML. I found this list: https://www.iana.org/assignments/character-sets/character-sets.xhtml
You can see there are a lot of encoding types. My approach above ONLY handles utf-8 and casts that to VARCHAR first. I am not certain if any of the other types (such as US-ASCII) would need to be converted to VARCHAR or if NVARCHAR can handle it. I expect if you don't know what encoding(s) will be used, you will need to do trial and error, with my above method, on all different encoding types and there are a lot.
As it is a large list to parse through if you needed to check all of them, I would take a different approach. A safer approach would be something like:
declare @DataSheetXML NVARCHAR(max)
declare @DataSheetXML2 xml
set @DataSheetXML =
N'<?xml version="1.0" encoding="utf-8"?>
<Datasheet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
</Datasheet>'
BEGIN try
SET @DataSheetXML2= CONVERT(XML,CONVERT(NVARCHAR(MAX),@DataSheetXML))
END try
BEGIN CATCH
BEGIN TRY
SET @DataSheetXML2= CONVERT(XML,CONVERT(VARCHAR(MAX),@DataSheetXML))
END TRY
BEGIN CATCH
SET @DatasheetXML2=N'<?xml version="1.0" encoding="utf-16"?>
<ERROR>ERROR ENCOUNTERED</ERROR>'
END CATCH
END CATCH
IF CONVERT(NVARCHAR(MAX),@DatasheetXML2) LIKE '%<ERROR>%')
BEGIN
--HANDLE ERROR and exit
END
The above approach ends up using 2 try catch blocks which I generally try to avoid in SQL if I can, but I think this is going to be easier than trying to test and capture all possible encoding types. Plus it handles the case where it can't convert to VARCHAR or NVARCHAR. I left the explicit NVARCHAR conversion in even though we have the NVARCHAR at the start because to me it makes it clearer to see what we are doing. That is, it increases readability for future developers. It is not required.
Plus, that last CONVERT for handling the error could be handled by parsing the XML rather than converting it. Parsing will likely give better performance, but I expect it to be very minimal performance gain. In this case, the XML is short, so converting it and doing a LIKE comparison on it should be quick and allows you to handle the error in the event some encoding type can't be converted to VARCHAR or NVARCHAR.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 13, 2022 at 2:22 pm
SQL Server 2019 now supports UTF-8 you can check if you have UTF-8 collations by running the following query :
SELECT Name, Description FROM fn_helpcollations() WHERE Name LIKE '%UTF8'
Since UTF-8 must be stored in VARCHAR (not NVARCHAR) you will need two conversions :
Here is an example where I've added a special character what would be considered an illegal character if you don't convert to UTF-8 first.
DECLARE
@DataSheetXML nvarchar(max),
@DataSheetXML2 xml
SET @DataSheetXML = N'<?xml version="1.0" encoding="utf-8"?>
<Datasheet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Description>Service 5 étoiles</Description>
</Datasheet>'
DECLARE @hdoc int
SET @DataSheetXML2 = cast(@DataSheetXML as varchar(max)) COLLATE Latin1_General_100_CI_AS_SC_UTF8
--SET @DataSheetXML2 = cast(@DataSheetXML as varchar(max)) --This would throw 'XML parsing: line 3, character 25, illegal xml character'
EXEC sp_xml_prepareDocument @hdoc OUTPUT, @DataSheetXML2
SELECT *
FROM OPENXML(@hdoc, '/Datasheet', 2)
WITH ([Description] varchar(max))
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply