March 30, 2015 at 5:09 pm
Hiya,
I've been trying to find and filter out data that can cause FOR XML to fail with the error:
FOR XML could not serialize the data for node ?? because it contains a character (0x0006) which is not allowed in XML.
I thought it would be simple, just identify the rows that don't match the XML spec http://www.w3.org/TR/REC-xml/#charsets and filter them out.
However, the following doesn't work.
select fieldname
from tablename
where fieldname
like N'%[^'+nchar(0x9)+nchar(0xA)+nchar(0xD)+nchar(0x20)+N'-'+nchar(0xD7FF)+N']%'
Not only does this fail to find the offending rows, when I checked by inverted it by taking out the "^" it still returns no rows.
I'm at a loss as to why it's doing this.
When I reduced 0xD7FF down to 0xFF it returns data, but I cannot add the other range back in without all rows not matching.
I've experimented with values and found 0x02E9 was the highest I could go without all values vanishing. Even then, numbers lower than that caused a large variation in the number of rows returned.
My Field is NVarchar(100) in SQL_Latin1_General_CP1_CI_AS.
Thanks
Ian.
March 30, 2015 at 6:07 pm
Maybe you could do this to find offending rows.
select fieldname
from tablename
EXCEPT
select fieldname
from tablename
where fieldname
NOT LIKE N'%[^'+nchar(0x9)+nchar(0xA)+nchar(0xD)+nchar(0x20)+N'-'+nchar(0xD7FF)+N']%'
The NOT LIKE N'%[^Pattern]%' will return non-offending rows. The EXCEPT will return the other rows.
You have a wide range in there, that might be causing the lack of results.
March 31, 2015 at 5:49 am
Right, I think I've found the issue.
As you can see from this SQLFiddle http://sqlfiddle.com/#!6/7801f/6 (watch out, sqlfiddle doesn't display the char that caused the issue for me in the first place!), in the first recordset there's two columns, the first is a patindex without changing the collation, the second is the same patindex with an enforced "bin" collation.
You can see that the normal collation doesn't match any characters, but the second does.
What's more interesting about this is when you look at the second query.
For for some reason it would seem that the patindex is matching the range, but excluding the hyphen from the characters matched despite the hyphen being within the range selected.
Unless I'm very much mistaken, this would look like a bug in SQL server.
As soon as you enforce a binary collation this behaviour goes away.
Ian.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply