July 1, 2013 at 11:46 am
opc.three (6/30/2013)
I only meant to show the two characters that would be escaped when casting to the XML data type.Thanks for the code samples Solomon. I ran a few tests and the one difference between the XML data type and the FOR XML techniques that I saw as important to note is that the XML data type will not show the actual text in all cases, i.e. it escapes the left-angle bracket and ampersand characters as < and & respectively, which could prove to disqualify its usage some scenarios.
Hey there. Yes, the XML data-type does require escaped less-than, ampersand, and even greater-than (these are auto-translated by CONVERT it seems), but I don't necessarily agree that it disqualifies any usage. Let's consider the two options, which truly are very similar:
1) XML datatype: This method requires doing 2 explicit REPLACE functions to translate less-than and ampersand characters into their XML-encoded values. The end result is an XML string that is exactly the string being requested with the only exceptions being the three escaped values: &, <, and >. However, after clicking on the XML value and getting to the "xml" tab, you can do a find-and-replace on these three and end up with the exact string being requested.
2) processing-instruction: This method requires just the special field name "processing-instruction(something)" and the "FOR XML RAW(''), TYPE". The end result is an XML string that has an open tag of "<?something" and a close tag of "?>". The string within those tags is identical (in all but one case) to the string being requested (i.e. no XML-encodings). The one exception to the true string representation is that any occurrences of "?>" in the source string will have a space added between the "?" and ">" characters. This could be problematic in some cases as it is then impossible to determine which occurrences of "? >" are naturally in the source string as opposed to which ones are a result of having the space added by this process.
So, personally I still like the XML datatype method as I have the ability to get back to a true representation of the source string (even if it takes a few hits of Control-H) whereas the processing-instruction method might not, in some cases, be able to get back to the true representation. However, to be fair, it is probably much more likely that strings have <, >, and & characters in them than "?>" sequences so most of the time people would be safe with the processing-instruction method. Proving once again that it is good to have choices :-).
Take care,
Solomon...
PS. It seems that this forum does a double-pass decoding so it is not easy to show HTML / XML -encoded values. Meaning, typing in "&lt;" will actually display ">" instead of '<". You need to instead specify "&amp;lt;". And so you can imagine the silliness required for me to type this up ;-).
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
July 1, 2013 at 12:32 pm
That's the config for the "Results to Grid" display. The "Results to Text" config dialog has a different setting, "Maximum number of characters displayed in each column:". The max value on this dialog is 8192.
July 1, 2013 at 1:13 pm
Solomon Rutzky (7/1/2013)
1) XML datatype: This method requires doing 2 explicit REPLACE functions to translate less-than and ampersand characters into their XML-encoded values. The end result is an XML string that is exactly the string being requested with the only exceptions being the three escaped values: &, <, and >. However, after clicking on the XML value and getting to the "xml" tab, you can do a find-and-replace on these three and end up with the exact string being requested.
To each their own 🙂
SELECT CAST('>>' AS XML) AS [Search and Replace?];
PS I can appreciate the trouble you went to in creating your post so things would appear as you intended. I end up using the HTML hex codes on a regular basis to get the output I need, which I had to do for this post as well.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 1, 2013 at 4:34 pm
opc.three (7/1/2013)
SELECT CAST('>>' AS XML) AS [Search and Replace?];
Well, yes and no ;-).
Yes = in that scenario it would be impossible to get back to the source string.
No = that scenario is not what happens with the code I submitted previously since the ampersand would have been escaped, leaving "&gt;>", which could be translated back as long as you do the "&" -> "&" replacement last.
PS I can appreciate the trouble you went to in creating your post so things would appear as you intended. I end up using the HTML hex codes on a regular basis to get the output I need, which I had to do for this post as well.
It seems that maybe the real issue is that the "&" gets translated when posting. If you look at your quote of my text in your last posting, it has the actual <, >, and & characters instead of the encoded values from my post. Your usage of "&" seems to survive longer so I will switch to using that instead of "&". Thanks for the suggestion! 🙂
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
July 1, 2013 at 5:34 pm
I think you have added an abstraction layer that seems to work but at the end of the day you still have an entitization problem:
I think your approach has the problem of not differentiating when a character should be entitized, i.e. when a left-angle bracket is part of some markup or a comment versus when < is part of a string literal, however the XML parser within SQL Server that is invoked when you use TYPE as part of FOR XML does know that and it can do that entitization for us properly.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 1, 2013 at 9:30 pm
opc.three (7/1/2013)
I think you have added an abstraction layer that seems to work but at the end of the day you still have an entitization problem:
My problem seems to have been more of communication than entitization: I left too much to implication when I said 'do the conversion from "&" -> "&" last'. I should have been clearer about the need to translate the other two encoded values -- < and > -- FIRST. Since there are 3 values that are encoded, those same 3 values need to be decoded. And, it is just as import to encode the & first as it is to decode it last.
So, to re-run the test:
DECLARE @Before NVARCHAR(MAX),
@After NVARCHAR(MAX);
SET @Before = N'&gt;&>>';
SET @After = REPLACE(
CAST(dbo.ShowEntireString(@Before) AS NVARCHAR(MAX)),
N'&',
N'&'
);
-- not the same, but due to missing a REPLACE on >
SELECT @Before AS [Before], @After AS [After]
SET @After =
REPLACE(
REPLACE(
REPLACE(
CAST(dbo.ShowEntireString(@Before) AS NVARCHAR(MAX)),
N'>',
N'>'
),
N'<',
N'<'
),
N'&',
N'&'
);
-- same values
SELECT @Before AS [Before], @After AS [After]
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
July 2, 2013 at 2:58 am
OK, now we're getting somewhere. So three replacements after the conversion to XML, and in a specific order to unwind the original encoding. Thanks for clarifying Solomon.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
October 30, 2013 at 6:02 pm
Sorry... post deleted... I had already addressed the issue I was posting about.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 4, 2024 at 7:41 am
Bit late to the party on this one (only just seen the article).
Instead of using FOR XML PATH, there's a new function (I've only just started using it on SSMS 20.2 on a SQL2022 server) called STRING_AGG and it doesn't alter the string in any way (so no weird XML characters for "&", etc).
Apart from that, excellent article as always
October 4, 2024 at 8:40 am
You can also (since recent SSMS) set the limit of the number of characters returned in the grid for non-XML data (varchar(max) for exemple)
October 4, 2024 at 3:06 pm
Bit late to the party on this one (only just seen the article).
Instead of using FOR XML PATH, there's a new function (I've only just started using it on SSMS 20.2 on a SQL2022 server) called STRING_AGG and it doesn't alter the string in any way (so no weird XML characters for "&", etc).
Apart from that, excellent article as always
Thank you for the kind words, Richard. I appreciate it.
I wish they had NOT republished the article. It's more than a decade old and there were better ways to be had even on the day it was published. I still use a modified version of the method that Orlando Colamatteo posted in the discussion way back then.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 4, 2024 at 3:42 pm
You can also (since recent SSMS) set the limit of the number of characters returned in the grid for non-XML data (varchar(max) for exemple)
Thanks for the feedback and I agree for nowadays. Back when the article was written more than a decade ago, getting things over 8000 characters to go to a text output was impossible because the setting would only go to 8000, IIRC.
On the menu page you were kind enough to post above, make sure that you set the "RetainCR/LF on copy or save" option and then you can copy from a "cell" on the grid to the Text Editor of SSMS (or any other text editor) with all the CR/LFs in all the right spots.
I still like the XML method that Orlando Colamatteo posted in the discussion way back then because it makes things "clickably visible".
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 61 through 71 (of 71 total)
You must be logged in to reply to this topic. Login to reply