December 23, 2014 at 12:48 pm
Luis Cazares (12/23/2014)
g.britton (12/23/2014)
The problem with the fourth option is that the nested query needs to be in parentheses. So this works:
SELECT SUBSTRING( ( SELECT ', ' + productname FROM dbo.Header AS h FOR XML PATH('') ) , 2, 100)
^ ^
Interestingly though, the order of items is different!
That's because the first one has a DISTINCT which orders the values to remove duplicates while the other one just returns them the way they were found.
None of the order is guaranteed in larger sets, though.
Ah, yes! My eye skipped over the DISTINCT keyword when I first read it. And, it's interesting that DISTINCT dictates the order of the final projection too. I don't think I've ever run across an explicit statement to that effect, even though it makes sense. Best not to rely on that behavior I suppose.
Gerald Britton, Pluralsight courses
December 24, 2014 at 1:32 am
Good question. Happy to see that STUFF is getting some love here - I often see code that is way more complicated than it should be because people avoid (or simply don't know about) STUFF.
The only gripe I have is that the first answer includes a DISTINCT that the other don't, and that does not follow from the requirements (though they are vague enough that they don't invalidate the DISTINCT either). Why was this added? It affects the results (if there are duplicate names), and also reduces performance and maintainability of the code.
December 24, 2014 at 1:35 am
DrKiller (12/23/2014)
wow what a complex way of doing it.I'm usually just doing this 🙂
DECLARE @tmpString VARCHAR(MAX)
SELECT @tmpString = COALESCE(@tmpString + ', ' + productname, productname) FROM Header
SELECT @tmpString
Others already pointed out the performance issue. I have not yet seen warnings about this code being undocumented, unsupported, and unreliable. Though this code, based on empirical observation, usually works - it does not always. And when it doesn't, MS support will slam the door in your face and tell you that you're on your own. (Well, except when you're a multi-million customer...)
December 24, 2014 at 1:38 am
g.britton (12/23/2014)
And, it's interesting that DISTINCT dictates the order of the final projection too.
In this case, it does. But don't rely on that. For any query with a bit more complexity, the optimizer may equally well choose a plan that does some stuff *after* implementing the DISTINCT.
And even when the DISTINCT is implemented last, you should still be careful. In that case the DISTINCT will indeed determine the order, but not necessarily in the way you might expect. Ordering to remove the duplicates is just one trick the optimizer has; I have also seen plans where a hash function was used to eliminate duplicates. In that case, the order of rows returned would be determined by the hash function - to most people, that would look like a random ordering.
December 24, 2014 at 6:41 am
Hugo Kornelis (12/24/2014)
g.britton (12/23/2014)
And, it's interesting that DISTINCT dictates the order of the final projection too.In this case, it does. But don't rely on that. .
..as I said. Pity you didn't quote my whole statement.
Gerald Britton, Pluralsight courses
December 24, 2014 at 10:22 am
g.britton (12/24/2014)
Hugo Kornelis (12/24/2014)
g.britton (12/23/2014)
And, it's interesting that DISTINCT dictates the order of the final projection too.In this case, it does. But don't rely on that. .
..as I said. Pity you didn't quote my whole statement.
Oops - you are right. I didn't read that carefully enough.
My apologies!
December 27, 2014 at 10:01 pm
Using select for XML to concatenate result strings is a fine idea I did not have.
Thanks for that!
The rest - I do not appreciate that much
if you want distinct, ordered, no problem with the short one, can't see limitations.
SELECT substring( (SELECT distinct ', ' + productname FROM dbo.Header order by 1 FOR XML PATH('')), 2, 100000)
Ah, brackets missing.
So, let's do the test
SELECT
ColorCommaDelimitedList = '#'+STUFF(( SELECT
', ' + productname AS [text()]
FROM
( SELECT DISTINCT
productname
FROM
Header
) x
FOR XML PATH('')
), 1, 1, '');
December 27, 2014 at 11:22 pm
Michael L John (12/23/2014)
DrKiller (12/23/2014)
wow what a complex way of doing it.I'm usually just doing this 🙂
DECLARE @tmpString VARCHAR(MAX)
SELECT @tmpString = COALESCE(@tmpString + ', ' + productname, productname) FROM Header
SELECT @tmpString
This is a very big performance hit.
As a completely un-scientific test, I took the employee table from my system. It contains 81195 rows.
Your solution is still running after 5+ minutes.
The solution in the question runs in less than a second.
Let's do a scientific test then because there could be hidden gotcha's in the table. For example, you say you "took the employee table from my system". What is the data-type and length of the column you used, please? Also, can you post the exact code that you used for comparison sake? Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2014 at 11:47 pm
Guess I'll have to do my own testing because no one (unless I just missed it) has de-entitized the data in the XML method and the variable overlay method doesn't require such a thing. De-entitization typically slows the XML method down by about 100% (in other words, makes it twice as slow). Depending on other things, that still might not be enough to lose the race but it's been years since I did such a test on these two methods. It's time I revisited them.
Also, be careful about using what I call "grooved" or "rut" data. That's non-randomized or, frequently, constant data such as (and I'm not picking on anyone in particular) using a single string literal as a part of the "test data". The optimizer will easily pick up on such a thing and take a short cut that won't be obvious. Many people think that a WHILE loop splitter will beat the DelimitedSplit8K function because of such a thing. I don't know if it'll make a difference with this particular comparison between XML and a variable overlay but I intend to test for it.
Last but not least, be aware of implicit conversions if you're testing from a table. Those can really kill one method or the other especially when COALESCE is involved because it will force implicit conversions to the highest level in the data-type hierarchy of its operands and it's relatively expensive to do so.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2014 at 11:52 pm
DrKiller (12/23/2014)
wow what a complex way of doing it.I'm usually just doing this 🙂
DECLARE @tmpString VARCHAR(MAX)
SELECT @tmpString = COALESCE(@tmpString + ', ' + productname, productname) FROM Header
SELECT @tmpString
My only hang-up with that method is that it inherently produces a RBAR result and I normally need to produce more than one row of (for example) CSV or TSV information from a table. The XML method lends itself to that pretty well.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 28, 2014 at 12:00 am
Hugo Kornelis (12/24/2014)
DrKiller (12/23/2014)
wow what a complex way of doing it.I'm usually just doing this 🙂
DECLARE @tmpString VARCHAR(MAX)
SELECT @tmpString = COALESCE(@tmpString + ', ' + productname, productname) FROM Header
SELECT @tmpString
Others already pointed out the performance issue. I have not yet seen warnings about this code being undocumented, unsupported, and unreliable. Though this code, based on empirical observation, usually works - it does not always. And when it doesn't, MS support will slam the door in your face and tell you that you're on your own. (Well, except when you're a multi-million customer...)
Not undocumented but rather poorly documented. Of course, I absolutely agree that it's totally unsupported and I've also seen it cause some very odd problems like it returning only the first value under certain conditions. Unfortunately, I've not preserved that particular test and I've lost track of the link that had a similar test that demonstrates the problem. I think I may have gotten the link from you in the past or it could just be ailing memory. Do you happen to have such a link?
{EDIT} Never mind. I found the link I was looking for. Here it is and it does actually show the single element return problem I spoke of.
http://sqlmag.com/sql-server/multi-row-variable-assignment-and-order
There are certainly work arounds for all of these problems but, as many will be quick to cite, they are all still unsupported and usually add complexity beyond that of the XML method even with its de-entitization problems.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 28, 2014 at 1:01 am
Jeff Moden (12/28/2014)
Hugo Kornelis (12/24/2014)
DrKiller (12/23/2014)
wow what a complex way of doing it.I'm usually just doing this 🙂
DECLARE @tmpString VARCHAR(MAX)
SELECT @tmpString = COALESCE(@tmpString + ', ' + productname, productname) FROM Header
SELECT @tmpString
Others already pointed out the performance issue. I have not yet seen warnings about this code being undocumented, unsupported, and unreliable. Though this code, based on empirical observation, usually works - it does not always. And when it doesn't, MS support will slam the door in your face and tell you that you're on your own. (Well, except when you're a multi-million customer...)
Not undocumented but rather poorly documented. Of course, I absolutely agree that it's totally unsupported and I've also seen it cause some very odd problems like it returning only the first value under certain conditions.
Unfortunately, I've not preserved that particular test and I've lost track of the link that had a similar test that demonstrates the problem. I think I may have gotten the link from you in the past or it could just be ailing memory. Do you happen to have such a link?{EDIT} Never mind. I found the link I was looking for. Here it is and it does actually show the single element return problem I spoke of.
http://sqlmag.com/sql-server/multi-row-variable-assignment-and-order
There are certainly work arounds for all of these problems but, as many will be quick to cite, they are all still unsupported and usually add complexity beyond that of the XML method even with its de-entitization problems.
Just a quick addition, this method is used in some undocumented functions such as fn_varbintohexstr
😎
December 28, 2014 at 5:48 am
Jeff Moden (12/28/2014)
Not undocumented but rather poorly documented.
The most relevant documentation beinig this one: http://support.microsoft.com/kb/287515: Execution Plan and Results of Aggregate Concatenation Queries Depend Upon Expression Location. This article from Microsoft Support officially states (and even includes repro code to prove it) that this "aggregate concatenation" technique can have different results based on seemingly unrelated changes in the execution plan.
The weird thing in this article is that the "cause" section explicitly states that "the correct behavior for an aggregate concatenation query is undefined". However, then in the workaround section, it states that "in order to achieve the expected results from an aggregate concatenation query, (...)". So if the "correct behavior" is undefined, I would assume that the "expected results" are also undefined - and yet we are given a way to achieve those "expected results"?
I have so far been unable to find any other MS documentation describing what the expected results of such a query should be. Not even the "rather poorly documented" description that you apparently found. Do you perhaps have a link to that?
I have deliberately kept out of the discussion on performance, because I see no sense in measuring performance of a solution that is not guaranteed to return the correct results. If you are willing to sacrifice correctness for performance, then why not simply replace all your queries with [font="Courier New"]SELECT 42;[/font]? Probably also incorrect, but you'll never get faster than that! 😉
Anyway, your remark in another post that FOR XML concatenation requires de-entitizing the data, an often-overlooked bug in most posts that mention FOR XML. Whenever I use that method in my code, I always have to spend some time trying to retrieve that code. The reason why I referenced the FOR XML method in my posts but didn't give code examples is that I didn't feel like going on another Google spree when writing that. 🙂
But since you now brought if up, here is how I would write the code for this QotD:
SELECT STUFF((SELECT ', ' + productname
FROM dbo.Header AS h
FOR XML PATH(''), TYPE).value('.', 'varchar(max)'), 1, 2, '');
December 28, 2014 at 3:07 pm
The link for the "poor documentation" that I was speaking of is actually in Books Online under UPDATE... as in "Quirky Update", which can be controlled but is absolutely not supported. Let's not go there, though. 😀
As for the link you provided, thanks for that. The thing that bothers me greatly about such articles is that, although there are some good lessons to be learned, the two queries are fundamentally different. I would naturally expect them to behave differently even without the concatenation.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 28, 2014 at 3:29 pm
Think the cause for terrible performance is, strictly speaking, not RBAR but allocating a new string for each turn, because with integers it's blazing fast. Try yourself:
drop table test_table
;
with a as (
select id = 1, text = cast('element-1' as varchar(100))
union all select id = id+1, text = cast('element-'+cast(id+1 as varchar(40)) as varchar(100)) from a where id < 30000 )
select * into test_table from a
option (maxrecursion 32767)
DECLARE @tmpString VARCHAR(MAX)
SELECT @tmpString = COALESCE(@tmpString + ', ' + text, text) FROM test_table order by id
SELECT @tmpString
DECLARE @i int
SELECT @i = COALESCE(@i + id , id) FROM test_table order by id
SELECT @i
Viewing 15 posts - 16 through 30 (of 34 total)
You must be logged in to reply to this topic. Login to reply