Creating a Comma-Separated List (SQL Spackle)

Question

Creating a Comma-Separated List (SQL Spackle)

Viewing 15 posts - 61 through 75 (of 85 total)

You must be logged in to reply to this topic. Login to reply

WayneS SSC Guru Points: 95461 More actions · Answer 1

COTS... say no more. I feel your pain.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

cs_troyk SSCertifiable Points: 5383 More actions · Answer 2

Not that I advocate switching to a "toy" RDBMS, but MySQL does have the GROUP_CONCAT operator which makes this type of thing a little more elegant. It's something to be aware of in case you ever need to migrate from a MySQL db.

-TroyK

chris.westgate Valued Member Points: 57 More actions · Answer 3

Great solution Wayne, thank you...

But for me, here's a tiny spanner in the works.

The column that I need to produce a delimited list from is an int column - and I need the list ordered by integer. However, trying to concatenate ',' with an int of course results in a conversion error trying to convert ',' to an int... BUT, if I cast my int value as a varchar, of course, it then produces a list that runs 1,10,11, 12, 2, 20... etc...

Any ideas please?

Cheers

Chris

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

chris.westgate (10/30/2012)
Great solution Wayne, thank you...
But for me, here's a tiny spanner in the works.
The column that I need to produce a delimited list from is an int column - and I need the list ordered by integer. However, trying to concatenate ',' with an int of course results in a conversion error trying to convert ',' to an int... BUT, if I cast my int value as a varchar, of course, it then produces a list that runs 1,10,11, 12, 2, 20... etc...
Any ideas please?
Cheers
Chris

You've basically just answered you own question. The only way to concatenate INTs will be to convert them. Just add an ORDER BY to you XML based SELECT and Bob's your uncle.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WayneS SSC Guru Points: 95461 More actions · Answer 5

chris.westgate (10/30/2012)
Great solution Wayne, thank you...
But for me, here's a tiny spanner in the works.
The column that I need to produce a delimited list from is an int column - and I need the list ordered by integer. However, trying to concatenate ',' with an int of course results in a conversion error trying to convert ',' to an int... BUT, if I cast my int value as a varchar, of course, it then produces a list that runs 1,10,11, 12, 2, 20... etc...
Any ideas please?
Cheers
Chris

Hi Chris,

In the article, I have an "ORDER BY Value" to control the ordering (just prior to the FOR XML). Just use your integer column there. As you (and Jeff) point out, just convert your int to a varchar to create the delimited list.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

chris.westgate Valued Member Points: 57 More actions · Answer 6

Sorry. Left out a couple of critical bits of information. I'm selecting two columns and ordering by one of them - but trying to get the CS string for the second column, against the first column... both columns are integers, and the second column has repeated values but I need the distinct list... and of course if distinct is used, the exact expression must appear in the order by list.

However, because I've got to cast it as a varchar to concatenate the comma, I can only order by the cast as varchar - which doesn't give the correct integer sorting...

...does that make more sense now?

chris.westgate Valued Member Points: 57 More actions · Answer 7

chris.westgate

Valued Member

Points: 57

October 30, 2012 at 11:14 am

#1554281

^^^ or indeed any sense at all? 😉

WayneS SSC Guru Points: 95461 More actions · Answer 8

So, what you are looking for is:

Col1, comma-separated list of Column 2 (in order)

Col1 should only be there once.

Col2 is an integer, and should be sorted by it's value.

Is this correct?

So, this is what you're looking for:

DECLARE @table TABLE (

Col1 INT,

Col2 INT);

INSERT INTO @table (Col1, Col2)

SELECT 1, 1 UNION ALL

SELECT 1, 2 UNION ALL

SELECT 1, 5 UNION ALL

SELECT 1, 10 UNION ALL

SELECT 3, 1 UNION ALL

SELECT 3, 5 UNION ALL

SELECT 3, 10 UNION ALL

SELECT 3, 15;

WITH cte AS

(

SELECT DISTINCT Col1

FROM @table

)

SELECT Col1,

CS = STUFF((SELECT ',' + CONVERT(VARCHAR(10), Col2)

FROM @table t1

WHERE t1.col1 = cte.Col1

ORDER BY Col2

FOR XML PATH(''), TYPE).value('.','varchar(max)'),

1, 1, '')

FROM cte

ORDER BY Col1;

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Possinator Hall of Fame Points: 3452 More actions · Answer 9

Just a note of caution - if you plan to string together integers with a comma delimiter and subsquently dump the data into Excel, Excel will try to do funny things with the string. I just ran into this today. I changed the delimiter to a pipe (|) and all is well.

[font="Courier New"]Looking for a Deadlock Victim Support Group..[/font]

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 10

cs_troyk (10/29/2012)
Not that I advocate switching to a "toy" RDBMS, but MySQL does have the GROUP_CONCAT operator which makes this type of thing a little more elegant. It's something to be aware of in case you ever need to migrate from a MySQL db.
-TroyK

MySQL's GROUP_CONCAT() function implements a semantically different piece of functionality than what Wayne has demonstrated in the article. Notice that the query in the article is not an aggregate, i.e. no GROUP BY clause.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

cs_troyk SSCertifiable Points: 5383 More actions · Answer 11

opc.three (11/8/2012)
cs_troyk (10/29/2012)
Not that I advocate switching to a "toy" RDBMS, but MySQL does have the GROUP_CONCAT operator which makes this type of thing a little more elegant. It's something to be aware of in case you ever need to migrate from a MySQL db.
-TroyK
MySQL's GROUP_CONCAT() function implements a semantically different piece of functionality than what Wayne has demonstrated in the article. Notice that the query in the article is not an aggregate, i.e. no GROUP BY clause.

Take a look at the CTE from the article, and consider the effect of the "DISTINCT" keyword.

Here's the MySQL solution to the comma-separated list problem:

SELECT AccountNumber, GROUP_CONCAT(Value ORDER BY Value) AS 'CommaList'

FROM T

GROUP BY AccountNumber

ORDER BY AccountNumber;

It produces the same output as the T-SQL solution.

-TroyK

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 12

cs_troyk (11/8/2012)
opc.three (11/8/2012)
cs_troyk (10/29/2012)
Not that I advocate switching to a "toy" RDBMS, but MySQL does have the GROUP_CONCAT operator which makes this type of thing a little more elegant. It's something to be aware of in case you ever need to migrate from a MySQL db.
-TroyK
MySQL's GROUP_CONCAT() function implements a semantically different piece of functionality than what Wayne has demonstrated in the article. Notice that the query in the article is not an aggregate, i.e. no GROUP BY clause.
Take a look at the CTE from the article, and consider the effect of the "DISTINCT" keyword.
Here's the MySQL solution to the comma-separated list problem:
SELECT AccountNumber, GROUP_CONCAT(Value ORDER BY Value) AS 'CommaList'
FROM T
GROUP BY AccountNumber
ORDER BY AccountNumber;
It produces the same output as the T-SQL solution.
-TroyK

Not sure what you're getting at when you talk about DISTINCT, whether you mean in the outer query or the correlated subquery, but if the subquery you would need to provide DISTINCT when using GROUP_CONCAT() as well, if duplicate values were a concern:

SELECT AccountNumber, GROUP_CONCAT(DISTINCT Value ORDER BY Value) AS 'CommaList'

FROM T

GROUP BY AccountNumber

ORDER BY AccountNumber;

If the outer query then yes, I agree, an aggregate may be a better option. I am not arguing about the likeness of the results however. All I am pointing out is that there is a difference in implementation, semantical maybe, but there is a penalty for using GROUP BY when it is not needed when compared to using a correlated subquery as Wayne presents it.

If GROUP_CONCAT() existed on SQL Server the XML technique from the article would almost certainly outperform it so while it may seem like a fair comparison from a query results point of view it's really apples and oranges under the hood.

Now, from a syntax point of view, when the outer query must be distinct, I like the GROUP_CONCAT() method which is why I wrote GROUP_CONCAT() for SQL Server[/url] by leveraging the SQLCLR but it is not the best choice for all scenarios.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

cs_troyk SSCertifiable Points: 5383 More actions · Answer 13

opc.three (11/8/2012)
cs_troyk (11/8/2012)
opc.three (11/8/2012)
cs_troyk (10/29/2012)
Not that I advocate switching to a "toy" RDBMS, but MySQL does have the GROUP_CONCAT operator which makes this type of thing a little more elegant. It's something to be aware of in case you ever need to migrate from a MySQL db.
-TroyK
MySQL's GROUP_CONCAT() function implements a semantically different piece of functionality than what Wayne has demonstrated in the article. Notice that the query in the article is not an aggregate, i.e. no GROUP BY clause.
Take a look at the CTE from the article, and consider the effect of the "DISTINCT" keyword.
Here's the MySQL solution to the comma-separated list problem:
SELECT AccountNumber, GROUP_CONCAT(Value ORDER BY Value) AS 'CommaList'
FROM T
GROUP BY AccountNumber
ORDER BY AccountNumber;
It produces the same output as the T-SQL solution.
-TroyK
Not sure what you're getting at when you talk about DISTINCT, whether you mean in the outer query or the correlated subquery, but if the subquery you would need to provide DISTINCT when using GROUP_CONCAT() as well, if duplicate values were a concern:
DISTINCT[/b] Value ORDER BY Value) AS 'CommaList'
FROM T
GROUP BY AccountNumber
ORDER BY AccountNumber;
If the outer query then yes, I agree, an aggregate may be a better option. I am not arguing about the likeness of the results however. All I am pointing out is that there is a difference in implementation, semantical maybe, but there is a penalty for using GROUP BY when it is not needed when compared to using a correlated subquery as Wayne presents it.
If GROUP_CONCAT() existed on SQL Server the XML technique from the article would almost certainly outperform it so while it may seem like a fair comparison from a query results point of view it's really apples and oranges under the hood.
Now, from a syntax point of view, when the outer query must be distinct, I like the GROUP_CONCAT() method which is why I wrote GROUP_CONCAT() for SQL Server[/url] by leveraging the SQLCLR but it is not the best choice for all scenarios.

My point is that the solution presented in the article is the equivalent to a solution using GROUP BY, so saying that there's no GROUP BY clause is only accurate when talking about the syntax. The query could be rewritten like this, and it's the same query from the optimizer's perspective:

SELECT AccountNumber,

CommaList = STUFF((

SELECT ',' + Value

FROM #TestData

WHERE AccountNumber = t.AccountNumber

ORDER BY Value

FOR XML PATH(''),

TYPE).value('.','varchar(max)'),1,1,'')

FROM #TestData t

GROUP BY AccountNumber

ORDER BY AccountNumber;

This produces the same query plan as the CTE version, and has the same IO profile, so it is incorrect to assert some penalty for using GROUP BY. I'm not sure how you arrive at the conclusion that the XML version would outperform GROUP_CONCAT if the latter were implemented in MS's product. Is this based on your results from implementing it using CLR?

-TroyK

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 14

cs_troyk (11/9/2012)
opc.three (11/8/2012)
cs_troyk (11/8/2012)
opc.three (11/8/2012)
cs_troyk (10/29/2012)
Not that I advocate switching to a "toy" RDBMS, but MySQL does have the GROUP_CONCAT operator which makes this type of thing a little more elegant. It's something to be aware of in case you ever need to migrate from a MySQL db.
-TroyK
MySQL's GROUP_CONCAT() function implements a semantically different piece of functionality than what Wayne has demonstrated in the article. Notice that the query in the article is not an aggregate, i.e. no GROUP BY clause.
Take a look at the CTE from the article, and consider the effect of the "DISTINCT" keyword.
Here's the MySQL solution to the comma-separated list problem:
SELECT AccountNumber, GROUP_CONCAT(Value ORDER BY Value) AS 'CommaList'
FROM T
GROUP BY AccountNumber
ORDER BY AccountNumber;
It produces the same output as the T-SQL solution.
-TroyK
Not sure what you're getting at when you talk about DISTINCT, whether you mean in the outer query or the correlated subquery, but if the subquery you would need to provide DISTINCT when using GROUP_CONCAT() as well, if duplicate values were a concern:
DISTINCT[/b] Value ORDER BY Value) AS 'CommaList'
FROM T
GROUP BY AccountNumber
ORDER BY AccountNumber;
If the outer query then yes, I agree, an aggregate may be a better option. I am not arguing about the likeness of the results however. All I am pointing out is that there is a difference in implementation, semantical maybe, but there is a penalty for using GROUP BY when it is not needed when compared to using a correlated subquery as Wayne presents it.
If GROUP_CONCAT() existed on SQL Server the XML technique from the article would almost certainly outperform it so while it may seem like a fair comparison from a query results point of view it's really apples and oranges under the hood.
Now, from a syntax point of view, when the outer query must be distinct, I like the GROUP_CONCAT() method which is why I wrote GROUP_CONCAT() for SQL Server[/url] by leveraging the SQLCLR but it is not the best choice for all scenarios.
My point is that the solution presented in the article is the equivalent to a solution using GROUP BY, so saying that there's no GROUP BY clause is only accurate when talking about the syntax. The query could be rewritten like this, and it's the same query from the optimizer's perspective:
SELECT AccountNumber,
CommaList = STUFF((
SELECT ',' + Value
FROM #TestData
WHERE AccountNumber = t.AccountNumber
ORDER BY Value
FOR XML PATH(''),
TYPE).value('.','varchar(max)'),1,1,'')
FROM #TestData t
GROUP BY AccountNumber
ORDER BY AccountNumber;
This produces the same query plan as the CTE version, and has the same IO profile, so it is incorrect to assert some penalty for using GROUP BY. I'm not sure how you arrive at the conclusion that the XML version would outperform GROUP_CONCAT if the latter were implemented in MS's product. Is this based on your results from implementing it using CLR?
-TroyK

Yes they are. I am now on the same page as far as which DISTINCT you were referring too. If the dataset you're aggregating on is not unique and you do not need to apply DISTINCT then SQL will not need to sort or de-duplicate the key column to aggregate the results, whereas the GROUP BY always will (unless the optimizer can ignore it due to the presence of a unique index). That is where the penalty I am referring too is introduced.

Here is the equivalent query using my SQLCLR:

SELECT AccountNumber,

dbo.GROUP_CONCAT(Value) AS CommaList

FROM #TestData t

GROUP BY AccountNumber

ORDER BY AccountNumber;

In my testing (test scripts are included in the CodePlex download) the SQLCLR object is on par with the XML TYPE method shown in the article in terms of performance. In some test cases the XML performs best, in others the SQL CLR performs best. The +- is roughly 10% in either direction so I am comfortable saying they are "comparable" for general use.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Neha05 Default port Points: 1494 More actions · Answer 15

Neha05

Default port

Points: 1494

December 24, 2012 at 12:30 pm

#1571354

Nice article.