Intent
Sometimes when generating reports or text for web pages or form letters, it becomes handy to concatenate multiple column values from rows of data into a single value entirely within T-SQL. For reference, a problem I encountered recently involved selecting one or more department names and converting them to a comma-separated list to appear in a drop-down list.
The SQL language itself doesn't easily facilitate this but there are ways around this. Below I outline two alternate queries using SQL Server.
Technical Goals
I wanted to concatenate values from several rows into a single value using an optional row separator / delimiter without using a cursor and without excessive wasted performance.
Consider the following table, FRUIT:
id | name |
101 | Apple |
102 | Banana |
103 | Orange |
104 | Melon |
105 | Grape |
The goal is to select rows from this table and efficiently create a single result such as:
Apple, Banana, Orange, Melon, Grape
Solution 1
This problem can be solved using common table expressions (CTE) and in particular, a recursive CTE. To make this work, I created three CTEs. The first assigns a unique and sequential row number to each row value:
with list as ( select name as value, row_number() over(order byname)as num from fruit )
CTE Output:
value | num |
Apple | 1 |
Banana | 2 |
Grape | 3 |
Melon | 4 |
Orange | 5 |
The second CTE is recursive. Recursive CTEs were not intuitive to me when I first started working with them; if this section seems a little confusing at first, don't worry - bear with me. The basic anatomy of a recursive CTE is at least two sections separated by a UNION ALL statement. The first section is known as the non-recursive section and most of the normal SQL rules apply here. Subsequent sections are recursive, that is, they may select from the parent CTE.
Here my non-recursive section selects from the first CTE and inserts a third column which is used as a temporary variable which will hold the concatenated values as recursion progresses. Essentially, this new column will be built up with each recursion:
with concatenations as ( -- non-recursive section select value as head, cast(''as varchar(MAX)) as tail, num from list union all -- recursive section select head, (select value from list where num = prev.num - 1) + tail as tail, num - 1 from concatenations as prev -- note: recursive selection happens here! where num > 0 )
CTE Output:
head | tail | num |
Apple | 1 | |
Banana | 2 | |
Grape | 3 | |
Melon | 4 | |
Orange | 5 | |
Orange | Melon | 4 |
Orange | GrapeMelon | 3 |
Orange | BananaGrapeMelon | 2 |
Orange | AppleBananaGrapeMelon | 1 |
Orange | NULL | 0 |
Melon | Grape | 3 |
Melon | BananaGrape | 2 |
Melon | AppleBananaGrape | 1 |
Melon | NULL | 0 |
Grape | Banana | 2 |
Grape | AppleBanana | 1 |
Grape | NULL | 0 |
Banana | Apple | 1 |
Banana | NULL | 0 |
Apple | NULL | 0 |
The third and final CTE aims to select only the line shown above in green. You will notice that the comma separator is missing; don't worry, we'll add that next.
with concatenated as ( select max(tail + head) as items from concatenations where num = 1 )
CTE Output:
items |
AppleBananaGrapeMelonOrange |
Thoughts
This result is very close to what we originally intended. The only differences are the lack of an item delimiter and the sort order is determined by the name and not the ID of the fruit.
Solution 1 Revised
The first thing to consider is the missing delimiter. This is easily added (the + ', ' section below the union) in the recursive CTE where the previous value is selected in the recursive section:
with concatenations as ( select value as head, cast('' as varchar(MAX)) as tail, num from list union all select head, (select value from list where num = prev.num - 1) + ', ' + tail as tail, num - 1 from concatenations as prev where</>strong> num > 0 )
The second consideration is the changed sort order. To correct that, we can simply change the CTE that assigns consecutive row numbers to each element:
with list as ( select name as value, row_number() over (order by id) as num from fruit )
Thoughts
We finally have the desired output but there is still a lurking performance bug. The number of rows selected in the recursive CTE grows exponentially with the number of input rows. For example, the above input of 5 rows creates 20 rows whereas an input of 1,000 rows creates 501,500 rows! Fortunately, there is a simple change that keeps the recursive CTE growth linear.
Optimization
Looking at the rows selected by the CTE, we see that no rows after the target row (highlighted in green above) are necessary for the final result. To eliminate them, we can add a clause to restrict the value of head to the be the final value returned from the list CTE. Note: we have to calculate this value in the non-recursive section because the rules for the recursive section don't allow doing it there.
with concatenations as ( select value as head, cast('' as varchar(MAX)) as tail, num, (select top 1 name from fruit order by id desc) as terminator from list union all select head, (select value from list where num = prev.num - 1) + ', ' + tail as tail, num - 1, terminator from concatenations as prev where num > 0 and head = terminator )
Solution 1 Finished
Below is everything listed above in one expression for your convenience:
with fruit as ( select 'Apple' as name, 101 as id unionall select 'Banana' as name, 102 as id unionall select 'Orange' as name, 103 as id unionall select 'Melon' as name, 104 as id unionall select 'Grape' as name, 105 as id ) , list as ( select name as value, row_number() over(order byid) asnum from fruit ) , concatenations as ( select value as head, cast('' as varchar(MAX)) as tail, num, (select top 1 name from fruit order by id desc) as terminator from list union all select head, (select value from list where num = prev.num - 1) + ', ' + tail as tail, num - 1, terminator from concatenations as prev where num > 0 and head = terminator ) , concatenated as ( select max(tail + head) as items from concatenations where num = 1 ) select * from concatenated
Conclusion
This solution satisfies the initial purpose by efficiently concatenating any number of rows into one string without using anything besides common table expressions.
Solution 2
One option I have explored is the FOR XML PATH expression. In this example, no CTEs are used. The syntax is more compact than the above lines, but I have found performance to be unpredictable; your mileage may vary!
Here's the equivalent of the entire CTE code above:
select replace( replace( replace( (select name from fruit order by id for xml path('')), '</name><name>', ', ' -- delimiter ), '</name>', '' ), '<name>', '' )
Final Thoughts
Considering the two solutions above, obviously the second is simpler and therefore easier to maintain and understand. I have encountered odd performance issues using the XML based solution that disappeared when I switched to the recursive CTE version. However, Solution 1 has many moving parts and it's not desirable to have this extent of duplicated logic whenever you want to concatenate items. It would be nice to put this logic into a simple stored function, but that's the topic for a future article.