Database developers need to combine values from multiple rows into a list for various reasons, including:
- displaying lists of values per group in aggregate queries;
- passing a comma-delimited list of values to a function.
The example below illustrates lists of counties generated per state. The left table contains two states with five counties per state, resulting in a total of 10 rows. The five rows for the state of 'AZ' in the left table are combined into one on the right, with the CountyID's placed in a comma separated list. Analogically, the five rows for the state of 'CA' are combined into a second row in the right table.
In terms of SQL operations, the rows in the left table are grouped by State and a list of CountyIDs is generated for each state. Let's look at how this is achieved using recursive CTEs.
Overview of Recursive CTEs
A recursive CTE requires at least one UNION ALL operator. The simplest recursive form has one UNION ALL and two queries[1]:
The anchor query is executed exactly once and the result is united with the results from the recursive member, which is executed until it returns an empty set. You have to provide a condition that will ensure the recursive member query does not run indefinitely. The easiest way to ensure termination is to use an increasing identity column in the base_table.
Using recursive CTEs to generate a list of values
Let's examine a sample database schema, which consists of tables Office, OfficeCounty and County. Each company has multiple offices. Each office serves one or more counties. Each county is served by no more than one office. The following normalized database model is employed:
The following code will generate the sample schema shown above:
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[County]( [CountyId] [smallint] IDENTITY(1,1) NOT NULL, [CountyName] [varchar](60) NOT NULL, [StateAbbr] [varchar](2) NOT NULL, CONSTRAINT [PK_County_1] PRIMARY KEY CLUSTERED ( [CountyId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]; GO CREATE TABLE [dbo].[Office]( [OfficeID] [int] IDENTITY(1,1) NOT NULL, [OfficeName] [varchar](100) NOT NULL, [Address1] [varchar](100) NULL, [Address2] [varchar](100) NULL, CONSTRAINT [PK_Office_2] PRIMARY KEY CLUSTERED ( [OfficeID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]; GO CREATE TABLE [dbo].[OfficeCounty]( [OfficeID] [int] NOT NULL, [CountyID] [smallint] NOT NULL ) ON [PRIMARY] GO ALTER TABLE [dbo].[OfficeCounty] WITH CHECK ADD CONSTRAINT [FK_County] FOREIGN KEY([CountyID]) REFERENCES [dbo].[County] ([CountyId]) GO ALTER TABLE [dbo].[OfficeCounty] CHECK CONSTRAINT [FK_County] GO ALTER TABLE [dbo].[OfficeCounty] WITH CHECK ADD CONSTRAINT [FK_Office] FOREIGN KEY([OfficeID]) REFERENCES [dbo].[Office] ([OfficeID]) GO ALTER TABLE [dbo].[OfficeCounty] CHECK CONSTRAINT [FK_Office] GO SET ANSI_PADDING OFF GO
Let's insert some data into the tables. We will create two offices called 'Draper, UT' and 'Orange Park, FL'. We will also insert 12 counties in the states of Florida('FL') and Utah('UT'). State 'FL' has 5 counties; 'UT' has 7 counties. Finally, in the OfficeCounty table, we will associate the 'Orange Park, FL' office with the 5 counties in Florida and the 'Draper, UT' office with the seven counties in Utah.
--Generate Sample data /****** Object: Table [dbo].[Office] Script Date: 06/17/2013 16:02:13 ******/SET IDENTITY_INSERT [dbo].[Office] ON INSERT [dbo].[Office] ([OfficeID], [OfficeName], [Address1], [Address2]) VALUES (151978, N'Draper, UT', N'138 street 122', NULL) INSERT [dbo].[Office] ([OfficeID], [OfficeName], [Address1], [Address2]) VALUES (151981, N'Orange Park, FL', N'2 Wells Rd Ste', NULL) SET IDENTITY_INSERT [dbo].[Office] OFF /****** Object: Table [dbo].[County] Script Date: 06/17/2013 16:02:13 ******/SET IDENTITY_INSERT [dbo].[County] ON INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (325, N'Baker County, FL', N'FL') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (333, N'Clay County, FL', N'FL') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (338, N'Duval County, FL', N'FL') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (368, N'Nassau County, FL', N'FL') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (378, N'St. Johns County, FL', N'FL') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2786, N'Davis County, UT', N'UT') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2798, N'Salt Lake County, UT', N'UT') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2802, N'Summit County, UT', N'UT') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2803, N'Tooele County, UT', N'UT') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2805, N'Utah County, UT', N'UT') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2806, N'Wasatch County, UT', N'UT') INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2809, N'Weber County, UT', N'UT') SET IDENTITY_INSERT [dbo].[County] OFF /****** Object: Table [dbo].[OfficeCounty] Script Date: 06/17/2013 16:02:13 ******/INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2786) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2798) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2802) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2803) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2805) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2806) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2809) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 325) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 333) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 338) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 368) INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 378)
Problem definition
We want to create a report that displays offices and comma-delimited lists of counties served by each office. We will walk through the solution in the steps below.
Step 1 - Creating the source CTE
The following script returns a list of values for OfficeID, CountyName and StateAbbr. Two additional columns are added for the purpose of the recursive CTE:
- rank_county_name: a unique sequential integer starting with 1 for the CountyName and the OfficeID/StateAbbr.
- max_rank_county_name: the total number of CountyName values for that OfficeID/StateAbbr. It is also equal to the maximum rank_county_name for the OfficeID.
SELECT o.OfficeID, c.CountyName, c.StateAbbr, ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name, COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name FROM dbo.Office o INNER JOIN dbo.OfficeCounty oc ON o.OfficeID=oc.OfficeID INNER JOIN dbo.County c ON c.CountyId=oc.CountyID
The result from the query is as follows:
OfficeID | CountyName | StateAbbr | rank_county_name | max_rank_county_name |
---|---|---|---|---|
151978 | Davis County, UT | UT | 1 | 7 |
151978 | Salt Lake County, UT | UT | 2 | 7 |
151978 | Summit County, UT | UT | 3 | 7 |
151978 | Tooele County, UT | UT | 4 | 7 |
151978 | Utah County, UT | UT | 5 | 7 |
151978 | Wasatch County, UT | UT | 6 | 7 |
151978 | Weber County, UT | UT | 7 | 7 |
151981 | Baker County, FL | FL | 1 | 5 |
151981 | Clay County, FL | FL | 2 | 5 |
151981 | Duval County, FL | FL | 3 | 5 |
151981 | Nassau County, FL | FL | 4 | 5 |
151981 | St. Johns County, FL | FL | 5 | 5 |
For simplicity, we will put the query above in a CTE:
WITH office_county_state AS (
SELECT o.OfficeID, c.CountyName, c.StateAbbr, ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name, COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name FROM dbo.Office o INNER JOIN dbo.OfficeCounty oc ON o.OfficeID=oc.OfficeID INNER JOIN dbo.County c ON c.CountyId=oc.CountyID )
The CTE above provides a convenient way to view and query only the list of OfficeId, CountyName and StateAbbr that we will need for the final resultset.
Step 2- Creating a recursive CTE
The basic idea is to start with a list of one CountyName and then consecutively append each remaining CountyName, separated by a comma. There are two questions that arise in regards to creating the recursive CTE:
- how to select the anchor query?
- how to ensure that the CTE terminates?
The anchor query should return only the first CountyName for every OfficeID/StateAbbr from the source CTE and will be invoked once. That is why we should use the predicate
WHERE rank_county_name=1.
From the sample data, the anchor query will return only two rows:
OfficeID | countynames | Stateabbr | rank_county_name | max_rank_county_name |
---|---|---|---|---|
151978 | Davis County, UT | UT | 1 | 7 |
151981 | Baker County, FL | FL | 1 | 5 |
Then the recursive member will be invoked multiple times and in each invocation it will join the previous resultset with the source CTE. Every time the recursive member is invoked, it will add one additional row to the resultset with rank_county_name greater than 1 from the rank_county_name added from the previous round, because of the predicate:
WHERE d.rank_county_name=1+recurs_CTE.rank_county_name
Eventually after the recursive member is executed several times, the row where "rank_county_name = max_rank_county_name" will be added to the resultset and the next execution of the recursive member will return an empty resultset, so the recursive CTE will stop.
The following code lists the complete code of the recursive CTE:
-- source CTE: WITH office_county_state AS ( SELECT o.OfficeID, c.CountyName, c.StateAbbr, ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name, COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name FROM dbo.Office o INNER JOIN dbo.OfficeCounty oc ON o.OfficeID=oc.OfficeID INNER JOIN dbo.County c ON c.CountyId=oc.CountyID ) --recursive CTE: ,recurs_CTE AS ( --anchor query: SELECT OfficeID, StateAbbr, rank_county_name, max_rank_county_name, CAST(CountyName AS VARCHAR(8000)) AS countynames FROM office_county_state WHERE rank_county_name=1 UNION ALL --recursive member: SELECT d.OfficeID, d.StateAbbr, d.rank_county_name, d.max_rank_county_name, CAST(recurs_CTE.countynames + ',' + d.CountyName AS VARCHAR(8000)) FROM recurs_CTE INNER JOIN office_county_state d ON d.OfficeID=recurs_CTE.OfficeID AND d.StateAbbr=recurs_CTE.StateAbbr WHERE d.rank_county_name=1+recurs_CTE.rank_county_name --ensures termination )
If we examine the resultset for one of the offices in state 'FL', we can see how the CountyName list expanded by every iteration of the recursive CTE:
OfficeID | StateAbbr | rank_county_name | max_rank_county_name | countyNames |
---|---|---|---|---|
151981 | FL | 1 | 5 | Baker County, FL |
151981 | FL | 2 | 5 | Baker County, FL,Clay County, FL |
151981 | FL | 3 | 5 | Baker County, FL,Clay County, FL,Duval County, FL |
151981 | FL | 4 | 5 | Baker County, FL,Clay County, FL,Duval County, FL,Nassau County, FL |
151981 | FL | 5 | 5 | Baker County, FL,Clay County, FL,Duval County, FL,Nassau County, FL,St. Johns County, FL |
As you can see, after the row with "rank_county_name = 5" has been added, the recursive member will return an empty resultset and the recursive CTE will terminate.
Because we use a UNION ALL, the values and the list of values must be explicitly converted to the same data type in the recursive CTE.
Step 3 - Generating the final resultset
The remaining task is to filter the rows, which contain the complete list of CountyName values. These are the rows that have "rank_county_name = max_rank_county_name":
--source CTE: WITH office_county_state AS ( SELECT o.OfficeID, c.CountyName, c.StateAbbr, ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name, COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name FROM dbo.Office o INNER JOIN dbo.OfficeCounty oc ON o.OfficeID=oc.OfficeID INNER JOIN dbo.County c ON c.CountyId=oc.CountyID ) --recursive CTE: ,recurs_CTE AS ( --anchor query: SELECT OfficeID, StateAbbr, rank_county_name, max_rank_county_name, CAST(CountyName AS VARCHAR(8000)) AS countynames FROM office_county_state WHERE rank_county_name=1 UNION ALL --recursive member: SELECT d.officeID, d.StateAbbr, d.rank_county_name, d.max_rank_county_name, CAST(recurs_CTE.countynames + ',' + d.CountyName AS VARCHAR(8000)) FROM recurs_CTE INNER JOIN office_county_state d ON d.OfficeID=recurs_CTE.OfficeID AND d.StateAbbr=recurs_CTE.StateAbbr WHERE d.rank_county_name=1+recurs_CTE.rank_county_name ) SELECT recurs_CTE.OfficeID, recurs_CTE.StateAbbr, recurs_CTE.countynames FROM recurs_CTE -- we filter the resultset from the recursive CTE to get the longest list of countyName WHERE recurs_CTE.rank_county_name=recurs_CTE.max_rank_county_name
We added the SELECT statement that queries against the recursive CTE but return only the rows that contain the complete lists.
Considerations
The default maximum recursion level for SQL Server is 100. Therefore the recursive CTE can be invoked only 100 times by default. If you expect to have a list with more than 100 values, you should change the maximum recursion level using the MAXRECURSION OPTION. The maximum list length allowed is 32,767.
In case of tables with hundreds of rows, the running time of the recursive CTE can reach several minutes. In order to decrease the query execution time significantly, it is beneficial to materialize the office_county_state CTE into a temporary table.
Conclusion
The article shows an easy, fast and flexible approach to generating lists of values from multiple rows based on recursive CTEs. In the scenario of a database with office locations, we show how to generate lists of county names for each office and state, where OfficeID is the unique row identifier, CountyName is the list value and StateAbbr is the grouping value. The presented implementation be tailored to any database schema that has a unique row id, a list value and a grouping value(s). The implementation is flexible because the list separator and the data type of the list can be easily changed and the string concatenation can be substituted with another operator such as sumation. The approach presented in this article achieves better execution time compared to using a CURSOR with fewer lines of code.
Reference
[1]"Training Kit (Exam 70-461): Querying Microsoft SQL Server 2012"by Dejan Sarka