Recursive Queries
Having seen the basic syntax of a CTE and some its use cases in my last article, let us turn to the more interesting use: Recursive Queries. A recursive query is one in which a CTE references itself. What is the use of a recursive query? Well, all of us SQL Server 2000 users have always wanted to build hierarchical data. The most classical example is when you are given an organization structure and you wanted to represent the same in a relational database and run queries against it.
In SQL Server 2000, there was no inherent way to perform hierarchical queries and people had different implementations. You can get dozens of them if you search GOOGLE. The SQL Server 2000 Books Online also had a topic called Expanding Hierarchies where a sample implementation was provided. With the advent of recursive queries however, we now have a standard implementation for working with hierarchical data and querying against it.
When you are using a recursive CTE, the syntax of the CTE changes slightly as shown:
WITH cte_name (optional column list) AS ( Cte_query_1 UNION ALL Cte_query_2 ) Statement that uses the above CTE OPTION (MAXRECURSION n)
First of all, note that a recursive query is made up of 2 queries joined by a UNION ALL. The first query is called the anchor member and it forms the basic result set. The second query is called the recursive member and this is where the CTE references itself. The recursive member forms the next part of the result set using the anchor member as input. The recursive member is called repeatedly until no more rows are returned (called the exit condition). Here are some quick rules about the recursive member.
- Only UNION ALL is allowed between the anchor and the recursive member
- Only 1 CTE reference per recursive member is allowed
When I first read about CTEs, my thought was: Hey, this is similar to how I would write a recursive factorial program! You could not be far away from the truth and in fact, it makes a lot of sense to understand CTEs from the point of how we write recursive functions in programming languages. Here is an implementation of a recursive factorial program using VB.NET.
Module Module1 Sub Main() Dim result As Integer Dim number As Integer number = 5 result = number result = number * Factorial(number - 1) Console.WriteLine("{0}", result) Console.ReadLine() End Sub Private Function Factorial(ByVal number As Integer) As Integer Dim result As Integer If (number = 0) Then Return(1) End If result = number * Factorial(number - 1) Return(result) End Function End Module
Let’s dissect the above program and equate it to terms that we introduced in the recursive CTE definition.
- At the start of the program, we assign the input value to “result”. This is the anchor member and establishes the starting point.
- After this, we call the “Factorial” function in a recursive fashion by passing in values decremented by 1 by each call. This is similar to the recursive member starting off from the anchor.
- In the function, we have a check to see if the incoming parameter is 0 and if so, we return a value of 1. This indicates that we have reached the end of the recursion and the condition is similar to the exit condition.
- When the exit condition is reached, the stack is unwound and the values are assigned to the “result” variable and finally returned where it is combined with the anchor and the final result (120 in our example) is displayed.
Ok, having seen the basics of a recursive query, let’s turn to some real-world problems with respect to that. For the sake of understanding how recursive queries work, let us consider the case of an organizational hierarchy and the different types of queries that we want to fire against it.
Here is the sample organizational chart.
- Chief Executive Officer
- Senior Director – Product Development
- Product Development Manager
- Project Lead
- Developers
- QA Lead
- Testing Team
- Documentation Lead
- Technical Writers
- Senior Director – Finance
- Accountants
- Senior Director – Human Resources
- HR Professionals
Note that we are representing only the structure and not the people playing the roles. For example, there may be many project leads under a single product development manager, but we are interested in only the structure of the organization and not how many people of each type are present.
Let us now create the table to hold this structure and then populate it with data. Here are the sample scripts that I’ve used.
IF (OBJECT_ID ('dbo.SampleOrg', 'U') IS NOT NULL) DROP TABLE dbo.SampleOrg GO CREATE TABLE dbo.SampleOrg ( LevelIDINT NOT NULL PRIMARY KEY, PositionNVARCHAR(50) NOT NULL, ReportingLevelIDINT REFERENCES dbo.SampleOrg (LevelID) ) GO -- Insert some sample data into the table based on the structure -- shown above INSERT INTO dbo.SampleOrg SELECT 1, 'Chief Executive Officer', NULL INSERT INTO dbo.SampleOrg SELECT 2, 'Senior Director - Development', 1 INSERT INTO dbo.SampleOrg SELECT 3, 'Senior Director - Finance', 1 INSERT INTO dbo.SampleOrg SELECT 4, 'Senior Director - Human Resources', 1 INSERT INTO dbo.SampleOrg SELECT 5, 'Product Development Manager', 2 INSERT INTO dbo.SampleOrg SELECT 6, 'Project Lead', 5 INSERT INTO dbo.SampleOrg SELECT 7, 'QA Lead', 5 INSERT INTO dbo.SampleOrg SELECT 8, 'Documentation Lead', 5 INSERT INTO dbo.SampleOrg SELECT 9, 'Developers', 6 INSERT INTO dbo.SampleOrg SELECT 10, 'Testers', 7 INSERT INTO dbo.SampleOrg SELECT 11, 'Writers', 8 INSERT INTO dbo.SampleOrg SELECT 12, 'Accountants', 3 INSERT INTO dbo.SampleOrg SELECT 13, 'HR Professionals', 4 GO
Ok. Now that we have the sample data in place, let us try some recursive queries to see how they work.
Sample 1: Show the levels that directly report to the Product Development Manager
Before we dissect the logic used to achieve this requirement, let us see the query itself. Here is the snippet that produces the above requirement based on the organization chart table created earlier.
WITH SampleOrgChart (Level, Position, ReportingLevel, OrgLevel, SortKey) AS ( -- Create the anchor query. This establishes the starting -- point SELECT a.LevelID, a.Position, a.ReportingLevelID, 0, CAST (a.LevelID AS VARBINARY(900)) FROM dbo.SampleOrg a WHERE a.Position = 'Product Development Manager' UNION ALL -- Create the recursive query. This query will be executed -- until it returns no more rows SELECT a.LevelID, a.Position, a.ReportingLevelID, b.OrgLevel+1, CAST (b.SortKey + CAST (a.LevelID AS BINARY(4)) AS VARBINARY(900)) FROM dbo.SampleOrg a INNER JOIN SampleOrgChart b ON a.ReportingLevelID = b.Level WHERE b.OrgLevel < 1 ) SELECT * FROM SampleOrgChart ORDER BY SortKey
Let’s understand what this query does:
- First, we create the anchor member as the record which is for the ProductDevelopment Manager. As part of this query, we create two pseudo columns. One for indicating the level (called OrgLevel) and for sorting the records in the right fashion (called SortKey). The sort key for us is the primary key of the table converted to a binary column.
- After the anchor query, we now use this as the input and form the recursive query. Note that the recursive query increments the OrgLevel column and also builds the SortKey column.
- Since we want only the people who directly report to the product development manager, we specify the condition OrgLevel < 1. What happens if we omit this condition? That is the next sample…
Sample 2: Show all the levels that report to the Product Development Manager
Here is the query that provides all the levels that reports to the product development manager.
WITH SampleOrgChart (Level, Position, ReportingLevel, OrgLevel, SortKey) AS ( -- Create the anchor query. This establishes the starting -- point SELECT a.LevelID, a.Position, a.ReportingLevelID, 0, CAST(a.LevelID AS VARBINARY(900)) FROM dbo.SampleOrg a WHERE a.Position = 'Product Development Manager' UNION ALL -- Create the recursive query. This query will be executed -- until it returns no more rows SELECT a.LevelID, a.Position, a.ReportingLevelID, b.OrgLevel+1, CAST(b.SortKey + CAST (a.LevelID AS BINARY(4)) AS VARBINARY(900)) FROM dbo.SampleOrg a INNER JOIN SampleOrgChart b ON a.ReportingLevelID = b.Level ) SELECT * FROM SampleOrgChart ORDER BY SortKey
Note that by removing the WHERE clause in the recursive query, we keep going down the hierarchy.
OK, by now, you will have a fair understanding of the power that CTEs and Recursive Queries put in your hands. I can keep writing about many situations possible, but there is a space and word constraint as far as a technical article goes!
As an exercise to the readers, see if you can write a query that displays the organization chart (basically all the above examples, display the records, but not like a chart). That should keep you going and enjoying this feature.