Background
A Common Table Expression (CTE) in T-SQL is like a versatile tool in the toolbox of a SQL developer. It's a way to create a temporary result set that you can reference within a larger SQL statement. Think of it as a query-within-a-query that simplifies complex tasks, improves query readability, and opens up powerful possibilities in SQL programming.
Imagine you're working with a vast database, trying to extract specific information from it. Instead of writing one monolithic query that's hard to decipher, you can break it down into manageable parts using CTEs. Each CTE acts as a stepping stone, allowing you to progressively build your query, making it easier to understand and maintain.
CTEs are incredibly useful for tasks like hierarchical data retrieval, recursive queries, data transformation, and more. They provide a structured and organized way to tackle complex SQL challenges. In this guide, we'll explore CTEs in detail, from their syntax to practical examples, giving you the tools to unlock the full potential of T-SQL in your database tasks. So, let's dive into the world of Common Table Expressions and discover how they can simplify and elevate your SQL coding skills.
Common Table Expressions (CTEs) in T-SQL
During the execution of a query, you can refer to the CTE as you would any regular table. It can be used in SELECT
, INSERT
, UPDATE
, and DELETE
statements. CTEs were not originally included in the SQL language. In 1999 (SQL 3), they were included as a new feature to the SQL standard definition. They were made available in SQL Server 2005.
The example SQL queries shown in this article will be based on the table olympic_games
. The table includes the results of all Olympic games, as well as three records (gold, silver, and bronze) for each event.
Example 1: A Simple CTE
Let's build a basic query to demonstrate CTE syntax in T-SQL. Assume we want a report containing nations and the amount of gold medals each country has earned in a marathon. We can create a CTE first called gold_in_marathon
to return those rows of gold medals in men’s and women’s marathons. In the gold_in_marathon
CTE, we only need the columns city
, year
and country
.
Note that all CTEs start with the keyword WITH
, This is followed by brackets. You provide the query you want the CTE to return in brackets. The entire question is shown below. The CTE is indicated in bold type:
WITH gold_in_marathon AS ( SELECT city, year, country FROM olympic_games WHERE medal_type = 'Gold' AND sport IN ('Marathon Men', 'Marathon Women') ) SELECT country, count(*) AS gold_medals_in_marathon FROM gold_in_marathon GROUP BY country ORDER BY gold_medals_in_marathon DESC;
Let's explore Example 1 in detail, which demonstrates the use of a CTE in T-SQL to find the number of gold medals each country has earned in marathon events. We will break down the query step by step.
Objective: Create a report that shows the nations and the number of gold medals they have won in marathon events.
Step 1: CTE Definition
In this part of the query:
- WITH gold_in_marathon AS: This line initiates the definition of the CTE named "gold_in_marathon."
- (SELECT city, year, country ... ): Inside the parentheses, you define the query that forms the CTE. In this case, it selects three columns: "city," "year," and "country."
- FROM olympic_games: This specifies the source table for the query, which is "olympic_games."
- WHERE medal_type = 'Gold' AND sport IN ('Marathon Men', 'Marathon Women'): This part of the query filters the results to include only gold medals in marathon events (both men's and women's).
Step 2: Main Query
After defining the CTE, you can use it in the main query. In this part of the query:
- SELECT country, COUNT(*) AS gold_medals_in_marathon: You select the "country" column and use the COUNT(*) function to count the number of rows in the CTE, representing the number of gold medals for each country in marathon events. The result of this count is given an alias "gold_medals_in_marathon" for better readability.
- FROM gold_in_marathon: Here, you specify the CTE, gold_in_marathon, as the source of the data.
- GROUP BY country: You group the results by the "country" column. This groups all rows with the same country together.
- ORDER BY gold_medals_in_marathon DESC;: Finally, you order the results in descending order based on the count of gold medals in marathon events.
The query starts by creating a CTE named "gold_in_marathon" that selects specific columns from the "olympic_games" table, filtering for gold medals in marathon events. Then, in the main query, it counts the gold medals for each country using the CTE and presents the results in descending order. The CTE acts as a temporary table that simplifies the query and enhances its readability.
Note that the CTE only has the columns city
, year
, and country
from the table olympic_games
. The database conducts the outer query after generating the CTE, which reads the CTE gold_in_marathon, grouping the entries by country
nd using the COUNT()
Use this tool to find out how many gold medals each country has. The query yielded the following results:
Example 2: Using CTEs with Renamed (Aliased) Columns in T-SQL
Let's break down and explain Example 2 in detail. This example demonstrates how to use a Common Table Expression (CTE) in T-SQL to calculate a country's rating for marathon results based on medal points. We'll explicitly rename a column in the CTE to enhance readability.
Objective: Create a report that calculates a country's rating for marathon results (both men and women) based on medal points. Each gold medal is worth three points, each silver medal is worth two points, and each bronze medal is worth one point.
Here is our CTE code:
WITH player_points AS ( SELECT country, winner_name, SUM( CASE medal_type WHEN 'Gold' THEN 3 WHEN 'Silver' THEN 2 WHEN 'Bronze' THEN 1 END ) AS player_total FROM olympic_games WHERE sport in ('Marathon Men', 'Marathon Women') GROUP BY country,winner_name ) SELECT country, SUM(player_total) AS country_points FROM player_points GROUP BY country ORDER BY country_points DESC;
Step 1: CTE Definition
In this part of the query:
- WITH player_points AS: This line initiates the definition of the CTE named "player_points."
- (SELECT country, winner_name, SUM(...): Inside the parentheses, you define the query that forms the CTE.
- The query selects the "country" and "winner_name" columns from the "olympic_games" table.
- It calculates the "player_total" using a SUM function within a CASE statement. Depending on the "medal_type," it assigns three points for gold, two points for silver, and one point for bronze medals.
- FROM olympic_games: Specifies the source table for the query, which is "olympic_games."
- WHERE sport IN ('Marathon Men', 'Marathon Women'): Filters the results to include only marathon events (both men's and women's).
Step 2: Main Query
After defining the CTE, you can use it in the main query. In this part of the query:
- SELECT country, SUM(player_total) AS country_points:
- You select the "country" column and use the SUM function to calculate the total points for each country by summing up the "player_total" column from the CTE.
- You assign this calculated total an alias "country_points" for clarity.
- FROM player_points: Specifies the CTE "player_points" as the source of the data.
- GROUP BY country: Groups the results by the "country" column, ensuring that each country's total points are aggregated.
- ORDER BY country_points DESC;: Orders the results in descending order based on the total points, so you can see which country has the highest rating.
- SELECT country, SUM(player_total) AS country_points:
This query calculates a country's rating for marathon results by assigning points to medals and summing them up. The CTE "player_points" plays a crucial role in this calculation, and we explicitly rename the calculated column as "player_total" within the CTE for better readability. The main query then aggregates the results by country and orders them to determine the highest-rated countries based on marathon medals.
One method for defining a name is to use an alias (as we shall see later) with the clause AS
. In the CTE definition, the column was named player_total
. We arrange the results by nation in the outer query to determine the total points for each country. It is worth noting that we utilise the equation SUM(player_total)
and, once again, an alias to change the column to country_points
. The query's results are as follows:
Example 3: Using Multiple CTEs In The Same T-SQL Query
Let's break down and explain Example 3 in detail. This example demonstrates how to use multiple Common Table Expressions (CTEs) in the same T-SQL query to generate a report that includes the names of Olympic athletes who have won at least one gold and one silver medal.
Objective: Create a report that lists the names of Olympic athletes who have achieved both gold and silver medals.
Here is our CTE code.
WITH gold(winner_name,gold_medals) AS ( SELECT winner_name, count(*) FROM olympic_games WHERE medal_type = 'Gold' GROUP BY winner_name ), silver(winner_name,silver_medals) AS ( SELECT winner_name, count(*) FROM olympic_games WHERE medal_type = 'Silver' GROUP BY winner_name ) SELECT g.winner_name, g.gold_medals, s.silver_medals FROM gold g JOIN silver s ON g.winner_name = s.winner_name;
Step 1: Define the First CTE (Gold Medals)
In this part of the query:
- WITH gold(winner_name, gold_medals) AS: This line initiates the definition of the first CTE named "gold."
- (SELECT winner_name, COUNT(*) ...): Inside the parentheses, you define the query for the CTE. It selects the "winner_name" column and counts the number of gold medals for each athlete using the COUNT(*) function.
- FROM olympic_games: Specifies the source table for the query, which is "olympic_games."
- WHERE medal_type = 'Gold': Filters the results to include only gold medals.
- GROUP BY winner_name: Groups the results by the "winner_name" column to count gold medals for each athlete separately.
Step 2: Define the Second CTE (Silver Medals)
In this part of the query:
- WITH silver(winner_name, silver_medals) AS: This line initiates the definition of the second CTE named "silver."
- (SELECT winner_name, COUNT(*) ...): Similar to the first CTE, this query selects the "winner_name" column and counts the number of silver medals for each athlete.
- FROM olympic_games: Specifies the source table for the query, which is still "olympic_games."
- WHERE medal_type = 'Silver': Filters the results to include only silver medals.
- GROUP BY winner_name: Groups the results by the "winner_name" column to count silver medals for each athlete separately.
Step 3: Main Query (Joining Both CTEs)
In this part of the query:
- SELECT g.winner_name, g.gold_medals, s.silver_medals: You select columns from both CTEs and rename them using aliases to clarify which CTE they originate from.
- FROM gold g: You specify "gold" as an alias for the first CTE, allowing you to reference it as if it were a table.
- JOIN silver s ON g.winner_name = s.winner_name: You join the two CTEs using the "winner_name" column as the common key. This ensures that only athletes who have both gold and silver medals are included in the query result.
This query demonstrates the use of multiple CTEs in a single T-SQL query to generate a report that lists Olympic athletes who have won both gold and silver medals. The CTEs "gold" and "silver" calculate the counts of gold and silver medals for each athlete, respectively. The main query then joins these CTEs and presents the results with appropriate column aliases for clarity.
The result is as follows:
Example 4: An SQL Server Query With a CTE Based On Another CTE
Let's dive into Example 4 and explain it in detail. This example showcases the use of two Common Table Expressions (CTEs) in a single query, with the second CTE being based on the first one. The goal is to find the top three nations in terms of the number of medals earned in a single Olympic game while avoiding duplicate entries for the same country.
Objective: Create a report that lists the top three nations with the most medals earned in a single Olympic game, avoiding duplicate entries for the same country.
Here is the CTE code.
WITH medals AS ( SELECT city, year, country, COUNT(*) AS quantity_of_medals FROM olympic_games GROUP BY city, year, country ), country_best_game AS ( SELECT country, city, year, quantity_of_medals FROM medals m1 WHERE quantity_of_medals = ( SELECT max(quantity_of_medals) FROM medals m2 WHERE m1.country = m2.country ) ) SELECT TOP 3 country, city, year, quantity_of_medals FROM country_best_game ORDER BY quantity_of_medals DESC;
Step 1: Define the First CTE (Medals)
In this part of the query:
- WITH medals AS: This line initiates the definition of the first CTE named "medals."
- (SELECT city, year, country, COUNT(*) ...): Inside the parentheses, you define the query for the CTE. It selects the "city," "year," and "country" columns and counts the total number of medals for each combination of these columns using the COUNT(*) function.
- FROM olympic_games: Specifies the source table for the query, which is "olympic_games."
- GROUP BY city, year, country: Groups the results by the "city," "year," and "country" columns to calculate the quantity of medals for each unique combination.
Step 2: Define the Second CTE (Country Best Game)
In this part of the query:
- country_best_game AS: This line initiates the definition of the second CTE named "country_best_game."
- (SELECT country, city, year, quantity_of_medals ...): Inside the parentheses, you define the query for the CTE. It selects the "country," "city," "year," and "quantity_of_medals" columns.
- FROM medals m1: Specifies that you are using data from the first CTE "medals" and aliases it as "m1."
- WHERE quantity_of_medals = (...): This part of the query identifies the rows where the "quantity_of_medals" for a country is equal to the maximum "quantity_of_medals" for that same country, effectively finding the country's best game.
Step 3: Main Query (Top Three Nations)
In this part of the query:
- SELECT TOP 3: You select the top three rows based on the specified criteria.
- country, city, year, quantity_of_medals: You select the columns from the "country_best_game" CTE to display in the final result.
- FROM country_best_game: Specifies the CTE "country_best_game" as the source of the data.
- ORDER BY quantity_of_medals DESC;: Orders the results in descending order based on the "quantity_of_medals" column, so you can see the top three nations with the most medals in a single game.
This query uses two CTEs in a sequence to calculate the quantity of medals for each country in each Olympic game and then determine the country's best game with the most medals. Finally, it lists the top three nations with the most medals in a single game, avoiding duplicate entries for the same country. The CTEs help organize and simplify the query for this analytical task.
The query's results are as follows:
Example 5: Using A CTE in a T-SQL INSERT
n T-SQL, CTEs can also be used in UPDATE
, INSERT
and DELETE
statements. As a general rule, any SQL command allowing an embedded SELECT
(for example a CREATE VIEW
statement) can support a CTE. Let’s see an example of an INSERT
using a common table expression in T-SQL.
This example illustrates how to use a Common Table Expression (CTE) in a T-SQL INSERT statement. The goal is to populate a table called "country_medals_by_game" with data from a CTE.
Objective: Insert data into the "country_medals_by_game" table using a CTE.
WITH medals AS ( SELECT city, year, country, COUNT(*) AS medals_won FROM olympic_games GROUP BY city, year, country ) INSERT INTO country_medals_by_game(city,year,country,quantity_of_medals) SELECT city, year, country, medals_won FROM medals;
In this part of the query:
- WITH medals AS: This line initiates the definition of the CTE named "medals."
- (SELECT city, year, country, COUNT(*) ...): Inside the parentheses, you define the query for the CTE. It selects the "city," "year," and "country" columns and counts the total number of medals for each combination of these columns using the COUNT(*) function.
- FROM olympic_games: Specifies the source table for the query, which is "olympic_games."
- GROUP BY city, year, country: Groups the results by the "city," "year," and "country" columns to calculate the quantity of medals for each unique combination.
Step 2: INSERT Data into Target Table
In this part of the query:
- INSERT INTO country_medals_by_game(city, year, country, quantity_of_medals): You specify the target table "country_medals_by_game" and the columns into which you want to insert data.
- SELECT city, year, country, medals_won FROM medals;: You use a SELECT statement to retrieve data from the "medals" CTE and insert it into the target table. The columns in the SELECT statement must match the columns specified in the INSERT statement.
This query demonstrates how to use a CTE to gather data from the "olympic_games" table and insert it into the "country_medals_by_game" table. The CTE "medals" calculates the quantity of medals for each unique combination of "city," "year," and "country," and the INSERT statement transfers this data into the target table. CTEs in INSERT statements can be a powerful way to manipulate and transfer data efficiently.
Example 6: Using A CTE in a SQL Server UPDATE
Let's explain Example 6 in more detail. This example demonstrates how to use a Common Table Expression (CTE) in a SQL Server UPDATE statement to calculate and update the "delta_with_previous_game" column in a table called "country_medals_by_game." The column "delta_with_previous_game" represents the difference in medals earned by a country in two consecutive Olympic games.
Objective: Update the "delta_with_previous_game" column in the "country_medals_by_game" table to represent the difference in medals earned by each country in two consecutive Olympic games.
Here is the CTE code.
WITH medals AS ( SELECT city, year, country, COUNT(*) AS medals_won FROM olympic_games GROUP BY city, year, country ) UPDATE country_medals_by_game SET delta_with_previous_game = ( SELECT COALESCE(quantity_of_medals - COALESCE(prevgame.medals_won,0),0) FROM medals prevgame WHERE country_medals_by_game.year = prevgame.year + 4 AND country_medals_by_game.country = prevgame.country );
Step 1: Define the CTE (Medals)
In this part of the query:
- WITH medals AS: This line initiates the definition of the CTE named "medals."
- (SELECT city, year, country, COUNT(*) ...): Inside the parentheses, you define the query for the CTE. It selects the "city," "year," and "country" columns and counts the total number of medals for each combination of these columns using the COUNT(*) function.
- FROM olympic_games: Specifies the source table for the query, which is "olympic_games."
- GROUP BY city, year, country: Groups the results by the "city," "year," and "country" columns to calculate the quantity of medals for each unique combination.
Step 2: UPDATE Statement
In this part of the query:
- UPDATE country_medals_by_game: You specify the target table "country_medals_by_game" that you want to update.
- SET delta_with_previous_game = (...): You use a subquery to determine the difference in medals earned between two successive Olympic games and set it in the "delta_with_previous_game" column.
- SELECT COALESCE(quantity_of_medals - COALESCE(prevgame.medals_won,0),0): In this subquery, you calculate the difference in medals earned by subtracting the previous game's medals ("medals_won" from the CTE "prevgame") from the current game's medals ("quantity_of_medals" from the CTE "medals"). The COALESCE function handles cases where there is no data for the previous game, returning 0 in such cases.
- FROM medals prevgame: You access the "medals" CTE with an alias "prevgame" to get data from the previous Olympic game.
- WHERE country_medals_by_game.year = prevgame.year + 4 AND country_medals_by_game.country = prevgame.country: This condition ensures that you are calculating the difference in medals between two consecutive Olympic games for the same country. The "+ 4" in the year condition accounts for the four-year gap between Olympic games.
This query uses a CTE named "medals" to calculate the quantity of medals for each unique combination of "city," "year," and "country." Then, it performs an UPDATE operation on the "country_medals_by_game" table, setting the "delta_with_previous_game" column based on the difference in medals earned in two consecutive Olympic games. This query allows for the computation of the medal delta efficiently and handles cases where data for the previous game is missing.
Example 7: Using CTEs for Recursive Queries in T-SQL
Let's provide a more detailed explanation of Example 7, which involves using Common Table Expressions (CTEs) for recursive queries in T-SQL to explore a data hierarchy within the "olympic_games" table. The goal is to find a list of men's marathon world records by tracing the hierarchy of records and their previous records.
Objective: Retrieve a report listing men's marathon world records, starting with the current world record and then showing the immediately prior records in a hierarchical fashion.
Assume we want to get a report with a list of men's marathon world records. We can begin by displaying the current world record, followed by the immediately prior record, and so on. To explore the hierarchy of men's marathon records, we'll need a recursive CTE. It will seem as follows:
WITH record_history (medal_id, year, winner, country, prev_record_medal_id) AS ( SELECT medal_id, year, winner_name, country, previous_record_medal FROM olympic_games WHERE sport = 'Marathon Men' AND record = 'true' UNION ALL SELECT og.medal_id, og.year, og.winner_name, og.country, og.previous_record_medal FROM olympic_games og JOIN record_history mrh ON og.medal_id = mrh.prev_record_medal_id ) SELECT * FROM record_history;
Step 1: Add Columns for Data Hierarchy
To create a data hierarchy, two additional columns are added to the "olympic_games" table:
- record: A text column set to "true" when a medal is associated with a new world record.
- previous_record_medal: A column to store the medal_id of the previous record.
Step 2: Define the Recursive CTE (record_history)
In this part of the query:
- WITH record_history (medal_id, year, winner, country, prev_record_medal_id) AS: This line initiates the definition of the recursive CTE named "record_history." It specifies the columns that will be included in the CTE.
- The CTE is divided into two parts, connected by UNION ALL:
- The first query selects the initial world record entry by filtering rows with sport = 'Marathon Men' and record = 'true'. This represents the current world record.
- The second query retrieves the previous records by joining the "olympic_games" table (aliased as "og") with the recursive CTE "record_history" (aliased as "mrh") based on the relationship between the medal_id and prev_record_medal_id.
Step 3: Retrieve the Results
In this part of the query, you select all columns from the "record_history" CTE to retrieve the results.
This query utilizes a recursive CTE named "record_history" to explore the hierarchy of men's marathon world records in the "olympic_games" table. It starts with the current world record (identified by record = 'true') and then recursively traces back to previous records using the prev_record_medal_id column. The result is a list of men's marathon world records in chronological order, showing the current record and its previous records. Recursive CTEs are useful for traversing hierarchical data structures and can be customized to display the desired hierarchy.
The results of the query are below:
There is a limit to the number of invocations permitted in a recursive CTE to prevent an infinite loop. This is set to 100 by default in SQL Server. However, you may adjust this limit by using the MAXRECURSION
argument at the conclusion of the recursive query.
Conclusion
Common Table Expressions (CTEs) in T-SQL are a powerful tool for SQL developers, offering a structured and elegant way to solve complex database challenges. Throughout this comprehensive guide, we have explored the essence of CTEs, from their basic syntax to advanced applications through seven real-world examples.
CTEs empower developers to break down intricate queries into manageable steps, enhancing code readability and maintainability. Whether it's retrieving hierarchical data, performing recursive operations, or simplifying data transformations, CTEs are invaluable in streamlining SQL tasks.
By understanding CTEs, you not only expand your SQL toolkit but also gain the ability to tackle diverse data scenarios with confidence. They provide a bridge between abstract problem-solving and practical SQL implementation, making complex queries more accessible and comprehensible.
Incorporating CTEs into your SQL repertoire is an essential skill for any database professional or enthusiast. With the knowledge and expertise gained from this guide, you are well-equipped to leverage CTEs to their fullest potential, elevating your SQL proficiency and efficiency in working with databases.