What is a Common Table Expression (CTE) and why use them?
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query. They permit you to write queries more easily, and to write queries that could not otherwise be expressed.
Common table expressions are useful or may be necessary if a query involves multiple aggregate functions. This provides a convenient means to temporarily store sets of values.
The following example demonstrates why you would want to use a Common Table Expression. Consider the problem:
Determine which department has the most number of employees.
This is a case where multiple aggregate functions are needed. The employee table in the sample database lists all the employees in a fictional company and specifies in which department each works. The following query lists the department ID codes and the total number of employees in each department.
SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id
This query can be used to extract the department with the most employees as follows:
SELECT dept_id, n FROM ( SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id ) AS a WHERE a.n = ( SELECT max(n) FROM ( SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id ) AS b )
While this statement provides the correct result, it has some disadvantages.
- The repeated subquery makes this statement clumsy.
- This statement provides no clear link between the subqueries.
One way around these problems is to create a view, then use it to re-express the query. This approach avoids the problems mentioned above.
CREATE VIEW CountEmployees(dept_id, n) AS SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id; SELECT dept_id, n FROM CountEmployees WHERE n = ( SELECT max(n) FROM CountEmployees );
The disadvantage of this approach is that some overhead is required, as the engine must update the system tables when creating the view. If the view will be used frequently, this approach is reasonable. However, in cases where the view is used only once within a particular SELECT statement the preferred solution is to use a Common Table Expression.
Common Table Expression Syntax
Common table expressions are defined using the WITH clause, which precedes the SELECT keyword in a SELECT statement. The content of the clause defines one or more temporary views that may then be referenced elsewhere in the statement.
The syntax of this clause mimics that of the CREATE VIEW statement. A CTE is made up of:
- An expression name representing the CTE.
- Optional column list.
- A query defining the CTE.
After a CTE is defined, it can be referenced like a table or view can in a SELECT, INSERT, UPDATE, or DELETE statement. A CTE can also be used in a CREATE VIEW statement as part of its defining SELECT statement.
The basic syntax structure for a CTE is:
WITH CTE_name1 [ ( column_name [,...n] ) AS (query defining the CTE1) , CTE_name2 [ ( column_name [,...n] ) AS (query defining the CTE2)
- A single WITH clause may define more than one common table expression. These definitions must be separated by commas.
- The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.
The statement to run the CTE is:
SELECT <column_list> FROM CTE_name
Using common table expressions
Continuing with our example… Using common table expressions, you can express the previous query as follows.
WITH CountEmployees(dept_id, n) AS ( SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id ) SELECT dept_id, n FROM CountEmployees WHERE n = ( SELECT max(n) FROM CountEmployees )
Multiple correlation names
Just as when using tables, you can give different correlation names to multiple instances of a common table expression. Doing so permits you to join a common table expression to itself.
For example, the query below produces pairs of departments that have the same number of employees.
WITH CountEmployees(dept_id, n) AS ( SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id ) SELECT a.dept_id, a.n, b.dept_id, b.n FROM CountEmployees AS a JOIN CountEmployees AS b ON a.n = b.n AND a.dept_id < b.dept_id
Multiple table expressions
A single WITH clause may define more than one common table expression. These definitions must be separated by commas. The following example lists the department that has the smallest payroll and the department that has the largest number of employees.
WITH CountEmployees(dept_id, n) AS ( SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id ), DeptPayroll( dept_id, amt ) AS ( SELECT dept_id, sum(salary) AS amt FROM employee GROUP BY dept_id ) SELECT count.dept_id, count.n, pay.amt FROM CountEmployees AS count JOIN DeptPayroll AS pay ON count.dept_id = pay.dept_id WHERE count.n = ( SELECT max(n) FROM CountEmployees ) OR pay.amt = ( SELECT min(amt) FROM DeptPayroll )
Conclusion
As you can see CTEs can be very useful. I have found them particularly handy when cleaning up demoralized tables. Like imported spread sheets. I can create a CTE distinct on columns that uniquely define rows. Then by joining to it I can create a result set including the 10’s of columns that were requested. This way I avoid using a GROUP BY and having to create functions to get the columns not in the group. This task would be near impossible using DISTINCT.
References
WITH common_table_expression (Transact-SQL)
http://msdn2.microsoft.com/en-us/library/ms175972.aspx
Using Common Table Expressions
http://msdn2.microsoft.com/en-us/library/ms190766.aspx
About common table expressions
http://www.ianywhere.com/developer/product_manuals/sqlanywhere/0901/en/html/dbugen9/00000332.htm