At some point as you start to create more complex SQL code that go beyond the basic Transact-SQL statements, you may find the need to constrain your query using the results of other SELECT statements. When you embed a SELECT statement within a parent Transact-SQL statement, these embedded SELECT statements are known as either a subquery, or a correlated subquery. In this level of the Beyond the Basics stairway, I will be discussing the different aspects of a subquery, in a future level I will be discussing the correlated subquery.
What is a Subquery?
A subquery is just a SELECT statement that is contained within another Transact-SQL statement. A subquery can be used anywhere an expression can be used. Many subqueries return a single column value because they are used in conjunction with comparison operators (=, !=, <, <= , >, >=) or an expression. When a subquery is not used as an expression or with a comparison operator it can return multiple values. Additionally, subqueries can even return multiple columns and values when they are used in a FROM clause or with the keyword EXISTS.
A subquery is easy to spot within a Transact-SQL statement because it will be the SELECT statement contained within parenthesis. Since a subquery is contained within a Transact-SQL statement the subquery is often referred to as the inner query. Whereas the Transact-SQL statement that contains a subquery is referred to as the outer query. Another characteristic of a subquery is it can be run independently of the outer query and will run without error, and may return a set of rows, or an empty row set.
Another form of a subquery is correlated subquery. But the correlated subquery cannot be run independently of the outer Transact SQL statement. A correlated subquery uses a column or columns from the outer query to constrain the results returned from the correlated subquery. This is enough about a correlated subquery for this article. I will be exploring correlated subqueries in a future stairway article.
Here are some other things to consider when using a subquery:
- ntext, text and image data types are not allowed to be returned from a subqueries
- The ORDER BY clause cannot be used in a subquery unless the TOP operator is used
- Views that use a subquery can’t be updated
- COMPUTE and INTO clauses cannot be used in a subquery
Sample Data for Subquery Examples
In order to demonstrate how to use a subquery I will need some test data. Rather than create my own test data, all of my examples will use the AdventureWorks2008R2 database. If you want to follow along and run my examples in your environment then you can download the AdventureWorks2008R2 database from here: http://msftdbprodsamples.codeplex.com/releases/view/93587
Examples of Subqueries that Return a Single Value
As stated above, subqueries that are used in an expression or return a value on one side of a comparison operator are required to return a single value. There are many different places in a Transact-SQL statement where a subquery is required to return a single column value, like in a selection list, WHERE clause, etc. In this section I will provide a series of examples that will demonstrate using a subquery as an expression or with a comparison operator to meet different business requirements.
Subquery in Column List
A subquery in a column list is a SELECT statement that returns a single column value that is placed within the column list of a SELECT clause. To demonstrate how to use a subquery in the selection list let’s assume we have to produce a result set from a SELECT statement that has the following business requirements:
- Return all the Sales.SalesOrderHeader records what have a OrderDate equal to “2007-02-19 00:00:00.000”
- Order the returned records by SalesOrderID
- Number each row returned where the oldest order has a RowNumber of 1, next oldest has a RowNumber of 2, etc
- The result set needs a column named TotalOrders which needs to be populated with the number of total orders that have a OrderDate that is equal to “2007-02-19 00:00:00.000”
The code to meet these requirements is shown in Listing 1.
SELECT ROW_NUMBER() OVER (ORDER BY SalesOrderID) RowNumber , (SELECT COUNT(*) FROM [Sales].[SalesOrderHeader] WHERE ModifiedDate = '2007-02-19 00:00:00.000') AS TotalOrders , * FROM [Sales].[SalesOrderHeader] WHERE OrderDate = '2007-02-19 00:00:00.000';
In this single Transact-SQL statement you see two different SELECT clauses. The subquery is the SELECT statement that is embedded in the middle of the statement in Listing 1 and it has parentheses around it. I have pulled out the subquery statement and placed it in Listing 2, in case you want to test to verify it can be run independently of the complete Transact-SQL statement.
SELECT COUNT(*) FROM [Sales].[SalesOrderHeader] WHERE OrderDate = '2007-02-19 00:00:00.000'
By having this subquery in the column list, this Transact-SQL statement in Listing1 is able to count the number of SalesOrderHeader row that have an OrderDate of “2007-02-19 00:00:00.000” and return that information along with the detailed row information about Sales.SalesOrderHeader records that have the same OrderDate value.
Example of Subquery in WHERE Clause
There are times you want to drive the WHERE clause condition based on the results of a SELECT statement. When you a SELECT statement in WHERE clause this SELECT statement is really a subquery. To demonstrate using a subquery in a WHERE clause, suppose you needed to display the Sales.SalesOrderDetail records that contain the purchase of extra-large long-sleeve logo jerseys. The code in Listing 3 meets my display requirements by using a subquery.
SELECT * FROM [Sales].[SalesOrderDetail] WHERE ProductID = (SELECT ProductID FROM [Production].[Product] WHERE Name = 'Long-Sleeve Logo Jersey, XL');
The subquery in Listing 3 is on the right side of the WHERE condition. This subquery identifies a ProductID for a Production.Product record where the Name of the product is “Long-Sleeve Logo Jersey, XL’. This subquery allowed me to find all the Sales.SalesOrderDetail records that have a ProductID that is associated with the product name of “Long-Sleeve Logo Jersey, XL”.
Example Using a Subquery to Control the TOP Clause
The number of rows returned using the TOP clause can be controlled by an expression. The code in Listing 5 identifies the number of Sales.SalesOrderDetail rows that should be returned based on a subquery in the TOP clause.
SELECT TOP (SELECT TOP 1 OrderQty FROM [Sales].[SalesOrderDetail] ORDER BY ModifiedDate) * FROM [Sales].[SalesOrderDetail] WHERE ProductID = 716;
The code in Listing 4 uses the OrderQty value returned from the subquery to identity the value that will be used in the TOP clause. By using a subquery to control the number of rows that TOP clause returns, allows you to build a subquery that will dynamically identify the number of rows returned from your query at runtime.
Example of Subquery in the HAVING Clause
In order to demonstrate using a subquery in the HAVING clause, suppose you have the following business requirement:
Produce a result set that contains the Sales.SalesOrderHeader.OrderDate and the number of orders for each date, where the number of orders exceeds the number of orders taken on ’2006-05-01’.
In order to meet this requirement I have developed the query in Listing 6 that uses a subquery in the HAVING clause.
SELECT count(*), OrderDate FROM [Sales].[SalesOrderHeader] GROUP BY OrderDate HAVING count(*) > (SELECT count(*) FROM [Sales].[SalesOrderHeader] WHERE OrderDate = '2006-05-01 00:00:00.000');
The code in Listing 5 has the subquery on the right side of the HAVING clause and uses the COUNT function in my subquery to determine the number of orders placed on ‘2006-05-01’.
Example of using a Subquery in a Function Call
To demonstrate using a subquery in a function call, suppose that you have the requirement to display the number of days between the OrderDate and the maximum OrderDate for each Sales.SalesOrderHeader records. The code in Listing 6 meets this requirement.
SELECT SalesOrderID , OrderDate ,DATEDIFF ( dd,OrderDate ,(SELECT MAX(OrderDate) FROM [Sales].[SalesOrderHeader]) ) AS DaysBetweenOrders ,(SELECT MAX(OrderDate) FROM [Sales].[SalesOrderHeader]) AS MaxOrderDate FROM [Sales].[SalesOrderHeader];
The code in Listing 6 has two different subqueries. Both subqueries return the max OrderDate in the Sales.SalesOrderHeader table. But the first subquery is used to pass a date to the second parameter of the DATEDIFF function.
Examples of Subqueries that Return Multiple Values
All my examples so far have contained subqueries that returned only a single value in a single column. Not all subqueries have that requirement. The next couple of examples will use subqueries that return multiple values and/or multiple columns.
Example of Subquery in the FROM clause
In the FROM clause you normally identify a table or set of tables that your Transact-SQL statement will operate against. Each table provides a set of records that your query will use to determine the final result set for your query. A subquery can be thought of as a query that returns a set of records, and therefore it can be used in a FROM clause just like a table. The query in Listing 7 shows how I used a subquery in a FROM clause. When a subquery is used in the FROM clause the result set produced from the subquery is commonly called a derived table.
SELECT SalesOrderID FROM (SELECT TOP 10 SalesOrderID FROM [Sales].[SalesOrderDetail] WHERE ProductID = 716 ORDER BY ModifiedDate DESC) AS Last10SalesOrders;
- The code in Listing 7 used the subquery in the FROM clause to create a table alias, named Last10SalesOrders. My subquery returned the last 10 Sales.alesOrderDetail records that contained a ProductID of 716.
- My code in Listing 7 is a very trivial example of how to use a subquery in the FROM clause. By using a subquery in a FROM clause you could easily construct more complex FROM syntax that joins the results of a subquery with other tables, or additional subqueries, like I have done in Listing 8.
SELECT DISTINCT OrderDate FROM (SELECT TOP 10 SalesOrderID FROM [Sales].[SalesOrderDetail] WHERE ProductID = 716 ORDER BY ModifiedDate DESC) AS Last10SalesOrders JOIN [Sales].[SalesOrderHeader] AS SalesOrderHeader ON Last10SalesOrders.SalesOrderID = SalesOrderHeader.SalesOrderID ORDER BY OrderDate
In Listing 8 I took the subquery/derived table I created in Listing 7 and joined it with the SalesOrderHeader table. By doing this I could determine the distinct OrderDate’s for the last 10 times people ordered ProductID = 716.
Example of using a Subquery with the IN Keyword
Another place where you can write a subquery that returns multiple values for a column is when your subquery produces a record set that is used with the IN keyword. The code in Listing 9 demonstrates how to pass values to the IN keyword using a subquery.
SELECT * FROM [Sales].[SalesOrderDetail] WHERE ProductID IN (SELECT ProductID FROM [Production].[Product] WHERE Name like '%XL%');
The code in Listing 9 uses a subquery to return different values for ProductID from the Production.Product table that have a name that contains the characters “XL”. These ProductID values returned from the subquery are then used in the IN keyword to constrain which rows are returned from the Sales.SalesOrderDetail table.
Example of using a Subquery in a Statement that Modifies Data
All of my examples so far have been demonstrating how to use a subquery within different parts of a SELECT statement. A subquery can be also be used within an INSERT, UPDATE or DELETE statement as well. The code in Listing 10 shows how to use a subquery in an INSERT statement.
DECLARE @SQTable TABLE ( OrderID int, OrderDate datetime, TotalDue money, MaxOrderDate datetime); -- INSERT with SubQuery INSERT INTO @SQTable SELECT SalesOrderID, OrderDate, TotalDue, (SELECT MAX(OrderDate) FROM [Sales].[SalesOrderHeader]) FROM [Sales].[SalesOrderHeader] WHERE CustomerID = 29614; -- Display Records SELECT * FROM @SQtable;
In my code in Listing 10, I used a subquery to calculate the value to be inserted into the column MaxOrderDate. This is only a single example of how to use a subquery in an INSERT statement. Keep in mind a subquery can be also be used within an UPDATE and/or DELETE statement as well.
Performance Considerations between Subqueries and JOIN
If you have read the “Subquery Fundamentals” document produced by Microsoft (http://technet.microsoft.com/en-us/library/ms189575(v=sql.105).aspx) then you might have run across this statement regarding performance of statements that contain a subquery:
“In Transact-SQL, there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not.”
To compare the performance of a query using subquery with an equivalent query that doesn’t use a subquery I’m going rewrite my subquery in Listing 3 to use a JOIN operation. Listing 11 shows my re-written JOIN query that is equivalent to my query in Listing 3.
SELECT SOD.* FROM [Sales].[SalesOrderDetail] AS SOD INNER JOIN [Production].[Product] AS P ON SOD.ProductID = P.ProductID WHERE P.Name = 'Long-Sleeve Logo Jersey, XL';
To compare the performance of the query in Listing 3 that uses a subquery and the query in Listing 11 that uses a JOIN I will be running both queries using the code in Listing 12.
SET STATISTICS IO ON; SET STATISTICS TIME ON; -- Listing 3 query SELECT * FROM [Sales].[SalesOrderDetail] WHERE ProductID = (SELECT ProductID FROM Production.Product WHERE Name = 'Long-Sleeve Logo Jersey, XL'); -- Listing 11 query SELECT SOD.* FROM [Sales].[SalesOrderDetail] AS SOD INNER JOIN [Production].[Product] AS P ON SOD.ProductID = P.ProductID WHERE P.Name = 'Long-Sleeve Logo Jersey, XL';
Listing 12: Code to test performance of Listing 3 and Listing 4
After running the code in listing 12 I reviewed the messages produced by the “SET STATISTICS” statements. By reviewing the statistics I found that both queries had 3,309 logical reads against the SalesOrderDetail table, and 2 logical reads against the Product table, and each used 31 ms of CPU. Additionally I reviewed the execution plan that SQL Server created for both of these queries. I found that SQL Server produced the same execution plan for both. Hence using a subquery or a JOIN query for my situation produced equivalent performance, just as documented by Microsoft.
Summary
A subquery is a SELECT statement embedded with another Transact-SQL statement. The subquery can be run independently of the outer query and therefore is sometimes referred to as an independent query. Remember that anytime you have a subquery in place of an expression, or it is used with a comparison operator, that it can only return a single column and value. Often a subquery can be rewritten using JOIN logic. Subqueries are a powerful tool to help you build your more complex Transact-SQL statements to meet your business requirements.
Question and Answer
In this section you can review how well you have comprehended using the subquery concept by answer the following questions.
Question 1:
Complete this sentence “A subquery is a SELECT statement within another Transact-SQL statement that _____________________”.
- can’t not be run independently of the complete query.
- references columns from the outer query.
- when run independently of the outer query it will return results.
Question 2:
When does a subquery require only a single column and value to be returned (select all that apply)?
- When the subquery is used in the FROM clause
- When the subquery is used in the IN clause
- When the subquery is used in an expression
- When the subquery is used with a comparison operator
Question 3:
A Transact-SQL statement that uses a subquery in the WHERE clause will always perform slower than an equivalent query that doesn’t contain a subquery (True or False)?
- True
- False
Answers:
Question 1:
The correct answer is c. A subquery can run independently of the outer query and it will return results. It does not require any columns from the outer query, and if it did have columns from the outer query it would be known as a correlated subquery.
Question 2:
The correct answer is c and d. A subquery needs to return one column value when it is used as an expression, or in a comparison operation. When a subquery is used with the IN keyword it can return a single or multiple values for a column. If a subquery is used in a FROM clause it can return just one column and one value, but it can also return multiple columns and values.
Question 3:
The correct answer is false. The SQL Server optimizer is very smart and will most likely calculate the same execution plan for two equivalent queries. If the execution plan for a query that contains a subquery and the one that is equivalent without a subquery both end up with the same execution plan then both queries will have the same performance.