The PIVOT operator was added to the Transact-SQL language in SQL Server 2005. It allows you to turn your row data on its side, so it can be presented as column data. This is useful when you want take unique column values and have them displayed as column headings, where the column headings are associated with summarized values displayed below each column heading. In this article I will be exploring how to use the PIVOT operator.
Syntax of PIVOT operator
The PIVOT operator allows you create multiple columns in a result set which are based on the unique column values in a source table or query.
In order to accomplish this a SELECT statement will need to contain the PIVOT operator using the following syntax:
SELECT <non-pivoted column>, [first pivoted column] AS <column name>, [second pivoted column] AS <column name>, ... [last pivoted column] AS <column name> FROM (<SELECT query that produces the data>) AS <alias for the source query> PIVOT ( <aggregation function>(<column being aggregated>) FOR [<column that contains the values that will become column headers>] IN ( [first pivoted column], [second pivoted column], ... [last pivoted column]) ) AS <alias for the pivot table> <optional ORDER BY clause>;
Where:
- [first pivoted column], [second pivoted column] and [last pivot column] – Identifies the unique column values that will be pivoted.
- <column_name> - Identifies the column alias name for the pivoted columns.
- <SELECT query that produces that data> - Identifies a query that produced the data that will be used by the PIVOT operator.
- <Alias for the query source> - Identifies the table alias name for the query that produces the data for the PIVOT operator.
- <aggregation function> - Identifies the function that will be used to aggregate the pivot column.
- <column being aggregated> - Identifies the column that will be aggregated by the PIVOT operator.
- <column that contains the values that will become column header> - Identifies the column from the query that that contains the unique column values that will be pivoted.
- <alias for the pivot table> - identifies the alias name for the table produced by the PIVOT operator.
- <optional ORDER BY clause> - identifies the columns used to sort the results produced by the PIVOT operator.
By reviewing this syntax you can see the PIVOT operator requires a number of different TSQL statement components to perform the PIVOT operation. To better understand this syntax let's review a few examples.
Sample Data for Examples
For all the examples in the article I will be using the AdventureWorks2012 database. If you want to follow along and you don't have the AdventureWorks2012 database you can download it from this location: http://msftdbprodsamples.codeplex.com/releases/view/93587
Example of Simple Pivot Operation
For my first example I will be creating some code that does a simple PIVOT operation. But before we look at some code that uses the PIVOT operator let's review the code in Listing 1 and the output it produces in Result 1.
USE AdventureWorks2012; GO SELECT [TerritoryID] , Year([OrderDate]) AS OrderYear , COUNT(*) NumOfOrders FROM [Sales].[SalesOrderHeader] WHERE Year([OrderDate]) in (2005,2006) GROUP BY Year([OrderDate]),[TerritoryID] ORDER BY Year([OrderDate]),[TerritoryID]
When I run the code in Listing 1 it produces the output in Result 1.
TerritoryID OrderYear NumOfOrders ----------- ----------- ----------- 1 2005 184 2 2005 40 3 2005 45 4 2005 296 5 2005 62 6 2005 127 7 2005 59 8 2005 76 9 2005 394 10 2005 96 1 2006 489 2 2006 100 3 2006 114 4 2006 748 5 2006 146 6 2006 434 7 2006 268 8 2006 233 9 2006 859 10 2006 301
If you look at the code in Listing 1 you can see it is counting the number of orders (NumOfOrders) for each TerritoryID, for each of the years listed. What I'm trying to show here is that there is one row of output produced for every unique combination of values of TerritoryID and OrderYear. By using the PIVOT operator I will be able to produce output similar toResult 1, but it will be pivoted. By using the PIVOT operator the NumOfOrders value for each TerritoryID will be a separate column value in a row for a given OrderYear. To demonstrate this let's look at the PIVOT query in Listing 2.
USE AdventureWorks2012; GO SELECT OrderYear as Num_Of_Orders_Per_Year_By_TerritoryID, [1], [2], [3], [4], [5], [6], [7], [8], [9], [10] FROM (SELECT [TerritoryID] , Year([OrderDate]) AS OrderYear ,1 Num FROM [Sales].[SalesOrderHeader] WHERE Year([OrderDate]) in (2005,2006)) AS SourceTable PIVOT ( SUM(Num) FOR [TerritoryID] IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10]) ) AS PivotTable;
When I run the code in Listing 2 I get the output in Result 2.
Num_Of_Orders_Per_Year_By_TerritoryID 1 2 3 4 5 6 7 8 9 10 ------------------------------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 2005 184 40 45 296 62 127 59 76 394 96 2006 489 100 114 748 146 434 268 233 859 301
By reviewing the output in Result 2 you can see that the summarized number of orders per year are now shown as column values for a specific OrderYear and TerritoryID. To understand how the PIVOT query created this output let's walk through the different parts for the query in Listing 2.
The query in Listing 2 can be broken up into three different parts. The first part of the query is right after the word SELECT. This first part identifies the column headings for the pivoted data. I my example those column headings would be the OrderYear, a non-pivoted column followed by 10 different pivoted values of 1 through 10. These values 1 through 10 represent the different possible TerritoryID values that are represented in SalesOrderHeader rows. I specified all the possible values for TerritoryID. You don't have to specify every value in the table. I could have specified only those column headings I wanted to display. Additionally if you specify a value that doesn't exist (such as “11”) in your data, then the value shown for that column will be NULL. The next part is a SELECT subquery that is identified by a table alias, in this case SourceTable. The SourceTable subquery identifies the source of the data for the PIVOT operator. In this example the SourceTable subquery produces a record set that contains the TerritoryID, the year of the order (OrderDate), and a constant value of 1 for the Num column for every SalesOrderHeader record where the year of the order is in 2005 or 2006. The next part of the query is the PIVOT clause, which is enclosed in parentheses and is given an alias name of PivotTable. The PIVOT clause is used to summarize the Num column to identify the summarized column values that will be pivoted. The PIVOT operator also aggregates and groups the data in the SourceTable. The data is summarized based on the columns in the SourceTable that are not being aggregated, which in this case is OrderYear and Territoryid. The PIVOT operator aggregates (in this case, sums) the Num column for each unique value of OrderYear, and TerritoryID. I used the SUM function to aggregate the Num column. Since the different Num column values are all 1's I could have also used the COUNT aggregation function to produce the same result. The aggregated values will be displayed under the column heading associated with the different TerritoryID's. For each unique values of the OrderYear column, which is the only column in the SourceTable that is not used in the PIVOT clause, there will be a separate row in the output. In my example the OrderYear column is the only column in the SourceTable that is not referenced in the PIVOT clause. Because OrderYear has two unique values there are two rows of aggregated data in Result 2.
Multiple Columns not used in the PIVOT clause
What happens when there are multiple columns in the source table that are not used in the PIVOT clause? When there is more than a single column in the source table that is not used in the PIVOT clause then there will be one row of output for every unique set of values for the non-PIVOT clausecolumns. To demonstrate this I'm going to run the code in Listing 3. In this listing I will be summarizing SalesOrderHeader data where the data is pivoted based on the sales quarter (OrderQtr), and aggregating the Num column based on OrderQtr, TerritoryID and OrderYear.
USE AdventureWorks2012; GO SELECT OrderYear, TerritoryID, [1], [2], [3], [4] FROM (SELECT Year([OrderDate]) AS OrderYear ,TerritoryID , ((Month([OrderDate])-1)/3) + 1 as OrderQtr ,1 Num FROM [Sales].[SalesOrderHeader] WHERE Year([OrderDate]) in (2005,2006)) AS SourceTable PIVOT ( COUNT(Num) FOR [OrderQtr] IN ([1], [2], [3], [4]) ) AS PivotTable ORDER BY OrderYear, TerritoryID;
When I run the code in Listing 3 I get the output in Result 3.
OrderYear TerritoryID 1 2 3 4 ----------- ----------- ----------- ----------- ----------- ----------- 2005 1 0 0 72 112 2005 2 0 0 18 22 2005 3 0 0 20 25 2005 4 0 0 127 169 2005 5 0 0 30 32 2005 6 0 0 61 66 2005 7 0 0 30 29 2005 8 0 0 30 46 2005 9 0 0 182 212 2005 10 0 0 51 45 2006 1 101 117 149 122 2006 2 20 20 30 30 2006 3 20 23 36 35 2006 4 148 208 209 183 2006 5 31 32 44 39 2006 6 108 93 128 105 2006 7 38 44 86 100 2006 8 38 56 61 78 2006 9 190 200 203 266 2006 10 47 32 108 114
By reviewing the code in Listing 3 you can see my SourceTable contains the following columns: OrderYear, TerritoryID, OrderQtr, and Num. The code in listing 3 aggregates the Num column, and pivots on the OrderQtr columns. Remember in the last section I said the PIVOT clausegroups the data based on the columns not being aggregated by the PIVOT operator. Therefore the data from the SourceTable is grouped on the OrderQtr, OrderYear and TerritoryID, because these are three columns in the SourceTable that are not aggregated in the PIVOT clause. You can see this by reviewing the output in Result 3, where there is an aggregated value displayed for each unique value of OrderQtr, OrderYear and TerritoryID. Additionally for every unique value pair of the OrderYear and TerritoryID columns, the two columns that are not used in the PIVOT clause, there is a separate row of output in Results 3. This time I used the COUNT function to determine the number of orders placed per OrderQtr, I could have used the SUM function as well since it produces the same results in this case.
Dynamically Determining PIVOT columns
What happens if you don't always know the column values you want to pivot on? Does this mean you can't write your PIVOT query in advance? Not knowing the pivot column values doesn't keep you from pivoting your data. You can use dynamic SQL code to generate your PIVOT query when the data values for the FOR clause of the PIVOT operator are not known. The code in Listing 4 generates dynamic SQL to determine the number of SalesOrderHeader records there are by TerritoryID and OrderYear.
SET NOCOUNT ON; USE AdventureWorks2012; GO DECLARE @Columns nvarchar(1000)=''; -- Identify columns to pivot SELECT @Columns=stuff(( SELECT DISTINCT ',' + QUOTENAME(CAST(Year([OrderDate]) AS CHAR(4))) as OrderYear FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] p2 ORDER BY OrderYear FOR XML PATH(''), TYPE).value('.', 'varchar(max)') ,1,1,'') --FROM (SELECT DISTINCT CAST(Year([OrderDate]) AS CHAR(4)) AS OrderYear -- FROM [Sales].[SalesOrderHeader]) AS Years DECLARE @CMD nvarchar(1000); -- Generate Dynamic SQL SET @CMD = 'SELECT TerritoryID, ' + @Columns + ' FROM (SELECT TerritoryID, Year(OrderDate) AS OrderYear ' + 'FROM [Sales].[SalesOrderHeader])AS SourceTable ' + ' PIVOT(COUNT(OrderYear) For [OrderYear] IN (' + @Columns + ')) as PivotTable'; -- Print and execute generated command PRINT @CMD EXEC sp_executesql @CMD;
When I run the code in Listing 4 I get the output in Result 4.
TerritoryID 2005 2006 2007 2008 ----------- ----------- ----------- ----------- ----------- 1 184 489 1789 2132 2 40 100 135 77 3 45 114 149 77 4 296 748 2391 2789 5 62 146 179 99 6 127 434 1597 1909 7 59 268 1127 1218 8 76 233 1062 1252 9 394 859 2662 2928 10 96 301 1352 1470
The Code in Listing 4 is broken up into three different pieces. The first part of this code identifies the different OrderYear values and sets the @Column variable to the set of unique OrderYear values with commas in between each year value. This code uses the FOR XML PATH clause to concatenate the unique OrderYear values together, separatedwithcommas. The STUFF function was used to remove the comma that was placed before the first OrderYear value. The second part of the code builds a dynamic PIVOT query which uses the @Column value to identify the column heading values for the pivoted column. The last part of the code prints out and executes the dynamic SQL.
The dynamic SQL code generated by Listing 4 can be found in Result 5.
SELECT TerritoryID, [2005],[2006],[2007],[2008] FROM ( SELECT TerritoryID, Year(OrderDate) AS OrderYear FROM [Sales].[SalesOrderHeader])AS SourceTable PIVOT(COUNT(OrderYear) For [OrderYear] IN ([2005],[2006],[2007],[2008])) as PivotTable
By reviewing the code in Results 5 you can see the code in Listing 4 identified that there were 4 different order years. And then uses those order year values to dynamically build the PIVOT query that is displayed in Results 5.
Summary
When you want to turn rows of data on its side to produce columns you should consider the PIVOT operator. With the PIVOT operator you can easily aggregate a single column, based on the unique values of another columns. Next time you need to summarize data based on specific unique column values consider using the PIVOT operator.
Question and Answer
In this section you can review how well you have understood using the PIVOT operation by answering the following questions.
Question 1:
The PIVOT operator supports summarizing many column? (True or False).
- True
- False
Question 2:
When the PIVOT clause groups data it does it based on which source columns?
- The source columns that are used in the PIVOT operator
- The source columns that are not aggregated with the PIVOT operator
- The source columns used in the FOR clause of the PIVOT operator
- All of the source column used in the PIVOT query
Question 3:
Which of the following statements are true (maybe more than one)?
- You can use the PIVOT operator even if you don't know the column values for the FOR clause.
- You can use the PIVOT operator to group on more than one column.
- The PIVOT operator uses the FOR column values as column headings
- The PIVOT operator does not required an ORDER BY clause
Answers:
Question 1:
The correct answer is b, false. The PIVOT operator supports only summarizing one column based on the values of other columns.
Question 2:
The correct answer is b. SQL Server groups data in a PIVOT clause by those column in the source table that are not aggregated by the PIVOT operator.
Question 3:
All of the answer are correct. If you don't already know the column values to be used in the FOR clause then you can generate a dynamic SQL PIVOT query. You can group the summarized column of a PIVOT statement based on all the columns that are not aggregated by the PIVOT operator, which could be more than one. When pivoting data the PIVOT operator uses the column values identified in the FOR clause as column headings in output for a PIVOT query. The “ORDER BY” clause is optional, so it doesn't need to be included in every PIVOT query.