Introduction:
In Part 1, we saw how to convert rows to columns using both Cross-Tabs and the PIVOT method of SQL Server 2005. We saw the real performance difference a thing called "Pre-Aggregation" (which is nothing more than a derived table that did most of the real work of aggregating) could make. For a review, Part 1 is located at the following link: http://www.sqlservercentral.com/articles/T-SQL/63681/
Here, in Part 2, we'll learn how to make Cross-Tabs dynamically. We'll cover the one of the most common of Cross-Tabs, aggregating data by month and year and then "pivot" the results so the months and years become column names. In the process, we'll review how to do Cross-Tabs using the methods in Part 1, then well see how to turn it into dynamic SQL and, finally, how to turn it into a Stored Procedure. We'll also learn a bunch of "tricks" on the way.
Despite the name of this series of articles and because of the performance advantage that Cross-Tabs have over PIVOT's, I'm not actually going to demonstrate how to create dynamic PIVOTS. After this article, though, I'm sure you'll be able to figure it out on your own, but I still prefer the speed and readability of the Cross-Tab over a PIVOT.
The Test Data
As is usual, in order to be able to demonstrate a concept, we need some test data that anyone can use so we're all on common ground. And, for a bit of performance testing, we need LOTS of test data. With that in mind, here's the standard test table generator that some of you have seen before...
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains a simple identical 10 part CSV for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2 --===== Add the Primary Key
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
Just as a reminder... if you want to keep the formatting of the code in these articles, put your cursor on line above code windows, like the one above, click'n'drag to one line below the code window, copy, paste into MS-Word, copy that, and then paste into either QA or SMS.
Problem Definition:
Our overall goal is to create a stored procedure that accepts a valid start date and a valid end date. Using those two dates, calculate and display the SUM of the SomeMoney column in the correct date column for each SomeLetters2. Also, each row should have a total and the entire report should have a last row called "Total" that has the totals for each month plus a grand total. For a start date in Jan 2008 and an end date in Jun 2008, the report should look something like this (left off Jun 2008 so it would fit nice on this page)...
SomeLetters2 Jan 2008 Feb 2008 Mar 2008 Apr 2008 May 2008 Total
------------ ------------------------------------------------------------
AA 460.05 477.24 594.08 574.61 639.33 3416.74
AB 811.41 492.19 378.44 619.46 391.91 3299.76
AC 1058.77 406.65 760.01 668.93 642.96 4019.70
AD 251.80 935.75 512.40 291.33 316.88 2851.43
AE 939.23 562.28 542.98 748.99 604.14 4131.55
AF 482.98 688.55 737.56 537.84 691.64 3296.28
...
...
ZW 520.13 625.99 339.65 655.31 707.82 3426.92
ZX 682.81 776.39 771.96 662.49 580.97 4003.81
ZY 923.58 683.92 785.07 613.60 717.01 4492.36
ZZ 653.54 581.82 714.86 688.47 582.53 3683.72
Total 419847.81 394365.06 426987.80 417839.48 427724.13 2503138.63
Let's get started...
Divide and Conquer
A lot of folks jump right in with trying to format the report by using a Cross-Tab (or PIVOT). As we found out in Part 1, that causes a reduction in performance. It also makes life a bit more complicated because you're also trying to solve both the aggregation and the Cross-Tab at the same time. Divide and conquer, instead.
So, what do we need to start this problem out? Well let's see...
1. Some variables for the start and end dates.
2. Something to convert them to months.
3. Something to convert them to a date range that starts at the beginning of a month and ends in that last milliseconds of a month to ensure we pick up rows with or without a time.
4. Something to include any missing months so we can zero them out.
5. Something to convert the SomeDate column to a "Month/Year".
6. Something to SUM the data for each month.
7. A Cross-Tab to "pivot" the data.
8. Something to do a total of each row.
9. Something to do a total of each column.
10. Something to do a grand total.
11. ...and, it all has to be done using dynamic SQL.
Sounds a bit overwhelming if you look at it all at once. Let's "peel one potato at a time"...
Variables for the Start and End Dates
Let's ask a couple of really important questions... like what format of date is going to be provided? If we allow different formats of dates to be entered, should we validate the dates as being real or just assume the user knows what they're providing? Are we going to have to force the user to use dates that actually start a month and end a month? What happens if they mess up and provide the dates in the wrong order? How far are we willing to go to make the stored procedure as easy as possible to use?
Sounds like a lot... but it's not. Using the DATETIME datatype, allowing for some implicit conversions and knowing a couple of common date tricks will easily solve all those important questions (except 1 which we'll cover later).
So... with that in mind, let's declare a couple of variables (these will later become parameters) as DATETIME and check to see if the dates need to be swapped. I've also included some test dates to get us started...
An Old Fashioned "Swap"
--===== These will become parameters in the final code
DECLARE @StartDate DATETIME
DECLARE @EndDate DATETIME
SET @StartDate = '2008-06-29'
SET @EndDate = '2008-01-15 --===== This variable gives us "room" to swap dates if needed
DECLARE @SwapDate DATETIME --===== If the dates are not in the correct order, swap them
IF @EndDate < @StartDate
BEGIN
SET @SwapDate = @EndDate
SET @EndDate = @StartDate
SET @StartDate = @SwapDate
END --===== Check the dates visually
SELECT @StartDate,@EndDate
You'll also find out that if you change the order of the dates in the code above, no swap is done and everything works as expected.
A More Modern "Swap"
Here's another way to do it... it shows an important concept in SQL Server that can be very useful in writing high performance code especially when it comes to functions and "quirky" updates. The important concept is that a single SELECT can be used to set multiple variables AND it does it in the same order as they appear in the SELECT list! That means that you can set and use a variable to populate another in the same SELECT. As you may have seen in the "Running Total" problem (see http://www.sqlservercentral.com/articles/Advanced+Querying/61716/), it also means that you can overwrite variables in the same SELECT and they behave exactly as expected...
--===== These will become parameters in the final code
DECLARE @StartDate DATETIME
DECLARE @EndDate DATETIME
SET @StartDate = '2008-06-29'
SET @EndDate = '2008-01-15' --===== This variable gives us "room" to swap dates if needed
DECLARE @SwapDate DATETIME --===== If the dates are not in the correct order, swap them...
-- WITHOUT using an IF
SELECT @SwapDate = @EndDate,
@EndDate = @StartDate,
@StartDate = @SwapDate
WHERE @EndDate < @StartDate --===== Check the dates visually
SELECT @StartDate,@EndDate
GO
Find the Endpoints of the Date Range
Ok, we also have to figure out a date range in months. That means the @StartDate should be at midnight of the first day of it's month and @EndDate should cover even the very last instant of it's month. We'll do this using the very well known method of using DATEADD together with DATEDIFF to find the first of the month. Notice that we add 1 month to @EndDate... that's because the last instant of a month ends just before the first instant of the next month. This way, we don't have to mess around with milliseconds or, in the case of SQL Server 2008, micro-seconds. The code will be able to run on any platform from SQL Server 2000 (or earlier) to SQL Server 2008. Obviously, we're not going to use BETWEEN for our date range, later on.
--===== These will become parameters in the final code
DECLARE @StartDate DATETIME
DECLARE @EndDate DATETIME
SET @StartDate = '2008-06-29'
SET @EndDate = '2008-01-15' --===== This variable gives us "room" to swap dates if needed
DECLARE @SwapDate DATETIME --===== If the dates are not in the correct order, swap them...
-- WITHOUT using an IF
SELECT @SwapDate = @EndDate,
@EndDate = @StartDate,
@StartDate = @SwapDate
WHERE @EndDate < @StartDate --===== Establish the full date range from the first instant of
-- the month for @StartDate through the last possible instant
-- of the month for @EndDate. That means up to and not including
-- the start of the month that follows @EndDate.
SELECT @StartDate = DATEADD(mm,DATEDIFF(mm,0,@StartDate),0),
@EndDate = DATEADD(mm,DATEDIFF(mm,0,@EndDate)+1,0) --===== Check the dates visually (comment this out for production code)
SELECT @StartDate,@EndDate
GO
Handle ANY Date?
Heh... yeah, I knew you were going to ask that. Ok... let's do a little experiment... we already know that the code above works with "proper" dates like the dashed ISO dates in the examples. Let's see what else will convert to a date...
--=============================================================================
-- Test a bunch of date formats that convert implicitly.
-- The ones that are commented out give a conversion error.
--=============================================================================
--===== Create a test variable
DECLARE @SomeDate DATETIME --===== Show some of the different formats that it will "auto-magically" convert
SET @SomeDate = '2008-06-15'; SELECT @SomeDate
--SET @SomeDate = '2008-06'; SELECT @SomeDate
SET @SomeDate = 'June 2008'; SELECT @SomeDate
SET @SomeDate = 'Jun 2008'; SELECT @SomeDate --THIS ONE IS REALLY IMPORTANT TO US!!!
SET @SomeDate = '15 Jun 2008'; SELECT @SomeDate
SET @SomeDate = '15 June 2008'; SELECT @SomeDate
SET @SomeDate = '20080615'; SELECT @SomeDate
SET @SomeDate = '200806'; SELECT @SomeDate
SET @SomeDate = '06/15/2008'; SELECT @SomeDate -- Assumes date format of mdy, bad to use
--SET @SomeDate = '15/06/2008'; SELECT @SomeDate -- Fails because is in dmy format and my machine in mdy format
--SET @SomeDate = '06/2008'; SELECT @SomeDate
SET @SomeDate = '2008 Jun'; SELECT @SomeDate
SET @SomeDate = '2008 June'; SELECT @SomeDate
SET @SomeDate = '2008 Jun 15'; SELECT @SomeDate
SET @SomeDate = '2008 June 15'; SELECT @SomeDate
SET @SomeDate = 'June 15, 2008'; SELECT @SomeDate
SET @SomeDate = '15 June, 2008'; SELECT @SomeDate
SET @SomeDate = 'Jun 15, 2008'; SELECT @SomeDate
SET @SomeDate = '15 Jun, 2008'; SELECT @SomeDate
--SET @SomeDate = 'June 15th, 2008'; SELECT @SomeDate
Basically, a DATETIME variable is pretty darned flexible when it comes to what it's given as a date. Notice the one I have marked as being "really important to us". It returns the 1st of June 2008 and is in the format we want for our report column headers.
Also notice the notes on "date format" lines of code. Sure, if the proper Date Format is setup for your area, the mm/dd/yyyy or dd/mm/yyyy format can be used... but it can be ambiguous for all days less than 13 and really shouldn't be used.
Doing the "Pre-aggregation"
Just like in Part 1, we'll pre-aggregate the data. This is mostly for performance but also allows this part of the code to be troubleshot separately when needed. We need to convert the dates to the same date of the month (the 1st is the easiest) so we can group on it and the SomeLetters2 column according to our problem definition...
--=============================================================================
-- Do the preaggregation like we did in Part 1
--=============================================================================
SELECT DATEADD(mm,DATEDIFF(mm,0,SomeDate),0) AS MonthDate,
SomeLetters2,
SUM(SomeMoney) AS Total
FROM dbo.JBMTest
WHERE SomeDate >= 'Jan 1 2008 12:00AM' --Start date goes here
AND SomeDate < 'Jul 1 2008 12:00AM' --End date (beginning of following month)
GROUP BY DATEADD(mm,DATEDIFF(mm,0,SomeDate),0), SomeLetters2
If you run the code, you'll see it works just fine. I gives monthly totals for each SomeLetters2, just like what is required.
Doing the Cross-Tab
Using the pre-aggregation as the core or source of our Cross-Tab, let's build the Cross-tab just like we did in Part 1...
--=============================================================================
-- Build the Cross-tab around the Pre-aggregation like we did in Part 1.
-- Rollup causes a "total" row here with a NULL SomeLetters2
--=============================================================================
SELECT SomeLetters2,
SUM(CASE WHEN MonthDate = 'Jan 2008' THEN Total ELSE 0 END) AS [Jan 2008],
SUM(CASE WHEN MonthDate = 'Feb 2008' THEN Total ELSE 0 END) AS [Feb 2008],
SUM(CASE WHEN MonthDate = 'Mar 2008' THEN Total ELSE 0 END) AS [Mar 2008],
SUM(CASE WHEN MonthDate = 'Apr 2008' THEN Total ELSE 0 END) AS [Apr 2008],
SUM(CASE WHEN MonthDate = 'May 2008' THEN Total ELSE 0 END) AS [May 2008],
SUM(CASE WHEN MonthDate = 'Jun 2008' THEN Total ELSE 0 END) AS [Jun 2008],
SUM(Total) AS Total
FROM (--==== Derived table "d" does preaggregation by whole month
SELECT DATEADD(mm,DATEDIFF(mm,0,SomeDate),0) AS MonthDate,
SomeLetters2,
SUM(SomeMoney) AS Total
FROM dbo.JBMTest
WHERE SomeDate >= 'Jan 1 2008 12:00AM'
AND SomeDate < 'Jul 1 2008 12:00AM'
GROUP BY DATEADD(mm,DATEDIFF(mm,0,SomeDate),0), SomeLetters2
) d
GROUP BY SomeLetters2 WITH ROLLUP
Something Different... Totals
You'll notice something different in this Cross-Tab than what we did in Part 1. Notice the GROUP BY... it has the words WITH ROLLUP added to it. Basically, that means "Do subtotals and totals" according to the columns found in the GROUP BY. Since we are only grouping by one column, there will be no subtotals (and they are not necessary), but the last row will have a NULL in the SomeLetters2 column and the last row will contain totals for each of the month columns and the Total column. That "Total of the Total Column" is the GRAND TOTAL. Heh... requirements keep being solved almost by osmosis.
That NULL in the Total row really bothers me... it's ugly. Let's fix it. Let's make it say "Total" when the "grouping" (subtotal/total occurs) for that column occurs...
--=============================================================================
-- Build the Cross-tab around the Pre-aggregation like we did in Part 1.
-- ROLLUP causes a "total" row here with a NULL SomeLetters2.
-- GROUPING takes care of that NULL.
--=============================================================================
SELECT CASE WHEN GROUPING(SomeLetters2) = 1 THEN 'Total' ELSE SomeLetters2 END AS SomeLetters2,
SUM(CASE WHEN MonthDate = 'Jan 2008' THEN Total ELSE 0 END) AS [Jan 2008],
SUM(CASE WHEN MonthDate = 'Feb 2008' THEN Total ELSE 0 END) AS [Feb 2008],
SUM(CASE WHEN MonthDate = 'Mar 2008' THEN Total ELSE 0 END) AS [Mar 2008],
SUM(CASE WHEN MonthDate = 'Apr 2008' THEN Total ELSE 0 END) AS [Apr 2008],
SUM(CASE WHEN MonthDate = 'May 2008' THEN Total ELSE 0 END) AS [May 2008],
SUM(CASE WHEN MonthDate = 'Jun 2008' THEN Total ELSE 0 END) AS [Jun 2008],
SUM(Total) AS Total
FROM (--==== Derived table "d" does preaggregation by whole month
SELECT DATEADD(mm,DATEDIFF(mm,0,SomeDate),0) AS MonthDate,
SomeLetters2,
SUM(SomeMoney) AS Total
FROM dbo.JBMTest
WHERE SomeDate >= 'Jan 1 2008 12:00AM'
AND SomeDate < 'Jul 1 2008 12:00AM'
GROUP BY DATEADD(mm,DATEDIFF(mm,0,SomeDate),0), SomeLetters2
) d
GROUP BY SomeLetters2 WITH ROLLUP
Just like the old days of "Unit Record" equipment... when you see a "1" in a certain column, it's a Total (or sub-total).
Not only is there the WITH ROLLUP option, there's also a WITH CUBE option which returns additional sub-totals if, like ROLLUP, you have more than one column in the GROUP BY. Read all about these options in Books Online. You'll also find out more about the GROUPING function in the WITH CUBE listing in Books Online and, as you can tell, can also be used with WITH ROLLUP.
Do notice the CASE statements... the MonthDate of the pre-aggregation returns the first of a month using high speed date logic. The CASE statement compares against a string that returns the first of the given month and year. We know this works because we saw it work in the " Handle ANY Date?" section above. Notice that it's the same as the column alias that we have on the same row of code? It may not be obvious right now, but that's going to make our dynamic SQL a bit easier and cleaner to write.
Some SQL Trivia
Last, but not least, here's a little SQL Trivia for you... makes a pretty good "NINJA Tie Breaker" interview question, as well. Where is the only place where you DON'T need an ORDER BY to guarantee that the output will be in the correct order? The answer is when you use WITH ROLLUP or WITH CUBE on a GROUP BY. In fact, you shouldn't do an ORDER BY because the NULL's that mark grouping will bubble to the top of the output.
Making ALL the Months in the Date Range
Speaking of dates, how do we return all the month names and years between @StartDate and @EndDate in that "mmm yyyy" format?
Start with a "Tally Table"
Remember my old friend the Tally table? If you don't already have one, now's the time to make one. Please see the article at the following link for how to make one and how it works to replace loops with high speed code...
http://www.sqlservercentral.com/articles/TSQL/62867/
For everyone's convenience, here's how I made the one I use...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2 --===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100 --===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
Making the Months
The code to make the months we want, which will be dates for the first of each month in the date range, is pretty easy when made with a Tally Table. For example, if we want all the months from Jan 2008 through Jun 2008, the following will produce such a list of dates... Notice that we actually use July 2008 as the end date... again, there's a reason for that... If we take a monthly DATEDIFF between 20080101 and 20080601, we only get a count of 5 months. @EndDate is already set to handle this as well as allowing us to find the last instant of Jun using LESS THAN in other code... also notice how the STUFF function is used to remove the day of the month...
--===== Hard coded version of "get all the months in the date range".
SELECT N,
STUFF(CONVERT(CHAR(11),DATEADD(mm, N-1, '20080101'),100),4,3,'') AS MonthName
FROM dbo.Tally
WHERE N <= DATEDIFF(mm,'20080101','20080701')
Running the code above gives us the following result. Notice that we have the months in the "mmm yyyy" format and the "N" column gives us something to maintain the correct order because the MonthName is no longer chronologically sortable...
NMonthName
1Jan 2008
2Feb 2008
3Mar 2008
4Apr 2008
5May 2008
6Jun 2008
Converting the Cross Tab to Dynamic SQL
Ok, ok! I'm getting to it. Before you take on the Dynamic SQL, you have to have some working/tested code to model it after, first! Like I said, "Peel one potato at a time"... Divide and Conquer!
Review and Mark the Working/Tested Code
So, here we have it... just as a reminder, the working code looks like this (not including the parameter and variable definitions). The only thing different is that I used a comment "bar" so separate the different types of dynamic SQL according to level of difficulty...the more "Static" a section of code, the easier it is to convert.
--=============================================================================
-- Build the Cross-tab around the Pre-aggregation like we did in Part 1.
-- This one has a couple of "kickers" to make the Total row.
-- I've added comment "bars" to separate the trully dynamic SQL sections
-- from the relatively static SQL sections.
--=============================================================================
---------- This is "Static" -------------------------------------------------------------------
SELECT CASE WHEN GROUPING(SomeLetters2) = 1 THEN 'Total' ELSE SomeLetters2 END AS SomeLetters2,
---------- This is "Dynamic" ------------------------------------------------------------------
SUM(CASE WHEN MonthDate = 'Jan 2008' THEN Total ELSE 0 END) AS [Jan 2008],
SUM(CASE WHEN MonthDate = 'Feb 2008' THEN Total ELSE 0 END) AS [Feb 2008],
SUM(CASE WHEN MonthDate = 'Mar 2008' THEN Total ELSE 0 END) AS [Mar 2008],
SUM(CASE WHEN MonthDate = 'Apr 2008' THEN Total ELSE 0 END) AS [Apr 2008],
SUM(CASE WHEN MonthDate = 'May 2008' THEN Total ELSE 0 END) AS [May 2008],
---------- This is "Mostly Static" ------------------------------------------------------------
SUM(Total) AS Total
FROM (--==== Derived table "d" does preaggregation by whole month
SELECT DATEADD(mm,DATEDIFF(mm,0,SomeDate),0) AS MonthDate,
SomeLetters2,
SUM(SomeMoney) AS Total
FROM dbo.JBMTest
WHERE SomeDate >= 'Jan 1 2008 12:00AM'
AND SomeDate < 'Jul 1 2008 12:00AM'
GROUP BY DATEADD(mm,DATEDIFF(mm,0,SomeDate),0), SomeLetters2
) d
GROUP BY SomeLetters2 WITH ROLLUP
If you look at the "Static" sections, there's not much that changes except that we'll need to replace the hardcoded dates in the WHERE clause with the @StartDate and @EndDate variables. Of course, since this will ALL be in NVARCHAR datatype variables, those variables will need to be encapsulated in single quotes AND they'll need to be converted into NVARCHAR. (Note that you could use just VARCHAR, but I'm trying to get more "Global" in my old age.) 😉
As you'll see, the "Dynamic" section really isn't all that difficult. Nothing changes except the date we're comparing to and the related column alias contained in brackets. The "hard" part is getting the code to generate more than one row. Remember the month names we generated with the Tally table? Using a little trick with "concatenation" will solve all our problems there. For a lot more information on the concatenation method we're going to use, please see the following article for some "how to's" and some performance pitfalls to avoid... I'd explain it all here, but this article is bloody long enough as it is...
http://www.sqlservercentral.com/articles/Test+Data/61572/
Create One Variable for Each Section of Code
You could certainly do the following with a single VARCHAR or NVARCHAR variable, but that wouldn't be as much fun to explain. Besides, we're "peeling just one potato at a time", remember?
Ok, here we go. We need a variable to hold each section of code separated by the comment bars in the code above...
--===== Declare the variables that will contain the dynamic SQL
DECLARE @SQL1 NVARCHAR(4000),
@SQL2 NVARCHAR(4000),
@SQL3 NVARCHAR(4000)
Now, why on Earth do I declare these as NVARCHAR? Well, other than all that talk about always being ready to "Go Global", I happen to know that I'm going to use the QUOTENAME function. The return of the QUOTENAME function is NVARCHAR and if you concatenate an NVARCHAR with a VARCHAR, the result is NVARCHAR. You don't have to use QUOTENAME, but it's a chance to show how yet another SQL Function works.
Do the Easy "Static" Parts First
First, let's create the "Static" and "Mostly Static" sections of the dynamic SQL because they're easy... the neat thing is, because we're using separate variables for each section, it doesn't matter what order that we create them in. It just matters how we put them together for execution...
--===== Create the "static" section of the code
SELECT @SQL1 = 'SELECT CASE WHEN GROUPING(SomeLetters2) = 1 THEN ''Total'' ELSE SomeLetters2 END AS SomeLetters2,'+CHAR(10)
Notice I ended the code with "+CHAR(10)". CHAR(10) means "use ASCII character #10" and I just happen to know that ASCII Character #10 is the LineFeed or NewLine character. This is absolutely NOT necessary for the code to work. Rather, it's to make it easier to print out if you need to troubleshoot the dynamic SQL.
There are quite a few important ASCII characters and you can find them all at the following link...
For a more detailed explanation of all ASCII characters, please see the following link...
http://en.wikipedia.org/wiki/ASCII
Also notice that where ever we need a single quote, you have to use two single quotes to get it to actually show up in the dynamic SQL.
Here's the third part... the "Mostly Static" code... we'll get to the second part in a minut
--===== Create the "Mostly Static" section of the code
SELECT @SQL3 =
' SUM(Total) AS Total
FROM (--==== Derived table "d" does preaggregation by whole month
SELECT DATEADD(mm,DATEDIFF(mm,0,SomeDate),0) AS MonthDate,
SomeLetters2,
SUM(SomeMoney) AS Total
FROM dbo.JBMTest
WHERE SomeDate >= ' + QUOTENAME(@StartDate,'''') + '
AND SomeDate < ' + QUOTENAME(@EndDate,'''') + '
GROUP BY DATEADD(mm,DATEDIFF(mm,0,SomeDate),0), SomeLetters2
) d
GROUP BY SomeLetters2 WITH ROLLUP
' --<<LOOK! GOTTA KEEP THIS!!!!
Ok... a couple of things to notice. First, I didn't have to end every line with CHAR(10) because it's mostly one big happy chunk of dynamic SQL... the "new line" at the end of each line will automatically get picked up and the code will be formatted as seen (with the exception of the dates, of course). See the special trick used at the end of each line in the WHERE clause that also provide a "new line" without using CHAR(10),
Do notice the dates... they need to be converted to a character based date and wrapped in single quotes. One of the easiest ways to do that is with QUOTENAME. The only thing you have to be careful of is to remember that QUOTENAME returns an NVARCHAR result and anytime a VARCHAR is concatenated with an NVARCHAR, the result will be NVARCHAR. If you really want to use VARCHAR(8000) for your dynamic SQL variables, then you'll need to find another way to do the date conversion and add the necessary quotes. Such as...
WHERE SomeDate >= ''' + CONVERT(VARCHAR(11),@StartDate,112) + '''
AND SomeDate < ''' + CONVERT(VARCHAR(11),@EndDate,112) + '''
The "Hard" Part - Concatenation to Form the Date Columns
This is probably the most important part of the dynamic SQL because lots of folks just can't figure out how to do it. You need to know lots of different things to make it this simple, but once you know them, dynamic SQL becomes a real breeze. Yes, it could be done with a WHILE loop... but why???
I put "Hard" in quotes in the section title because it's really not all that difficult. If you look at each line of code for the CASE statements, they're all identical except for the quoted dates and the column aliases. Even then, those two things are the same in any given CASE statement. To make life even easier, we already have a source for those dates... remember the Tally Table thing we did in the "Making the Months" section above? THIS is where THAT comes into play with a little help from "concatenation"...
--===== The "Hard" Part - Concatenation to Form the Date Columns
SELECT @SQL2 = COALESCE(@SQL2,'')
+ ' SUM(CASE WHEN MonthDate = ' + QUOTENAME(d.MonthName,'''')
+ ' THEN Total ELSE 0 END) AS [' + d.MonthName + '],' + CHAR(10)
FROM
(--==== Produces the list of MonthNames in the mmm yyyy format
SELECT N,
STUFF(CONVERT(CHAR(11),DATEADD(mm, N-1, @StartDate),100),4,3,'') AS MonthName
FROM dbo.Tally
WHERE N <= DATEDIFF(mm,@StartDate,@EndDate)
) d
ORDER BY d.N
Notice our buddy, the Tally Table, works just as we used it before and is being used as a source of "MonthName's" for the outer query. The outer query creates each CASE statement, using MonthName, and concatenates it all into @SQL2. Again, the methods (there's more than just this method) for doing this are discussed to a great extent at the following link...
http://www.sqlservercentral.com/articles/Test+Data/61572/
Testing the Dynamic SQL
Before you even think of turning all of this into a stored procedure, you have to put all the pieces together and test the resulting dynamic SQL. So, without further ado, here's the whole shootin' match... notice the PRINT statement at the end... it generates the Dynamic SQL so we can actually execute it in another window to make it real easy to troubleshoot. Of course, I never make mistrakes, so I never have to do that... yeah, right... 😉
-- Test the Dynamic SQL -- Execute what gets printed out
--=============================================================================
-- Create some variables that will act as input parameters and swap the
-- dates WITHOUT using IF and establish the endpoints of the date range.
--=============================================================================
--===== These will become parameters in the final code
DECLARE @StartDate DATETIME
DECLARE @EndDate DATETIME
SET @StartDate = '2008-06-29'
SET @EndDate = '2008-01-15' --===== This variable gives us "room" to swap dates if needed
DECLARE @SwapDate DATETIME --===== If the dates are not in the correct order, swap them...
-- WITHOUT using an IF
SELECT @SwapDate = @EndDate,
@EndDate = @StartDate,
@StartDate = @SwapDate
WHERE @EndDate < @StartDate --===== Establish the full date range from the first instant of
-- the month for @StartDate through the last possible instant
-- of the month for @EndDate. That means up to and not including
-- the start of the month that follows @EndDate.
SELECT @StartDate = DATEADD(mm,DATEDIFF(mm,0,@StartDate),0),
@EndDate = DATEADD(mm,DATEDIFF(mm,0,@EndDate)+1,0) --===== Check the dates visually
SELECT @StartDate,@EndDate --===== Declare the variables that will contain the dynamic SQL
DECLARE @SQL1 NVARCHAR(4000),
@SQL2 NVARCHAR(4000),
@SQL3 NVARCHAR(4000) --===== Create the "static" section of the code
SELECT @SQL1 = 'SELECT CASE WHEN GROUPING(SomeLetters2) = 1 THEN ''Total'' ELSE SomeLetters2 END AS SomeLetters2,'+CHAR(10)
--===== Create the "Mostly Static" section of the code
SELECT @SQL3 =
' SUM(Total) AS Total
FROM (--==== Derived table "d" does preaggregation by whole month
SELECT DATEADD(mm,DATEDIFF(mm,0,SomeDate),0) AS MonthDate,
SomeLetters2,
SUM(SomeMoney) AS Total
FROM dbo.JBMTest
WHERE SomeDate >= ' + QUOTENAME(@StartDate,'''') + '
AND SomeDate < ' + QUOTENAME(@EndDate,'''') + '
GROUP BY DATEADD(mm,DATEDIFF(mm,0,SomeDate),0), SomeLetters2
) d
GROUP BY SomeLetters2 WITH ROLLUP
' --===== The "Hard" Part - Concatenation to Form the Date Columns
SELECT @SQL2 = COALESCE(@SQL2,'')
+ ' SUM(CASE WHEN MonthDate = ' + QUOTENAME(d.MonthName,'''')
+ ' THEN Total ELSE 0 END) AS [' + d.MonthName + '],' + CHAR(10)
FROM
(
SELECT N,
STUFF(CONVERT(CHAR(11),DATEADD(mm, N-1, @StartDate),100),4,3,'') AS MonthName
FROM dbo.Tally
WHERE N <= DATEDIFF(mm,@StartDate,@EndDate)
) d
ORDER BY d.N PRINT @SQL1 + @SQL2 + @SQL3
The result from that is executable code ... execute it in a separate window and make any repairs that may be necessary. Then, translate those repairs to the dynamic SQL and try again until the generated code works with no repairs....
Indexing for the Code
The neat thing about printing out the dynamic SQL and running it, is that you don't have to think much about how to index it. I just ran the resulting code through the "Database Engine Tuning Advisor" and it came up with a pretty good recommendation for and index for the code. It also came up with a statistics recommendation, but it didn't do anything for the performance.
CREATE NONCLUSTERED INDEX [_dta_index_JBMTest_9_734625660__K6_K3_5] ON [dbo].[JBMTest]
(
[SomeDate] ASC,
[SomeLetters2] ASC
)
INCLUDE ( [SomeMoney]) WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
Just to make sure that everything was going to be all nice and scalable, I reran the test data creation code... this time, with 10 million rows. Running the query for all of 2008 gave me the following performance without the index...
(677 row(s) affected)
SQL Server Execution Times:
CPU time = 7469 ms, elapsed time = 27869 ms.
After adding the index, here's what I got...
(677 row(s) affected)
SQL Server Execution Times:
CPU time = 3172 ms, elapsed time = 3884 ms.
A 10 million row table chock full of highly randomized data and we get monthly sums for a year, row totals, column totals, and a grand total... all well within what most people have as a "5 second SLA for reporting". Who says dynamic SQL is slow? 😉
Wrapping It All Up in a Stored Procedure
Home stretch... all we need to do is change a couple of variables to parameters, get rid of the start date and end date that get printed out as a sanity check, change the final PRINT to an EXEC, convert it to a stored proc, give it a name, and test it. Here's the stored proc...
--===== The FINAL stored procedure
CREATE PROCEDURE dbo.ReportMonthRangeTotalsForSomeLetters2
/*****************************************************************************************
Purpose: Dynamically generates a "Total" report for each SomeLetters2 entry of the
dbo.JBMTest table along with row and column totals. Output is grouped in monthly
columns having the "mmm yyyy" format.
Inputs: @StartDate - Anything that will convert to a date in the desired starting Month.
@EndDate - Anything that will convert to a date in the desiredending Month Revision History:
Rev 00 - 22 Nov 2008 - Jeff Moden - Initial creation, test, and release for
- SQLServerCentral.com
*****************************************************************************************/
--===== Declare the I/O parameters
@StartDate DATETIME,
@EndDate DATETIME
AS --===== These will become parameters in the final code (uncomment for troubleshooting)
--DECLARE @StartDate DATETIME
--DECLARE @EndDate DATETIME
-- SET @StartDate = '2008-06-29'
-- SET @EndDate = '2008-01-15' ------------------------------------------------------------------------------------------
-- Declare the local variables and do any environmental setup
------------------------------------------------------------------------------------------
--===== This variable gives us "room" to swap dates if needed
DECLARE @SwapDate DATETIME --===== Declare the variables that will contain the dynamic SQL
DECLARE @SQL1 NVARCHAR(4000),
@SQL2 NVARCHAR(4000),
@SQL3 NVARCHAR(4000) --===== Suppress the auto-display of any rowcounts
SET NOCOUNT ON ------------------------------------------------------------------------------------------
-- "Groom" the dates, swapping if necessary, and adjust to make them represent the
-- date range endpoints.
------------------------------------------------------------------------------------------
--===== If the dates are not in the correct order, swap them...
-- WITHOUT using an IF
SELECT @SwapDate = @EndDate,
@EndDate = @StartDate,
@StartDate = @SwapDate
WHERE @EndDate < @StartDate --===== Establish the full date range from the first instant of
-- the month for @StartDate through the last possible instant
-- of the month for @EndDate. That means up to and not including
-- the start of the month that follows @EndDate.
SELECT @StartDate = DATEADD(mm,DATEDIFF(mm,0,@StartDate),0),
@EndDate = DATEADD(mm,DATEDIFF(mm,0,@EndDate)+1,0) --===== Check the dates visually (uncomment for troubleshooting)
-- SELECT @StartDate,@EndDate ------------------------------------------------------------------------------------------
-- Build the dynamic SQL to create the report with regard to input date parameters
------------------------------------------------------------------------------------------
--===== Create the "static" section of the code
SELECT @SQL1 = 'SELECT CASE WHEN GROUPING(SomeLetters2) = 1 THEN ''Total'' ELSE SomeLetters2 END AS SomeLetters2,'+CHAR(10)
--===== Create the "Mostly Static" section of the code
SELECT @SQL3 =
' SUM(Total) AS Total
FROM (--==== Derived table "d" does preaggregation by whole month
SELECT DATEADD(mm,DATEDIFF(mm,0,SomeDate),0) AS MonthDate,
SomeLetters2,
SUM(SomeMoney) AS Total
FROM dbo.JBMTest
WHERE SomeDate >= ' + QUOTENAME(@StartDate,'''') + '
AND SomeDate < ' + QUOTENAME(@EndDate,'''') + '
GROUP BY DATEADD(mm,DATEDIFF(mm,0,SomeDate),0), SomeLetters2
) d
GROUP BY SomeLetters2 WITH ROLLUP
' --===== The "Hard" Part - Concatenation to Form the Date Columns
SELECT @SQL2 = COALESCE(@SQL2,'')
+ ' SUM(CASE WHEN MonthDate = ' + QUOTENAME(d.MonthName,'''')
+ ' THEN Total ELSE 0 END) AS [' + d.MonthName + '],' + CHAR(10)
FROM
(
SELECT N,
STUFF(CONVERT(CHAR(11),DATEADD(mm, N-1, @StartDate),100),4,3,'') AS MonthName
FROM dbo.Tally
WHERE N <= DATEDIFF(mm,@StartDate,@EndDate)
) d
ORDER BY d.N --===== Print the Dynamic SQL (uncomment for troubleshooting
-- PRINT @SQL1 + @SQL2 + @SQL3 --===== Execute the Dynamic SQL to create the desired report
EXEC (@SQL1 + @SQL2 + @SQL3)
GO
Testing the Code
This should be pretty simple...
EXEC dbo.ReportMonthRangeTotalsForSomeLetters2 'Jan 2008', 'Jun 2008'
EXEC dbo.ReportMonthRangeTotalsForSomeLetters2 '20080628', '20080115'
EXEC dbo.ReportMonthRangeTotalsForSomeLetters2 'Jul 2007', 'Jun 2008'
Those work just fine... Now, let's delete the entire month of March, 2008 and make sure that it gets reported as a zero total instead of just going missing...
DELETE dbo.JBMTest
WHERE SomeDate >= 'Mar 2008'
AND SomeDate < 'Apr 2008' EXEC dbo.ReportMonthRangeTotalsForSomeLetters2 'Jan 2008', 'Jun 2008'
EXEC dbo.ReportMonthRangeTotalsForSomeLetters2 '20080628', '20080115'
EXEC dbo.ReportMonthRangeTotalsForSomeLetters2 'Jul 2007', 'Jun 2008'
All those work just fine, too... but when we try the following... BOOM!
EXEC dbo.ReportMonthRangeTotalsForSomeLetters2 'Jan 2000', 'Dec 2005' Msg 102, Level 15, State 1, Line 50
Incorrect syntax near 'EN'.
Msg 102, Level 15, State 1, Line 59
Incorrect syntax near 'd'.
Msg 319, Level 15, State 1, Line 60
Incorrect syntax near the keyword 'with'. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.
These are some pretty strange errors for code that we absolutely know works correctly. What went wrong? Well, if you print out the dynamic SQL, you'll find it's not all there for @SQL2, the part that generates the date columns. The code got too big for an NVARCHAR to hold. How do you fix it? In SQL Server 2005, it's simple... just change the datatype for @SQL2 to NVARCHAR(MAX) and your done.
For SQL Server 2000, except for maybe change to VARCHAR as previously shown, it's not so simple... and will be the subject of a future. 😉
Quick Review
Here's a quick review of the steps we took...
1. Build the preaggregation SELECT to summarize the data.
2. Build a Cross Tab to pivot/format the data the way you want it.
3. Note all the places where parameters should be used in the code.
4. Split the code into sections... the SELECT list should always be separate because it's the most dynamic and will be solved using "concatenation".
5. Build each section in dynamic code variables.
6. Run the dynamic code and print it out. Troubleshoot if necessary. Use it to figure out your indexes if necessary.
7. Convert the code to a stored procedure and finish testing.
The "Key" to dynamic SQL is to first build non-dynamic SQL that works and then do the conversion to dynamic SQL.
Despite the length of this article (we covered a lot of ground), creating Cross-Tab reports (or, even Pivots if you prefer) using Dynamic SQL really isn't that hard. . If you back and look at the final stored procedure, it's really pretty short. Contrary to popular belief, Dynamic SQL Cross-Tab reports are pretty darned fast and, if you take the time to do a little analysis, will meet or beat most reporting SLA's.
Notice that about half of this article had nothing to do with Dynamic SQL. Heh... it's a bit like I remember my Calculus classes were. The Calculus was easy... setting up the problem and doing the Algebra was the hard part. Half of this article was about setting up and testing a Cross-Tab against some test data (could have been real). That should always be the first step... getting working/tested static code. Always use "Divide and Conquer" methods to solve these types of problems. They just make life so much easier. Then, convert that to dynamic SQL and test, test, test.
Last, but not least, I'm pretty much "old school" and a real "Data Troll". Most folks will tell you that all of this should be done in the GUI or using a reporting tool. Personally, I can't see sending all that much data across the pipe. And, I've not had the pleasure (displeasure according to rumor) of using Crystal Reports or Reporting Services and, thus, have no idea how long they would take to generate such a report. But, I have seen that they sometimes require you to write a query to help the report along. Using the methods and tools, like my buddy the Tally Table, I'm thinking that you'll be better prepared for such an eventuality.
Thanks for listening, folks.
--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row"