“That can’t be done in SQL.”
How many times have you said this to yourself (or to your boss/colleagues)?
How often have others said this to you? The fact is that, while it’s true that
T-SQL has its limitations when compared to a “real” programming language like
C++, you might be amazed at the tricks you can pull off with T-SQL if you’re
willing to do some digging and trial-and-error experimentation.
Such an opportunity to push the limits of T-SQL came to me just yesterday at my
job. I work at a talent agency, and one of the databases I manage stores
information relating to performing artists’ and bands’ live performances (or
“personal appearances” as it’s called in the industry). A personal appearance
consists of such attributes as venue locations, venue capacities, show dates,
and count of tickets sold (on a per-day as well as on an aggregate basis).
The users wanted a Crystal Report that would display some basic header
information about an artist’s or band’s upcoming shows. They also wanted one of
the columns to display a carriage-return-delimited list (one item per line) of
ticket count dates, along with (for each date shown) the total number of tickets
sold for that date and the number sold on that particular day. The original spec
called for a list of the last five ticket-count dates, with the intent that it
would look something like this:
10/16/03 - 1181 (19) 10/15/03 - 1162 (14) 10/14/03 - 1148 (28) 10/13/03 - 1120 (9) 10/10/03 - 1111 (10)
The number to the immediate right of each date represents the total number sold to that date, and the number in parenthesis represents the number sold on that day. The dates are in descending order, counting downward from the most recent ticket-sale-count date. You can see, for example, that, on 10/16/2003, 19 tickets were sold, bringing the total up from the previous day from 1162 to 1181.
I assumed that this would be fairly simple with SQL Server 2000 – but it actually proved to me more complex than I thought. I wanted my solution to be set-oriented, if at all possible, avoiding cursors and loops. Luckily I was able to create a UDF to perform this task, as the following code shows:
CREATE FUNCTION GetLast5TicketCounts (@ShowID INT, @ClientName VARCHAR(200)) RETURNS VARCHAR(8000) AS BEGIN DECLARE @Counts VARCHAR(8000) SELECT TOP 5 @Counts = ISNULL(@Counts + '<BR><BR><BR>', '') + CONVERT(VARCHAR(100), CountDate, 101) + ' - ' + CAST(TicketCount AS VARCHAR(10)) + ISNULL(' (' + CAST(TicketCount - ISNULL( ( SELECT MAX(tc2.TicketCount) FROM vTicketCount tc2 WHERE tc2.ShowID = @ShowID AND tc2.ClientName = @ClientName AND tc2.CountDate < tc1.CountDate ), 0) AS VARCHAR(100)) + ')', '') FROM vTicketCount tc1 WHERE ShowID = @ShowID AND ClientName = @ClientName ORDER BY CountDate DESC RETURN ISNULL(@Counts, '') END
As you can see, this took quite a bit of coding! Note the use of the inner subquery, whose result (the previous day’s total to-date ticket count) is subtracted from the current day’s total to-date count in order to obtain the ticket count for the current day only. If the date in question happens to be the first date of ticket sales (meaning that there is no previous ticket-count date), then ISNULL forces it to return 0, and nothing is subtracted.
Also note the use of the HTML <BR> tags in the code in order to force a carriage-return/line-break after each item. The reason for this was that the T-SQL function CHAR(13) doesn’t seem to work with fields in Crystal Reports – but, Crystal fields can be set to be HTML-aware. Thus I make liberal use of HTML tags when I’m coding queries that are to be used in reports. (For some reason, I find that I need to use three <BR> tags in order to effect a line break).
I incorporated this UDF into the stored procedure that drives the report and assumed that my work was done. Then my boss informed me that the users are probably not going to want to be limited to seeing just the last five ticket-count dates; they will probably want to specify how many “last ticket-count” dates to show! Uh oh. I knew that this would require dynamic SQL if I were to use the same basic code layout (in a TOP n clause, you can’t use a variable as n – you have to write “EXEC(‘SELECT TOP ‘ + @variable + ‘)”). I also knew that you cannot use dynamic SQL in a UDF – nor can you use the SET ROWCOUNT n statement. So I explained to my boss that the users’ request was probably not possible with SQL Server, but I would do my best to find a way.
With a little experimentation, I discovered that this operation (allowing the user to specify the number of records to return) could indeed be performed in a UDF – but it required coding a WHILE loop, something I was trying to avoid (I try to stick with pure set-oriented operations as much as possible – WHILE loops and cursors are terribly inefficient in T-SQL as compared with set-oriented solutions). Here is the code I came up with:
CREATE FUNCTION GetLastNTicketCounts (@ShowID INT, @ClientName VARCHAR(200), @NumCounts INT) RETURNS VARCHAR(8000) AS BEGIN DECLARE @t TABLE (ID INT IDENTITY, TicketCount INT, CountDate DATETIME) INSERT INTO @t SELECT * FROM ( SELECT TOP 100 PERCENT TicketCount, CountDate FROM vTicketCount WHERE ShowID = @ShowID AND ClientName = @ClientName ORDER BY CountDate DESC ) t DECLARE @Counts VARCHAR(8000), @Counter INT SET @Counter = 1 WHILE @Counter <= @NumCounts BEGIN SELECT @Counts = ISNULL(@Counts + '<BR><BR><BR>', '') + CONVERT(VARCHAR(100), CountDate, 101) + ' - ' + CAST(TicketCount AS VARCHAR(10)) + ISNULL(' (' + CAST(TicketCount - ISNULL( ( SELECT MAX(tc2.TicketCount) FROM @t tc2 WHERE tc2.CountDate < tc1.CountDate ), 0) AS VARCHAR(100)) + ')', '') FROM @t tc1 WHERE ID = @Counter SET @Counter = @Counter + 1 END RETURN ISNULL(@Counts, '') END
Note the use of a table variable to store the results (ordered in descending order of the count date and including an IDENTITY column for convenience in incrementally stepping through the data in the loop). Since we are only dealing with a small amount of data (it’s unlikely that the report will contain more than 50 records or that the user will opt to see more than 10 count dates per record), the addition of the loop did not cause any real performance hit.
Feeling like a hero as I presented this to my boss, I then got the bomb dropped on me when I was told that the user would not only want to see n number of last ticket-sale dates – they would also want to see (in the same field) n number of first ticket-sale dates, too (that is, dates counting up from the first date of ticket sales)! Oh, and could I also make sure a neat little line appears separating the set of “last” dates from the set of “first” dates?
I knew I was in trouble on this one, because, in order for this to work, the first part of the query (which would return the “first” set of dates) would need to have the data sorted in ascending order by date in order to work properly – just like the second part of the query (which returns the “last” dates) would need the data sorted in descending order by date. After much experimentation, attempting to coalesce the two resultsets together into the @Counts variable via adding a second WHILE loop pass (and ending up each time with the dates in the wrong order and hence inaccurate count figures – even though I was using ORDER BY), I discovered that I could get around this by declaring two separate table variables – each sorted ascendingly or descendingly as the case required.
Since I already had one half of the equation figured out (how to display the last n ticket count dates/figures), I only needed to “reverse” my logic in order to display the first n dates/figures. Rather than combining both operations into one monster UDF, I decided to create a second UDF to handle returning the “first” dates, and then concatenate the results of each together in the main stored procedure. Here is the code – which as you can see is nearly identical to that of the previous UDF, with the exception of the bolded text:
CREATE FUNCTION GetFirstNTicketCounts (@ShowID INT, @ClientName VARCHAR(200), @NumCounts INT = NULL) RETURNS VARCHAR(8000) AS BEGIN DECLARE @t TABLE (ID INT IDENTITY, TicketCount INT, CountDate DATETIME)
INSERT INTO @t
SELECT *
FROM (
SELECT TOP 100 PERCENT TicketCount, CountDate
FROM vTicketCount
WHERE ShowID = @ShowID
AND ClientName = @ClientName
ORDER BY CountDate ASC
) t
DECLARE @Counts VARCHAR(8000), @Counter INT
SET @Counter = 1
WHILE @Counter <= @NumCounts
BEGIN
SELECT @Counts = ISNULL(@Counts + '<BR><BR><BR>', '') +
CONVERT(VARCHAR(100), CountDate, 101) + ' - ' +
CAST(TicketCount AS VARCHAR(10)) + ISNULL(' (' +
CAST(TicketCount - ISNULL( (
SELECT MAX(tc2.TicketCount)
FROM @t tc2 WHERE tc2.CountDate < tc1.CountDate ), 0) AS VARCHAR(100)) + ')', '') FROM @t tc1 WHERE ID = @Counter SET @Counter = @Counter + 1 END RETURN ISNULL(@Counts, '') END
As you can see, the only difference in the code is that the data is inserted into the table variable in ascending (rather than descending) order. Everything else is the same. I only needed to concatenate the results of both of these functions in order to return the data the users wanted.
One other minor issue remained: how to display the separator line between the “first” dates and the “last” dates. This line should only be displayed if the user has opted to display both the “first” count dates and the “last” count dates (there would be no need for a separator line if only one set of count dates were being displayed, or if no dates at all were being displayed). I added the following code to the stored procedure:
DECLARE @LineBreak VARCHAR(100) SET @LineBreak = CASE WHEN ISNULL(@FCNT, 0) = 0 OR ISNULL(@LCNT, 0) = 0 THEN '' ELSE '<BR><BR><BR>----------------------------<BR><BR><BR>' END
Note that the @FCNT and @LCNT variables represent the number of “first” and “last” ticket count dates to display, respectively. I then added this line of code to the SELECT portion of the procedure to concatenate it all together:
NULLIF( MAX(dbo.GetFirstNTicketCounts(tc.ShowID, @CN, @FCNT)) + @LineBreak + MAX(dbo.GetLastNTicketCounts(tc.ShowID, @CN, @LCNT)), @LineBreak ) AS TicketCount Here is the entire code of the resulting stored procedure: CREATE PROCEDURE rTicketCountSum @CN VARCHAR(200), @FCNT INT = NULL, -- Num of ticket counts to display from beginning of count period @LCNT INT = NULL, -- Num of ticket counts to display from end of count period @FDO TINYINT = 0 -- Display future show dates only AS DECLARE @LineBreak VARCHAR(100) SET @LineBreak = CASE WHEN ISNULL(@FCNT, 0) = 0 OR ISNULL(@LCNT, 0) = 0 THEN '' ELSE '<BR><BR><BR>----------------------------<BR><BR><BR>' END SELECT ClientName, CONVERT(VARCHAR, ShowDate, 101) AS ShowDate, VenueName, Contact, Phone, VenueCityState, Capacity, CAST((MAX(TicketCount) * 100) / Capacity AS VARCHAR(10)) + '%' AS PctSold, NULLIF( MAX(dbo.GetFirstNTicketCounts(tc.ShowID, @CN, @FCNT)) + @LineBreak + MAX(dbo.GetLastNTicketCounts(tc.ShowID, @CN, @LCNT)), @LineBreak ) AS TicketCount FROM vTicketCount tc -- a view joining all the relevant tables together LEFT JOIN PATC_Contacts c ON tc.ShowID = c.ShowID WHERE ClientName = @CN AND (ShowDate >= GETDATE() OR ISNULL(@FDO, 0) = 0) GROUP BY ClientName, tc.ShowID, ShowDate, VenueName, Contact, Phone, VenueCityState, Capacity Result?
The report now returns exactly the data that the users wanted (including the “neat little line break”), while still performing
efficiently! Here is a partial screenshot showing a few columns of the Crystal Report (run with the user opting to see the first 5 and last 5 count dates). Notice the far
left-hand column:
The moral of this story is: I’ve learned to not be so quick to “write off” a programming challenge as being beyond the scope of T-SQL. I’ve learned not to palm coding tasks off onto the front-end developers without thoroughly
experimenting to see if, by any possible way, the task can be performed efficiently on the database server. And in the process I’ve significantly minimized (if not eliminated altogether) those instances in which I’m tempted to raise my hands in frustration and declare those dreaded six words:
“That can’t be done in SQL.”