"SQL Spackle" is a collection of short articles written based on multiple requests for similar code. These short articles are NOT meant to be complete solutions. Rather, they are meant to "fill in the cracks".
--Phil McCracken
Prologue
Almost as soon as this article came out, the good folks in the discussion showed that the methods below really aren't worth a hoot for international short-month names. Having a look at the contents of the sys.syslanguages table (which I made the mistake of never previously looking at) showed that "short-month" names aren't always 3 letters or even letters at all in some languages. Admittedly, I've never had to write multi-language friendly code but that's still no good excuse.
To wit, I've added the "Handling Different Languages" section and made a couple of other modifications to this "SQL Spackle" article to fill a more global crack. My thanks go out to all that helped me fill in this crack in MY knowledge.
Introduction
I realize that this subject has been written about at least a dozen times just on this site alone but I keep seeing people make the same mistake over and over (including "today" at the time of this writing). A very common problem in SQL Server seems to be converting a date to display in the MON YYYY format where "MON" is the 3 character month such as "Jan", "Feb", "Mar", etc.
Before I get any hate mail, I'll be the first to agree that formatting dates in SQL Server is almost always the wrong thing to do. Still, there are times where that's where the formatting has to be done and, if you need to do it, you might as well do it with performance, ease of programming, and readability in mind.
Typical Code
This isn't actually a "requested" bit of code. Rather it's an artifact that regularly appears in many posts on this and other forums. The reason why people don't appear to request it as a "how to" is because they think they've sussed the problem of 3 character months with code like the following example:
SELECT CASE DATEPART(mm,SomeDate) WHEN 01 THEN 'Jan' WHEN 02 THEN 'Feb' WHEN 03 THEN 'Mar' WHEN 04 THEN 'Apr' WHEN 05 THEN 'May' WHEN 06 THEN 'Jun' WHEN 07 THEN 'Jul' WHEN 08 THEN 'Aug' WHEN 09 THEN 'Sep' WHEN 10 THEN 'Oct' WHEN 11 THEN 'Nov' ELSE 'Dec' END + ' ' + CAST(DATEPART(yy,SomeDate) AS CHAR(4)) FROM dbo.JBMTest ;
Does the code work? Absolutely! So, what's the problem with it? Heh... You mean besides it being too long, too slow, fairly ugly, and prone to copy'n'paste errors? How about the fact that if you ever need to change languages, it'll still appear as Jan, Feb, Mar, etc. even though it should be in, say, German?
Let's start tearing the code apart from the bottom up and see what we can do about all those problems.
Test Data
In order to demo certain aspects of the code in this article, we need some dates... a whole lot of dates. The following code will build a million random dates/times from the first instant of the year 2000 to the last instant of 2009 (ten full years in total) in about 5 seconds on my 8 year old, single CPU desktop box:
--===== Do this testing in a nice safe place that everyone has USE TempDB ; --===== Create and populate a test table with a million dates -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers -- Column "SomeDate" has a range of >=01/01/2000 and -- <01/01/2010 non-unique date/times SELECT TOP 1000000 RowNum = IDENTITY(INT,1,1), SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) INTO dbo.JBMTest FROM sys.All_Columns t1, sys.All_Columns t2 ;
Why a million dates? For two reasons: 1) I'm a "batch programmer" and million row tables are actually considered to be a fairly small table compared to what most batch programmers normally work with and 2) all of the code runs extremely fast so we need at least that many rows to start seeing the differences.
Getting the Year
Whoa! What's so hard about getting the year? What possible improvement could we make over DATEPART()?
Welllllllll, DATEPART() or even YEAR() are dandy ways to isolate the year but we need the year as a string so we can concatenate it to the 3 letter month that we'll eventually come up with. To do that with DATEPART(), we need to do an extra conversion to convert it to a CHAR or NCHAR or we'll get an error similar to the following:
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value 'May ' to data type int.
Believe it or not, not only does the extra conversion to CHAR or NCHAR add to the clutter we see in the original code, that extra conversion also takes a bit of extra time to execute, as you'll soon see. So, how can we do the conversion without explicitly doing the conversion?
Introducing DATENAME()
Lookup DATENAME() in Books Online and you'll find that it has all the same date-part features as DATEPART() except that it returns strings instead of numeric values. For the YEAR or YY date-part, it will return the 4 digit year as a 4 character NVARCHAR and it's pretty fast. Let's compare the original method of isolating the year with DATENAME(). I'd like to point out that I use a "throw away" variable (@BitBucket) to keep display and disk latency out of the picture.
DECLARE @BitBucket CHAR(4); PRINT '========== Original Method of Conversion =========='; SET STATISTICS TIME ON; SELECT @BitBucket = CAST(DATEPART(yy,SomeDate) AS CHAR(4)) FROM dbo.JBMTest; SET STATISTICS TIME OFF; PRINT '========== DATENAME Method of Conversion =========='; SET STATISTICS TIME ON; SELECT @BitBucket = DATENAME(yy,SomeDate) FROM dbo.JBMTest SET STATISTICS TIME OFF;
That gives us the following statistical output:
========== Original Method of Conversion ==========
SQL Server Execution Times:
CPU time = 1203 ms, elapsed time = 1267 ms.
========== DATENAME Method of Conversion ==========
SQL Server Execution Times:
CPU time = 1156 ms, elapsed time = 1167 ms.
BWAA-HAA!!! As some of my more "dry" DBA "friends" might say, "Well whupty-freakin'-doooo! The boy genius saved 17 characters of typing and a whole 100 milliseconds duration over, what... A million rows???" My reply is usually something like, "You mean an 8% increase in performance with 44% less typing? Don't you wish you could do that everywhere?"
But that's not the whole story.
Introducing DATENAME()... Again!
DATENAME() has a rather special effect on days of the week and months. Instead of returning the numeric equivalent as a 2 character NVARCHAR, it actually spells the name out. Try it...
SELECT DATENAME(dw,GETDATE()), DATENAME(mm,GETDATE()) ;
When I ran that code on Tuesday, January 04, 2011, it correctly returned the following:
------------------------------ ------------------------------
Tuesday January
If we use SUBSTRING(1,3) (or LEFT(3) if you prefer) on those two items as in the following code:
SELECT SUBSTRING(DATENAME(dw,GETDATE()),1,3), SUBSTRING(DATENAME(mm,GETDATE()),1,3) ;
We get the following correct 3 character representations for the day of the week and the month name:
--- ---
Tue Jan
Putting It All Together
Just like the title of this section says, let's put it all together:
SELECT SUBSTRING(DATENAME(mm,SomeDate),1,3) + ' ' + DATENAME(yy,SomeDate) FROM dbo.JBMTest ;
That works very well and it's almost exactly twice as fast as the original method. The code is also a whole lot shorter and it's all actually ANSI compliant, to boot.
Heh... but, like I said, I'm a batch programmer and I don't actually believe in the myth of truly portable code. What I do believe in is squeezing every last bit of speed I can out of the code using the "Spackle Knife" known as T-SQL.
If we let T-SQL do most of the work for us, we find that "format code 106" (please do look it up in Books Online under CONVERT) used in conjunction with CONVERT returns just a little bit more than what we really need in the date format of DD MON YYYY. All we need to do is grab the right-hand 8 characters from that and we're done:
SELECT RIGHT(CONVERT(CHAR(11),SomeDate,106),8) FROM dbo.JBMTest ;
Not only are we done, but compare the code above to the original code. There's a huge difference in the number of characters and with the placement of the correct comment, no one will even need to look up what "format code 106" does:
--===== Format the date as MON YYYY SELECT RIGHT(CONVERT(CHAR(11),SomeDate,106),8) FROM dbo.JBMTest ;
Handling Different Languages
As the discussion that follows this article shows, the methods above just don't handle non-English short-month names very well. There have been suggestions that one could "split" the CSV column called "shortmonths" in the system table called "sys.syslanguages" to do the trick. Although I'll be one of the first to use a Tally Table or Tally CTE for things, neither a split nor any Tally Table tricks seem either needed or appropriate for the simple task of creating a "shortname YYYY" format for a report or whatever because SQL Server will handle all of that much like the method just before this section.
We just need to make a couple of small tweeks and the same code that handles 3 month conversions for English speaking countries will suddenly work with any language SQL Server supports even if a Slovakian language that uses Roman Numerals for short-month names is the current language.
Here's the code:
SELECT SUBSTRING(CONVERT(VARCHAR(30),GETDATE(),106),4,30)
Here's some code that tests the code above for all the languages stored in sys.syslanguages. I didn't list the results here because they're too long for this article.
--===== Declare some obviously named variables DECLARE @MyLanguage SYSNAME, @SQL NVARCHAR(MAX) ; --===== Remember the current language for this session SELECT @MyLanguage = 'SET LANGUAGE N' + QUOTENAME(@@LANGUAGE,'''') ; --===== Create the SQL to demonstrate that the code works in other languages SELECT @SQL = CAST( ( SELECT 'SET LANGUAGE N' + QUOTENAME(Alias,'''')+';' + CHAR(10) + 'SELECT Language =' + QUOTENAME(Alias,'''') + ', [MON YYYY] = SUBSTRING(CONVERT(VARCHAR(30),GETDATE(),106),4,30);' + CHAR(10) FROM sys.syslanguages FOR XML PATH(''),TYPE ) AS NVARCHAR(MAX)) ; --===== Now show the MON YYYY format for all the languages in sys.syslanguages -- and return to my original language setting EXEC (@SQL); EXEC (@MyLanguage);
The short names for several of the months come up with question marks on the box I'm running on and I suspect it's just because I don't have the correct character sets available to me.
The Final Speed Test
At this point, I know I probably don't have to do this final test, but I wanted you to see the performance results. Again, we'll use a couple of "throw away" variables for testing. The following code runs against the million date table we made at the very beginning of this article:
DECLARE @BitBucket CHAR(8), @NBitBucket NCHAR(8) ; PRINT '========== Original Method of Conversion =========='; SET STATISTICS TIME ON; SELECT @BitBucket = CASE DATEPART(mm,SomeDate) WHEN 01 THEN 'Jan' WHEN 02 THEN 'Feb' WHEN 03 THEN 'Mar' WHEN 04 THEN 'Apr' WHEN 05 THEN 'May' WHEN 06 THEN 'Jun' WHEN 07 THEN 'Jul' WHEN 08 THEN 'Aug' WHEN 09 THEN 'Sep' WHEN 10 THEN 'Oct' WHEN 11 THEN 'Nov' ELSE 'Dec' END + ' ' + CAST(DATEPART(yy,SomeDate) AS CHAR(4)) FROM dbo.JBMTest; SET STATISTICS TIME OFF; PRINT '========== DATENAME Method of Conversion =========='; SET STATISTICS TIME ON; SELECT @NBitBucket = SUBSTRING(DATENAME(mm,SomeDate),1,3) + N' ' + DATENAME(yy,SomeDate) FROM dbo.JBMTest; SET STATISTICS TIME OFF; PRINT '========== ENGLISH CONVERT Method of Conversion =========='; SET STATISTICS TIME ON; SELECT @BitBucket = RIGHT(CONVERT(CHAR(11),SomeDate,106),8) FROM dbo.JBMTest; SET STATISTICS TIME OFF; PRINT '========== INTERNATIONAL CONVERT Method of Conversion =========='; SET STATISTICS TIME ON; SELECT @BitBucket = SUBSTRING(CONVERT(VARCHAR(30),SomeDate,106),4,30) FROM dbo.JBMTest; SET STATISTICS TIME OFF;
And, here are the results from one of the many runs I executed:
========== Original Method of Conversion ==========
SQL Server Execution Times:
CPU time = 4781 ms, elapsed time = 4980 ms.
========== DATENAME Method of Conversion ==========
SQL Server Execution Times:
CPU time = 2641 ms, elapsed time = 2765 ms.
========== ENGLISH CONVERT Method of Conversion ==========
SQL Server Execution Times:
CPU time = 2203 ms, elapsed time = 2199 ms.
========== INTERNATIONAL CONVERT Method of Conversion ==========
SQL Server Execution Times:
CPU time = 2250 ms, elapsed time = 2289 ms.
Like I said earlier, the DATENAME() method, which is also (recently) ANSI compliant, is almost exactly twice as fast as the original hard-coded method and has the distinct advantage of being a bit flexible when it comes to language changes.
The non-ANSI compliant method of using CONVERT is another 17% faster than that and when you're working with millions of rows of data, every bit helps. It, too, will change with language settings.
The bottom line is that the code is (even with removing all the extra leading spaces for formatting) more than 200 characters shorter than the original method and runs more than twice as fast. It's also easy to memorize, super quick to type, and avoids the use of Scalar User Defined Functions.
Crack filled!
Thanks for listening, folks.
--Jeff Moden