Introduction
First of all, I agree with most of the heavy hitters in saying that you should avoid formatting in both the “Data Layer” and the “Business Layer” of your applications and that it should only be done in the “Presentation Layer”. The two big reasons for that are…
- It’s cheaper to do such conversions in the application layer. SQL Server is expensive compared to the application layer and shouldn’t usually be bogged down with formatting tasks.
- If done properly in the “Presentation Layer”, the dates and times can follow the “local formatting” that the user has on their computer.
Still, there are occasions where the “Data Layer” and the “Presentation Layer” are the same. As just one example, if you have to send out files that contain dates as strings, then you frequently have to do the formatting in the “Data Layer” to correctly populate the file.
In such cases, you want the formatting to be as fast as possible for both duration and CPU usage, and that’s a nice segue into a rather nasty problem.
The FORMAT() function, which can be used to format dates (and other things) as strings, made its appearance in SQL Server 2012 which, at the time of this writing, was almost a decade ago. It was known even before it came out that it has a terrible performance issue and yet people keep writing about how to use it without that warning.
Others have written about the performance problem and do tests that demonstrate that it’s 3 to 4 times slower than using the CONVERT() function. That’s actually a gross understatement. If you take out display and disk times, it’s actually much worse and we’re going to prove it in this article
While the FORMAT() function is convenient, especially for those well versed in .NET, the purpose of this article is to convince you that it should NEVER be used in SQL Server… and “NEVER” is a word that I don’t use often!
I’ll also tell you that we're not going to teach you everything about formatting strings in this article. A short book could be written on the subject and that’s not the objective of this article. The objective of this article is to convince people that you shouldn’t use the FORMAT() function when you need to format strings, especially but not limited to Dates and Times.
Setting Up the Test Table
The test table consists of a million rows in a typical configuration with an ID column, a column that contains a DATETIME, and a few other columns. We’ll simulate those several other columns using a single “Fluff” column that’s always 130 bytes wide by using a fixed width CHAR(130) as its datatype.
Also, I use my fnTally() function as a high performance row source to replace various forms of RBAR1 , which includes WHILE loops, recursive CTEs, recursive functions, and “GO nnn”. You can, of course, use your own if you have one. If you don’t have one, you can get a copy of mine at the following URL: https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
Here’s the code to build the test table as a temporary table in TempDB. On my box, it takes less than a second to complete. The first 1,000 rows will be displayed once the table is created.
--===================================================================================== -- Build a test table containing 1 million rows of randomized dates and times. --===================================================================================== --===== Conditionally Drop the Test Table ============================================= DROP TABLE IF EXISTS #FormatTest; GO --===== Create the Test Table ========================================================= SET NOCOUNT OFF ; CREATE TABLE #FormatTest ( RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED ,SomeDT DATETIME --Our random DATETIME column ,Fluff CHAR(130) DEFAULT 'X' --Simulated several other columns ) ; --===== Populate the Test Table ======================================================= -- One million rows of test data spread across 10 years of dates DECLARE @LoDT DATETIME = '2020' --Inclusive Start Date ,@CoDT DATETIME = '2030' --Exlusive CutOff Date ; INSERT INTO #FormatTest WITH (TABLOCK) --Required for Minimal Logging (SomeDT) SELECT SomeDT = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,@LoDT,@CoDT)+@LoDT FROM dbo.fnTally(1,1000000) ; --===== Let's see a sample of what the table contains... ============================== SELECT TOP 1000 * FROM #FormatTest ORDER BY RowNum ;
Test Code Structure
Each test will have the following nearly sections:
- A “Title” section to identify the test being conducted. This will appear in the “Messages” window in SSMS.
- That will be followed by code that clears various caches to make all tests the same. Don’t run this code on a production box.
- The next section is the actual test code. It turns on STATISTICS TIME , which will appear also appear in the “Messages” tab. The first comment in this section will also appear in the “TextData” column if you’re running SQL Profiler on whatever SPID your testing from.
- Each test will be executed 3 times using a “GO 3” to test for an "Empty Cache" condition, 2nd run when the execution plan is actually stored (if you have Optimize for Ad Hoc queries enabled, and you usually should), and a fully cached 3rd run.
- We’ll run two tests, one after the other. The first test will use the FORMAT() function and the second test will use something other than FORMAT().
The run results of each pair of tests will be included in this article as they appear in SQL Profiler. You can see similar results for CPU and Duration in the messages window if you don’t want to fire up SQL Profiler when doing your own testing.
Also note… displaying a million rows takes a certain amount of time, no matter what. I call that the "Great Equalizer" when it comes to measuring performance of code. To overcome that problem, I’ve removed all display time from the equation by dumping the output to an “@BitBucket” variable so that we can see what the real computational differences are.
Test #1 – Convert to a “MM/DD/YYYY” format.
This uses a very straight forward FORMAT() and an equally straight forward CONVERT using the “101” style.
Here’s the code…
--===================================================================================== PRINT REPLICATE('=',119); RAISERROR('===== FORMAT(MM/dd/yyyy) ==========',0,0) WITH NOWAIT; --===================================================================================== SET STATISTICS TIME,IO OFF; CHECKPOINT; DBCC FREEPROCCACHE WITH NO_INFOMSGS; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; GO --===== FORMAT(MM/dd/yyyy) ============================================================ SET STATISTICS TIME ON; DECLARE @BitBucket VARCHAR(50) ; SELECT @BitBucket = FORMAT(SomeDT,'MM/dd/yyyy') FROM dbo.FormatTest ; SET STATISTICS TIME,IO OFF; PRINT REPLICATE('-',119); GO 3 --===================================================================================== PRINT REPLICATE('=',119); RAISERROR('========== CONVERT 101 (mm/dd/yyyy) ==========',0,0) WITH NOWAIT; --===================================================================================== SET STATISTICS TIME,IO OFF; CHECKPOINT; DBCC FREEPROCCACHE WITH NO_INFOMSGS; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; GO --===== CONVERT 101 (mm/dd/yyyy) ====================================================== SET STATISTICS TIME ON; DECLARE @BitBucket VARCHAR(50) ; SELECT @BitBucket = CONVERT(VARCHAR(50),SomeDT,101) FROM dbo.FormatTest ; SET STATISTICS TIME,IO OFF; PRINT REPLICATE('-',119); GO 3
… and here are the SQL Profiler results:
Taking the average CPU and Duration of each group of tests, it turns out that the CONVERT 101 method is a whopping 27.8 times faster for both CPU and Duration!
Test #2
Many folks will balk at the first test and say that it was too easy and the CONVERT was built to do that. What about a more complicated format where CONVERT doesn’t have an equivalent “style”?
--===================================================================================== PRINT REPLICATE('=',119); RAISERROR('========== FORMAT(D) ==========',0,0) WITH NOWAIT; -- This is like Friday, February 25, 2028 --===================================================================================== SET STATISTICS TIME,IO OFF; CHECKPOINT; DBCC FREEPROCCACHE WITH NO_INFOMSGS; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; GO --===== FORMAT(D) ===================================================================== SET STATISTICS TIME ON; DECLARE @BitBucket VARCHAR(50) ; SELECT @BitBucket = FORMAT(SomeDT,'D') --Adapts to language but is "unpredictable". FROM dbo.FormatTest --For example, removes commas in French. ; SET STATISTICS TIME,IO OFF; PRINT REPLICATE('-',119); GO 3 --===================================================================================== PRINT REPLICATE('=',119); RAISERROR('========== BRUTE FORCE ==========',0,0) WITH NOWAIT; --===================================================================================== SET STATISTICS TIME,IO OFF; CHECKPOINT; DBCC FREEPROCCACHE WITH NO_INFOMSGS; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; GO --===== BRUTE FORCE =================================================================== SET STATISTICS TIME ON; DECLARE @BitBucket VARCHAR(50) ; SELECT @BitBucket = DATENAME(dw,SomeDT)+', ' + DATENAME(mm,SomeDT)+' '+DATENAME(dd,SomeDT)+', ' + DATENAME(yy,SomeDT) FROM dbo.FormatTest ; SET STATISTICS TIME,IO OFF; PRINT REPLICATE('-',119); GO 3
Here are the SQL Profiler results from that:
Again, doing the math of taking the averages of CPU and Duration for each set of runs, we find that the “Brute Force” method takes 17.3 times less CPU and runs 17.7 times faster than FORMAT() does.
SQL Death by a Thousand Cuts
I’ve heard it a thousand times… “But it doesn’t matter because I’m only doing formatting for 1 or a couple of rows at a time”.
Ladies and Gentlemen, performance ALWAYS matters and is second only to accuracy and it’s a very, very close second. Stop and think about it…
First, how many times per hour are such “low row count” queries going to run in an hour?
Second, what would it be like if everything on your server ran even just 2 to 3 times faster, never mind 17 to 27 times faster?
Third, now that you know the right way, why would you intentionally do it the wrong way?
Oh! I see… “Developers time is valuable”. If your Developers don't know how to do this type of thing and do it quickly, there's a simple solution... GET THEM SOME TRAINING!... especially in the areas of performance. You won't be sorry you did. If you're a Developer, spend some time at self-training. It's a great investment in your career.
Just stop using FORMAT()… server time is just as valuable Developer time and it’s valuable 24/7/365 even when the Developers are sleeping.
And the “Death by a Thousand Cuts” is, if you always take the attitude that you’re only working with a couple of rows and forget about performance in all those low-row applications, your server is going to be 17 to 27 times slower and, to fix it, you’re going to have to fix it in a thousand places.
Wrap Up
FORMAT() is seriously slow and CPU intensive compared to the more traditional methods of converting Dates and Times to strings in SQL Server/T-SQL. Use CONVERT or DATENAME with some concatenation instead.
Here are some links for CONVERT and DATENAME that can help you with this issue as well as many other areas of your code. And don’t stop at just date conversions for CONVERT… it’s a bit long but there’s a wealth of information in that article about many things even including thing like which conversions are NOT deterministic and much more!
- https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql
- https://docs.microsoft.com/en-us/sql/t-sql/functions/datename-transact-sql
And, there’s some good utility in DATEPART, as well…
And, as a bit of a sidebar, the STR() function is also slow (about 3 times as slow as other methods) and has other issues, as well. Please see the following article on that. It's old but still very relevant.
Thanks for listening, folks!
-- Jeff Moden
1 "RBAR” is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row"
© Copyright - Jeff Moden, 03 Jul 2022, All Rights Reserved