Let’s talk about the DATEDIFF vs. DATEADD functions in SQL Server. Each one performs a different task. The former calculates the differences between two given date & time values based on some date part (day, minute, second, etc.). The latter calculates a date & time value by adding a number of date parts to a given date & time value. You can use both of them to achieve the same results, but performance will be very different.
Once in a while I encounter a query that looks like the following:
SELECT * FROM Sales.Orders WHERE DATEDIFF (DAY , OrderDateTime , SYSDATETIME ()) <= 90;
This query is supposed to retrieve orders in the last 90 days. But there is a problem with this query, and I’m not talking about the use of SELECT *, which is not a good practice for so many reasons. I’m talking about applying the predicate on an expression rather than on a column alone. In this case the expression involves the DATEDIFF function, but it can be any other expression. If we don’t apply the predicate on the column alone, then the optimizer can’t use an index on that column, even if such an index exists. And if the table is very large, and the predicate filters out most of the table, then instead of getting a quick and efficient index seek, we will get a long and heavy table scan.
So we need to rewrite the predicate to look like the following:
SELECT * FROM Sales.Orders WHERE OrderDateTime {Operator} {Constant Expression};
Where {Operator} is “<=” or “>=” or something similar, and {Constant Expression} is some expression, which does not depend on any column and therefore can be calculated once for the query. This way the optimizer will be able to estimate the number of rows based on the constant expression and the column statistics, and if the estimated number of rows is low enough, it can leverage the index on the “OrderDateTime” column.
So if you think about it, instead of asking for “the orders for which the number of days between them and now is 90 or less”, we can ask for “the orders that were created on or after a certain point in time, which is now less 90 days”. And the query can be rewritten like this:
SELECT * FROM Sales.Orders WHERE OrderDateTime >= DATEADD (DAY , -90 , SYSDATETIME ());
I tested both queries on a sample table, which has millions of rows, and only around 500 rows in the last 90 days. The first query produced a table scan, while the second query produced an index seek. Of course, the execution time of the second query was much lower than the first query.
Both queries were supposed to return the orders in the last 90 days, but the first query returned 523 rows, and the second query returned 497 rows. So what’s going on?
The answer has to do with the way DATEDIFF works. This function returns the number of date parts (days, years, seconds, etc.) between two date & time values. It does that by first rounding down each one of the date & time values to the nearest date part value, and then counting the number of date parts between them.
For example, let’s calculate the number of years between the following two date & time values: “2016-12-31 23:59:59” and “2017-01-01 00:00:00”. The time between these two values is one second, but the result of applying the DATEDIFF function on these values is one year:
DATEDIFF (YEAR , '2016-12-31 23:59:59' , '2017-01-01 00:00:00') = 1
This is because each one of the values is first rounded down to the nearest whole year, like this:
DATEDIFF (YEAR , '2016-01-01 00:00:00' , '2017-01-01 00:00:00') = 1
The DATEADD function, on the other hand, doesn’t need to round anything. It just adds (or subtracts) a given number of date parts to a date & time value. So this is why we get different results from the two queries, although both of them do the same thing conceptually – they both return the orders in the last 90 days.
In many cases, the accuracy is not important, so as long as the query answers the business question (“give me the orders in the last 90 days”), then it’s good enough. Fortunately, the second query (using DATEADD) is not only much more efficient, thanks to the ability to perform an index seek, but it’s also more accurate, because no rounding is involved.
But you need to be careful. If users are already used to a report that runs the DATEDIFF query behind the scenes, and you change it to the DATEADD query, then the users will be very grateful for the performance improvement, but as soon as they realize that the results of the report are not consistent with what they were used to, then they might get very angry at you. Sometimes consistency is more important than accuracy, and you need to make the right choice. If you need to maintain consistency, but you still want to be able to utilize the index and get good performance, then you can mimic the behavior of the DATEDIFF function by rounding down the values yourself, like this:
SELECT * FROM Sales.Orders WHERE OrderDateTime >= DATEADD (DAY , -90 , DATEADD (DAY , DATEDIFF (DAY , '2000-01-01' , SYSDATETIME ()) , '2000-01-01'));
The expression beginning with the second DATEADD rounds down the value of SYSDATETIME () to the nearest whole day, which is today at midnight. There is no need to round down the value of “OrderDateTime”, because if the rounded value is greater than the constant expression, then the value itself is also greater than the constant expression.
This query looks ugly, but it works. It uses an index seek, and it is consistent with the result of the DATEDIFF query.
The post DATEDIFF vs. DATEADD appeared first on Madeira Data Solutions.