June 15, 2010 at 1:10 pm
Hello,
In my WHERE clause, I need to compare to 2 datetime columns by ignoring the time part.
I wanted to know whether using the DateDiff function rather than the Convert function would give better query execution performance ?
Which one of the following is better:
1. ... WHERE DateDiff(d, MyDateTime1, MyDateTimeCol2) > 0
2. .. WHERE Convert(DateTime, MyDateTime1, 112) < Convert(DateTime, MyDateTime2, 112)
SQL Server experts, please reply.
Thanks.
June 15, 2010 at 1:31 pm
Although without testing it I can't quantify it, I think the first would perform better, my rationale is that it should be a fairly simple math operation where only the numbers to the left of the decimal would be evaluated.
To prove the second item won't even work:
DECLARE @test-2 datetime
SET @test-2 = GETDATE()
SELECT CONVERT( datetime, @test-2, 112 )
GO
I think you'd have to convert it to text to do the comparison and text operations are going to be more expensive..
If I may offer an alternative, can you add a calculated column that filters off the time and is set to persisted, then you could do a direct comparison and the cost of the stripping of the date would occur a single time for each record at the moment of insertion.
CEWII
June 15, 2010 at 3:10 pm
To answer the question asked, the DateDiff will be more efficient than the convert. You might want to check out the "Common Date/Time Routines" link in my signature.
However, neither of these will be very efficient in a where clause... neither is SARGable, and neither will be able to utilize an index on the column. The best you can hope for with these is an index scan, though you may also get a table scan or clustered index scan. If this is a large table, just take a lunch break while it's running.
Elliott's suggestion to have a persisted, calculated column would work out here - just add an index to that column, and you'll be able to take care of both problems.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
June 15, 2010 at 3:41 pm
From what I've read, the DATEDIFF() method has proven to be more efficient.
Overall, for SQL to use any index, it has to highly selective or a covering index: I suspect that's unlikey here.
Of course you can avoid any manipulation on one of the columns by doing two manipulations on the other one :-). Then at least one of the columns is SARGable.
I'd be suprised if the overhead of two persisted 8-byte columns is really worth it here, unless you do this type of thing a lot on those columns. Typically a little occasional CPU overhead is much better than permanent I/O overhead.
Scott Pletcher, SQL Server MVP 2008-2010
June 15, 2010 at 5:24 pm
scott.pletcher (6/15/2010)
Of course you can avoid any manipulation on one of the columns by doing two manipulations on the other one :-). Then at least one of the columns is SARGable.
Scott, you've got me curious about this. Just what two manipulations would you perform on one datetime column (with time info) that would prevent performing any manipulations on the other datetime column (with time info)? I don't see how you can do it with just two manipulations.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
June 15, 2010 at 5:49 pm
Overall, for SQL to use any index, it has to highly selective or a covering index: I suspect that's unlikey here.
In a smaller environment you'll see the optimizer frequently do without a given index unless it's really perfect for the job. On the smaller tables the cost of the scan is simply not high enough to justify the use of the index. However, as you are working in a larger environment, the optimizer may even choose to use a non-covering index with less than fantastic selectivity just because the cost of the scan is so enormously high. In this case you could see a key lookup to pick up the uncovered columns.
The big thing is not to guess ... do testing and see what the issues are with the execution plan in your environment. It very well may even differ greatly from dev to test to prod.
Of course you can avoid any manipulation on one of the columns by doing two manipulations on the other one :-). Then at least one of the columns is SARGable.
Potentially instead of doing a DATEDIFF which includes both columns, you could hypothetically leave the one date column alone and do a comparison with a DATEADD on the other column, but since the other column isn't SARGable you'll still have it looking at each record. In a huge environment this will result in a crushing performance hit unless the result set is otherwise greatly reduced by some other criteria. In a really small environment you shouldn't really notice the inefficiency of date manipulations. So again, it's just up to you to test to figure out what works for your specific environment.
June 16, 2010 at 7:44 am
However, as you are working in a larger environment, the optimizer may even choose to use a non-covering index with less than fantastic selectivity just because the cost of the scan is so enormously high. In this case you could see a key lookup to pick up the uncovered columns.
...
Potentially instead of doing a DATEDIFF which includes both columns, you could hypothetically leave the one date column alone and do a comparison with a DATEADD on the other column, but since the other column isn't SARGable you'll still have it looking at each record. In a huge environment this will result in a crushing performance hit unless the result set is otherwise greatly reduced by some other criteria.
Isn't that somewhat self-contradictory? Isn't it better to have one of them SARGable than neither? Assuming you already have an index on that one column (if neither column is in any index, don't see how you avoid a scan no matter what you do). I would think the other column would have to be in the index as well, even if just INCLUDEd .... but the optimizer gets better all the time, so I won't try to limit it 🙂 .
I believe he wanted specific code of how to do the compare leaving one column alone, not just a generic statement.
Scott Pletcher, SQL Server MVP 2008-2010
June 16, 2010 at 8:13 am
WHERE date_col1 >= DATEADD(DAY, DATEDIFF(DAY, 0, date_col2), 0)
AND date_col1 < DATEADD(DAY, DATEDIFF(DAY, 0, date_col2) + 1, 0)
Or:
WHERE date_col1 BETWEEN DATEADD(DAY, DATEDIFF(DAY, 0, date_col2), 0) AND DATEADD(MS, -3, DATEADD(DAY, DATEDIFF(DAY, 0, date_col2) + 1, 0))
Scott Pletcher, SQL Server MVP 2008-2010
June 16, 2010 at 10:08 am
I can see how that would work.
It still seems like a lot of operations.
If this is an infrequent or even marginally frequent query it shold probably be ok. If not you might look at my original suggestion.. A lot depends (in my mind) on how often this kind of comparison is done.
CEWII
June 16, 2010 at 10:11 am
Absolutely right.
If this is a commonly used query, I would probably "bite the bullet" and make sure there was an effective (i.e. time-saving) covering index and that SQL was actually using it for the query (it's easy to forget that step, but just because you know the covering index is there doesn't necessarily mean SQL will use it :-)).
If it's run once a month, say, you might just let this particular query be a little slow rather than adding an index to the table or otherwise enlarging the table.
Scott Pletcher, SQL Server MVP 2008-2010
June 16, 2010 at 10:15 am
scott.pletcher (6/16/2010)
WHERE date_col1 >= DATEADD(DAY, DATEDIFF(DAY, 0, date_col2), 0)AND date_col1 < DATEADD(DAY, DATEDIFF(DAY, 0, date_col2) + 1, 0)
Or:
WHERE date_col1 BETWEEN DATEADD(DAY, DATEDIFF(DAY, 0, date_col2), 0) AND DATEADD(MS, -3, DATEADD(DAY, DATEDIFF(DAY, 0, date_col2) + 1, 0))
I see where you're coming from, and I was thinking this, but I count 4 manipulations, not 2. Was really curious if there would in fact be a way of doing this with just 2.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
June 16, 2010 at 10:20 am
Not that I'm aware of, if you're counting down to that level.
The key point is to allow one of the columns to be non-manipulated in any way so that it is SARGable and thus "index-searchable".
Btw, if I were going to persist something, I would persist a smallint "day difference" column rather than persisting the entire date columns.
For example, something like this:
ALTER TABLE tablename
ADD DayDiffBtwnDate1Date2 AS CAST(DATEDIFF(DAY, date1, date2) AS smallint) PERSISTED
I could then include that column in queries and/or index(es), as apporpriate.
I think that will meet all the needs, although I haven't thought enough about it to walk thru all the scenarios in my head yet.
Scott Pletcher, SQL Server MVP 2008-2010
June 16, 2010 at 10:40 am
Just a small suggestion, if you will go with creating persisted computed column (and assuming your columns are in two diffrent tables), store the date as an INT using:
FLOOR(CAST(DateTimeValue AS FLOAT))
It will be 4 byte less per value to store and a bit faster for comparison.
Of cause, if both of your columns are in the same table, you just need to store result of DATEDIFF (which is also will be an int).
June 16, 2010 at 11:28 am
I might have misread something earlier but I wasn't thinking they were in the same table, the code presented is unclear. If they are and the difference between then is often used then by all means store just that. As far as storing as integer, I don't really have any objection if that column is used for comparisons only, if they ever want to display it I might consider smalldatetime..
Like many other things.. It depends..
CEWII
June 16, 2010 at 11:45 am
Elliott Whitlow (6/16/2010)
...I don't really have any objection if that column is used for comparisons only, if they ever want to display it I might consider smalldatetime...
You can easely convert INT back to datetime or smalldatetime (without time portion) for display purposes.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply