April 27, 2009 at 11:04 pm
Comments posted to this topic are about the item Comparison of Dates in SQL
April 28, 2009 at 1:05 am
Is SQL Server able to use indexes when using DataDiff(Day, D1, D2) = 0 ?
Best regards
Henrik Staun Poulsen
April 28, 2009 at 1:12 am
I think that this articole is very academic and it is not useful for work, because it works, it's true, but the indexes? Do the indexes works?
Bye
mandu
April 28, 2009 at 2:21 am
Hi,
The following statement is untrue: "it is always a good practice to use DATEDIFF method for date comparison instead of making comparison using logical operators (,=)".
As the previous two posters noted, using a function call on an indexed date column (or any indexed column, for that matter) prevents SQL Server from doing an efficient "seek" on the values in that column for that index - all the values need to be retrieved first, then the function applied. If the date column has the highest selectivity, then this can be a severe performance killer.
A safer way to ensure that you get all dates in a certain range, ignoring the "time" component, without applying a function to the source column, is using DateAdd as follows:
--select the correct data type here, DateTime vs SmallDateTime,
-- as you might otherwise incurr another conversion function call
-- on the source column!
DECLARE @FromDate DateTime
DECLARE @ToAndIncludingDate DateTime
SET @FromDate = '2009-04-01'
SET @ToAndIncludingDate = '2009-04-15'
SELECT *
FROM SomeTable
WHERE (SomeColumn >= @FromDate AND SomeColumn < DateAdd(Day, 1, @ToAndIncludingDate) )
HTH,
Tao
http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
April 28, 2009 at 2:32 am
I am pretty sure that use of datediff won't work well with indexes, since it requires a calculation to be performed on every row that is being compared.
When I am working with dates, I always say >= the begin date, and then = @startdate and somedate < @enddate -- Note I am selecting all records PRIOR to the next day
END
[/code]
Since the @enddate is effectively the beginning of the day AFTER the date required, saying < will then get all records up to and including the end of the day requested...
Here are the date functions I use. Unfortunately I cannot remember if I wrote them myself or found them somewhere... My dbdate function used to use the convert function to get a string representation and then truncate and convert the result back into a date, but I ran into trouble with different regional settings (where dates and times are expressed differently). The method below is a bit slower and but seems to work for any regional setting.
-- Remove the time portion from a date.
CREATE FUNCTION [dbo].[DBDate] (@DATE datetime)
RETURNS datetime
AS
BEGIN
DECLARE @Date1 DateTime
SET @Date1 = @Date
SET @Date1 = DateAdd(ms, -1 * DatePart(ms, @Date1), @Date1)
SET @Date1 = DateAdd(ss, -1 * DatePart(ss, @Date1), @Date1)
SET @Date1 = DateAdd(mi, -1 * DatePart(mi, @Date1), @Date1)
SET @Date1 = DateAdd(hh, -1 * DatePart(hh, @Date1), @Date1)
RETURN @Date1
END
-- Get 12AM of the next day
CREATE FUNCTION [dbo].[DBDateNext] (@DATE datetime)
RETURNS datetime
AS
BEGIN
RETURN DateAdd(dd, 1, dbo.DBDate(@Date))
END
April 28, 2009 at 2:38 am
Interesting approach - we use Convert, relying on the fact that DateTime values are internally stored as floating-point values, with every whole number representing a day:
SET @DateWithoutTime = Convert(DateTime, Floor(Convert(Float, @OriginalDate)))
Any idea which is faster? I don't have time to test right now, but I'm quite curious 🙂
http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
April 28, 2009 at 2:40 am
henrik staun poulsen (4/28/2009)
Is SQL Server able to use indexes when using DateDiff(Day, D1, D2) = 0 ?Best regards
Henrik Staun Poulsen
FTFY 🙂
The answer is yes.
But only if:
Otherwise, no 😉
Cheers,
Paul
April 28, 2009 at 2:46 am
Hi Tao - guess we could always just run both methods over a largish table and get stats out of the query analyser (like cpu usage, etc) ... In terms of finding the start of a day - the method I proposed has a lot of steps in it than yours - so its probably slower, also, your method is compact enough just to use in the original sproc rather than doing a function call.
April 28, 2009 at 3:01 am
guess we could always just run both methods over a largish table and get stats out of the query analyser (like cpu usage, etc)
I did exactly that, using the small example, you find that the datediff and dateadd are nearly identical, indexes or not. So I grabbed a large table (a few million rows) from our systems, indexed them appropriately and ran a few tests. When comparing my three tests, datediff ranked at 90%, dateadd, although a complex plan, took only 9%, and the final test took the remaining 1%. What we typically do for tables which are queried on a datetime field religiously is to actually create another column with the date portion only. It does require an extra 4 bytes (smalldatetime) per row, but disk space is cheap. Just remember to update the column after the data load in a batch update statement, don't use triggers, computed columns, or calcs in your inserts, as this will slow your data loads down drastically.
April 28, 2009 at 3:04 am
Tao Klerks (4/28/2009)
The following statement is untrue: "it is always a good practice to use DATEDIFF method for date comparison instead of making comparison using logical operators (,=)".
Agreed.
Tao Klerks (4/28/2009)
As the previous two posters noted, using a function call on an indexed date column (or any indexed column, for that matter) prevents SQL Server from doing an efficient "seek" on the values in that column for that index - all the values need to be retrieved first, then the function applied. If the date column has the highest selectivity, then this can be a severe performance killer.
Not always.
As mentioned previously, the QO will consider using a useful index containing a function on a persisted column or as part of any indexed view (enterprise only). Note that the QO may consider using an index from a view if that index covers a guaranteed superset of the target rows. See http://msdn.microsoft.com/en-us/library/ms187864(SQL.90).aspx or Books Online for details.
There are some trivial examples where a function will not prevent a seek - one is adding ISNULL around a non-nullable column. This doesn't work with COALESCE.
Tao Klerks (4/28/2009)
A safer way to ensure that you get all dates in a certain range, ignoring the "time" component, without applying a function to the source column, is using DateAdd...
True. Though the example given may not result in a good plan either, because SQL Server cannot sniff the local variable parameters. The usual solutions apply: create a nested procedure, use dynamic SQL, add OPTION (RECOMPILE) or (OPTIMIZE FOR(@var=value)).
Cheers,
Paul
April 28, 2009 at 3:14 am
From the following code, it is verified that the Index is used with or without function on the column...
Create Table #t1 (mid int,gdate datetime)
CREATE CLUSTERED INDEX [IX_ALIEN_MISSILE] ON [#t1]
(
[gdate] ASC
) ON [PRIMARY]
Insert into #t1 Values ( 1,'01-01-2009 06:00:00' )
Insert into #t1 Values ( 2,'01-01-2009 07:00:00' )
Insert into #t1 Values ( 3,'01-01-2009 04:00:00' )
Insert into #t1 Values ( 4,'01-01-2009 03:00:00' )
Insert into #t1 Values ( 5,'01-02-2009 04:00:00' )
Insert into #t1 Values ( 6,'01-02-2009 05:00:00' )
Insert into #t1 Values ( 7,'01-02-2009 06:00:00' )
Insert into #t1 Values ( 8,'01-02-2009 06:00:00' )
Insert into #t1 Values ( 9,'01-03-2009 04:00:00' )
Insert into #t1 Values ( 10,'01-03-2009 04:00:00' )
Insert into #t1 Values ( 11,'01-03-2009 05:00:00' )
Insert into #t1 Values ( 12,'01-03-2009 06:00:00' )
Insert into #t1 Values ( 13,'01-03-2009 07:00:00' )
Insert into #t1 Values ( 14,'01-04-2009 08:00:00' )
Insert into #t1 Values ( 15,'01-04-2009 09:00:00' )
Insert into #t1 Values ( 16,'01-04-2009 10:00:00' )
Insert into #t1 Values ( 17,'01-04-2009 10:00:00' )
Select * from #t1 where Datediff(day,gdate,Convert(datetime,'2009-01-01 00:00:00.000')) = 0
Select * from #t1 where Convert(datetime,Convert(varchar(20),gdate,110)) = Convert(datetime,'2009-01-01 00:00:00.000')
Select * from #t1 where gdate = Convert(datetime,'2009-01-01 00:00:00.000')
drop table #t1
On checking the execution plan, it is clear that the Inde Scan is Used on the Convert of DateDiff Function and Index Seek is Used if used without function on the column. In either case, Index is Used...
April 28, 2009 at 3:17 am
This is not something I had ever heard of:
Paul White (4/28/2009)
Though the example given may not result in a good plan either, because SQL Server cannot sniff the local variable parameters.
Are you saying that the following statement would result in a better query plan? (or more reliably result in a good query plan?)
SELECT *
FROM SomeTable
WHERE (SomeColumn >= '2009-04-01' AND SomeColumn = '2009-04-01' AND SomeColumn < DateAdd(Day, 1, '2009-04-15') )
Is there any reference to this issue / mechanism anywhere online that I can look at?
http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
April 28, 2009 at 3:37 am
benr (4/28/2009)
When I am working with dates, I always say >= the begin date, and then < the next day. Eg. If you want all records up to and including the end of 31st March 2009 your query should ask for all records prior to the 1st of April. This method should then be able to leverage indexes.
This is true, and a valid approach, unless either date column can be NULL.
Adding ISNULL or COALESCE on a NULLable column will prevent index usage (subject to the caveats I already posted). Using OR to cope with the NULLs will almost always prevent index usage too. Using a number of UNION ALLs to cope with the NULLs will usually be able to use indexes, but n seeks and n - 1 UNION ALLs will not always be cheaper than one index scan.
The functions posted can be improved because they:
1. Use multiple SETs where a single SELECT would suffice
2. Does not specify SCHEMABINDING to mark it as deterministic
3. Does not specify RETURNS NULL ON NULL INPUT to avoid running for NULL inputs
The function body can be rewritten in one line as below
RETURN CONVERT(DATETIME, CONVERT(CHAR(8), @Date, 112), 112)
If the new function is marked with SCHEMABINDING, it is deterministic and can be used in a persisted column. It should also be marked with RETURNS NULL ON NULL INPUT.
Cheers,
Paul
April 28, 2009 at 3:45 am
Tao Klerks (4/28/2009)
Interesting approach - we use Convert, relying on the fact that DateTime values are internally stored as floating-point values, with every whole number representing a day
The FLOAT method is imprecise in the sense required for functions to be used in persisted columns. For that reason, I wouldn't use this code as the basis for a scalar UDF.
I don't suppose Microsoft are about to change the relationship between FLOATs and DATETIMEs, but it seems an unnecessary hidden dependency to add to your code.
Some folks also view this trick as a bit of a hack, when a better solution exists.
Paul
April 28, 2009 at 3:58 am
craig.lovegren (4/28/2009)
I did exactly that, using the small example, you find that the datediff and dateadd are nearly identical, indexes or not. So I grabbed a large table (a few million rows) from our systems, indexed them appropriately and ran a few tests. When comparing my three tests, datediff ranked at 90%, dateadd, although a complex plan, took only 9%, and the final test took the remaining 1%.
It sounds as if you are quoting percentage batch costs from the execution plan. If so, please be aware that these percentages are based on the optimizers cost estimates. To accurately report performance results, you need a reproducible test set up (with code) and output from SET STATISTICS IO, TIME ON as a minimum. It is also common to start with a cold cache. A posted actual execution plan is often helpful too.
craig.lovegren (4/28/2009)
What we typically do for tables which are queried on a datetime field religiously is to actually create another column with the date portion only. It does require an extra 4 bytes (smalldatetime) per row, but disk space is cheap. Just remember to update the column after the data load in a batch update statement, don't use triggers, computed columns, or calcs in your inserts, as this will slow your data loads down drastically.
Well that all rather depends on your priorities. Computed columns have the distinct advantage of being guaranteed to match the source data at all times. Someone could update one of your date columns and not the other.
I am confused as to how a computed column slows down your data loads 'drastically'. Also, simple scalar calculations in the SELECT part of an INSERT are typically so much cheaper than the INSERT operation that it makes no odds.
Viewing 15 posts - 1 through 15 (of 110 total)
You must be logged in to reply to this topic. Login to reply