February 8, 2016 at 5:21 pm
I’m a new member on a team and came across the following query (which I’ve obfuscated) in a stored procedure and immediately thought I would be able to add value by explaining that casting a column used in the criteria would be a performance problem. I took the original query and my re-worked version and got the estimated execution plan. Much to my dismay, both queries are doing an index seek! I have nothing to back up my suggestion now. Why on earth is it doing a seek on what I thought was the poorly written query? The table itself is very small since it’s a dev environment – approximately 800 rows. There is a nonclustered index on CreatedDate.
-- What is being used
SELECT SomeID,
CreatedDate
FROM SomeHistory
WHERE CAST(CreatedDate AS DATE) BETWEEN '2/1/2016' AND '2/8/2016'
ORDER BY CreatedDate desc
-- What I want to suggest
SELECT SomeID,
CreatedDate
FROM SomeHitory
WHERE CreatedDate BETWEEN '2/1/2016' AND '2/9/2016'
ORDER BY CreatedDate desc
I'm also attaching images of the execution plans as well as the properties of the Index Seek items from the plan.
Thank you!
February 8, 2016 at 5:33 pm
LSAdvantage (2/8/2016)
I’m a new member on a team and came across the following query (which I’ve obfuscated) in a stored procedure and immediately thought I would be able to add value by explaining that casting a column used in the criteria would be a performance problem. I took the original query and my re-worked version and got the estimated execution plan. Much to my dismay, both queries are doing an index seek! I have nothing to back up my suggestion now. Why on earth is it doing a seek on what I thought was the poorly written query? The table itself is very small since it’s a dev environment – approximately 800 rows. There is a nonclustered index on CreatedDate.
-- What is being used
SELECT SomeID,
CreatedDate
FROM SomeHistory
WHERE CAST(CreatedDate AS DATE) BETWEEN '2/1/2016' AND '2/8/2016'
ORDER BY CreatedDate desc
-- What I want to suggest
SELECT SomeID,
CreatedDate
FROM SomeHitory
WHERE CreatedDate BETWEEN '2/1/2016' AND '2/9/2016'
ORDER BY CreatedDate desc
I'm also attaching images of the execution plans as well as the properties of the Index Seek items from the plan.
Thank you!
To my recollection, the SQL Server product team knew that this exact CAST was going to be done SO MUCH that they actually baked it into the engine so that it would be "seekable". Do note that I have just returned from a short trip to Dubai so my brain is not necessarily trust-worthy right now. :w00t:
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 8, 2016 at 6:09 pm
I get different plans but indeed see an index seek on both. One item to be careful of with your recommendation regarding the logic:
-- What you should suggest
SELECT SomeID,
CreatedDate
FROM SomeHistory
WHERE CreatedDate >= '2/1/2016'
AND CreatedDate < '2/9/2016' -- up to, but not including 2/9
ORDER BY CreatedDate desc;
Testbed:
USE tempdb;
DROP TABLE SomeHistory
CREATE TABLE SomeHistory (SomeID int, CreatedDate datetime);
INSERT INTO SomeHistory select top 100000 c1.object_id, DATEADD(DAY,CHECKSUM(NEWID()) % (15* 365),c2.create_date) from sys.all_columns c1 cross join sys.tables c2 cross join sys.all_columns c3;
CREATE NONCLUSTERED INDEX ix1 ON SomeHistory(CreatedDate)
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 8, 2016 at 8:50 pm
While it's quite cool that there are CASTs that still allow seeks, in this case you would still want to look into avoiding the CAST.
As discussed at http://dba.stackexchange.com/questions/34047/cast-to-date-is-sargable-but-is-it-a-good-idea, there are still things about that plan that are suboptimal.
For one thing, the estimate for the number of rows coming out of the seek can be quite a bit more inaccurate when you do the CAST and force the dynamic seek pattern.
The performance of the version with the CAST is also not quite as good, so I avoid it if at all possible.
To illustrate:
CREATE TABLE test_dates (some_datetime datetime);
WITH n1(x) AS (SELECT 1 UNION ALL SELECT 1), -- 2 rows
n2(x) AS (SELECT 1 FROM n1 a, n1 b), -- 4 rows
n3(x) AS (SELECT 1 FROM n2 a, n2 b), -- 16 rows
n4(x) AS (SELECT 1 FROM n3 a, n3 b), -- 256 rows
n5(x) AS (SELECT 1 FROM n4 a, n4 b), -- 65536 rows
tally(x) AS (SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM n5)
INSERT INTO test_dates
SELECT DATEADD(minute,x,'20130101')
FROM tally;
CREATE CLUSTERED INDEX CI_some_datetime ON test_dates (some_datetime);
GO
CREATE PROCEDURE test_no_cast
AS
DECLARE @datetime_bucket as datetime
SELECT @datetime_bucket=some_datetime
FROM test_dates
WHERE some_datetime BETWEEN '20130101' AND '20130103';
GO
CREATE PROCEDURE test_sargable_cast
AS
DECLARE @datetime_bucket as datetime
SELECT @datetime_bucket=some_datetime
FROM test_dates
WHERE CAST(some_datetime AS DATE) BETWEEN '20130101' AND '20130102';
GO
EXEC test_no_cast
GO 1000
EXEC test_sargable_cast
GO 1000
SELECT proc_name=object_name(object_id, database_id),
total_logical_reads,
total_worker_time,
total_elapsed_time
FROM sys.dm_exec_procedure_stats ps
WHERE object_name(object_id, database_id) LIKE 'test%';
DROP PROCEDURE test_no_cast, test_sargable_cast;
DROP TABLE test_dates;
On my instance the actual number of rows returned is 2880 for the non-CAST version and 2879 for the SARGable CAST (the two are not logically the same because the non-CAST version will return midnight of the upper boundary).
The estimated rows for the non-CAST version is 2879.71, acceptably close 🙂
The estimated rows for the SARGable CAST is 1439.86, rather off, especially if that under-estimate gets propagated through the rest of a more complex plan.
Here are the performance results on my lab machine:
proc_name total_logical_reads total_worker_time total_elapsed_time
------------------------------ -------------------- -------------------- --------------------
test_sargable_cast 9000 994330 995330
test_no_cast 9000 491606 492603
While in absolute terms that's not a crushing difference for 1000 executions, it's a pretty significant percentage difference, and there's really no point conceding that performance just to use a CAST that happens to be able to use a seek, however nifty that might seem 🙂
Especially when combined with the potential cardinality estimation problems, I'd avoid utilizing these SARGable CASTs unless there's a really good reason to use them.
Cheers!
February 9, 2016 at 5:22 am
Apparently there is a compatibility matrix which lets the engine know if the cast preserves the order of the data between the 2 data types.
So Cast ( intcol as small int) and cast( datecol as datetime) will use an index, but I dont think it will log a missing index if one does not exist.(Could be mistaken).
Only cast uses it as far as I know, and not convert.
I would use it because it is more readable to do the below
cast(startdatetime as date) = cast(getdate() as date)
than the whole startdatetime between (dateadd(datepart(())) and dateadd(datepart blah blah -3 millesecond+1day) contrivance to get all events for today.
February 9, 2016 at 7:34 am
MadAdmin (2/9/2016)
Apparently there is a compatibility matrix which lets the engine know if the cast preserves the order of the data between the 2 data types.So Cast ( intcol as small int) and cast( datecol as datetime) will use an index, but I dont think it will log a missing index if one does not exist.(Could be mistaken).
Only cast uses it as far as I know, and not convert.
Yet another reason to use the ANSI-SQL CAST instead of the proprietary CONVERT by default.
I would use it because it is more readable to do the below
cast(startdatetime as date) = cast(getdate() as date)
than the whole startdatetime between (dateadd(datepart(())) and dateadd(datepart blah blah -3 millesecond+1day) contrivance to get all events for today.
For performance reasons this query form would be preferable:
startdatetime >= cast(cast(getdate() as date) as datetime)
and startdatetime < cast(cast(dateadd(day,1,getdate()) as date) as datetime)
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 9, 2016 at 8:25 am
To back up what Orlando said, here are some performance metrics from my machine using the test harness from above (just modified to have datetimes today, and with the procedures modified to use the new code patterns):
proc_name total_logical_reads total_worker_time total_elapsed_time
---------------- -------------------- -------------------- --------------------
test_MadAdmin 12000 637796 639004
test_Orlando 9000 329322 330552
We see the same cardinality estimation issue I mentioned in my previous post as well (only much worse in this particular case), with the version that doesn't CAST the column correctly estimating 1440 rows, while the version that CASTs the column estimates 1 row, and returns the same 1440 (but at least it uses a seek! :hehe:).
For me, the very slight increase in simplicity of the code is not worth all those performance worries, but if the goal is to make the code as easily readable as possible, and you know for sure that the code will never, ever be run very frequently or as part of a more complex query, then maybe I would understand.
Still, the increase in readability is oh so slight, and I have never yet felt confident of assurances that the code pattern would never be used frequently or in more complex queries 🙂
For those interested, the modified test script for this case is attached as a text file (the site's being finicky about letting me put the code in the post again).
Cheers!
February 9, 2016 at 8:49 am
Wow! Thank you or the great information everyone! I will read up on the link Jacob Wilkins provided as well.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply