April 28, 2009 at 4:06 am
Tao Klerks (4/28/2009)
Are you saying that the following statement would result in a better query plan? (or more reliably result in a good query plan?)
SELECT *
FROM SomeTable
WHERE (SomeColumn >= '2009-04-01' AND SomeColumn < '2009-04-16' )
Absolutely yes. Though you might need to add an OPTION (RECOMPILE) to prevent re-use of any previous plan, especially if forced parameterization is in effect.
Tao Klerks (4/28/2009)
If so, why would that be? Because in one case SQL Server can read the values and choose an appropriate query plan based on statistics / known value distributions, and in the other case it cannot?
Yes! See http://blogs.msdn.com/queryoptteam/archive/2006/03/31/565991.aspx
Tao Klerks (4/28/2009)
In that case, does adding a function call inline break it?
Why don't you test it and tell everyone what you find? 😉
Tao Klerks (4/28/2009)
Is there any reference to this issue / mechanism anywhere online that I can look at?
The link above is to a blog entry from the SQL Server QUery Optimization Team.
Google 'SQL Server 2005 parameter sniffing' and you will turn up many good articles, including some on this site.
Paul
April 28, 2009 at 4:38 am
What if one just manually add the times to your SQL query?
I have been doing something like this:
" WHERE DATEFIELD BETWEEN " & @DATE1 & " 00:00:00" & " AND " & @DATE2 & " 23:59:59.997"
I know this works (please ignore any typos or syntax errors - I have not checked it), and I know it uses the indexes.
Why would something like this not be advisable?
regards
Dawid
April 28, 2009 at 4:40 am
Paul White (4/28/2009)
It sounds as if you are quoting percentage batch costs from the execution plan.
The results were from batch queries, 100 each with random dates for each iteration, I just rounded the results up because I was more interested in the severe differences between what was posted in the article and other suggestions.
Paul White (4/28/2009)
I am confused as to how a computed column slows down your data loads 'drastically'. Also, simple scalar calculations in the SELECT part of an INSERT are typically so much cheaper than the INSERT operation that it makes no odds.
You're right, I made a gross generalization, and actually didn't elaborate well what I was thinking. Computed columns have no impact on loads and they're only computed for indexing purposes or during queries; the same impacts you'd see by bulk updating. In this instance, since truncating a date using floor would be deterministic, it would actually be OK to use computed columns with an index as opposed to computing the value and sticking in it another column.
What I was really aiming for, however, is that if the principal query against your table is going to be on date ranges, if you're looking to avoid the time portion of it, eliminate it from the equation all together. Calling on UDFs or math functions inside of a query, although optimized by SQL Server itself, will still be slower than computing the date portion once and querying multiple times against it. If you have to do the calculation more than once, you're just wasting the servers time by making it repeat itself; compute it and save it, the disk cost is negligible compared to the performance gains you'll achieve.
April 28, 2009 at 4:51 am
Thanks Paul!
You've just added another dimension of complexity to SQL Performance tuning for me, now I have to go away and think about this. 🙂
The things that are most unclear for me now are:
1) How are things different (if at all) in SQL 2000? The article targets 2005, but mentions procs in 2000...
2) What is the relationship between query plan caching and parameter sniffing, eg in a stored procedure? Isn't parameter sniffing essentially doing the opposite of query plan caching? Are there circumstances under which "sniffed" parameters will cause a stored procedure to be recompiled? (or can multiple query plans be cached for a single proc, and be selected based on sniffed parameters?)
I don't expect you to "answer" this, I understand it's probably food for hours of investigation and/or discussion; I'll save this for my "cool stuff to look at when I have extra time on my hands (never)" pile.
http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
April 28, 2009 at 5:14 am
Tao Klerks (4/28/2009)
You've just added another dimension of complexity to SQL Performance tuning for me, now I have to go away and think about this. 🙂
The good news is that learning never stops. I start to worry if I don't come across something I didn't know every day...
Tao Klerks (4/28/2009)
1) How are things different (if at all) in SQL 2000? The article targets 2005, but mentions procs in 2000...
SS2K had the same issue, see http://blogs.msdn.com/khen1234/archive/2005/06/02/424228.aspx
There are more options for working around this behaviour in 2K5 and 2K8 (which introduces OPTIMIZE FOR...UNKNOWN)
Tao Klerks (4/28/2009)
2) What is the relationship between query plan caching and parameter sniffing, eg in a stored procedure? Isn't parameter sniffing essentially doing the opposite of query plan caching? Are there circumstances under which "sniffed" parameters will cause a stored procedure to be recompiled? (or can multiple query plans be cached for a single proc, and be selected based on sniffed parameters?)
See http://sqlblog.com/blogs/kalen_delaney/archive/2007/07/30/did-you-know-estimated-vs-actual-plans.aspx for information which, with a little thought, will allow you to answer this for yourself.
Enjoy!
Paul
April 28, 2009 at 5:20 am
dawidmocke (4/28/2009)
I have been doing something like this:" WHERE DATEFIELD BETWEEN " & @DATE1 & " 00:00:00" & " AND " & @DATE2 & " 23:59:59.997"
I know this works (please ignore any typos or syntax errors - I have not checked it), and I know it uses the indexes.
Why would something like this not be advisable?
Hey there,
I'm assuming that the code you use in real life ensures that the variables have no date component, are converted to strings for the time concatenation, and are then converted back to the same datatype as the column. I further assume that all conversions are done with a explicit deterministic CONVERT style.
If so, it is fine, aside from the issues already mentioned in my earlier posts concerning parameter sniffing and the consequent availability of useful statistics to the optimizer at execution time.
Paul
April 28, 2009 at 5:23 am
The best way to go according to me is to create a variable and convert the dates to minutes or hours (depending on how accurate you need diff and get it back).
Dont just take DATEDIFF(hh, One date, another date)
By taking these results isolatedly can even result in wrong info
DATEDIFF (hh, '2009-04-28 10:55:00','2009-04-28 13:00:00')
will give 3
and DATEDIFF (mi, '2009-04-28 10:55:00','2009-04-28 13:00:00')
gives 125.
One would do %60 and take minutes - 5 minutes.
But the answer will be wrong.
Instead if we do /60 to get hours and %60 to get minutes, the answer will be correct.
So use DATEDIFF carefully.
April 28, 2009 at 5:27 am
To anyone listening,
Just for the record, I have to say that this is a poor article.
The author would do well to seek a review of future articles from one of the many knowledgeable and experienced members before publishing.
My personal view is that this article does more harm than good and ought to be taken down and rewritten.
I'm sorry if that sounds harsh, but SSC deserves a minimum standard.
Paul
April 28, 2009 at 5:50 am
Karteek (4/28/2009)
The best way to go according to me is to create a variable and convert the dates to minutes or hours (depending on how accurate you need diff and get it back).
Really? The best way to do what, exactly? I thought the article was about comparing dates with or without a time component! The most part of the discussion has been about index usage.
Please explain what you are referring to, and what problem you are solving with the variable.
Karteek (4/28/2009)
Dont just take DATEDIFF(hh, One date, another date)
Where was this suggested? I can't find reference to it in the article or the discussion...
Karteek (4/28/2009)
By taking these results isolatedly can even result in wrong infoDATEDIFF (hh, '2009-04-28 10:55:00','2009-04-28 13:00:00')
will give 3
and DATEDIFF (mi, '2009-04-28 10:55:00','2009-04-28 13:00:00')
gives 125.
One would do %60 and take minutes - 5 minutes.
But the answer will be wrong.
Instead if we do /60 to get hours and %60 to get minutes, the answer will be correct.
So use DATEDIFF carefully.
Can you point me to the 'wrong' bit please?
Books Online
DATEDIFF returns the number of date and time [highlight="#FFFF11"]boundaries[/highlight] crossed between two specified dates.
3 'hour boundaries' are crossed in your first example, and 125 'minute boundaries'.
It should be clear from that why the functions return the values they do, and why taking the modulus will not work.
Paul
April 28, 2009 at 5:55 am
Well, is a good trick of DATETIME field. But is a bad practice if you are looking for performance.
Instead of (if you work with MSSQL2005 or 2008) use the DATE and TIME domains, if apply, and avoid calculations. Do you agree? 😉
April 28, 2009 at 6:08 am
Ric Sierra (4/28/2009)
Instead of (if you work with MSSQL2005 or 2008) use the DATE and TIME domains, if apply, and avoid calculations. Do you agree? 😉
:ermm: DATE(n) and TIME(n) types are only available in SS2K8, not 2005.
DATE and TIME do have an obvious application here, but these new types are not without problems of their own.
But that is a different subject altogether.
Paul
April 28, 2009 at 6:37 am
Hmm... You may suffer a huge performance hit if you use DATEDIFF().
SQL SVR may have to perform a full table scan. Not exactly what one
desires in a production system.
I would suggest creating a varchar(8) column and populating it with the
YYYYMMDD format using the CONVERT() function. One could write a trigger to perform this task.
You would then index this varchar() column to maximize performance
in WHERE clauses.
Using the CONVERT(varchar(8),getdate(),112) will return a date in the
YYYYMMDD format.
April 28, 2009 at 7:15 am
Here is very good article about datetime by Tibor:
April 28, 2009 at 7:15 am
My approach is quite simple. Often times I am passing in a date range into a sproc for selection criteria. What I do is the following
set @SubmitDateFrom = cast(convert(varchar(20), @SubmitDateFrom, 101) + ' 00:00:00' as datetime)
set @SubmitDateTo = cast(convert(varchar(20), @SubmitDateTo, 101) + ' 23:59:59' as datetime)
Then I go ahead and do a simple comparison in my WHERE clause:
WHERE p.SubmitDate BETWEEN @SubmitDateFrom AND @SubmitDateTo
[/CODE]
I've established this as a standard development practice and seems to work just fine.
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
April 28, 2009 at 7:17 am
Dallas Martin (4/28/2009)
Hmm... You may suffer a huge performance hit if you use DATEDIFF().SQL SVR may have to perform a full table scan. Not exactly what one desires in a production system.
An index scan at worst, not a table scan. Remember that if a good proportion of the rows in the table (say 10-30% depending) would be touched by an index seek, the scan is a good choice.
Dallas Martin (4/28/2009)
I would suggest creating a varchar(8) column and populating it with theYYYYMMDD format using the CONVERT() function. One could write a trigger to perform this task.
One could also drive with one's feet, but that doesn't make it a good idea.
A trigger would add unnecessary overhead, would need to be carefully implemented to handle single-row and set-based INSERT and UPDATE operations, would complicate bulk loading, and there is always the danger of the trigger being disabled, or data in the CHAR(8) column not matching the datetime column.
If one was determined to add a column to solve this problem, use a (persisted if you like) computed column as I described earlier. Seeing as YYYYMMDD must always be numeric, I would type it as an INT (4 bytes).
Paul
Viewing 15 posts - 16 through 30 (of 110 total)
You must be logged in to reply to this topic. Login to reply