February 28, 2008 at 6:12 am
George Heinrich (2/28/2008)
Great article! I'm glad I read it, but now i'm having trouble writing one of my queries...I need to join 2 tables on smalldatetime columns. One of the tables stores the date and time and the other table doesn't. So table A has '2008-02-27 10:32:31' and table B has '2008-02-27 00:00:00' and I need to join them on table A date = table B date (ignoring the time info).
Normally I use:
...FROM
TableA JOIN TableB
ON Cast(Cast(TableA.DateColumn AS INT) AS Smalldatetime) =
Cast(Cast(TableB.DateColumn AS INT) AS SmallDateTime)
...
Will wrapping these columns in the JOIN clause also cause a degradation of performance like wrapping them in a where clause would?
Is there a more optimized way of writing this?
Thanks,
George
Yes, functions in the Join clause have the same effect as functions in the Where clause.
What I generally recommend, if possible, in this case, is add calculated columns to the tables, like so:
alter table TableA
add DateColumnYear as datepart(year, datecolumn),
DateColumnMonth as datepart(month, datecolumn),
DateColumnDay as datepart(day, datecolumn)
go
create index IDX_TableA_DateParts on dbo.TableA (DateColumnDay, DateColumnMonth, DateColumnYear)
Then do your join/where clauses on the calculated columns. Like:
select ...column list...
from dbo.TableA
inner join dbo.TableB
on TableA.DateColumnYear = TableB.DateColumnYear
and TableA.DateColumnMonth = TableB.DateColumnMonth
and TableA.DateColumnDay = TableB.DateColumnB
Alternately, if you don't often select by year, month, day, but do often join on the whole date, you can add a calculated column:
alter dbo.TableA
add DateColumnDate as cast(convert(varchar(25), DateColumn, 102) as datetime)
go
create index IDX_TableA_DateColumnDate on dbo.TableA (DateColumnDate)
Do that to both tables, then join on the calculated column.
In my experience, this is very fast and efficient. Of course, it only applies if you are in a position to add calculated columns to tables.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 28, 2008 at 6:13 am
If you convert your @varDate local to a smalldatetime local before using it in the SELECT statement you can do a straight comparison of the local variable and the table column.
INSTEAD OF:
select * from table_name where dateDiff(d,@varDate,Created_Date) =0
TRY THIS:
DECLARE @sdtDate smalldatetime
SELECT @sdtDate = cast(@varDate AS smalldatetime)
SELECT * FROM table_name WHERE Created_Date = @sdtDate
February 28, 2008 at 6:16 am
select * from table_name where dateDiff(d,@varDate,Created_Date) =0
It seems to me that this test simply wants to compare an element to a static variable to determine if they match (which is the case if the dateDiff result is 0). Why not convert the variable to the appropriate type and format to match Created_Date (this avoids having CONVERT called explicitly in the WHERE clause by the parser---very expensive) and compare the fields directly. No function wrapped around the real element, so no hiding of the index.
Or is it too early for the caffeine to kick in and I'm missing something very obvious?
Back to the original subject, very good article. I spent a some time myself testing the performance of function calls (intrinsic and UDF) and arrived at the same conclusions. And that is also where I ran into the explicit CONVERT function call in tracing the performance of a WHERE clause.
------------
Buy the ticket, take the ride. -- Hunter S. Thompson
February 28, 2008 at 6:20 am
GSquared (2/28/2008)
Good article.One small quibble with it. In one of the examples, you use "((FullName1 = 'Ed Jones') OR (FullName1 IS NULL))", as an example of how to get Index Seeks, instead of Scans.
In many cases, the moment you put an "or" in a Where clause, you get a scan instead of a seek. Not always, but it is something to watch out for. The same applies for "in ()" (since that's just a way to shorthand multiple "or" statements).
If you really HAVE to do the OR clause, I've found an approach like this works best, to avoid scans as much as possible. For example, if you have one of those stinking reports where the users have demanded 15 different parameters, but they don't know which one they'll use so you have to stack up NULL checks as well as column comparisons:
SELECT...
FROM....
WHERE
CASE WHEN @MyParam IS NULL THEN 1
WHEN @MyParam = table1.Column1 THEN 1
ELSE 0
END = 1
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 28, 2008 at 6:25 am
I'm curious to know if the 'ON' predicate will behave the same as the 'WHERE' statement in all cases described in the article?
February 28, 2008 at 6:45 am
Does the case not cause very similar performance problems, I remember having to rewrite an SP a little while ago to improve performance by removing the case from a where clause
February 28, 2008 at 6:45 am
SSC-Addicted,
Thanks for the good idea. There are LOTS of issues with this database. I could gripe all day about inconsistencies and poor design, but I just keep on smiling every single day!! (Anyone have any prozac?)
George
February 28, 2008 at 6:49 am
nice article - clean, simple and informative.
February 28, 2008 at 6:51 am
leea (2/28/2008)
Does the case not cause very similar performance problems, I remember having to rewrite an SP a little while ago to improve performance by removing the case from a where clause
Ah, but check out this case statement. It's actually not a function against columns. Instead it's a series of checks that result in a 1 or a 0. It uses indexes very well. I was shown this by a Microsoft consultant named Bill Sulcius. We called him the Magic Conch because we did whatever he told us to. It looks like it should be a RBAR process, but it actually performs these checks in a set-based manner.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 28, 2008 at 6:53 am
Hi Gregory,
What is the comparative performance of the ISNULL() ... against the OR columnname is NULL
Thanks for a great article!
February 28, 2008 at 6:59 am
Nicely done, Gregory Jackson. Outstanding intro to the pitfalls of doing calculations in the WHERE clause.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 28, 2008 at 7:10 am
Good Article. One comment that perhaps others want to expand upon. Null in an index should be avoided. -Thanks
February 28, 2008 at 7:14 am
David Jackson (2/28/2008)
This looks complex but will use your index.where CreatedDate between dateadd(dd, datediff(dd,0,Getdate()),0) --midnight TODAY
and dateAdd(ss,-1,dateAdd(dd,1,dateadd(dd, datediff(dd,0,Getdate()),0))) --23:59:59 TODAY
HTH
Dave J
Edit: Oops! I see w.lengenfelder posted this already! :w00t:
Even then, there is the possibility of something slipping in during that 1 second you've ignored. And, with the new date types coming out for 2k8, it becomes even more important to not ignore that 1 second. With that in mind, I recommend NOT using BETWEEN because of the inclusive nature of both endpoints... instead, allow me to suggest something like the following...
SELECT SomeColList
FROM dbo.SomeTable
WHERE SomeDateCol >= DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)
AND SomeDateCol < DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0) +1
... and it allows an index to be correctly used (SEEKs) if it is available and the rest of the query allows it.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 28, 2008 at 7:23 am
GSquared (2/28/2008)
Alternately, if you don't often select by year, month, day, but do often join on the whole date, you can add a calculated column:
alter dbo.TableA
add DateColumnDate as cast(convert(varchar(25), DateColumn, 102) as datetime)
go
create index IDX_TableA_DateColumnDate on dbo.TableA (DateColumnDate)
Do that to both tables, then join on the calculated column.
In my experience, this is very fast and efficient. Of course, it only applies if you are in a position to add calculated columns to tables.
I second the vote for that method! It works very well for whole day joins and other whole day calculations including aggregation...
--Jeff Moden
Change is inevitable... Change for the better is not.
February 28, 2008 at 7:27 am
Why not create an index computed column where you can provide this capability for routinely used functions.
Viewing 15 posts - 16 through 30 (of 98 total)
You must be logged in to reply to this topic. Login to reply