September 24, 2008 at 11:22 am
I am curious as to whether or not I could take a performance hit for using system functions in a join. Below is a simple example:
SELECT
eventId
,startTime
,sqlText
,hash
FROM
MyDatabase1.dbo.EventSQL e
INNER JOIN MyDatabase2.dbo.time_day t
ON CAST( FLOOR( CAST( (e.starttime) AS FLOAT ) ) AS DATETIME ) = t.business_date
WHERE
t.fiscal_year = 2008
AND
t.calendar_quarter = 1
Basically I am dropping the hour and minute data on the starttime in the 'e' table to join it to my calendar reference table.
Thanks,
Scot
September 24, 2008 at 11:59 am
In a word YES. Anytime you are using functions such as that on one side of the join it will most likely disallow index usage on that side. Perhaps you might want to investigate the use of a indexed calculated column that you might join on?
Also, there's a faster way to drop the time off of a datetime, no cast or convert it just keeps it all as a datetime so it's a bit faster.
SELECT Dateadd(dd, datediff(d,0,getdate()), 0)
-Luke.
September 24, 2008 at 12:09 pm
Thanks Luke. Just to make sure I understand what you are saying, I could add a column to the table I'm joining using a function, then add a default constraint to calculate the value based on the starttime column value. Index my new column and use it in the join. Is that correct?
September 24, 2008 at 12:16 pm
sorry I didn't mean calculated column, I meant Computed column... But basiocally that's the gist of it. Technet can go through it a lot more thoroughly than I.
You can read all about there here...
http://technet.microsoft.com/en-us/library/ms191250(SQL.90).aspx
September 24, 2008 at 12:25 pm
This is how I would rewrite the query you originally posted:
SELECT
eventId
,startTime
,sqlText
,hash
FROM
MyDatabase1.dbo.EventSQL e
INNER JOIN MyDatabase2.dbo.time_day t
ON (e.starttime >= t.business_date
and e.starttime < dateadd(dd, 1, t.business_date))
WHERE
t.fiscal_year = 2008
AND
t.calendar_quarter = 1
😎
September 24, 2008 at 12:35 pm
Thanks to both of you. I really appreciate your help!
September 24, 2008 at 12:38 pm
Be sure to test both ways. I'd be interested in knowing which works better/easier for you.
😎
September 24, 2008 at 12:45 pm
September 24, 2008 at 1:30 pm
I just ran a test on this.
alter table dbo.SomeTable
add Col6Cleaned as dateadd(day, datediff(day, 0, col6), 0) -- Just the date, no time
create index IDX_Col6Cleaned on dbo.sometable(col6cleaned)
create table #Dates (
BusinessDay datetime primary key)
insert into #dates (businessday)
select dateadd(day, number, '1/1/2000')
from dbo.numbers -- table of numbers from 0 to 10000
set statistics time on
set statistics io on
declare @Col1 int
select @col1 = col1
from dbo.sometable
inner join #dates
on col6cleaned = businessday
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table '#Dates'. Scan count 3, logical reads 67, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table 'SomeTable'. Scan count 3, logical reads 16625, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1671 ms, elapsed time = 1010 ms.
*/
declare @Col1 int
select @col1 = col1
from dbo.sometable
inner join #dates
on col6 >= businessday
and col6 < dateadd(day, 1, businessday)
dbo.SomeTable is a million rows of semi-random data. Col6 is a datetime that includes random dates and times.
The first query, using the computed datetime column, took just over 1 second total, including some parallelism that made it take just under 2 seconds of CPU time (dual-core computer).
I don't have stats on the second query, because I killed it after 35 seconds. With that difference in execution, I didn't see a point to letting it finish (especially when I heard the fans and drives going into overdrive on the computer this database is on).
The reason I jumped in and tested this is because I've been using and advocating computed columns for datetime queries for several years now, and wanted to make sure it was still a good idea in this case.
In this particular case, the index I created wasn't used, because the optimizer decided a clustered index scan and a hash join, with the parallelism that allows, was better. I've found it's about 50/50 whether an index on a calculated column is better or not. Depends on what it's being used for.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 24, 2008 at 1:36 pm
Oh, and just to test the original idea:
declare @Col1 int
select @col1 = col1
from dbo.sometable
inner join #dates
on CAST( FLOOR( CAST( (col6) AS FLOAT ) ) AS DATETIME ) = businessday
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
Table '#Dates'. Scan count 3, logical reads 67, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table 'SomeTable'. Scan count 3, logical reads 16625, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
*/
Almost as fast at the computed column, but not quite. Since the first one ignored the index on the column, that makes sense.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 24, 2008 at 1:37 pm
September 24, 2008 at 1:43 pm
GSquared,
Can you test it with the additional criteria in the original query: where t.fiscal_year = 2008?
This may (or may not) make a difference in the queries.
I'd try, but I don't have the test tables I need here at work.
😎
September 24, 2008 at 1:47 pm
While yer testin, if you wouldn't mind running the original again with the dateadd(datediff)) bit instead of the cast(floor)) I'd be interested if that might not eliminate the difference between the computed column and the original query.
thanks.
-Luke.
September 24, 2008 at 1:52 pm
And, to throw more into the mix, actual performance in the OP's environment may be different due to storage/indexing on the tables.
I guess the best way to test would be to have the OP post the DDL (create statements for tables and indexes), populate those with test data and go from there.
😎
September 25, 2008 at 7:59 am
I modified the test as follows, to limit by year:
alter table dbo.SomeTable
add Col6Year as datepart(year, col6);
drop index IDX_Col6Cleaned on dbo.SomeTable;
create index IDX_Col6 on dbo.SomeTable (col6year, col6cleaned, col1);
create table #Dates (
BusinessDay datetime primary key)
insert into #dates (businessday)
select dateadd(day, number, '1/1/2000')
from dbo.numbers -- table of numbers from 0 to 10000
set statistics time on
set statistics io on
declare @Col1 int
select @col1 = col1
from dbo.sometable
inner join #dates
on col6cleaned = businessday
where col6year = 2008
/*
SQL Server Execution Times:
CPU time = 609 ms, elapsed time = 330 ms.
*/
Still using a clustered index scan instead of the index I created on those computed columns. I even tried persisting the computed columns, but it still insists on using a clustered index scan instead of that index. I checked, and 2008 is 10% of the table, approximately, with the other 90% being approximately evenly split between 9 other years, so it wouldn't appear to be a selectivity issue (I think).
When I use an index hint to force use of the covering index, it uses a key lookup from that index to get the Col6Cleaned value, and that makes it slower.
This part of the behavior seems odd to me. I'm not used to seeing covering indexes being ignored in favor of clustered index scans.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply