Performance impact of using functions in a join

Question

Performance impact of using functions in a join

Scot Hurd-471796

SSC Enthusiast

Points: 116
More actions
September 24, 2008 at 11:22 am

#193200

I am curious as to whether or not I could take a performance hit for using system functions in a join. Below is a simple example:
SELECT
eventId
,startTime
,sqlText
,hash
FROM
MyDatabase1.dbo.EventSQL e
INNER JOIN MyDatabase2.dbo.time_day t
ON CAST( FLOOR( CAST( (e.starttime) AS FLOAT ) ) AS DATETIME ) = t.business_date
WHERE
t.fiscal_year = 2008
AND
t.calendar_quarter = 1
Basically I am dropping the hour and minute data on the starttime in the 'e' table to join it to my calendar reference table.
Thanks,
Scot

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Luke L SSC-Dedicated Points: 35547 More actions · Answer 1

In a word YES. Anytime you are using functions such as that on one side of the join it will most likely disallow index usage on that side. Perhaps you might want to investigate the use of a indexed calculated column that you might join on?

Also, there's a faster way to drop the time off of a datetime, no cast or convert it just keeps it all as a datetime so it's a bit faster.

SELECT Dateadd(dd, datediff(d,0,getdate()), 0)

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Scot Hurd-471796 SSC Enthusiast Points: 116 More actions · Answer 2

Thanks Luke. Just to make sure I understand what you are saying, I could add a column to the table I'm joining using a function, then add a default constraint to calculate the value based on the starttime column value. Index my new column and use it in the join. Is that correct?

Luke L SSC-Dedicated Points: 35547 More actions · Answer 3

sorry I didn't mean calculated column, I meant Computed column... But basiocally that's the gist of it. Technet can go through it a lot more thoroughly than I.

You can read all about there here...

http://technet.microsoft.com/en-us/library/ms191250(SQL.90).aspx

To help us help you read this[/url]For better help with performance problems please read this[/url]

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

This is how I would rewrite the query you originally posted:

SELECT

eventId

,startTime

,sqlText

,hash

FROM

MyDatabase1.dbo.EventSQL e

INNER JOIN MyDatabase2.dbo.time_day t

ON (e.starttime >= t.business_date

and e.starttime < dateadd(dd, 1, t.business_date))

WHERE

t.fiscal_year = 2008

AND

t.calendar_quarter = 1

😎

Scot Hurd-471796 SSC Enthusiast Points: 116 More actions · Answer 5

Scot Hurd-471796

SSC Enthusiast

Points: 116

September 24, 2008 at 12:35 pm

#876606

Thanks to both of you. I really appreciate your help!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Be sure to test both ways. I'd be interested in knowing which works better/easier for you.

😎

Luke L SSC-Dedicated Points: 35547 More actions · Answer 7

Me too. The computed column will have some overhead to maintain the index and such so Lynn's way could be faster, I've no idea as I don't have a readily available server to test on...

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

GSquared SSC Guru Points: 260824 More actions · Answer 8

I just ran a test on this.

alter table dbo.SomeTable

add Col6Cleaned as dateadd(day, datediff(day, 0, col6), 0) -- Just the date, no time

create index IDX_Col6Cleaned on dbo.sometable(col6cleaned)

create table #Dates (

BusinessDay datetime primary key)

insert into #dates (businessday)

select dateadd(day, number, '1/1/2000')

from dbo.numbers -- table of numbers from 0 to 10000

set statistics time on

set statistics io on

declare @Col1 int

select @col1 = col1

from dbo.sometable

inner join #dates

on col6cleaned = businessday

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 1 ms.

Table '#Dates'. Scan count 3, logical reads 67, physical reads 0,

read-ahead reads 0, lob logical reads 0, lob physical reads 0,

lob read-ahead reads 0.

Table 'SomeTable'. Scan count 3, logical reads 16625, physical reads 0,

read-ahead reads 0, lob logical reads 0, lob physical reads 0,

lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,

read-ahead reads 0, lob logical reads 0, lob physical reads 0,

lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1671 ms, elapsed time = 1010 ms.

*/

declare @Col1 int

select @col1 = col1

from dbo.sometable

inner join #dates

on col6 >= businessday

and col6 < dateadd(day, 1, businessday)

dbo.SomeTable is a million rows of semi-random data. Col6 is a datetime that includes random dates and times.

The first query, using the computed datetime column, took just over 1 second total, including some parallelism that made it take just under 2 seconds of CPU time (dual-core computer).

I don't have stats on the second query, because I killed it after 35 seconds. With that difference in execution, I didn't see a point to letting it finish (especially when I heard the fans and drives going into overdrive on the computer this database is on).

The reason I jumped in and tested this is because I've been using and advocating computed columns for datetime queries for several years now, and wanted to make sure it was still a good idea in this case.

In this particular case, the index I created wasn't used, because the optimizer decided a clustered index scan and a hash join, with the parallelism that allows, was better. I've found it's about 50/50 whether an index on a calculated column is better or not. Depends on what it's being used for.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 9

Oh, and just to test the original idea:

declare @Col1 int

select @col1 = col1

from dbo.sometable

inner join #dates

on CAST( FLOOR( CAST( (col6) AS FLOAT ) ) AS DATETIME ) = businessday

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 6 ms.

Table '#Dates'. Scan count 3, logical reads 67, physical reads 0,

read-ahead reads 0, lob logical reads 0, lob physical reads 0,

lob read-ahead reads 0.

Table 'SomeTable'. Scan count 3, logical reads 16625, physical reads 0,

read-ahead reads 0, lob logical reads 0, lob physical reads 0,

lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,

read-ahead reads 0, lob logical reads 0, lob physical reads 0,

lob read-ahead reads 0.

*/

Almost as fast at the computed column, but not quite. Since the first one ignored the index on the column, that makes sense.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Luke L SSC-Dedicated Points: 35547 More actions · Answer 10

Thanks for the time it took to set up the test, and the additional info.

Good things to keep in mind.

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

GSquared,

Can you test it with the additional criteria in the original query: where t.fiscal_year = 2008?

This may (or may not) make a difference in the queries.

I'd try, but I don't have the test tables I need here at work.

😎

Luke L SSC-Dedicated Points: 35547 More actions · Answer 12

While yer testin, if you wouldn't mind running the original again with the dateadd(datediff)) bit instead of the cast(floor)) I'd be interested if that might not eliminate the difference between the computed column and the original query.

thanks.

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 13

And, to throw more into the mix, actual performance in the OP's environment may be different due to storage/indexing on the tables.

I guess the best way to test would be to have the OP post the DDL (create statements for tables and indexes), populate those with test data and go from there.

😎

GSquared SSC Guru Points: 260824 More actions · Answer 14

I modified the test as follows, to limit by year:

alter table dbo.SomeTable

add Col6Year as datepart(year, col6);

drop index IDX_Col6Cleaned on dbo.SomeTable;

create index IDX_Col6 on dbo.SomeTable (col6year, col6cleaned, col1);

create table #Dates (

BusinessDay datetime primary key)

insert into #dates (businessday)

select dateadd(day, number, '1/1/2000')

from dbo.numbers -- table of numbers from 0 to 10000

set statistics time on

set statistics io on

declare @Col1 int

select @col1 = col1

from dbo.sometable

inner join #dates

on col6cleaned = businessday

where col6year = 2008

/*

SQL Server Execution Times:

CPU time = 609 ms, elapsed time = 330 ms.

*/

Still using a clustered index scan instead of the index I created on those computed columns. I even tried persisting the computed columns, but it still insists on using a clustered index scan instead of that index. I checked, and 2008 is 10% of the table, approximately, with the other 90% being approximately evenly split between 9 other years, so it wouldn't appear to be a selectivity issue (I think).

When I use an index hint to force use of the covering index, it uses a key lookup from that index to get the Col6Cleaned value, and that makes it slower.

This part of the behavior seems odd to me. I'm not used to seeing covering indexes being ignored in favor of clustered index scans.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon