Performance impact of using functions in a join

  • I am curious as to whether or not I could take a performance hit for using system functions in a join. Below is a simple example:

    SELECT

    eventId

    ,startTime

    ,sqlText

    ,hash

    FROM

    MyDatabase1.dbo.EventSQL e

    INNER JOIN MyDatabase2.dbo.time_day t

    ON CAST( FLOOR( CAST( (e.starttime) AS FLOAT ) ) AS DATETIME ) = t.business_date

    WHERE

    t.fiscal_year = 2008

    AND

    t.calendar_quarter = 1

    Basically I am dropping the hour and minute data on the starttime in the 'e' table to join it to my calendar reference table.

    Thanks,

    Scot

  • In a word YES. Anytime you are using functions such as that on one side of the join it will most likely disallow index usage on that side. Perhaps you might want to investigate the use of a indexed calculated column that you might join on?

    Also, there's a faster way to drop the time off of a datetime, no cast or convert it just keeps it all as a datetime so it's a bit faster.

    SELECT Dateadd(dd, datediff(d,0,getdate()), 0)

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • Thanks Luke. Just to make sure I understand what you are saying, I could add a column to the table I'm joining using a function, then add a default constraint to calculate the value based on the starttime column value. Index my new column and use it in the join. Is that correct?

  • sorry I didn't mean calculated column, I meant Computed column... But basiocally that's the gist of it. Technet can go through it a lot more thoroughly than I.

    You can read all about there here...

    http://technet.microsoft.com/en-us/library/ms191250(SQL.90).aspx

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • This is how I would rewrite the query you originally posted:

    SELECT

    eventId

    ,startTime

    ,sqlText

    ,hash

    FROM

    MyDatabase1.dbo.EventSQL e

    INNER JOIN MyDatabase2.dbo.time_day t

    ON (e.starttime >= t.business_date

    and e.starttime < dateadd(dd, 1, t.business_date))

    WHERE

    t.fiscal_year = 2008

    AND

    t.calendar_quarter = 1

    😎

  • Thanks to both of you. I really appreciate your help!

  • Be sure to test both ways. I'd be interested in knowing which works better/easier for you.

    😎

  • Me too. The computed column will have some overhead to maintain the index and such so Lynn's way could be faster, I've no idea as I don't have a readily available server to test on...

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • I just ran a test on this.

    alter table dbo.SomeTable

    add Col6Cleaned as dateadd(day, datediff(day, 0, col6), 0) -- Just the date, no time

    create index IDX_Col6Cleaned on dbo.sometable(col6cleaned)

    create table #Dates (

    BusinessDay datetime primary key)

    insert into #dates (businessday)

    select dateadd(day, number, '1/1/2000')

    from dbo.numbers -- table of numbers from 0 to 10000

    set statistics time on

    set statistics io on

    declare @Col1 int

    select @col1 = col1

    from dbo.sometable

    inner join #dates

    on col6cleaned = businessday

    /*

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#Dates'. Scan count 3, logical reads 67, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table 'SomeTable'. Scan count 3, logical reads 16625, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1671 ms, elapsed time = 1010 ms.

    */

    declare @Col1 int

    select @col1 = col1

    from dbo.sometable

    inner join #dates

    on col6 >= businessday

    and col6 < dateadd(day, 1, businessday)

    dbo.SomeTable is a million rows of semi-random data. Col6 is a datetime that includes random dates and times.

    The first query, using the computed datetime column, took just over 1 second total, including some parallelism that made it take just under 2 seconds of CPU time (dual-core computer).

    I don't have stats on the second query, because I killed it after 35 seconds. With that difference in execution, I didn't see a point to letting it finish (especially when I heard the fans and drives going into overdrive on the computer this database is on).

    The reason I jumped in and tested this is because I've been using and advocating computed columns for datetime queries for several years now, and wanted to make sure it was still a good idea in this case.

    In this particular case, the index I created wasn't used, because the optimizer decided a clustered index scan and a hash join, with the parallelism that allows, was better. I've found it's about 50/50 whether an index on a calculated column is better or not. Depends on what it's being used for.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Oh, and just to test the original idea:

    declare @Col1 int

    select @col1 = col1

    from dbo.sometable

    inner join #dates

    on CAST( FLOOR( CAST( (col6) AS FLOAT ) ) AS DATETIME ) = businessday

    /*

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 6 ms.

    Table '#Dates'. Scan count 3, logical reads 67, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table 'SomeTable'. Scan count 3, logical reads 16625, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    */

    Almost as fast at the computed column, but not quite. Since the first one ignored the index on the column, that makes sense.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the time it took to set up the test, and the additional info.

    Good things to keep in mind.

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • GSquared,

    Can you test it with the additional criteria in the original query: where t.fiscal_year = 2008?

    This may (or may not) make a difference in the queries.

    I'd try, but I don't have the test tables I need here at work.

    😎

  • While yer testin, if you wouldn't mind running the original again with the dateadd(datediff)) bit instead of the cast(floor)) I'd be interested if that might not eliminate the difference between the computed column and the original query.

    thanks.

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • And, to throw more into the mix, actual performance in the OP's environment may be different due to storage/indexing on the tables.

    I guess the best way to test would be to have the OP post the DDL (create statements for tables and indexes), populate those with test data and go from there.

    😎

  • I modified the test as follows, to limit by year:

    alter table dbo.SomeTable

    add Col6Year as datepart(year, col6);

    drop index IDX_Col6Cleaned on dbo.SomeTable;

    create index IDX_Col6 on dbo.SomeTable (col6year, col6cleaned, col1);

    create table #Dates (

    BusinessDay datetime primary key)

    insert into #dates (businessday)

    select dateadd(day, number, '1/1/2000')

    from dbo.numbers -- table of numbers from 0 to 10000

    set statistics time on

    set statistics io on

    declare @Col1 int

    select @col1 = col1

    from dbo.sometable

    inner join #dates

    on col6cleaned = businessday

    where col6year = 2008

    /*

    SQL Server Execution Times:

    CPU time = 609 ms, elapsed time = 330 ms.

    */

    Still using a clustered index scan instead of the index I created on those computed columns. I even tried persisting the computed columns, but it still insists on using a clustered index scan instead of that index. I checked, and 2008 is 10% of the table, approximately, with the other 90% being approximately evenly split between 9 other years, so it wouldn't appear to be a selectivity issue (I think).

    When I use an index hint to force use of the covering index, it uses a key lookup from that index to get the Col6Cleaned value, and that makes it slower.

    This part of the behavior seems odd to me. I'm not used to seeing covering indexes being ignored in favor of clustered index scans.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply