Performance of functions on group by

Question

Performance of functions on group by

Adam Bean

One Orange Chip

Points: 26631
More actions
March 3, 2010 at 1:04 pm

#139149

Hey all,
Having some troubles here. The query I'm working on takes 34 seconds, which in itself is obviously unacceptable. The indexing is rather good, but can be tweaked a tad more.
The problem is really the SUM aggregate I'm using and then grouping on the rest of the columns. I went from 1 sec to 34 seconds by adding the SUM/GROUP BY. Now, I'm at a loss as for S&G's I removed all my system functions from the GROUP BY (ISNULL, CASE, REPLACE, DATEDIFF), and I dropped to 14 seconds. The execution plan drastically differs after this change. With the functions the majority of the execution was in a sort (79%) and the next biggest (19%) on compute scalar. Without them, most of the time (49%) is spent on hash match and the next biggest (17%) on parallelism (repartition streams).
My main question at this time is, do I need to retain my GROUP BY to match my SELECT statement in regards to those system functions mentioned above? The data actually looks the same ... but here is how I am using them:
ISNULL(REPLACE(dcio.[Name],',',''),'_Unassigned')
,CASE
WHEN asr.[Name] IS NULL OR asr.[Name] = '' THEN '_Unassigned'
ELSE REPLACE(asr.[Name], ',', '')
END
DATEDIFF(dd, ve.[StartDate], GETDATE())
So again, I'm curious to as If I'm lacking fundamental knowledge in regards to as I have to retain these functions in a GROUP BY?
Thanks!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 1

Adam Bean (3/3/2010)
Hey all,
Having some troubles here. The query I'm working on takes 34 seconds, which in itself is obviously unacceptable. The indexing is rather good, but can be tweaked a tad more.
The problem is really the SUM aggregate I'm using and then grouping on the rest of the columns. I went from 1 sec to 34 seconds by adding the SUM/GROUP BY. Now, I'm at a loss as for S&G's I removed all my system functions from the GROUP BY (ISNULL, CASE, REPLACE, DATEDIFF), and I dropped to 14 seconds. The execution plan drastically differs after this change. With the functions the majority of the execution was in a sort (79%) and the next biggest (19%) on compute scalar. Without them, most of the time (49%) is spent on hash match and the next biggest (17%) on parallelism (repartition streams).
My main question at this time is, do I need to retain my GROUP BY to match my SELECT statement in regards to those system functions mentioned above? The data actually looks the same ... but here is how I am using them:
ISNULL(REPLACE(dcio.[Name],',',''),'_Unassigned')
,CASE
WHEN asr.[Name] IS NULL OR asr.[Name] = '' THEN '_Unassigned'
ELSE REPLACE(asr.[Name], ',', '')
END
DATEDIFF(dd, ve.[StartDate], GETDATE())
So again, I'm curious to as If I'm lacking fundamental knowledge in regards to as I have to retain these functions in a GROUP BY?
Thanks!

Adam.... can you post the whole query?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Adam Bean One Orange Chip Points: 26631 More actions · Answer 2

Jeff,

I actually ended up resolving the performance problems by taking the joins to the tables which did the aggregates and put them into a sub query and it now runs in less than 2 seconds.

I am still curious to know though as by default, I've always recreated my grouping based on my select (with system functions and such). I never really questioned it until now. Do you need functions such as REPLACE/ISNULL/DATEDIFF/etc. in a grouping?

Thanks!