Grouping on CASE statements = slow query?

Question

Grouping on CASE statements = slow query?

erikpoker

Ten Centuries

Points: 1135
More actions
February 9, 2012 at 7:20 am

#256539

Hi,
I have a rather large table (2-3 mill rows) and im creating a view to put on top of this table.
This view will transform some foreign keys to a prefixed string using case statements and then grouping by these values.
Example of such a construct:
CASE WHEN a.SourceSystem_FK = 13 THEN 'i' + pAccHier.Prefix ELSE pAccHier.Prefix + CASE WHEN LEFT(a.AccountNumber, 1) IN ('3', '4') THEN c.ID + '_' ELSE '' END END + a.AccountNumber AS Account
If i do this select without the grouping, then the results starts showing up almost instantly.
But once i start grouping by these constructs, then preformance is drasticly slowed down. I have 6 such constructs and grouping by them will increase processing time from 1 sec to 3,5 mins.
Do you clever folks know any way i could optimize this? From the execution plan it looks like most of the time is spent sorting the result.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 1

Could you please provide the full query at least, please. Grouping is relatively expensive operation as well as string concatination. It's impossible to advise on optimisation without having underlying object structure. If you could provide DDL for tables and the rest as per link in my signature, I believe you will find a lot of good advices posted.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Jack Corbett SSC Guru Points: 184397 More actions · Answer 2

It will definitely have some impact because now the the QP has to apply the case and then do the aggregation(s) which are may need ordering in addition to grouping. For example consider these 3 queries in AdventureWorks2008:

SELECT

CASE WHEN LEFT(P.LastName, 1) = 'A' THEN 1

ELSE 2

END

FROM

person.Person AS P

GROUP BY

P.LastName

SELECT

CASE WHEN LEFT(P.LastName, 1) = 'A' THEN 1

ELSE 2

END

FROM

person.Person AS P

GROUP BY

CASE WHEN LEFT(P.LastName, 1) = 'A' THEN 1

ELSE 2

END

SELECT

CASE WHEN LEFT(P.LastName, 1) = 'A' THEN 1

ELSE 2

END

FROM

person.Person AS P

The first query is faster than the second because the ordering can be pushed to the index scan and a fast stream aggregate can be used for the aggregation and the compute scalar for the CASE can be done last. In the second query the same index is scanned but it can't be ordered because of the function, the CASE must be evaluated before the grouping and a more expensive HASH MATCH must be used for the aggregation since the results aren't ordered. The last query is the least expensive because there is no aggregation just an index scan and a compute scalar for the CASE.

IF you posted DDL and the full query we might be able to offer some optimization advice. You can't really compare a query with aggregation and a simple select because the aggregation will add to the cost.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

erikpoker Ten Centuries Points: 1135 More actions · Answer 3

Thanks for both your answers and sorry for my late reply.

First of. Here is DDL of the main selected table "PostingYTD", its indexes and the cross joined functions that create prefixes. Further down you will find the complete select statement.

I have not provided the DDL of the DimensionBase tables, they are all just regular dimensions and is not the subject of my question.

Thank you for all helpful suggestions. Im running out of ideas here. I cant think of any way to speed it up, so hopefully you guys have trick hidden up your sleave.

DDL of PostingYTD, indexes and Function:

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [Fact2012Base].[PostingYTD](

[Year] [int] NOT NULL,

[Month] [int] NULL,

[Period] [nvarchar](50) NULL,

[SourceSystem_FK] [int] NOT NULL,

[TransactionVersion_FK] [int] NOT NULL,

[Supplier_FK] [int] NULL,

[Customer_FK] [int] NULL,

[Vessel_FK] [int] NULL,

[SFICode_FK] [int] NULL,

[ProductGroupGeoteam_FK] [int] NULL,

[CurrencyOriginal_FK] [int] NULL,

[DynamicsDocumentType_FK] [int] NULL,

[StaffCategory_FK] [int] NULL,

[Currency_FK] [int] NOT NULL,

[ExchangeRateType_FK] [int] NOT NULL,

[Account_FK] [int] NOT NULL,

[TransactionType_FK] [int] NOT NULL,

[Department_FK] [int] NOT NULL,

[CounterPartCompany_FK] [int] NULL,

[Project_FK] [int] NULL,

[Company_FK] [int] NOT NULL,

[Amount_YTD] [float] NULL,

[Amount_EUR_YTD] [float] NULL,

[Updated] [datetime] NULL

) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_PostingYTD_Main] ON [Fact2012Base].[PostingYTD]

(

[Year] ASC,

[SourceSystem_FK] ASC,

[TransactionVersion_FK] ASC,

[TransactionType_FK] ASC,

[Amount_YTD] ASC

)

INCLUDE ( [Month],

[Period],

[Currency_FK],

[ExchangeRateType_FK],

[Account_FK],

[Department_FK],

[CounterPartCompany_FK],

[Project_FK],

[Company_FK]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_PostingYTD_Source] ON [Fact2012Base].[PostingYTD]

(

[SourceSystem_FK] ASC

)

INCLUDE ( [Year],

[Month],

[TransactionVersion_FK],

[Currency_FK],

[Account_FK],

[Department_FK],

[CounterPartCompany_FK],

[Company_FK],

[Amount_YTD]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_YTD_Currency] ON [Fact2012Base].[PostingYTD]

(

[Currency_FK] ASC,

[Company_FK] ASC,

[Year] ASC,

[SourceSystem_FK] ASC,

[TransactionType_FK] ASC

)

INCLUDE ( [Month],

[Period],

[TransactionVersion_FK],

[ExchangeRateType_FK],

[Account_FK],

[Department_FK],

[CounterPartCompany_FK],

[Project_FK],

[Amount_YTD]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE FUNCTION [Masterdata].[GetHierarchyPrefixRow]

(

-- Add the parameters for the function here

@hierarchy_FK int,

@hierarchyLevel smallint

)

RETURNS TABLE

AS

RETURN

(

Select Masterdata.GetHierarchyPrefix( @hierarchy_FK, @hierarchyLevel ) as Prefix

)

GO

CREATE FUNCTION [Masterdata].[GetHierarchyPrefix]

(

-- Add the parameters for the function here

@hierarchy_FK int,

@hierarchyLevel smallint

)

RETURNS nvarchar(50)

AS

BEGIN

DECLARE @resultPrefix nvarchar(21)

DECLARE @hierarchyPrefix nvarchar(5)

DECLARE @levelPrefix nvarchar(5)

DECLARE @dimesionPrefix nvarchar(5)

DECLARE @showHierarchyPrefix bit

DECLARE @showLevelPrefix bit

DECLARE @showDimesionPrefix bit

SELECT

@hierarchyPrefix = Masterdata.Hierarchy.Prefix,

@levelPrefix = Masterdata.HierarchyLevel.Prefix,

@dimesionPrefix = Masterdata.Dimension.Prefix,

@showHierarchyPrefix = Masterdata.HierarchyLevel.ShowHierarchyPrefix,

@showLevelPrefix = Masterdata.HierarchyLevel.ShowHierarchyLevelPrefix,

@showDimesionPrefix = Masterdata.HierarchyLevel.ShowDimensionPrefix

FROM Masterdata.Hierarchy INNER JOIN

Masterdata.HierarchyLevel ON Masterdata.Hierarchy.Hierarchy_PK = Masterdata.HierarchyLevel.Hierarchy_FK LEFT OUTER JOIN

Masterdata.Dimension ON Masterdata.HierarchyLevel.Dimension_FK = Masterdata.Dimension.Dimension_PK

WHERE (Masterdata.Hierarchy.Hierarchy_PK = @hierarchy_FK) AND (Masterdata.HierarchyLevel.[Level] = @hierarchyLevel)

SET @resultPrefix = ''

IF @showHierarchyPrefix = 1

SET @resultPrefix = @hierarchyPrefix

IF @showHierarchyPrefix = 1 AND @showLevelPrefix = 1

SET @resultPrefix = @resultPrefix + '.'

IF @showLevelPrefix = 1

SET @resultPrefix = @resultPrefix + @levelPrefix

IF @showLevelPrefix = 1 AND @showDimesionPrefix = 1

SET @resultPrefix = @resultPrefix + '.'

IF @showDimesionPrefix = 1

SET @resultPrefix = @resultPrefix + @dimesionPrefix

IF @showHierarchyPrefix = 1 OR @showLevelPrefix = 1 OR @showDimesionPrefix = 1

SET @resultPrefix = @resultPrefix + '.'

IF @resultPrefix is null

RETURN 'Error Hierarchy level undefined'

RETURN @resultPrefix

END

GO

Here is the select statement with the group by case statements

SELECT

CASE WHEN y.TransactionType_FK IN (1, 5) THEN 'Act' ELSE 'Bud' END AS Scenario,

CASE WHEN y.TransactionType_FK = 5 THEN 'YTD Journals Load' ELSE 'YTD Load' END AS [View],

Year,

Month,

Period,

pEntHier.Prefix + d.FullID AS Entity,

CASE WHEN a.SourceSystem_FK = 13 THEN 'i' + pAccHier.Prefix ELSE pAccHier.Prefix + CASE WHEN LEFT(a.AccountNumber, 1)

IN ('3', '4') THEN c.ID + '_' ELSE '' END END + a.AccountNumber AS Account,

pCountPartHier.Prefix + cp.ID AS CounterPart,

pProjHier.Prefix + pc.ID + '_' + p.ID AS Project,

cu.ID AS Currency,

y.Currency_FK,

y.ExchangeRateType_FK,

SUM(Amount_YTD) AS Amount_YTD

FROM

Fact2012Base.PostingYTD AS y

LEFT OUTER JOIN

DimensionBase.Project AS p

ON p.Project_FK = y.Project_FK

INNER JOIN

DimensionBase.Company AS pc

ON p.OwnerCompany_FK = pc.Company_FK

INNER JOIN

DimensionBase.Department AS d

ON y.Department_FK = d.Department_FK

INNER JOIN

DimensionBase.Account AS a

ON y.Account_FK = a.Account_FK

INNER JOIN

DimensionBase.Company AS c

ON y.Company_FK = c.Company_FK

AND y.Currency_FK = c.Currency_FK

INNER JOIN

DimensionBase.Currency AS cu

ON y.Currency_FK = cu.Currency_FK

LEFT OUTER JOIN

DimensionBase.Company AS cp

ON y.CounterpartCompany_FK = cp.Company_FK

CROSS JOIN Masterdata.GetHierarchyPrefixRow(10, 2) AS pCountPartHier

CROSS JOIN Masterdata.GetHierarchyPrefixRow(1, 3) AS pEntHier

CROSS JOIN Masterdata.GetHierarchyPrefixRow(3, 2) AS pAccHier

CROSS JOIN Masterdata.GetHierarchyPrefixRow(9, 1) AS pProjHier

WHERE

(y.Year >= 2010)

AND (y.TransactionVersion_FK <> 192)

AND (y.SourceSystem_FK > 0)

AND (y.TransactionType_FK IN (1, 2, 5))

AND (c.Country_FK = 31)

AND (y.Amount_YTD <> 0)

GROUP BY

CASE WHEN y.TransactionType_FK IN (1, 5) THEN 'Act' ELSE 'Bud' END,

CASE WHEN y.TransactionType_FK = 5 THEN 'YTD Journals Load' ELSE 'YTD Load' END,

Year,

Month,

Period,

pEntHier.Prefix + d.FullID,

CASE WHEN a.SourceSystem_FK = 13 THEN 'i' + pAccHier.Prefix ELSE pAccHier.Prefix + CASE WHEN LEFT(a.AccountNumber, 1)

IN ('3', '4') THEN c.ID + '_' ELSE '' END END + a.AccountNumber,

pCountPartHier.Prefix + cp.ID,

pProjHier.Prefix + pc.ID + '_' + p.ID,

cu.ID,

y.Currency_FK,

y.ExchangeRateType_FK

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 4

I would try to SELECT INTO #table the result of the query including the calculated columns used for grouping (without grouping itself), then the final query would select from #table and group as required

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

erikpoker Ten Centuries Points: 1135 More actions · Answer 5

Eugene Elutin (2/10/2012)
I would try to SELECT INTO #table the result of the query including the calculated columns used for grouping (without grouping itself), then the final query would select from #table and group as required

Thanks for your tip.

I just tried this. The select into #table takes about 2:30 to run, and the final group by query 0:40 sec.

So end result is 3:10 which is marginally faster than doing the select in its original form. (3:30)

It seems as the share amount of data to copy into the temp table is whats making the first step take so long. If i just select (without into and without group by) then the results start coming instantly, but the 1.8 million rows it returns takes 2:30 to load.

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 6

Ok, your "results start coming instantly" means nothing, as most likely they will take even longer to come out in full than SELECT INTO. 1.8 millions should not take 2.30 min to insert. Can you attach the query plan? Next suspect whould be cross joining to user-defined function...

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

erikpoker Ten Centuries Points: 1135 More actions · Answer 7

Hmm.. Its a bit difficult to get the actual execution plan for you. When i turn on include actual execution plan the query just goes on forever. I've been waiting 16 mins now for a result but nothing. I'll let it run for another 15 then i have to include the estimated one instead...

Jack Corbett SSC Guru Points: 184397 More actions · Answer 8

Seeing your query I don' think your main problem is the CASE statement included in the GROUP BY, I think it more likely related to your CROSS JOIN's to the function and then the fact that you are grouping on data from the function calls. I'd be willing to bet you basically have created a cursor with each function call because SQL is calling the functions once for each row instead of doing it in a set based manner, that just a guess. I'd try putting the results of the function calls in table variables or temp tables since they are not being run based on data in any of the tables so that the functions are only called once each.

I'd be interested to see what your performance would be if you eliminated the function calls.

I'd also be interested in seeing what a server-side trace produced when this query is run.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

erikpoker Ten Centuries Points: 1135 More actions · Answer 9

Ok. I let the query run for an hour, no result. Seems like the actual executionplan is impossible to get.

Anyways, here is the estimated one, if that helps..

Jack Corbett SSC Guru Points: 184397 More actions · Answer 10

erikpoker (2/10/2012)
Ok. I let the query run for an hour, no result. Seems like the actual executionplan is impossible to get.
Anyways, here is the estimated one, if that helps..

I don't see the grouping happening in this query. I do see that it looks like the optimizer was smart enough to only call the cross joined functions once which is a good thing.

You do have one other error in the query which I believe means you aren't getting the results you really want from the query. Although you have specified a LEFT OUTER JOIN to DimensionBase.Project the fact that you have that table INNER JOIN'd to DimensionBase.Company really converts the OUTER JOIN to an INNER JOIN. This is confirmed in the execution plan as you can see that optimizer has only implemented one OUTER JOIN and that is on the last LEFT OUTER JOIN to DimensionBase.Company AS cp ON y.CounterpartCompany_FK = cp.Company_FK. To actually get the first OUTER JOIN I believe you need to convert the INNER JOIN to DimensionBase.Company AS pc ON p.OwnerCompany_FK = pc.Company_FK to a LEFT OUTER JOIN and I think that will get you what you really want.

I'm actually a little surprised you are getting an OUTER JOIN on the second one since you have a filter on that table in the WHERE clause, but I'm not smart enough to understand why that would happen. I may ask someone who is though.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

erikpoker Ten Centuries Points: 1135 More actions · Answer 11

The grouping is not happening in that queryplan. I attached the queryplan for the previous suggestion to first put the results in a temp table and then group the results. So the attached queryplan only applies to the first select into query, without any grouping.

As for your suggestions on the outer joins, you are right, a bit of a mixup there, but not affecting things much performance wise.

Jack Corbett SSC Guru Points: 184397 More actions · Answer 12

erikpoker (2/10/2012)
As for your suggestions on the outer joins, you are right, a bit of a mixup there, but not affecting things much performance wise.

Yeah it's not a performance issue, but if you aren't getting the right data then does performance really matter? I'm normally concerned with getting the right results first and then tuning to get the right results as quickly as possible.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

erikpoker Ten Centuries Points: 1135 More actions · Answer 13

Jack Corbett (2/10/2012)
Yeah it's not a performance issue, but if you aren't getting the right data then does performance really matter? I'm normally concerned with getting the right results first and then tuning to get the right results as quickly as possible.

Lol. Actually it does belive it or not 🙂 I have a team of guys armed with excel and god-knows-what, whos job is to verify the numbers returned by this view. So my job is just to get the thing running as fast as humanly (or computerly) possible.

Right now i cant see many more possibilities to speed this thingy up. Maybe an indexed view, or a prepopulated table, but i dont really wanna go there...

Jack Corbett SSC Guru Points: 184397 More actions · Answer 14

Can you post the execution plan for the group by?

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question