February 9, 2012 at 7:20 am
Hi,
I have a rather large table (2-3 mill rows) and im creating a view to put on top of this table.
This view will transform some foreign keys to a prefixed string using case statements and then grouping by these values.
Example of such a construct:
CASE WHEN a.SourceSystem_FK = 13 THEN 'i' + pAccHier.Prefix ELSE pAccHier.Prefix + CASE WHEN LEFT(a.AccountNumber, 1) IN ('3', '4') THEN c.ID + '_' ELSE '' END END + a.AccountNumber AS Account
If i do this select without the grouping, then the results starts showing up almost instantly.
But once i start grouping by these constructs, then preformance is drasticly slowed down. I have 6 such constructs and grouping by them will increase processing time from 1 sec to 3,5 mins.
Do you clever folks know any way i could optimize this? From the execution plan it looks like most of the time is spent sorting the result.
February 9, 2012 at 8:00 am
Could you please provide the full query at least, please. Grouping is relatively expensive operation as well as string concatination. It's impossible to advise on optimisation without having underlying object structure. If you could provide DDL for tables and the rest as per link in my signature, I believe you will find a lot of good advices posted.
February 9, 2012 at 2:51 pm
It will definitely have some impact because now the the QP has to apply the case and then do the aggregation(s) which are may need ordering in addition to grouping. For example consider these 3 queries in AdventureWorks2008:
SELECT
CASE WHEN LEFT(P.LastName, 1) = 'A' THEN 1
ELSE 2
END
FROM
person.Person AS P
GROUP BY
P.LastName
SELECT
CASE WHEN LEFT(P.LastName, 1) = 'A' THEN 1
ELSE 2
END
FROM
person.Person AS P
GROUP BY
CASE WHEN LEFT(P.LastName, 1) = 'A' THEN 1
ELSE 2
END
SELECT
CASE WHEN LEFT(P.LastName, 1) = 'A' THEN 1
ELSE 2
END
FROM
person.Person AS P
The first query is faster than the second because the ordering can be pushed to the index scan and a fast stream aggregate can be used for the aggregation and the compute scalar for the CASE can be done last. In the second query the same index is scanned but it can't be ordered because of the function, the CASE must be evaluated before the grouping and a more expensive HASH MATCH must be used for the aggregation since the results aren't ordered. The last query is the least expensive because there is no aggregation just an index scan and a compute scalar for the CASE.
IF you posted DDL and the full query we might be able to offer some optimization advice. You can't really compare a query with aggregation and a simple select because the aggregation will add to the cost.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
February 10, 2012 at 2:04 am
Thanks for both your answers and sorry for my late reply.
First of. Here is DDL of the main selected table "PostingYTD", its indexes and the cross joined functions that create prefixes. Further down you will find the complete select statement.
I have not provided the DDL of the DimensionBase tables, they are all just regular dimensions and is not the subject of my question.
Thank you for all helpful suggestions. Im running out of ideas here. I cant think of any way to speed it up, so hopefully you guys have trick hidden up your sleave.
DDL of PostingYTD, indexes and Function:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [Fact2012Base].[PostingYTD](
[Year] [int] NOT NULL,
[Month] [int] NULL,
[Period] [nvarchar](50) NULL,
[SourceSystem_FK] [int] NOT NULL,
[TransactionVersion_FK] [int] NOT NULL,
[Supplier_FK] [int] NULL,
[Customer_FK] [int] NULL,
[Vessel_FK] [int] NULL,
[SFICode_FK] [int] NULL,
[ProductGroupGeoteam_FK] [int] NULL,
[CurrencyOriginal_FK] [int] NULL,
[DynamicsDocumentType_FK] [int] NULL,
[StaffCategory_FK] [int] NULL,
[Currency_FK] [int] NOT NULL,
[ExchangeRateType_FK] [int] NOT NULL,
[Account_FK] [int] NOT NULL,
[TransactionType_FK] [int] NOT NULL,
[Department_FK] [int] NOT NULL,
[CounterPartCompany_FK] [int] NULL,
[Project_FK] [int] NULL,
[Company_FK] [int] NOT NULL,
[Amount_YTD] [float] NULL,
[Amount_EUR_YTD] [float] NULL,
[Updated] [datetime] NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_PostingYTD_Main] ON [Fact2012Base].[PostingYTD]
(
[Year] ASC,
[SourceSystem_FK] ASC,
[TransactionVersion_FK] ASC,
[TransactionType_FK] ASC,
[Amount_YTD] ASC
)
INCLUDE ( [Month],
[Period],
[Currency_FK],
[ExchangeRateType_FK],
[Account_FK],
[Department_FK],
[CounterPartCompany_FK],
[Project_FK],
[Company_FK]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_PostingYTD_Source] ON [Fact2012Base].[PostingYTD]
(
[SourceSystem_FK] ASC
)
INCLUDE ( [Year],
[Month],
[TransactionVersion_FK],
[Currency_FK],
[Account_FK],
[Department_FK],
[CounterPartCompany_FK],
[Company_FK],
[Amount_YTD]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_YTD_Currency] ON [Fact2012Base].[PostingYTD]
(
[Currency_FK] ASC,
[Company_FK] ASC,
[Year] ASC,
[SourceSystem_FK] ASC,
[TransactionType_FK] ASC
)
INCLUDE ( [Month],
[Period],
[TransactionVersion_FK],
[ExchangeRateType_FK],
[Account_FK],
[Department_FK],
[CounterPartCompany_FK],
[Project_FK],
[Amount_YTD]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE FUNCTION [Masterdata].[GetHierarchyPrefixRow]
(
-- Add the parameters for the function here
@hierarchy_FK int,
@hierarchyLevel smallint
)
RETURNS TABLE
AS
RETURN
(
Select Masterdata.GetHierarchyPrefix( @hierarchy_FK, @hierarchyLevel ) as Prefix
)
GO
CREATE FUNCTION [Masterdata].[GetHierarchyPrefix]
(
-- Add the parameters for the function here
@hierarchy_FK int,
@hierarchyLevel smallint
)
RETURNS nvarchar(50)
AS
BEGIN
DECLARE @resultPrefix nvarchar(21)
DECLARE @hierarchyPrefix nvarchar(5)
DECLARE @levelPrefix nvarchar(5)
DECLARE @dimesionPrefix nvarchar(5)
DECLARE @showHierarchyPrefix bit
DECLARE @showLevelPrefix bit
DECLARE @showDimesionPrefix bit
SELECT
@hierarchyPrefix = Masterdata.Hierarchy.Prefix,
@levelPrefix = Masterdata.HierarchyLevel.Prefix,
@dimesionPrefix = Masterdata.Dimension.Prefix,
@showHierarchyPrefix = Masterdata.HierarchyLevel.ShowHierarchyPrefix,
@showLevelPrefix = Masterdata.HierarchyLevel.ShowHierarchyLevelPrefix,
@showDimesionPrefix = Masterdata.HierarchyLevel.ShowDimensionPrefix
FROM Masterdata.Hierarchy INNER JOIN
Masterdata.HierarchyLevel ON Masterdata.Hierarchy.Hierarchy_PK = Masterdata.HierarchyLevel.Hierarchy_FK LEFT OUTER JOIN
Masterdata.Dimension ON Masterdata.HierarchyLevel.Dimension_FK = Masterdata.Dimension.Dimension_PK
WHERE (Masterdata.Hierarchy.Hierarchy_PK = @hierarchy_FK) AND (Masterdata.HierarchyLevel.[Level] = @hierarchyLevel)
SET @resultPrefix = ''
IF @showHierarchyPrefix = 1
SET @resultPrefix = @hierarchyPrefix
IF @showHierarchyPrefix = 1 AND @showLevelPrefix = 1
SET @resultPrefix = @resultPrefix + '.'
IF @showLevelPrefix = 1
SET @resultPrefix = @resultPrefix + @levelPrefix
IF @showLevelPrefix = 1 AND @showDimesionPrefix = 1
SET @resultPrefix = @resultPrefix + '.'
IF @showDimesionPrefix = 1
SET @resultPrefix = @resultPrefix + @dimesionPrefix
IF @showHierarchyPrefix = 1 OR @showLevelPrefix = 1 OR @showDimesionPrefix = 1
SET @resultPrefix = @resultPrefix + '.'
IF @resultPrefix is null
RETURN 'Error Hierarchy level undefined'
RETURN @resultPrefix
END
GO
Here is the select statement with the group by case statements
SELECT
CASE WHEN y.TransactionType_FK IN (1, 5) THEN 'Act' ELSE 'Bud' END AS Scenario,
CASE WHEN y.TransactionType_FK = 5 THEN 'YTD Journals Load' ELSE 'YTD Load' END AS [View],
Year,
Month,
Period,
pEntHier.Prefix + d.FullID AS Entity,
CASE WHEN a.SourceSystem_FK = 13 THEN 'i' + pAccHier.Prefix ELSE pAccHier.Prefix + CASE WHEN LEFT(a.AccountNumber, 1)
IN ('3', '4') THEN c.ID + '_' ELSE '' END END + a.AccountNumber AS Account,
pCountPartHier.Prefix + cp.ID AS CounterPart,
pProjHier.Prefix + pc.ID + '_' + p.ID AS Project,
cu.ID AS Currency,
y.Currency_FK,
y.ExchangeRateType_FK,
SUM(Amount_YTD) AS Amount_YTD
FROM
Fact2012Base.PostingYTD AS y
LEFT OUTER JOIN
DimensionBase.Project AS p
ON p.Project_FK = y.Project_FK
INNER JOIN
DimensionBase.Company AS pc
ON p.OwnerCompany_FK = pc.Company_FK
INNER JOIN
DimensionBase.Department AS d
ON y.Department_FK = d.Department_FK
INNER JOIN
DimensionBase.Account AS a
ON y.Account_FK = a.Account_FK
INNER JOIN
DimensionBase.Company AS c
ON y.Company_FK = c.Company_FK
AND y.Currency_FK = c.Currency_FK
INNER JOIN
DimensionBase.Currency AS cu
ON y.Currency_FK = cu.Currency_FK
LEFT OUTER JOIN
DimensionBase.Company AS cp
ON y.CounterpartCompany_FK = cp.Company_FK
CROSS JOIN Masterdata.GetHierarchyPrefixRow(10, 2) AS pCountPartHier
CROSS JOIN Masterdata.GetHierarchyPrefixRow(1, 3) AS pEntHier
CROSS JOIN Masterdata.GetHierarchyPrefixRow(3, 2) AS pAccHier
CROSS JOIN Masterdata.GetHierarchyPrefixRow(9, 1) AS pProjHier
WHERE
(y.Year >= 2010)
AND (y.TransactionVersion_FK <> 192)
AND (y.SourceSystem_FK > 0)
AND (y.TransactionType_FK IN (1, 2, 5))
AND (c.Country_FK = 31)
AND (y.Amount_YTD <> 0)
GROUP BY
CASE WHEN y.TransactionType_FK IN (1, 5) THEN 'Act' ELSE 'Bud' END,
CASE WHEN y.TransactionType_FK = 5 THEN 'YTD Journals Load' ELSE 'YTD Load' END,
Year,
Month,
Period,
pEntHier.Prefix + d.FullID,
CASE WHEN a.SourceSystem_FK = 13 THEN 'i' + pAccHier.Prefix ELSE pAccHier.Prefix + CASE WHEN LEFT(a.AccountNumber, 1)
IN ('3', '4') THEN c.ID + '_' ELSE '' END END + a.AccountNumber,
pCountPartHier.Prefix + cp.ID,
pProjHier.Prefix + pc.ID + '_' + p.ID,
cu.ID,
y.Currency_FK,
y.ExchangeRateType_FK
February 10, 2012 at 3:06 am
I would try to SELECT INTO #table the result of the query including the calculated columns used for grouping (without grouping itself), then the final query would select from #table and group as required
February 10, 2012 at 4:14 am
Eugene Elutin (2/10/2012)
I would try to SELECT INTO #table the result of the query including the calculated columns used for grouping (without grouping itself), then the final query would select from #table and group as required
Thanks for your tip.
I just tried this. The select into #table takes about 2:30 to run, and the final group by query 0:40 sec.
So end result is 3:10 which is marginally faster than doing the select in its original form. (3:30)
It seems as the share amount of data to copy into the temp table is whats making the first step take so long. If i just select (without into and without group by) then the results start coming instantly, but the 1.8 million rows it returns takes 2:30 to load.
February 10, 2012 at 6:08 am
Ok, your "results start coming instantly" means nothing, as most likely they will take even longer to come out in full than SELECT INTO. 1.8 millions should not take 2.30 min to insert. Can you attach the query plan? Next suspect whould be cross joining to user-defined function...
February 10, 2012 at 6:38 am
Hmm.. Its a bit difficult to get the actual execution plan for you. When i turn on include actual execution plan the query just goes on forever. I've been waiting 16 mins now for a result but nothing. I'll let it run for another 15 then i have to include the estimated one instead...
February 10, 2012 at 7:22 am
Seeing your query I don' think your main problem is the CASE statement included in the GROUP BY, I think it more likely related to your CROSS JOIN's to the function and then the fact that you are grouping on data from the function calls. I'd be willing to bet you basically have created a cursor with each function call because SQL is calling the functions once for each row instead of doing it in a set based manner, that just a guess. I'd try putting the results of the function calls in table variables or temp tables since they are not being run based on data in any of the tables so that the functions are only called once each.
I'd be interested to see what your performance would be if you eliminated the function calls.
I'd also be interested in seeing what a server-side trace produced when this query is run.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
February 10, 2012 at 7:24 am
Ok. I let the query run for an hour, no result. Seems like the actual executionplan is impossible to get.
Anyways, here is the estimated one, if that helps..
February 10, 2012 at 7:56 am
erikpoker (2/10/2012)
Ok. I let the query run for an hour, no result. Seems like the actual executionplan is impossible to get.Anyways, here is the estimated one, if that helps..
I don't see the grouping happening in this query. I do see that it looks like the optimizer was smart enough to only call the cross joined functions once which is a good thing.
You do have one other error in the query which I believe means you aren't getting the results you really want from the query. Although you have specified a LEFT OUTER JOIN to DimensionBase.Project the fact that you have that table INNER JOIN'd to DimensionBase.Company really converts the OUTER JOIN to an INNER JOIN. This is confirmed in the execution plan as you can see that optimizer has only implemented one OUTER JOIN and that is on the last LEFT OUTER JOIN to DimensionBase.Company AS cp ON y.CounterpartCompany_FK = cp.Company_FK. To actually get the first OUTER JOIN I believe you need to convert the INNER JOIN to DimensionBase.Company AS pc ON p.OwnerCompany_FK = pc.Company_FK to a LEFT OUTER JOIN and I think that will get you what you really want.
I'm actually a little surprised you are getting an OUTER JOIN on the second one since you have a filter on that table in the WHERE clause, but I'm not smart enough to understand why that would happen. I may ask someone who is though.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
February 10, 2012 at 8:12 am
The grouping is not happening in that queryplan. I attached the queryplan for the previous suggestion to first put the results in a temp table and then group the results. So the attached queryplan only applies to the first select into query, without any grouping.
As for your suggestions on the outer joins, you are right, a bit of a mixup there, but not affecting things much performance wise.
February 10, 2012 at 8:18 am
erikpoker (2/10/2012)
As for your suggestions on the outer joins, you are right, a bit of a mixup there, but not affecting things much performance wise.
Yeah it's not a performance issue, but if you aren't getting the right data then does performance really matter? I'm normally concerned with getting the right results first and then tuning to get the right results as quickly as possible.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
February 10, 2012 at 8:31 am
Jack Corbett (2/10/2012)
Yeah it's not a performance issue, but if you aren't getting the right data then does performance really matter? I'm normally concerned with getting the right results first and then tuning to get the right results as quickly as possible.
Lol. Actually it does belive it or not 🙂 I have a team of guys armed with excel and god-knows-what, whos job is to verify the numbers returned by this view. So my job is just to get the thing running as fast as humanly (or computerly) possible.
Right now i cant see many more possibilities to speed this thingy up. Maybe an indexed view, or a prepopulated table, but i dont really wanna go there...
February 10, 2012 at 8:47 am
Can you post the execution plan for the group by?
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply