August 1, 2017 at 10:02 am
Hi,
I'm a DW/ETL developer and I've created a data load process to load an aggregated table in my data warehouse. One of the joins in the query contains a 'BETWEEN' operator on DateKey (an integer representation of date in format YYYYMMDD) and according to my execution plan, the clustered index seek takes up 43% of the query cost. I don't usually use a BETWEEN in a join if I can avoid it, so I'm not sure if this is actually the issue or not, but I'm keen to improve the performance of this query as it's causing my aggregated table load to run very slowly.
I've attached the execution plan.
The table with the slow index is defined as below.CREATE TABLE [dbo].[tblCostPrices](
[StoreKey] [INT] NOT NULL,
[ProductKey] [INT] NOT NULL,
[FromDateKey] [INT] NOT NULL,
[ToDateKey] [INT] NOT NULL,
[PriceLevel] [INT] NOT NULL,
[CostPrice] [FLOAT] NOT NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [CL_CostPrices] ON [dbo].[tblCostPrices]
(
[StoreKey] ASC,
[ProductKey] ASC,
[PriceLevel] ASC,
[FromDateKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Anybody got any advice?
Thanks.
August 1, 2017 at 10:22 am
The issue is that you're selecting all the rows. To be able to take full advantage of an index, you need to limit the results.
You're obfuscating the query by using a view and you attached an estimated plan instead of an actual plan.
The BETWEEN is usually not a performance problem.
August 1, 2017 at 10:26 am
What is the query? In your execution plan all I see is this:
select vss.* from aggr.vfactSaleSummary as vss
August 1, 2017 at 10:44 am
Sorry I'd forgotten I'd used views. The vfactSalesSummary view is:SELECT StoreKey
, EmployeeKey
, EmployeeClassKey
, TillKey
, DateKey
, Hour
, Quarter
, SUM(TransactionCount) AS TransactionCount
, SUM(Quantity) AS Quantity
, SUM(SalesInclVAT) AS SalesInclVAT
, SUM(SalesVAT) AS SalesVAT
, SUM(SalesExclVAT) AS SalesExclVAT
, SUM(Cost) AS Cost
, SUM(DiscountInclVAT) AS DiscountInclVAT
, SUM(DiscountExclVAT) AS DiscountExclVAT
, SUM(LoyaltyCardQuantity) AS LoyaltyCardQuantity
, SUM(LoyaltyCardAmount) AS LoyaltyCardAmount
, SUM(DrinksLoyaltyAmount) AS DrinksLoyaltyAmount
, SUM(NewspapersLoyaltyAmount) AS NewspapersLoyaltyAmount
, SUM(ExtrasLoyaltyAmount) AS ExtrasLoyaltyAmount
, SUM(NegativeTransactionCount) AS NegativeTransactionCount
, SUM(ZeroValueTransactionCount) AS ZeroValueTransactionCount
, SUM(UnderFiftyPenceTransactionCount) AS UnderFiftyPenceTransactionCount
, SUM(UnderOnePoundTransactionCount) AS UnderOnePoundTransactionCount
, SUM(UnderFivePoundsTransactionCount) AS UnderFivePoundsTransactionCount
, SUM(UnderTenPoundsTransactionCount) AS UnderTenPoundsTransactionCount
, SUM(UnderFifteenPoundsTransactionCount) AS UnderFifteenPoundsTransactionCount
, SUM(UnderTwentyPoundsTransactionCount) AS UnderTwentyPoundsTransactionCount
, SUM(OverTwentyPoundsTransactionCount) AS OverTwentyPoundsTransactionCount
, SUM(UnderOneDollarFiftyTransactionCount) AS UnderOneDollarFiftyTransactionCount
, SUM(UnderTwoTransactionCount) AS UnderTwoTransactionCount
, SUM(ReturnsQuantity) AS ReturnsQuantity
, SUM(ReturnSalesInclVAT) AS ReturnSalesInclVAT
, SUM(ReturnSaleExclVAT) AS ReturnSaleExclVAT
, SUM(FoodOnlyTransaction) AS FoodOnlyTransaction
, SUM(DrinksAmount) AS DrinksAmount
, SUM(HotAndIcedDrinksAmount) AS HotAndIcedDrinksAmount
, SUM(NewspapersAmount) AS NewspapersAmount
, SUM(FoodAmount) AS FoodAmount
, SUM(FoodQuantity) AS FoodQuantity
, SUM(CustomerEatIn) AS CustomerEatIn
, SUM(CustomerTakeaway) AS CustomerTakeaway
, SUM(CustomerCount) AS CustomerCount
FROM aggr.vfactSalesSummaryStage
GROUP BY StoreKey
, EmployeeKey
, EmployeeClassKey
, TillKey
, DateKey
, Hour
, Quarter;
and that refers to another view where the meat happens:
SELECT fsd.TransactionReference
, fsd.StoreKey
, fsd.EmployeeKey
, fsd.EmployeeClassKey
, fsd.TillKey
, fsd.DateKey
, DATEPART(hh, fsd.TransactionTime) AS Hour
, FLOOR(DATEPART(mi, fsd.TransactionTime) / 15) * 15 AS Quarter
, 1 AS TransactionCount
, SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
ELSE 1
END * CASE WHEN fsd.TransactionTypeKey = 1
AND fsd.ReturnFlag = 0 THEN 1
ELSE 0
END) AS Quantity
, SUM(ISNULL(fsd.SalesInclVAT, 0)) AS SalesInclVAT
, SUM(ISNULL(fsd.SalesVAT, 0)) AS SalesVAT
, SUM(ISNULL(fsd.SalesInclVAT, 0) - ISNULL(fsd.SalesVAT, 0)) AS SalesExclVAT
, SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
ELSE 1
END
* CASE WHEN fsd.TransactionTypeKey = 1
AND fsd.ReturnFlag = 0
THEN ISNULL(tcp.CostPrice, 0)
ELSE 0
END) AS Cost
, SUM(CASE WHEN fsd.TransactionTypeKey = 2
THEN ISNULL(fsd.SalesInclVAT, 0) * -1
ELSE 0
END) AS DiscountInclVAT
, SUM(CASE WHEN fsd.TransactionTypeKey = 2
THEN (ISNULL(fsd.SalesInclVAT, 0)
- ISNULL(fsd.SalesVAT, 0)) * -1
ELSE 0
END) AS DiscountExclVAT
-- Loyalty Cards
, SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
AND dp.MajorGroupName <> 'Drink Extra' THEN 1
ELSE 0
END) AS LoyaltyCardQuantity
, SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
THEN ISNULL(fsd.SalesInclVAT, 0) * -1
ELSE 0
END) AS LoyaltyCardAmount
-- Loyalty amounts
, SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
AND LEFT(dp.MajorGroupName, 5) = 'Drink'
AND dp.MajorGroupName <> 'Drink Extra'
THEN ISNULL(fsd.SalesInclVAT, 0) * -1
ELSE 0
END) AS DrinksLoyaltyAmount
, SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
AND dp.FamilyGroupName = 'Newspapers'
THEN ISNULL(fsd.SalesInclVAT, 0) * -1
ELSE 0
END) AS NewspapersLoyaltyAmount
, SUM(CASE WHEN dd.DiscountDesc = 'Loyalty Card'
AND LEFT(dp.MajorGroupName, 5) = 'Drink'
AND dp.MajorGroupName = 'Drink Extra'
THEN ISNULL(fsd.SalesInclVAT, 0) * -1
ELSE 0
END) AS ExtrasLoyaltyAmount
-- Transaction Flags
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) < 0 THEN 1
ELSE 0
END AS NegativeTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) = 0 THEN 1
ELSE 0
END AS ZeroValueTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 0.0001 AND 0.5
THEN 1
ELSE 0
END AS UnderFiftyPenceTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 0.0001 AND 1
THEN 1
ELSE 0
END AS UnderOnePoundTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 0 AND 4.9999
THEN 1
ELSE 0
END AS UnderFivePoundsTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 5 AND 9.9999
THEN 1
ELSE 0
END AS UnderTenPoundsTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 10 AND 14.9999
THEN 1
ELSE 0
END AS UnderFifteenPoundsTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 15 AND 19.9999
THEN 1
ELSE 0
END AS UnderTwentyPoundsTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) >= 20 THEN 1
ELSE 0
END AS OverTwentyPoundsTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)
- ISNULL(fsd.SalesVAT, 0)) < 1.5 THEN 1
ELSE 0
END AS UnderOneDollarFiftyTransactionCount
, CASE WHEN SUM(ISNULL(fsd.SalesInclVAT, 0)) BETWEEN 0.0001 AND 2
THEN 1
ELSE 0
END AS UnderTwoTransactionCount
-- Returns (Refunds)
, SUM(CASE WHEN fsd.TransactionTypeKey = 1
AND fsd.ReturnFlag = 1 THEN 1
ELSE 0
END) AS ReturnsQuantity
, SUM(CASE WHEN fsd.TransactionTypeKey = 1
AND fsd.ReturnFlag = 1
THEN ISNULL(fsd.SalesInclVAT, 0)
ELSE 0
END) AS ReturnSalesInclVAT
, SUM(CASE WHEN fsd.TransactionTypeKey = 1
AND fsd.ReturnFlag = 1
THEN ISNULL(fsd.SalesInclVAT, 0)
- ISNULL(fsd.SalesVAT, 0)
ELSE 0
END) AS ReturnSaleExclVAT
, CASE WHEN SUM(CASE WHEN fsd.TransactionTypeKey = 1
AND LEFT(MajorGroupName, 4) = 'Food'
THEN 1
ELSE 0
END) = SUM(CASE WHEN fsd.TransactionTypeKey = 1
THEN 1
ELSE 0
END) THEN 1
ELSE 0
END AS FoodOnlyTransaction
-- Product Amounts
, SUM(CASE WHEN LEFT(dp.MajorGroupName, 5) = 'Drink'
AND dp.MajorGroupName <> 'Drink Extra'
THEN ISNULL(fsd.SalesInclVAT, 0)
ELSE 0
END) AS DrinksAmount
, SUM(CASE WHEN dp.MajorGroupName IN ('Drink Hot', 'Drink Iced')
THEN ISNULL(fsd.SalesInclVAT, 0)
ELSE 0
END) AS HotAndIcedDrinksAmount
, SUM(CASE WHEN dp.FamilyGroupName = 'Newspapers'
THEN ISNULL(fsd.SalesInclVAT, 0)
ELSE 0
END) AS NewspapersAmount
, SUM(CASE WHEN LEFT(dp.MajorGroupName, 4) = 'Food'
THEN ISNULL(fsd.SalesInclVAT, 0)
ELSE 0
END) AS FoodAmount
, SUM(CASE WHEN LEFT(dp.MajorGroupName, 4) = 'Food' THEN 1
ELSE 0
END) AS FoodQuantity
-- Customers
, SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
ELSE 1
END * CASE WHEN LEFT(dp.MajorGroupName, 5) = 'Drink'
AND dp.MajorGroupName <> 'Drink Extra'
AND fsd.OrderTypeKey IN (1, 3)
AND fsd.TransactionTypeKey = 1
AND fsd.SalesInclVAT <> 0
AND ISNULL(tcp.CostPrice, 0) <> 0
AND fsd.ReturnFlag = 0 THEN 1
ELSE 0
END) AS CustomerEatIn
, SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
ELSE 1
END * CASE WHEN LEFT(dp.MajorGroupName, 5) = 'Drink'
AND dp.MajorGroupName <> 'Drink Extra'
AND fsd.OrderTypeKey IN (2, 4)
AND fsd.TransactionTypeKey = 1
AND fsd.SalesInclVAT <> 0
AND ISNULL(tcp.CostPrice, 0) <> 0
AND fsd.ReturnFlag = 0 THEN 1
ELSE 0
END) AS CustomerTakeaway
, SUM(CASE WHEN fsd.VoidFlag = 1 THEN -1
ELSE 1
END * CASE WHEN LEFT(dp.MajorGroupName, 5) = 'Drink'
AND dp.MajorGroupName <> 'Drink Extra'
AND fsd.OrderTypeKey IN (1, 2, 3, 4)
AND fsd.TransactionTypeKey = 1
AND fsd.SalesInclVAT <> 0
AND ISNULL(tcp.CostPrice, 0) <> 0
AND fsd.ReturnFlag = 0 THEN 1
ELSE 0
END) AS CustomerCount
FROM syn.factSalesDetail AS fsd
JOIN syn.dimProduct AS dp
ON fsd.ProductKey = dp.ProductKey
JOIN syn.dimDiscount AS dd
ON dd.DiscountKey = fsd.DiscountKey
LEFT JOIN syn.tblCostPrices AS tcp
ON fsd.PriceLevel = tcp.PriceLevel
AND fsd.ProductKey = tcp.ProductKey
AND fsd.StoreKey = tcp.StoreKey
AND fsd.DateKey BETWEEN tcp.FromDateKey
AND tcp.ToDateKey
JOIN control.tblTransactionSummaryReloadDates AS ttsrd
ON ttsrd.dateKey = fsd.DateKey
GROUP BY fsd.TransactionReference
, fsd.StoreKey
, fsd.EmployeeKey
, fsd.EmployeeClassKey
, fsd.TillKey
, fsd.DateKey
, DATEPART(hh, fsd.TransactionTime)
, FLOOR(DATEPART(mi, fsd.TransactionTime) / 15) * 15;
I just thought - would adding the ToDateKey into the clustered index help do you think?
August 1, 2017 at 11:03 am
I'll repeat it.
You're querying too many rows. The estimates show about 12M rows.
That said, I wonder what would happen if you add a JOIN hint. Try adding the missing words to leave it like this:LEFT OUTER HASH JOIN syn.tblCostPrices AS tcp
August 1, 2017 at 11:19 am
Unfortunately everything I do for this client is 'too many rows' - they are a large international retail chain and have enormous data quantities coming into their DW on a daily basis. The query I've posted here was 30 days of sales transactions which I'm attempting to aggregate into a table to use for reporting purposes. The 30 day window can possibly be narrowed to reduce the amount of data, but business practices dictate that sometimes we will need to refresh the data that far back on a daily basis so I'd rather not change that.
I'll give the HASH a try and see if that makes any difference. I've currently got the query running in around 20 mins, but I'd love to shave some more time off that if I can.
August 1, 2017 at 11:37 am
Looking at the queries I see nothing to filter the data to 30 days worth of data unless the tables you are running the queries against only have 30 days worth of data. There is no where clause in either query that you have posted.
August 1, 2017 at 12:50 pm
IMHO I would try removing the SUM and group by from the 'meat' SQL, since you are summing the data in the other SQL. I'm not saying it will help but that is what I would try..I know I've had SQL perform poorly when I have a lot of case statements and I'm trying to sum the data at the same time.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
August 2, 2017 at 1:58 am
The 30 day filtering is done using the control.tblTransactionSummaryReloadDates table. This has 30 date key values loaded into it in a previous step of the ETL and so the inner join to this table effectively filters the query. This allows us to control how many days are loaded without altering the SQL code.
The complexity of the meat query comes from the fact I'm calculating a lot of KPIs at a transaction level. The raw data in factSalesDetail is at line level so I effectively have to group twice - once to calculate many of the measures by transaction and then again to 'count' those up to the level of aggregation I actually want.
I'm starting to think 20 mins might not be too bad based on how complex this is and how much data....
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply