September 24, 2014 at 4:02 am
I have a query i have been optimizing. Now runs in about 15 minutes but was wondering if there is any way tr educe the SORT cost.
Currently the high costs left are the Table insert which is 58% and the Sort cost of 36%
The inner query below is around 400million rows and aggregates to around 15,000,000 rows - all feed back welcome π
SELECT@1 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -1, GETDATE() -1) as DATE)
SELECT@2 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -2, GETDATE() -1) as DATE)
SELECT@3 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -3, GETDATE() -1) as DATE)
SELECT@4 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -4, GETDATE() -1) as DATE)
SELECT@5 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -5, GETDATE() -1) as DATE)
SELECT@6 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -6, GETDATE() -1) as DATE)
SELECT@7 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -7, GETDATE() -1) as DATE)
SELECT@8 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -8, GETDATE() -1) as DATE)
INSERTdbo.MyTable(CustomerKey, StoreKey, LastWeek, LastTwoWeeks, LastThreeWeeks, LastFourWeeks, LastFiveWeeks, LastSixWeeks, LastSevenWeeks, LastEightWeeks)
SELECTCustomerKey
, StoreKey
, SUM(One) ASLastWeek
, SUM(Two) ASLastTwoWeeks
, SUM(Three) ASLastThreeWeeks
, SUM(Four) ASLastFourWeeks
, SUM(Five) ASLastFiveWeeks
, SUM(Six) ASLastSixWeeks
, SUM(Seven) ASLastSevenWeeks
, SUM(Eight) ASLastEightWeeks
FROM(
SELECTCustomerKey
, StoreKey
, CASE WHEN StorePurchaseDate > @1 THEN 1 END as One
, CASE WHEN StorePurchaseDate > @2 THEN 1 END as Two
, CASE WHEN StorePurchaseDate > @3 THEN 1 END as Three
, CASE WHEN StorePurchaseDate > @4 THEN 1 END as Four
, CASE WHEN StorePurchaseDate > @5 THEN 1 END as Five
, CASE WHEN StorePurchaseDate > @6 THEN 1 END as Six
, CASE WHEN StorePurchaseDate > @7 THEN 1 END as Seven
, CASE WHEN StorePurchaseDate > @8 THEN 1 END as Eight
FROMdbo.FactStore
UNION ALL
SELECTCustomerKey
, StoreKey
, CASE WHEN OnlinePurchaseDate > @1 THEN 1 END as One
, CASE WHEN OnlinePurchaseDate > @2 THEN 1 END as Two
, CASE WHEN OnlinePurchaseDate > @3 THEN 1 END as Three
, CASE WHEN OnlinePurchaseDate > @4 THEN 1 END as Four
, CASE WHEN OnlinePurchaseDate > @5 THEN 1 END as Five
, CASE WHEN OnlinePurchaseDate > @6 THEN 1 END as Six
, CASE WHEN OnlinePurchaseDate > @7 THEN 1 END as Seven
, CASE WHEN OnlinePurchaseDate > @8 THEN 1 END as Eight
FROMdbo.FactOnline
) as x
GROUP BY CustomerKey
, StoreKey;
GO
September 24, 2014 at 5:14 am
There doesn't seem to be any tuning opportunities in the code itself. You're retrieving all the data from two tables in a UNION ALL with no filtering. The optimizer is going to have very few choices in how to deal with that. Then you have the aggregation part of the query. The only way to improve this, based on the limited knowledge I have, would be to restructure your tables to avoid the UNION, possibly, and add a columnstore index to the restructured table for the aggregation. But as long as you don't have any filtering in place, there just isn't much that can be done here.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 24, 2014 at 5:24 am
Thanks for the reply.
the table will be dropped and recreated form scratch once a week our of BAU and the used only to be absorbed into QlikView.
There is some conditions
SELECTCustomerKey
, StoreKey
, CASE WHEN StorePurchaseDate > @1 THEN 1 END as One
, CASE WHEN StorePurchaseDate > @2 THEN 1 END as Two
, CASE WHEN StorePurchaseDate > @3 THEN 1 END as Three
, CASE WHEN StorePurchaseDate > @4 THEN 1 END as Four
, CASE WHEN StorePurchaseDate > @5 THEN 1 END as Five
, CASE WHEN StorePurchaseDate > @6 THEN 1 END as Six
, CASE WHEN StorePurchaseDate > @7 THEN 1 END as Seven
, CASE WHEN StorePurchaseDate > @8 THEN 1 END as Eight
FROMdbo.FactStore
WHERE IsReturned = 'N'
UNION ALL
SELECTCustomerKey
, StoreKey
, CASE WHEN OnlinePurchaseDate > @1 THEN 1 END as One
, CASE WHEN OnlinePurchaseDate > @2 THEN 1 END as Two
, CASE WHEN OnlinePurchaseDate > @3 THEN 1 END as Three
, CASE WHEN OnlinePurchaseDate > @4 THEN 1 END as Four
, CASE WHEN OnlinePurchaseDate > @5 THEN 1 END as Five
, CASE WHEN OnlinePurchaseDate > @6 THEN 1 END as Six
, CASE WHEN OnlinePurchaseDate > @7 THEN 1 END as Seven
, CASE WHEN OnlinePurchaseDate > @8 THEN 1 END as Eight
FROMdbo.FactOnline
didn't think the where clause for the first part of the union all was relevant, sorry for missing it out.
thnaks
September 24, 2014 at 6:49 am
Have you tried using appropriate date filters on each table?
Something like StorePurchaseDate >= @8.
Posting the actual execution plan of the query with the table insert part commented out might be useful.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 24, 2014 at 7:19 am
Lots of possible issues here:
1) The SORT isn't because of UNION. You are doing a UNION ALL, which does NOT do the SORT/DISTINCT that a UNION alone does. The SORT is for the GROUP BY. That cannot be avoided unless you make some form of index that can be used to avoid it.
2) My belief it is is something related to the INSERT itself OR possibly a spill to tempdb for the GROUP BY SORT. Look at query plan details to find out about the spill. Monitor wait stats and especially file IO stalls during the run to find the pain points. Also use sp_whoisactive to see real-time performance issues.
3) How much memory does server have? How many cores? What is MAXDOP setting and are that many cores being used? What is IO subsystem picture?
4) You say this is a complete rebuild process? Now another main issue becomes are you getting minimally logged INSERTs? Database recovery model, TABLOCKX, indexing on table being inserted to, etc. all come into play for that.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
September 24, 2014 at 8:05 am
Some filtering then, but, assuming that's a CHAR(1) field, not much filtering at all, so you're still going to be looking at scans and few options for the optimizer to help you out. Kevin is probably on the right track, with no changes to processing or structure, you're down to the hardware to carry the load.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 24, 2014 at 8:08 am
thanks, there is a message about spilling into tempdb. This is currently being run on a dev server though which is shared and much different from the production one for the client.
September 24, 2014 at 8:13 am
There is actually an optimization that can be done to this code. It's called "pre-aggregation" (a term that I go from SQL Server MVP Peter Larsson). If you were to do SUM(CASE) in each of the queries in the FROM clause, less time would be spent in the UNION ALL and a couple of other things. I've used the method quite successfully in other things of this nature.
And, yeah... it seems counter intuitive but give it a try. Don't forget that missing WHERE clause in that one query. π
--Jeff Moden
Change is inevitable... Change for the better is not.
September 24, 2014 at 9:05 am
Thanks for that, the sort cost increases when doing the pre-aggregation, however the overall run time does down as well estimate IO etc
September 25, 2014 at 8:06 am
When you break this down you are creating weekly bandings, the following query reduces the need for the parameters, as this could form either a CTE, Subquery or Table valued function, with a parameter for number of weeks.
SELECT
DateKey
,(DATEPART(WEEK,CONVERT(DATE,GetDATE()))-DatePart(WEEK,TheDate))+1 Band
FROM
DimDate
where
TheDate
Between CONVERT(DATE,DATEAdd(week,-7,GetDATE()))
AND GetDATE()-1
This would then form part of a join on the Store/Online PurchaseDate Key, to limit the rows, and in that case an index on that column would possibly be beneficial.
To replicate the Logic replacing the
CASE WHEN StorePurchaseDate > @1 THEN 1 END AS ONE
With
SUM (CASE Band WHEN 1 THEN 1 ELSE 0 END) AS ONE
,SUM (CASE Band WHEN 2 THEN 1 ELSE 0 END) AS TWO
should replicate it.
I'm not certain what it is you are counting, but there a possible issue with the numbers, as the current logic counts number of items, assuming each row is a single sale product, what if a customer buys 10 items in one transaction, is that 10 rows or a single row?
_________________________________________________________________________
SSC Guide to Posting and Best Practices
September 25, 2014 at 12:00 pm
Try this:
SELECT COALESCE(fs.CustomerKey, fo.CustomerKey) AS CustomerKey
, COALESCE(fs.StoreKey, fo.StoreKey) AS StoreKey
, ISNULL(fs.LastWeek, 0) + ISNULL(fo.LastWeek, 0) AS LastWeek
, ISNULL(fs.LastTwoWeeks, 0) + ISNULL(fo.LastTwoWeeks, 0) AS LastTwoWeeks
, ISNULL(fs.LastThreeWeeks, 0) + ISNULL(fo.LastThreeWeeks, 0) AS LastThreeWeeks
, ISNULL(fs.LastFourWeeks, 0) + ISNULL(fo.LastFourWeeks, 0) AS LastFourWeeks
, ISNULL(fs.LastFiveWeeks, 0) + ISNULL(fo.LastFiveWeeks, 0) AS LastFiveWeeks
, ISNULL(fs.LastSixWeeks, 0) + ISNULL(fo.LastSixWeeks, 0) AS LastSixWeeks
, ISNULL(fs.LastSevenWeeks, 0) + ISNULL(fo.LastSevenWeeks, 0) AS LastSevenWeeks
, ISNULL(fs.LastEightWeeks, 0) + ISNULL(fo.LastEightWeeks, 0) AS LastEightWeeks
FROM (
SELECT CustomerKey
, StoreKey
, SUM(CASE WHEN StorePurchaseDate > @1 THEN 1 ELSE 0 END) as LastWeek
, SUM(CASE WHEN StorePurchaseDate > @2 THEN 1 ELSE 0 END) as LastTwoWeeks
, SUM(CASE WHEN StorePurchaseDate > @3 THEN 1 ELSE 0 END) as LastThreeWeeks
, SUM(CASE WHEN StorePurchaseDate > @4 THEN 1 ELSE 0 END) as LastFourWeeks
, SUM(CASE WHEN StorePurchaseDate > @5 THEN 1 ELSE 0 END) as LastFiveWeeks
, SUM(CASE WHEN StorePurchaseDate > @6 THEN 1 ELSE 0 END) as LastSixWeeks
, SUM(CASE WHEN StorePurchaseDate > @7 THEN 1 ELSE 0 END) as LastSevenWeeks
, SUM(CASE WHEN StorePurchaseDate > @8 THEN 1 ELSE 0 END) as LastEightWeeks
FROM dbo.FactStore
WHERE
StorePurchaseDate > @8
GROUP BY CustomerKey
, StoreKey
) AS fs
FULL OUTER JOIN (
SELECT CustomerKey
, StoreKey
, SUM(CASE WHEN OnlinePurchaseDate > @1 THEN 1 ELSE 0 END) as LastWeek
, SUM(CASE WHEN OnlinePurchaseDate > @2 THEN 1 ELSE 0 END) as LastTwoWeeks
, SUM(CASE WHEN OnlinePurchaseDate > @3 THEN 1 ELSE 0 END) as LastThreeWeeks
, SUM(CASE WHEN OnlinePurchaseDate > @4 THEN 1 ELSE 0 END) as LastFourWeeks
, SUM(CASE WHEN OnlinePurchaseDate > @5 THEN 1 ELSE 0 END) as LastFiveWeeks
, SUM(CASE WHEN OnlinePurchaseDate > @6 THEN 1 ELSE 0 END) as LastSixWeeks
, SUM(CASE WHEN OnlinePurchaseDate > @7 THEN 1 ELSE 0 END) as LastSevenWeeks
, SUM(CASE WHEN OnlinePurchaseDate > @8 THEN 1 ELSE 0 END) as LastEightWeeks
FROM dbo.FactOnline
WHERE
OnlinePurchaseDate > @8
GROUP BY CustomerKey
, StoreKey
) as fo ON fo.CustomerKey = fs.CustomerKey AND fo.StoreKey = fs.StoreKey
GROUP BY COALESCE(fs.CustomerKey, fo.CustomerKey)
, COALESCE(fs.StoreKey, fo.StoreKey)
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 25, 2014 at 12:17 pm
Quick question if StorePurchaseDate is not null is the OnlinePurchaseDate null and vice versa? Or do you have an OnlinePurchase flag?
π
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply