December 21, 2009 at 12:43 pm
Hello All..
Let me know if anyone feels this should be expected. The performance doesn't feel right to me.
Here is my actual SQL. (BTW, i'm testing so don't yell at me for the hard dates or inputs!!!!)
I hope i didn't make it difficult to read......i inserted **'s at the problem area.
When i run this without the sums, the query runs in 4-10 seconds.
With the sums it goes to 4 1/2 to 5 minutes. Both queries return the same number of rows and would be summing FROM the same number of records.
Does this seem right to all the smart people out there?? (I haven't made it to that list yet!)
Thanks for looking - Peter
======================================================
declare @FromDt as datetime
declare @ToDt as datetime
set @FromDt = '12/19/09'
set @ToDt = '12/19/09';
WITH ctTrans (TRANS_ID) AS
(
SELECT DISTINCT TRANS_ID FROM TRN.TRNDTL_SPR09 AS TD INNER JOIN
MSDM.MERCH.PDT_USRTL AS P ON TD.SKU_NUM = P.SKU_NUMBER WHERE P.PDT_CLASS_NUMBER = 965
AND TD.WK_END_DT BETWEEN @FromDt AND @ToDt
),
ctClass (TRANS_ID, PDT_ID, PDT_DSCR, PDT_UNITS) AS
(
SELECT
TD.TRANS_ID, P.PDT_CLASS_NUMBER AS PDT_ID, P.PDT_CLASS_DSCR AS PDT_DSCR, ****SUM(SLS_UNITS) AS PDT_UNITS****
FROM
TRN.TRNDTL_SPR09 AS TD INNER JOIN
MSDM.MERCH.PDT_USRTL AS P ON
TD.SKU_NUM = P.SKU_NUMBER
WHERE
TD.WK_END_DT BETWEEN @FromDt AND @ToDt AND PDT_DEPT_NUMBER NOT IN (46, 49)
GROUP BY
TD.TRANS_ID, P.PDT_CLASS_NUMBER, P.PDT_CLASS_DSCR
)
SELECT ctClass.PDT_ID, COUNT(ctClass.TRANS_ID) as ClassCount, ****SUM(ctClass.PDT_UNITS) as PDT_UNITS****
FROM ctClass INNER JOIN ctTrans ON
ctClass.Trans_ID = ctTrans.Trans_Id
GROUP BY ctClass.PDT_ID
ORDER BY COUNT(ctClass.TRANS_ID) DESC
December 21, 2009 at 1:06 pm
Looks like you are using a linked server. Are the summed columns on the linked server table? If so, I;d recommend doing this a bit differently so that as much processing is down remotely as possible. If you can make 1 linked server call that returns the data already summed (as close as you can get to the final results) and inserts into a temp table and then use the temp table to join to your local table. I'd be willing to bet you see a vast improvement.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 21, 2009 at 1:08 pm
Another comment from me:
I am just starting to use CTE's. I find they are easier to manage and perform better. This query started as a table joinded to a query with another query in my where clause.
When i run the query in that format, it runs closer to 2 minutes vs. the 4-5.
I guess my real question is - From experience, CTE's -OR- subqueries and joined queries??
And I'm sure the answer - It depends!!!!
Thanks again.
December 21, 2009 at 1:14 pm
Hi Jack
No, not a linked server. Tables live directly in a database on the SQL Server.
Check out my comment. Looks like we posted about the same time.
I'm migrating to common table expressions and away from temp tables and subqueries. This just might be one of those instances where the CTE is less effecient.
December 21, 2009 at 1:41 pm
Oops my bad, I saw interpreted the underscore in the second table name as a period and thought I saw 4-part naming.
Can you post execution plans (graphical)? That will help.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 21, 2009 at 1:50 pm
Please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 21, 2009 at 1:52 pm
i have not posted a plan before.....what's the best way to do that??
December 21, 2009 at 1:54 pm
Just saw Gila's
I'll follow the instructions and post.
December 21, 2009 at 2:14 pm
table: TRN.TRNDTL_SPR09
trans_id
loc_num
sku_num
sls_units
sls_amt
mgn_amt
cld_dt
wk_end_dt
PK- trans_id, sku_num
IDX - one each on: trans_id; cld_dt; loc_num; sku_num; wk_end_dt;
execution plan attached.
December 21, 2009 at 2:36 pm
Thanks, but you have still left us with incomplete information as you only included the schema for one of the tables involved in the query. It would also be helpful to have the query plan form the fast query as well. Here are a few comments on what you provided thus far:
1. Does this
IDX - one each on: trans_id; cld_dt; loc_num; sku_num; wk_end_dt;
mean that you 5 single column indexes? Those are typically not the most useful indexes.
2. From the query plan I see that the actual rows vs. estimated rows for the first Index Seek on TRDTL_SPR09 is off by tens of millions which means you probably should update statistics on that table.
3. From the query plan I see a lot of parallelism so you may want to try OPTION (MaxDop 1) to force it to not be parallelized. Comment 2 may help with this as well.
4. The optimizer is suggesting an index on WK_END_DT with included columns TRANS_ID, SKU_NUM, SLS_UNITS to make an index that will cover that query. I don't know that you should definitely create it, but I do think you need to re-examine your indexes.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 21, 2009 at 2:47 pm
I second what Jack said on the updating of the statistics, that would certainly be your first step.
I would also look how the join is made on the table : TRDTL_SPR09, it seems most of the execution plan is wasted there, and maybe if you could aim for a covering index, or change your join to do a clustered seek, that would certainly be your best bet.
Most of the tables you are playing with are pretty big, but this 1 is simply huge, so it would help not having to completely scan it.
Cheers,
J-F
December 21, 2009 at 6:02 pm
Hi
#1 - thanks for the input so far guys.
#2 - as i'm sure you figured out, i am far from a dba, and i don't even play a good one on tv. I have enough knowledge to be dangerous.
Now for some answers. Yes, 5 single field indexes. I did that because that table is typically searched on any 5 of those fields frequently. Which brings a question from me - If there is a more frequent search including 2 of the fields say sku_num & wk_end_dt, should i create that index??
The table is HUGE. Sales transaction table. I append once a week to it on a sunday night. I then rebuild all the indexes (which takes about 5 hours over night). ANother question - does rebuilding the index rebuild statistics?? I use ALTER INDEX etc etc etc.
I didnt include the data on the product table for the reason its pretty simple.....about 200k rows, sku_num (PK), then some item descriptive data. No other indexes.
I don't have the execution plan available to me at this moment for the faster running query as i'm checking my emails from home now.
I'm not familiar with the term covering index, but i will certainly read up on it.
thanks again.
December 22, 2009 at 1:41 am
You can start with this series on indexes, it may help
http://www.sqlservercentral.com/articles/Indexing/68439/
If you have queries often filtering on two columns then you really should add an index on those two columns. It doesn't have to be a new one, you could widen an existing index by adding another column to it.
Rebuilding an index updates the stats on that index. Reorganising does not. It's only the stats on that index, if there are column stats they may need manual updates (or they may not, hard to tell sometimes)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 22, 2009 at 6:50 am
There are a couple nasty table scans in there. Can you add this index and then post the revised execution plan?
CREATE NONCLUSTERED INDEX idx_TRNDTLSPR09_WKENDDT
ON [TRN].[TRNDTL_SPR09] ([WK_END_DT])
INCLUDE ([TRANS_ID],[SKU_NUM],[SLS_UNITS])
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 22, 2009 at 9:20 am
Now that i understand the communication better, let me give the scenario the way i should have started.
TABLE: DETAIL SALES TRANSACTIONS
SKU_NUM - PK & single column index
TRANS_ID - PK & single column index
LOC_NUM - single idx
CLD_DT - single idx
WK_END_DT - single idx
UNITS
SALES
MARGIN
I will attach a text file that is one of the most run queries against this table.
My thinking (right or wrong) on the individual indexes was that those are the most searched columns. Now i'm thinking i should have something different.
You should see from the query that SKU, WK_END & TRANS_ID are in the WHERE & JOINS most of the time. So if i search for transactions that contain a certain SKU should one of my indexes be SKU with an INCLUDE on WK_END & TRANS_ID or should the index be all 3 or both.
I know an execution plan would probably give you the answer, but i guess from experience first what would you think.
Also, what started all of this for me was that i started using CTE's over subqueries because i started getting better performance. This particular query runs faster in the attached format.
THANK YOU ALL FOR THE FEEDBACK
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply