July 30, 2012 at 4:52 pm
I have a script that uses the GROUP BY clause and returns the SUM much greater than expected. Here is the script with GROUP BY followed by the script without GROUP BY. The value returned by SUM(RETAIL_SALES.[Sales Units LW]) as [Sales Units LW] is 18426. The correct value should be 6142. I appreciate your help.
STYLE COLOR SEASON YR MO WK Sales Units LW
HK87202 FWHG 2012 20127418426.00
STYLE COLOR SEASON YR MO WK Sales Units LW
HK87202 FWHG 2012201274 366
HK87202 FWHG 2012201274 796
HK87202 FWHG 20122012741189
HK87202 FWHG 20122012741814
HK87202 FWHG 20122012741977
TOTAL 6142
SELECT DISTINCT
ITEMMAST.STYLE as STYLE
,ITEMMAST.COLOR as COLOR
,Max(ITEMMAST.SEASON) as SEASON
,RETAIL_SALES.YR
,RETAIL_SALES.MO
,RETAIL_SALES.WK
,SUM(RETAIL_SALES.[Sales Units LW]) as [Sales Units LW]
FROM Evy_RH_Objects.dbo.RETAIL_SALES RETAIL_SALES
LEFT OUTER JOIN RH2007_EvyLive.dbo.ITEMMAST ITEMMAST on ITEMMAST.CUSTNO='WALM01' and (ITEMMAST.SKU=RETAIL_SALES.SKU or ITEMMAST.ITEMUPC=RETAIL_SALES.SKU)
WHERE
RETAIL_SALES.CUST_NO='WALM01'
and RETAIL_SALES.WK=4
and ITEMMAST.STYLE='HK87202'
GROUP BY ITEMMAST.STYLE, ITEMMAST.COLOR, RETAIL_SALES.YR, RETAIL_SALES.MO, RETAIL_SALES.WK
=================================================================================
SELECT DISTINCT
ITEMMAST.STYLE as STYLE
,ITEMMAST.COLOR as COLOR
,ITEMMAST.SEASON as SEASON
,RETAIL_SALES.YR
,RETAIL_SALES.MO
,RETAIL_SALES.WK
,RETAIL_SALES.[Sales Units LW] as [Sales Units LW]
FROM Evy_RH_Objects.dbo.RETAIL_SALES RETAIL_SALES
LEFT OUTER JOIN RH2007_EvyLive.dbo.ITEMMAST ITEMMAST on ITEMMAST.CUSTNO='WALM01' and (ITEMMAST.SKU=RETAIL_SALES.SKU or ITEMMAST.ITEMUPC=RETAIL_SALES.SKU)
WHERE
RETAIL_SALES.CUST_NO='WALM01'
and RETAIL_SALES.WK=4
and ITEMMAST.STYLE='HK87202'
July 30, 2012 at 10:53 pm
You may have a many-to-many join going on. You should probably also have things like "ITEMMAST.CUSTNO='WALM01'" in a WHERE clause instead of an ON especially when outer joins are involved.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 31, 2012 at 1:29 am
Following on from Jeff's comment - if you remove the GROUP BY, you should be able to check whether more rows are being returned than you expect/want.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 31, 2012 at 10:17 am
I did remove the GROUP BY and and achieved the correct result. My original post shows a 2nd script without GROUP BY.
July 31, 2012 at 12:58 pm
JayWinter (7/31/2012)
I did remove the GROUP BY and and achieved the correct result. My original post shows a 2nd script without GROUP BY.
Your DISTINCT clause is hiding the problem. DISTINCT is processed after the GROUP BY, so any duplicates will be included in your totals for the GROUP BY, but will be excluded in your QA query.
DISTINCT is also superfluous in conjunction with a GROUP BY anyhow. The results of a simple GROUP BY statement are necessarily distinct. (That may not be the case if you have multiple grouping sets.)
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply