Group By Help

  • I have a script that uses the GROUP BY clause and returns the SUM much greater than expected. Here is the script with GROUP BY followed by the script without GROUP BY. The value returned by SUM(RETAIL_SALES.[Sales Units LW]) as [Sales Units LW] is 18426. The correct value should be 6142. I appreciate your help.

    STYLE COLOR SEASON YR MO WK Sales Units LW

    HK87202 FWHG 2012 20127418426.00

    STYLE COLOR SEASON YR MO WK Sales Units LW

    HK87202 FWHG 2012201274 366

    HK87202 FWHG 2012201274 796

    HK87202 FWHG 20122012741189

    HK87202 FWHG 20122012741814

    HK87202 FWHG 20122012741977

    TOTAL 6142

    SELECT DISTINCT

    ITEMMAST.STYLE as STYLE

    ,ITEMMAST.COLOR as COLOR

    ,Max(ITEMMAST.SEASON) as SEASON

    ,RETAIL_SALES.YR

    ,RETAIL_SALES.MO

    ,RETAIL_SALES.WK

    ,SUM(RETAIL_SALES.[Sales Units LW]) as [Sales Units LW]

    FROM Evy_RH_Objects.dbo.RETAIL_SALES RETAIL_SALES

    LEFT OUTER JOIN RH2007_EvyLive.dbo.ITEMMAST ITEMMAST on ITEMMAST.CUSTNO='WALM01' and (ITEMMAST.SKU=RETAIL_SALES.SKU or ITEMMAST.ITEMUPC=RETAIL_SALES.SKU)

    WHERE

    RETAIL_SALES.CUST_NO='WALM01'

    and RETAIL_SALES.WK=4

    and ITEMMAST.STYLE='HK87202'

    GROUP BY ITEMMAST.STYLE, ITEMMAST.COLOR, RETAIL_SALES.YR, RETAIL_SALES.MO, RETAIL_SALES.WK

    =================================================================================

    SELECT DISTINCT

    ITEMMAST.STYLE as STYLE

    ,ITEMMAST.COLOR as COLOR

    ,ITEMMAST.SEASON as SEASON

    ,RETAIL_SALES.YR

    ,RETAIL_SALES.MO

    ,RETAIL_SALES.WK

    ,RETAIL_SALES.[Sales Units LW] as [Sales Units LW]

    FROM Evy_RH_Objects.dbo.RETAIL_SALES RETAIL_SALES

    LEFT OUTER JOIN RH2007_EvyLive.dbo.ITEMMAST ITEMMAST on ITEMMAST.CUSTNO='WALM01' and (ITEMMAST.SKU=RETAIL_SALES.SKU or ITEMMAST.ITEMUPC=RETAIL_SALES.SKU)

    WHERE

    RETAIL_SALES.CUST_NO='WALM01'

    and RETAIL_SALES.WK=4

    and ITEMMAST.STYLE='HK87202'

  • You may have a many-to-many join going on. You should probably also have things like "ITEMMAST.CUSTNO='WALM01'" in a WHERE clause instead of an ON especially when outer joins are involved.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Following on from Jeff's comment - if you remove the GROUP BY, you should be able to check whether more rows are being returned than you expect/want.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I did remove the GROUP BY and and achieved the correct result. My original post shows a 2nd script without GROUP BY.

  • JayWinter (7/31/2012)


    I did remove the GROUP BY and and achieved the correct result. My original post shows a 2nd script without GROUP BY.

    Your DISTINCT clause is hiding the problem. DISTINCT is processed after the GROUP BY, so any duplicates will be included in your totals for the GROUP BY, but will be excluded in your QA query.

    DISTINCT is also superfluous in conjunction with a GROUP BY anyhow. The results of a simple GROUP BY statement are necessarily distinct. (That may not be the case if you have multiple grouping sets.)

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply