Aggregating at a higher level?

  • Hi gang,

    I have a solution to this but it seems that there must be a better way. The problem is pretty routine, I would imagine...

    I have a DW table that is flattened out nicely. It contains order information, one row per order/item pair. The table also contains promo_code and promo_discount_amt columns. Let's say the data looks something like this:

    ID ORDER_KEY ITEM_ID PROMO_DISC_CODE PROMO_DISC_AMT

    1 4 100 ABC 25.00

    2 4 101 ABC 25.00

    3 4 102 ABC 25.00

    4 5 200 XYZ 100.00

    5 5 201 XYZ 100.00

    The promo info applies to the order not the line item. What I'd like to do is get the total promo discount amount (which should be 125.00).

    The way I solved this is using a CTE like this:

    WITH Promos (OrderKey, DiscAmt) AS (

    SELECT DISTINCT [order_key], promo_disc_amt FROM FactOrder

    WHERE [promo_disc_code] IS NOT NULL

    )

    SELECT SUM(DiscAmt) FROM Promos

    It just seems that there must be a simple way to aggregate this without resorting to temp table... am I missing something or is this the SOP for doing this sort of thing?

    Cheers,

    Chris

  • Wow, total flub on my part! :blush:

    Thanks for the catch KB.

    Edit: Removed my first short sighted response.

  • SUM(DISTINCT PROMO_DISC_AMT) is not the right solution if PROMO_DISC_CODE can be repeated in the data (which I guess must be a possibility).

    If your data looks like :

    INSERT INTO @FACTORDER

    SELECT 4, 100, 'ABC', 25.00 UNION ALL

    SELECT 4, 101, 'ABC', 25.00 UNION ALL

    SELECT 4, 102, 'ABC', 25.00 UNION ALL

    SELECT 5, 200, 'XYZ', 100.00 UNION ALL

    SELECT 5, 201, 'XYZ', 100.00 UNION ALL

    SELECT 6, 300, 'ABC', 25.00 UNION ALL

    SELECT 6, 301, 'ABC', 25.00

    SELECT SUM(DISTINCT PROMO_DISC_AMT) FROM @FACTORDER

    WHERE PROMO_DISC_CODE IS NOT NULL

    This will still give you 125 instead of 150 which I believe is the required solution here.

  • You could also do this with the windowed version of an aggregate.

    SELECT [order_key], promo_disc_amt,

    Sum(promo_disc_amt) over (partition by NULL) totalPromo

    FROM FactOrder

    WHERE [promo_disc_code] IS NOT NULL

    group by orderkey

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hi all,

    Thanks for the responses.

    Matt, that looks like what I was looking for! What is "partition" in this case? Or is it a keyword used with OVER?

    Cheers,

    Chris

  • When you're "windowing" aggregate functions - PARTITION BY is concaptually the same as GROUP BY. It defines the window you're operating on.

    An example might work better:

    Select customerID,

    invoiceID,

    invoiceamt,

    sum(invoiceAmt) over (PARTITION BY CUSTOMERID) customerTotal

    from

    customerInvoices

    The above example would allow you to have the customer's total of all invoices on each line (so you could things like "percentage of group total" calcs without a separate sub-query and without a group by. Or rather an EXPLICIT separate sub-query (if you look at the exec plan - there's still some extra steps essentially boiling down to a worktable being created with the summary info, etc...).

    As of right now - these tend to work well on SMALL datasets. They don't do so well on large sets, where the derived tables tend to do better.

    The interesting behavior here is that the windowed aggregates operate on the recordset that exists after the GROUP BY, WHERE and JOIN have been applied, so in your case it's working on the distinct values, counting each just once....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hi Matt,

    Maybe I am doing something wrong, but I am getting the total of all promo_disc_amt for every line in the criteria, including the dupes.

    If I do:

    SELECT DISTINCT order_key, promo_disc_amt

    FROM FactOrder

    WHERE [promo_disc_code] IS NOT NULL AND [date_purchased_key] = 860

    I get 32 rows, each with a distinct order_key and 25.00 as the value, save one which is 50. So the total discount should be 825.

    When I run:

    SELECT DISTINCT [order_key], Sum(promo_disc_amt) over (partition by NULL) totalPromo

    FROM FactOrder

    WHERE [promo_disc_code] IS NOT NULL AND [date_purchased_key] = 860

    I get 32 rows, each with a distinct order_key and 1825.00 as the value, which is the same value as:

    SELECT SUM(promo_disc_amt)

    FROM FactOrder

    WHERE [promo_disc_code] IS NOT NULL AND [date_purchased_key] = 860

    What am I missing?

    Thanks for the help!

    Chris

  • Raconteur (6/6/2008)


    Hi Matt,

    Maybe I am doing something wrong, but I am getting the total of all promo_disc_amt for every line in the criteria, including the dupes.

    If I do:

    SELECT DISTINCT order_key, promo_disc_amt

    FROM FactOrder

    WHERE [promo_disc_code] IS NOT NULL AND [date_purchased_key] = 860

    I get 32 rows, each with a distinct order_key and 25.00 as the value, save one which is 50. So the total discount should be 825.

    When I run:

    SELECT DISTINCT [order_key], Sum(promo_disc_amt) over (partition by NULL) totalPromo

    FROM FactOrder

    WHERE [promo_disc_code] IS NOT NULL AND [date_purchased_key] = 860

    I get 32 rows, each with a distinct order_key and 1825.00 as the value, which is the same value as:

    SELECT SUM(promo_disc_amt)

    FROM FactOrder

    WHERE [promo_disc_code] IS NOT NULL AND [date_purchased_key] = 860

    What am I missing?

    Thanks for the help!

    Chris

    Use the group by instead of the distinct - you should get different results. The group by throws out the duplicates FIRST, then the windowed function fires. DISTINCT happens after the windowed function fires, so it still evaluates the dupes. One of the interesting things aobut windowing - you have to be really careful about the "sequence of events".

    I think you're looking for (this looks strange but should work):

    SELECT [order_key], Sum(promo_disc_amt) over (partition by NULL) totalPromo

    FROM FactOrder

    WHERE [promo_disc_code] IS NOT NULL AND [date_purchased_key] = 860

    group by orderkey,promo_disc_amt

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Mr. Matt... I bow to your expertise and the depth of your knowledge!

    Works perfectly.

    Thank you!!

  • Thanks! Been dabbling with the windowed stuff for a little bit - trying to figure them out.

    And - I do like the handle - story teller is always an interesting choice for this board...:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Cheers!

    It's more of a job-title than anything... I am actually a writer/director. Software consulting is just my hobby. 🙂

    C

  • Nice one! At the risk of stating the obvious - I wasn't nearly that inspired when I picked mine....:D Good to see someone who was!

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • It is nice to see examples of various features (like OVER (PARTITION BY NULL)), but wouldn't a simple MIN() with group by do the same? I may have missed something in the examples, but as I understand it, this should also give correct result - that is, if what is required is a sum of all discounts (for a given condition):

    SELECT SUM(Q.disc) as disc_total

    FROM

    (select order_key, min(promo_disc_amt) as disc

    from FactOrder

    where promo_disc_code IS NOT NULL AND date_purchased_key = 860

    group by order_key) as Q

    If the discount applies to entire order, not to particular orderlines, there should be no problems with it (although I didn't have a chance to test it, since you didn't post table structure and data sample).

  • Hi Vladan,

    That looks like it will work as well, but, and I know I wasn't very clear in my original post, I was trying to avoid the temp table/subquery, which is what I was doing in the CTE.

    Cheers,

    Chris

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply