Struggling with query. Number of times per group a record has value for certain field

  • EDIT: This has been resolved.

    Hi,

    This is probably a simple query, but I'm having trouble with it. Here is a very simplified version of my table:

    OrderNumber-----OrderDetail-----ItemCategory

    -------1--------------------1------------------A

    -------1--------------------2------------------G

    -------1--------------------3------------------L

    -------2--------------------1------------------A

    -------3--------------------1------------------N

    -------3--------------------2------------------R

    -------3--------------------3------------------Y

    I've linked many tables together to get my query so far, and there are many more fields on it than this, but this is just an example.

    What I'm trying to do is add a column that will indicate the number of OrderDetail records with an ItemCategory of "A" per OrderNumber. So I would be looking for output something like this:

    OrderNumber-----OrderDetail-----ItemCategory-----NumberOfA

    -------1--------------------1------------------A-------------------2

    -------1--------------------2------------------A-------------------2

    -------1--------------------3------------------L-------------------2

    -------2--------------------1------------------A-------------------1

    -------3--------------------1------------------N-------------------0

    -------3--------------------2------------------R-------------------0

    -------3--------------------3------------------Y-------------------0

    Any suggestions on a query that could produce such a column?

    Thanks.

  • Nevermind, I did find a solution for this.

  • tarr94 (4/20/2012)


    Nevermind, I did find a solution for this.

    Forum etiquette would have you post your solution. It is possible others may benefit.

  • This?

    ; WITH sampledata(OrderNumber,OrderDetail,ItemCategory) AS

    (

    SELECT 1,1,'A'

    UNION ALL SELECT 1,2,'A'

    UNION ALL SELECT 1,3,'L'

    UNION ALL SELECT 2,1,'A'

    UNION ALL SELECT 3,1,'N'

    UNION ALL SELECT 3,2,'R'

    UNION ALL SELECT 3,3,'Y'

    ),

    GroupsWithA (OrderNumber ,ItemCategoryCt ) AS

    (

    SELECT OrderNumber , ItemCategoryCt = COUNT(*)

    FROM sampledata

    WHERE ItemCategory = 'A'

    GROUP BY OrderNumber

    )

    SELECT outr.* ,ItemCategoryCt = ISNULL(OtrAp.ItemCategoryCt,0)

    FROM sampledata Outr

    OUTER APPLY (

    SELECT ItemCategoryCt

    FROM GroupsWithA inr

    WHERE Outr.OrderNumber = inr.OrderNumber

    ) OtrAp

  • ColdCoffee - When first I looked at your query, my instincts told me there has to be an easier way.

    So I came up with this:

    DECLARE @sampledata TABLE (OrderNumber INT,OrderDetail INT,ItemCategory VARCHAR(5))

    INSERT INTO @sampledata (OrderNumber,OrderDetail,ItemCategory)

    SELECT 1,1,'A'

    UNION ALL SELECT 1,2,'A'

    UNION ALL SELECT 1,3,'L'

    UNION ALL SELECT 2,1,'A'

    UNION ALL SELECT 3,1,'N'

    UNION ALL SELECT 3,2,'R'

    UNION ALL SELECT 3,3,'Y'

    SELECT OrderNumber, OrderDetail, ItemCategory

    ,SUM(CASE ItemCategory WHEN 'A' THEN 1 ELSE 0 END) OVER (PARTITION BY OrderNumber) As ItemCatagoryCt

    FROM @sampledata

    Then I grabbed your SQL, removed your sampledata CTE and used my temporary table instead, comparing the query plan costs. Yours won by a significant margin!

    Say it ain't so! I'd love to understand why this is and I'd also like to know who would win in an actual timing test.

    I'm going to need to study carefully what you did to see if I can learn something from it.

    Good on you!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Ah wait! The culprit was the sort caused by using the PARTITION, which I missed on initial scrutiny of the query plan. Apparently it is much less efficient than the second table scan your query performs.

    Equivalant in query plan cost to yours and possibly a bit more understandable is this:

    SELECT OrderNumber, OrderDetail, ItemCategory

    ,(SELECT COUNT(*)

    FROM @sampledata s2

    WHERE s1.OrderNumber = s2.OrderNumber and ItemCategory = 'A') As ItemCatagoryCt

    FROM @sampledata s1

    If somehow I could remove the second table scan, this could be improved.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Not being one to leave anything to chance, especially when I don't understand, I created a test harness on original input data expanding it to 7000 rows. Try it yourself to see the results.

    DECLARE @sampledata TABLE (OrderNumber INT,OrderDetail INT,ItemCategory VARCHAR(5))

    ;WITH Tally (n) AS (

    SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2)

    INSERT INTO @sampledata (OrderNumber,OrderDetail,ItemCategory)

    SELECT n+OrderNumber, OrderDetail, ItemCategory

    FROM (

    SELECT 1,1,'A'

    UNION ALL SELECT 1,2,'A'

    UNION ALL SELECT 1,3,'L'

    UNION ALL SELECT 2,1,'A'

    UNION ALL SELECT 3,1,'N'

    UNION ALL SELECT 3,2,'R'

    UNION ALL SELECT 3,3,'Y') y(OrderNumber,OrderDetail,ItemCategory)

    CROSS APPLY (SELECT n FROM Tally) x

    SET STATISTICS TIME ON

    ;WITH GroupsWithA (OrderNumber ,ItemCategoryCt ) AS

    (

    SELECT OrderNumber , ItemCategoryCt = COUNT(*)

    FROM @sampledata

    WHERE ItemCategory = 'A'

    GROUP BY OrderNumber

    )

    SELECT outr.* ,ItemCategoryCt = ISNULL(OtrAp.ItemCategoryCt,0)

    FROM @sampledata Outr

    OUTER APPLY (

    SELECT ItemCategoryCt

    FROM GroupsWithA inr

    WHERE Outr.OrderNumber = inr.OrderNumber

    ) OtrAp

    PRINT 'ColdCoffee''s query complete'

    SELECT OrderNumber, OrderDetail, ItemCategory

    ,SUM(CASE ItemCategory WHEN 'A' THEN 1 ELSE 0 END) OVER (PARTITION BY OrderNumber) As ItemCatagoryCt

    FROM @sampledata

    PRINT 'Dwain''s query complete - simplfied'

    SELECT OrderNumber, OrderDetail, ItemCategory

    ,(SELECT COUNT(*)

    FROM @sampledata s2

    WHERE s1.OrderNumber = s2.OrderNumber and ItemCategory = 'A') As ItemCategoryCt

    FROM @sampledata s1

    PRINT 'Dwain''s query complete - redux'

    Here are my results (eliminating extraneous details):

    (7000 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2886 ms, elapsed time = 2902 ms.

    ColdCoffee's query complete

    (7000 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 62 ms, elapsed time = 157 ms.

    Dwain's query complete - simplfied

    (7000 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2824 ms, elapsed time = 2855 ms.

    Dwain's query complete - redux

    It appears that the PARTITION solution runs better than its query plan would indicate unless I'm misinterpreting the results.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply