Struggling with query. Number of times per group a record has value for certain field

Question

Post reply

Struggling with query. Number of times per group a record has value for certain field

tarr94

SSCommitted

Points: 1684
More actions
April 20, 2012 at 11:35 am

#260293

EDIT: This has been resolved.
Hi,
This is probably a simple query, but I'm having trouble with it. Here is a very simplified version of my table:
OrderNumber-----OrderDetail-----ItemCategory
-------1--------------------1------------------A
-------1--------------------2------------------G
-------1--------------------3------------------L
-------2--------------------1------------------A
-------3--------------------1------------------N
-------3--------------------2------------------R
-------3--------------------3------------------Y
I've linked many tables together to get my query so far, and there are many more fields on it than this, but this is just an example.
What I'm trying to do is add a column that will indicate the number of OrderDetail records with an ItemCategory of "A" per OrderNumber. So I would be looking for output something like this:
OrderNumber-----OrderDetail-----ItemCategory-----NumberOfA
-------1--------------------1------------------A-------------------2
-------1--------------------2------------------A-------------------2
-------1--------------------3------------------L-------------------2
-------2--------------------1------------------A-------------------1
-------3--------------------1------------------N-------------------0
-------3--------------------2------------------R-------------------0
-------3--------------------3------------------Y-------------------0
Any suggestions on a query that could produce such a column?
Thanks.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

tarr94 SSCommitted Points: 1684 More actions · Answer 1

tarr94

SSCommitted

Points: 1684

April 20, 2012 at 12:04 pm

#1476861

Nevermind, I did find a solution for this.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

tarr94 (4/20/2012)
Nevermind, I did find a solution for this.

Forum etiquette would have you post your solution. It is possible others may benefit.

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 3

This?

; WITH sampledata(OrderNumber,OrderDetail,ItemCategory) AS

(

SELECT 1,1,'A'

UNION ALL SELECT 1,2,'A'

UNION ALL SELECT 1,3,'L'

UNION ALL SELECT 2,1,'A'

UNION ALL SELECT 3,1,'N'

UNION ALL SELECT 3,2,'R'

UNION ALL SELECT 3,3,'Y'

),

GroupsWithA (OrderNumber ,ItemCategoryCt ) AS

(

SELECT OrderNumber , ItemCategoryCt = COUNT(*)

FROM sampledata

WHERE ItemCategory = 'A'

GROUP BY OrderNumber

)

SELECT outr.* ,ItemCategoryCt = ISNULL(OtrAp.ItemCategoryCt,0)

FROM sampledata Outr

OUTER APPLY (

SELECT ItemCategoryCt

FROM GroupsWithA inr

WHERE Outr.OrderNumber = inr.OrderNumber

) OtrAp

Dwain Camps SSC Guru Points: 86948 More actions · Answer 4

ColdCoffee - When first I looked at your query, my instincts told me there has to be an easier way.

So I came up with this:

DECLARE @sampledata TABLE (OrderNumber INT,OrderDetail INT,ItemCategory VARCHAR(5))

INSERT INTO @sampledata (OrderNumber,OrderDetail,ItemCategory)

SELECT 1,1,'A'

UNION ALL SELECT 1,2,'A'

UNION ALL SELECT 1,3,'L'

UNION ALL SELECT 2,1,'A'

UNION ALL SELECT 3,1,'N'

UNION ALL SELECT 3,2,'R'

UNION ALL SELECT 3,3,'Y'

SELECT OrderNumber, OrderDetail, ItemCategory

,SUM(CASE ItemCategory WHEN 'A' THEN 1 ELSE 0 END) OVER (PARTITION BY OrderNumber) As ItemCatagoryCt

FROM @sampledata

Then I grabbed your SQL, removed your sampledata CTE and used my temporary table instead, comparing the query plan costs. Yours won by a significant margin!

Say it ain't so! I'd love to understand why this is and I'd also like to know who would win in an actual timing test.

I'm going to need to study carefully what you did to see if I can learn something from it.

Good on you!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86948 More actions · Answer 5

Ah wait! The culprit was the sort caused by using the PARTITION, which I missed on initial scrutiny of the query plan. Apparently it is much less efficient than the second table scan your query performs.

Equivalant in query plan cost to yours and possibly a bit more understandable is this:

SELECT OrderNumber, OrderDetail, ItemCategory

,(SELECT COUNT(*)

FROM @sampledata s2

WHERE s1.OrderNumber = s2.OrderNumber and ItemCategory = 'A') As ItemCatagoryCt

FROM @sampledata s1

If somehow I could remove the second table scan, this could be improved.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86948 More actions · Answer 6

Not being one to leave anything to chance, especially when I don't understand, I created a test harness on original input data expanding it to 7000 rows. Try it yourself to see the results.

DECLARE @sampledata TABLE (OrderNumber INT,OrderDetail INT,ItemCategory VARCHAR(5))

;WITH Tally (n) AS (

SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2)

INSERT INTO @sampledata (OrderNumber,OrderDetail,ItemCategory)

SELECT n+OrderNumber, OrderDetail, ItemCategory

FROM (

SELECT 1,1,'A'

UNION ALL SELECT 1,2,'A'

UNION ALL SELECT 1,3,'L'

UNION ALL SELECT 2,1,'A'

UNION ALL SELECT 3,1,'N'

UNION ALL SELECT 3,2,'R'

UNION ALL SELECT 3,3,'Y') y(OrderNumber,OrderDetail,ItemCategory)

CROSS APPLY (SELECT n FROM Tally) x

SET STATISTICS TIME ON

;WITH GroupsWithA (OrderNumber ,ItemCategoryCt ) AS

(

SELECT OrderNumber , ItemCategoryCt = COUNT(*)

FROM @sampledata

WHERE ItemCategory = 'A'

GROUP BY OrderNumber

)

SELECT outr.* ,ItemCategoryCt = ISNULL(OtrAp.ItemCategoryCt,0)

FROM @sampledata Outr

OUTER APPLY (

SELECT ItemCategoryCt

FROM GroupsWithA inr

WHERE Outr.OrderNumber = inr.OrderNumber

) OtrAp

PRINT 'ColdCoffee''s query complete'

SELECT OrderNumber, OrderDetail, ItemCategory

,SUM(CASE ItemCategory WHEN 'A' THEN 1 ELSE 0 END) OVER (PARTITION BY OrderNumber) As ItemCatagoryCt

FROM @sampledata

PRINT 'Dwain''s query complete - simplfied'

SELECT OrderNumber, OrderDetail, ItemCategory

,(SELECT COUNT(*)

FROM @sampledata s2

WHERE s1.OrderNumber = s2.OrderNumber and ItemCategory = 'A') As ItemCategoryCt

FROM @sampledata s1

PRINT 'Dwain''s query complete - redux'

Here are my results (eliminating extraneous details):

(7000 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 2886 ms, elapsed time = 2902 ms.

ColdCoffee's query complete

(7000 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 62 ms, elapsed time = 157 ms.

Dwain's query complete - simplfied

(7000 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 2824 ms, elapsed time = 2855 ms.

Dwain's query complete - redux

It appears that the PARTITION solution runs better than its query plan would indicate unless I'm misinterpreting the results.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St