Find all orders that have EXACTLY the same items

Question

Find all orders that have EXACTLY the same items

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 1

SwePeso (5/3/2011)
See this topic. We hammered the subjet to death...

BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 2

Jeff Moden (5/3/2011)
SwePeso (5/3/2011)
See this topic. We hammered the subjet to death...
BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉

I know you're good, but no point in turning into the CELKO of performance...

I know you're nice, but just keeping ego in check ;-).

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 3

Ninja's_RGR'us (5/3/2011)
Jeff Moden (5/3/2011)
SwePeso (5/3/2011)
See this topic. We hammered the subjet to death...
BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉
I know you're good, but no point in turning into the CELKO of performance...
I know you're nice, but just keeping ego in check ;-).

You're kidding, right?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 4

Peter,

What does the following snippet from your code do?

,1 S SeqID FROM #Stage AS s

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Jeff Moden (5/3/2011)
Ninja's_RGR'us (5/3/2011)
Jeff Moden (5/3/2011)
SwePeso (5/3/2011)
See this topic. We hammered the subjet to death...
BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉
I know you're good, but no point in turning into the CELKO of performance...
I know you're nice, but just keeping ego in check ;-).
You're kidding, right?

Of course... I was testing if the smiley face actually worked to convey the message... apparently not... even on someone as used to forums as you are.

Interesting... :hehe:

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 6

Heh... you did relate me to Celko and there was no smiley face on that line.

Sorry, Remi... I'm a bit on edge and I took your humor the wrong way.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 7

... and, no... smiley faces don't usually work for me. I don't even see them most of the time. I've been burned pretty badly by some folks who tried to use smiley faces to cover up some pretty bad things.

I just wasn't prepared for the depth of humor you intended especially since the word "Celko" was involved.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

Jeff Moden (5/3/2011)
... and, no... smiley faces don't usually work for me. I don't even see them most of the time. I've been burned pretty badly by some folks who tried to use smiley faces to cover up some pretty bad things.
I just wasn't prepared for the depth of humor you intended.

No worries, I'll survive this :w00t:.

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 9

Thanks, Remi... sorry for the snap...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 10

Jeff Moden (5/3/2011)
ChrisM@home (5/3/2011)
Here's another version using APPLY without any sneaky tricks.
Chris, run an estimated execution plan of your code against the following 10,000 rows of data and look at the row counts for the arrows...

It's awful isn't it! I spent some time yesterday working on it after posting. Running against your 1M rows, I had to cancel. Running against 10k rows, a few changes resulted in this:

;WITH ItemsPerOrder AS (SELECT oid, ItemRows = COUNT(*) FROM dbo.JBMTest GROUP BY oid)

SELECT i1.oid, MatchingOid = i2.oid

FROM dbo.JBMTest o1

INNER JOIN ItemsPerOrder i1 ON i1.oid = o1.oid

INNER JOIN ItemsPerOrder i2 ON i1.ItemRows = i2.ItemRows AND i1.oid < i2.oid

CROSS APPLY (SELECT ItemMatch = COUNT(*) FROM JBMTest WHERE oid = i2.oid AND itemid = o1.itemid) o2

WHERE o2.ItemMatch > 0

GROUP BY i1.oid, i1.ItemRows, i2.oid

HAVING i1.ItemRows = SUM(o2.ItemMatch)

which still takes 20s to run. Preaggregating, even into a temp table and chucking on some indexes, didn't help much. Why? Because this join: i1.oid <> i2.oid is a (partial) cross join and this 'improvement': i1.oid < i2.oid is a (partial) triangular join. So even though it's "proper set-based" and works in a way that even the blind can see, the performance will always suck.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 11

Jeff Moden (5/3/2011)
BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉

This is the results I get when using your three sample data scripts. The solution below scales linear.

10 times the record, 7 times the duration.

/* Total time

10k 1 sec

100k 7 sec

1000k50 sec

*/

-- 0 sec (10k), 4 sec (100k), 39 sec (1000k)

SELECT x.OID AS LowID,

y.OID AS HighID,

MIN(x.ProductItems) AS ProductItems

INTO#Stage

FROM(

SELECTOID,

COUNT(*) AS ProductItems

FROMdbo.Orders

GROUP BYOID

) AS x

INNER JOIN(

SELECTOID,

COUNT(*) AS ProductItems

FROMdbo.Orders

GROUP BYOID

) AS y ON y.ProductItems = x.ProductItems

INNER JOINdbo.Orders AS x1 ON x1.OID = x.OID

INNER JOINdbo.Orders AS y1 ON y1.OID = y.OID

WHEREx1.ItemID = y1.ItemID

AND x.OID < y.OID

GROUP BYx.OID,

y.OID

HAVINGCOUNT(*) = MIN(x.ProductItems)

-- 0 sec (10k), 1 sec (100k), 2 sec (1000k)

CREATE INDEX IX_Low ON #Stage (LowID) INCLUDE (HighID, ProductItems)

CREATE INDEX IX_High ON #Stage (HighID) INCLUDE (LowID, ProductItems)

-- 0 sec (10k), 1 sec (100k), 8 sec (1000k)

;WITH cteDuplicate

AS (

SELECTs.LowID AS theGrp,

s.LowID AS OID,

MIN(s.ProductItems) AS ProductItems,

1 AS SeqID

FROM #Stage AS s

WHERE NOT EXISTS (SELECT * FROM #Stage AS x WHERE x.HighID = s.LowID)

GROUP BY s.LowID

UNION ALL

SELECTd.theGrp,

f.ID AS OID,

d.ProductItems,

d.SeqID + 1 AS SeqID

FROMcteDuplicate AS d

CROSS APPLY(

SELECTs.HighID,

ROW_NUMBER() OVER (ORDER BY s.HighID) AS SeqID

FROM#Stage AS s

WHEREs.LowID = d.OID

) AS f(ID, SeqID)

WHERE f.SeqID = 1

)

SELECTDENSE_RANK() OVER (ORDER BY theGrp) AS theGroup,

ProductItems,

SeqID,

COUNT(*) OVER (PARTITION BY theGrp) AS OrderItems,

OID

INTO #Temp

FROMcteDuplicate

OPTION (MAXRECURSION 0)

-- 0 sec

DROP TABLE #Stage

-- 0 sec (10k), 0 sec (100k), 1 sec (1000k)

SELECTw.ProductItems,

STUFF(w.Data, 1, 2, '') AS Items,

w.OrderItems,

STUFF(f.Data, 1, 2, '') AS Orders

FROM(

SELECTt.theGroup,

t.ProductItems,

(

SELECT', ' + CAST(pod.ItemID AS VARCHAR(12))

FROM dbo.Orders AS pod

WHEREpod.OID = t.OID

ORDER BYpod.ItemID

FOR XMLPATH('')

) AS Data,

t.OrderItems

FROM#Temp AS t

WHEREt.SeqID = 1

) AS w

CROSS APPLY(

SELECT', ' + CAST(t.OID AS VARCHAR(12))

FROM#Temp AS t

WHERE t.theGroup = w.theGroup

ORDER BY t.OID

FOR XMLPATH('')

) AS f(Data)

-- 0 sec

DROP TABLE #Temp

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 12

Jeff Moden (5/3/2011)
What does the following snippet from your code do?
,1 S SeqID FROM #Stage AS s

Spelling error. It should read

,1 AS SeqID

N 56°04'39.16"
E 12°55'05.25"

tfifield SSCrazy Eights Points: 9655 More actions · Answer 13

Craig Farrell (5/2/2011)
tfifield (5/2/2011)
Todd,
This should give you what you want. It gives each order and its items for only orders that have the same items.
Todd, that can get fouled up by slightly different item lists.
IE: oid 1 with 111, 222, 333 and oid 2 with 111, 222, and 444.
You'll have the same counts, and your join will still return 2 of the 3 rows, possibly showing poor data results.
To do it the way you're describing, you'd need to nearly crossjoin the table to itself on itemids, take the count of THAT result per oid pairing, then return to the table and compare the counts of the pair to each independent piece, confirming they all match still. It gets nasty fast.
It would end up looking something like this:
;WITH cte AS (
SELECT
o1.oid AS oid1,
o2.oid AS oid2,
COUNT(*) AS cntForPair
FROM
#orders AS o1
JOIN
#orders AS o2
ONo1.oid <> o2.oid
AND o1.itemID = o2.itemID
GROUP BY
o1.oid,
o2.oid
),
OrderAgg AS (
SELECT
oid,
COUNT(*) AS oCount
FROM
#orders
GROUP BY
oid
)
--select * from orderagg
SELECT
cte.*
FROM
cte
JOIN
orderAgg AS oa1
ONcte.oid1 = oa1.oid
JOIN
orderAgg AS oa2
ONcte.oid2 = oa2.oid
WHERE
cte.cntForPair = oa1.oCount
AND cte.cntForPair = oa2.oCount
AND cte.oid1 < cte.oid2
EDIT: This has been going longer then I expected it, so I edited my original answer post to include the ORDER BY to avoid issues.

Craig,

I thought about that about 1 minute after I posted it and did the old Homer Simpson DOH! I then got a bunch of phone calls from clients and haven't been back to SSC until now. I'm a bit red faced on it. Comes from going over the forum before coffee.

Todd

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 14

tfifield (5/5/2011)
Craig,
I thought about that about 1 minute after I posted it and did the old Homer Simpson DOH! I then got a bunch of phone calls from clients and haven't been back to SSC until now. I'm a bit red faced on it. Comes from going over the forum before coffee.
Todd

Never done that. Nope. Not me. No idea what you're talking about. You're all alone in doing that. Yep. Not meeeee..... :Whistling:

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA