May 3, 2011 at 6:58 pm
SwePeso (5/3/2011)
See this topic. We hammered the subjet to death...
BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 7:01 pm
Jeff Moden (5/3/2011)
SwePeso (5/3/2011)
See this topic. We hammered the subjet to death...BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉
I know you're good, but no point in turning into the CELKO of performance...
I know you're nice, but just keeping ego in check ;-).
May 3, 2011 at 7:25 pm
Ninja's_RGR'us (5/3/2011)
Jeff Moden (5/3/2011)
SwePeso (5/3/2011)
See this topic. We hammered the subjet to death...BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉
I know you're good, but no point in turning into the CELKO of performance...
I know you're nice, but just keeping ego in check ;-).
You're kidding, right?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 7:30 pm
Peter,
What does the following snippet from your code do?
,1 S SeqID FROM #Stage AS s
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 7:30 pm
Jeff Moden (5/3/2011)
Ninja's_RGR'us (5/3/2011)
Jeff Moden (5/3/2011)
SwePeso (5/3/2011)
See this topic. We hammered the subjet to death...BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉
I know you're good, but no point in turning into the CELKO of performance...
I know you're nice, but just keeping ego in check ;-).
You're kidding, right?
Of course... I was testing if the smiley face actually worked to convey the message... apparently not... even on someone as used to forums as you are.
Interesting... :hehe:
May 3, 2011 at 7:39 pm
Heh... you did relate me to Celko and there was no smiley face on that line.
Sorry, Remi... I'm a bit on edge and I took your humor the wrong way.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 7:43 pm
... and, no... smiley faces don't usually work for me. I don't even see them most of the time. I've been burned pretty badly by some folks who tried to use smiley faces to cover up some pretty bad things.
I just wasn't prepared for the depth of humor you intended especially since the word "Celko" was involved.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 8:08 pm
Jeff Moden (5/3/2011)
... and, no... smiley faces don't usually work for me. I don't even see them most of the time. I've been burned pretty badly by some folks who tried to use smiley faces to cover up some pretty bad things.I just wasn't prepared for the depth of humor you intended.
No worries, I'll survive this :w00t:.
May 3, 2011 at 8:10 pm
Thanks, Remi... sorry for the snap...
--Jeff Moden
Change is inevitable... Change for the better is not.
May 4, 2011 at 2:08 am
Jeff Moden (5/3/2011)
ChrisM@home (5/3/2011)
Here's another version using APPLY without any sneaky tricks.Chris, run an estimated execution plan of your code against the following 10,000 rows of data and look at the row counts for the arrows...
It's awful isn't it! I spent some time yesterday working on it after posting. Running against your 1M rows, I had to cancel. Running against 10k rows, a few changes resulted in this:
;WITH ItemsPerOrder AS (SELECT oid, ItemRows = COUNT(*) FROM dbo.JBMTest GROUP BY oid)
SELECT i1.oid, MatchingOid = i2.oid
FROM dbo.JBMTest o1
INNER JOIN ItemsPerOrder i1 ON i1.oid = o1.oid
INNER JOIN ItemsPerOrder i2 ON i1.ItemRows = i2.ItemRows AND i1.oid < i2.oid
CROSS APPLY (SELECT ItemMatch = COUNT(*) FROM JBMTest WHERE oid = i2.oid AND itemid = o1.itemid) o2
WHERE o2.ItemMatch > 0
GROUP BY i1.oid, i1.ItemRows, i2.oid
HAVING i1.ItemRows = SUM(o2.ItemMatch)
which still takes 20s to run. Preaggregating, even into a temp table and chucking on some indexes, didn't help much. Why? Because this join: i1.oid <> i2.oid is a (partial) cross join and this 'improvement': i1.oid < i2.oid is a (partial) triangular join. So even though it's "proper set-based" and works in a way that even the blind can see, the performance will always suck.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
May 4, 2011 at 3:10 am
Jeff Moden (5/3/2011)
BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉
This is the results I get when using your three sample data scripts. The solution below scales linear.
10 times the record, 7 times the duration.
/* Total time
10k 1 sec
100k 7 sec
1000k50 sec
*/
-- 0 sec (10k), 4 sec (100k), 39 sec (1000k)
SELECT x.OID AS LowID,
y.OID AS HighID,
MIN(x.ProductItems) AS ProductItems
INTO#Stage
FROM(
SELECTOID,
COUNT(*) AS ProductItems
FROMdbo.Orders
GROUP BYOID
) AS x
INNER JOIN(
SELECTOID,
COUNT(*) AS ProductItems
FROMdbo.Orders
GROUP BYOID
) AS y ON y.ProductItems = x.ProductItems
INNER JOINdbo.Orders AS x1 ON x1.OID = x.OID
INNER JOINdbo.Orders AS y1 ON y1.OID = y.OID
WHEREx1.ItemID = y1.ItemID
AND x.OID < y.OID
GROUP BYx.OID,
y.OID
HAVINGCOUNT(*) = MIN(x.ProductItems)
-- 0 sec (10k), 1 sec (100k), 2 sec (1000k)
CREATE INDEX IX_Low ON #Stage (LowID) INCLUDE (HighID, ProductItems)
CREATE INDEX IX_High ON #Stage (HighID) INCLUDE (LowID, ProductItems)
-- 0 sec (10k), 1 sec (100k), 8 sec (1000k)
;WITH cteDuplicate
AS (
SELECTs.LowID AS theGrp,
s.LowID AS OID,
MIN(s.ProductItems) AS ProductItems,
1 AS SeqID
FROM #Stage AS s
WHERE NOT EXISTS (SELECT * FROM #Stage AS x WHERE x.HighID = s.LowID)
GROUP BY s.LowID
UNION ALL
SELECTd.theGrp,
f.ID AS OID,
d.ProductItems,
d.SeqID + 1 AS SeqID
FROMcteDuplicate AS d
CROSS APPLY(
SELECTs.HighID,
ROW_NUMBER() OVER (ORDER BY s.HighID) AS SeqID
FROM#Stage AS s
WHEREs.LowID = d.OID
) AS f(ID, SeqID)
WHERE f.SeqID = 1
)
SELECTDENSE_RANK() OVER (ORDER BY theGrp) AS theGroup,
ProductItems,
SeqID,
COUNT(*) OVER (PARTITION BY theGrp) AS OrderItems,
OID
INTO #Temp
FROMcteDuplicate
OPTION (MAXRECURSION 0)
-- 0 sec
DROP TABLE #Stage
-- 0 sec (10k), 0 sec (100k), 1 sec (1000k)
SELECTw.ProductItems,
STUFF(w.Data, 1, 2, '') AS Items,
w.OrderItems,
STUFF(f.Data, 1, 2, '') AS Orders
FROM(
SELECTt.theGroup,
t.ProductItems,
(
SELECT', ' + CAST(pod.ItemID AS VARCHAR(12))
FROM dbo.Orders AS pod
WHEREpod.OID = t.OID
ORDER BYpod.ItemID
FOR XMLPATH('')
) AS Data,
t.OrderItems
FROM#Temp AS t
WHEREt.SeqID = 1
) AS w
CROSS APPLY(
SELECT', ' + CAST(t.OID AS VARCHAR(12))
FROM#Temp AS t
WHERE t.theGroup = w.theGroup
ORDER BY t.OID
FOR XMLPATH('')
) AS f(Data)
-- 0 sec
DROP TABLE #Temp
N 56°04'39.16"
E 12°55'05.25"
May 4, 2011 at 3:11 am
Jeff Moden (5/3/2011)
What does the following snippet from your code do?
,1 S SeqID FROM #Stage AS s
Spelling error. It should read
,1 AS SeqID
N 56°04'39.16"
E 12°55'05.25"
May 5, 2011 at 1:09 pm
Craig Farrell (5/2/2011)
tfifield (5/2/2011)
Todd,This should give you what you want. It gives each order and its items for only orders that have the same items.
Todd, that can get fouled up by slightly different item lists.
IE: oid 1 with 111, 222, 333 and oid 2 with 111, 222, and 444.
You'll have the same counts, and your join will still return 2 of the 3 rows, possibly showing poor data results.
To do it the way you're describing, you'd need to nearly crossjoin the table to itself on itemids, take the count of THAT result per oid pairing, then return to the table and compare the counts of the pair to each independent piece, confirming they all match still. It gets nasty fast.
It would end up looking something like this:
;WITH cte AS (
SELECT
o1.oid AS oid1,
o2.oid AS oid2,
COUNT(*) AS cntForPair
FROM
#orders AS o1
JOIN
#orders AS o2
ONo1.oid <> o2.oid
AND o1.itemID = o2.itemID
GROUP BY
o1.oid,
o2.oid
),
OrderAgg AS (
SELECT
oid,
COUNT(*) AS oCount
FROM
#orders
GROUP BY
oid
)
--select * from orderagg
SELECT
cte.*
FROM
cte
JOIN
orderAgg AS oa1
ONcte.oid1 = oa1.oid
JOIN
orderAgg AS oa2
ONcte.oid2 = oa2.oid
WHERE
cte.cntForPair = oa1.oCount
AND cte.cntForPair = oa2.oCount
AND cte.oid1 < cte.oid2
EDIT: This has been going longer then I expected it, so I edited my original answer post to include the ORDER BY to avoid issues.
Craig,
I thought about that about 1 minute after I posted it and did the old Homer Simpson DOH! I then got a bunch of phone calls from clients and haven't been back to SSC until now. I'm a bit red faced on it. Comes from going over the forum before coffee.
Todd
May 5, 2011 at 2:21 pm
tfifield (5/5/2011)
Craig,I thought about that about 1 minute after I posted it and did the old Homer Simpson DOH! I then got a bunch of phone calls from clients and haven't been back to SSC until now. I'm a bit red faced on it. Comes from going over the forum before coffee.
Todd
Never done that. Nope. Not me. No idea what you're talking about. You're all alone in doing that. Yep. Not meeeee..... :Whistling:
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply