Find all orders that have EXACTLY the same items

  • SwePeso (5/3/2011)


    See this topic. We hammered the subjet to death...

    BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/3/2011)


    SwePeso (5/3/2011)


    See this topic. We hammered the subjet to death...

    BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉

    I know you're good, but no point in turning into the CELKO of performance...

    I know you're nice, but just keeping ego in check ;-).

  • Ninja's_RGR'us (5/3/2011)


    Jeff Moden (5/3/2011)


    SwePeso (5/3/2011)


    See this topic. We hammered the subjet to death...

    BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉

    I know you're good, but no point in turning into the CELKO of performance...

    I know you're nice, but just keeping ego in check ;-).

    You're kidding, right?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Peter,

    What does the following snippet from your code do?

    ,1 S SeqID FROM #Stage AS s

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/3/2011)


    Ninja's_RGR'us (5/3/2011)


    Jeff Moden (5/3/2011)


    SwePeso (5/3/2011)


    See this topic. We hammered the subjet to death...

    BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉

    I know you're good, but no point in turning into the CELKO of performance...

    I know you're nice, but just keeping ego in check ;-).

    You're kidding, right?

    Of course... I was testing if the smiley face actually worked to convey the message... apparently not... even on someone as used to forums as you are.

    Interesting... :hehe:

  • Heh... you did relate me to Celko and there was no smiley face on that line.

    Sorry, Remi... I'm a bit on edge and I took your humor the wrong way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ... and, no... smiley faces don't usually work for me. I don't even see them most of the time. I've been burned pretty badly by some folks who tried to use smiley faces to cover up some pretty bad things.

    I just wasn't prepared for the depth of humor you intended especially since the word "Celko" was involved.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/3/2011)


    ... and, no... smiley faces don't usually work for me. I don't even see them most of the time. I've been burned pretty badly by some folks who tried to use smiley faces to cover up some pretty bad things.

    I just wasn't prepared for the depth of humor you intended.

    No worries, I'll survive this :w00t:.

  • Thanks, Remi... sorry for the snap...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/3/2011)


    ChrisM@home (5/3/2011)


    Here's another version using APPLY without any sneaky tricks.

    Chris, run an estimated execution plan of your code against the following 10,000 rows of data and look at the row counts for the arrows...

    It's awful isn't it! I spent some time yesterday working on it after posting. Running against your 1M rows, I had to cancel. Running against 10k rows, a few changes resulted in this:

    ;WITH ItemsPerOrder AS (SELECT oid, ItemRows = COUNT(*) FROM dbo.JBMTest GROUP BY oid)

    SELECT i1.oid, MatchingOid = i2.oid

    FROM dbo.JBMTest o1

    INNER JOIN ItemsPerOrder i1 ON i1.oid = o1.oid

    INNER JOIN ItemsPerOrder i2 ON i1.ItemRows = i2.ItemRows AND i1.oid < i2.oid

    CROSS APPLY (SELECT ItemMatch = COUNT(*) FROM JBMTest WHERE oid = i2.oid AND itemid = o1.itemid) o2

    WHERE o2.ItemMatch > 0

    GROUP BY i1.oid, i1.ItemRows, i2.oid

    HAVING i1.ItemRows = SUM(o2.ItemMatch)

    which still takes 20s to run. Preaggregating, even into a temp table and chucking on some indexes, didn't help much. Why? Because this join: i1.oid <> i2.oid is a (partial) cross join and this 'improvement': i1.oid < i2.oid is a (partial) triangular join. So even though it's "proper set-based" and works in a way that even the blind can see, the performance will always suck.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Jeff Moden (5/3/2011)


    BWAAA-HAAAA!!! With only 55 rows of test data on one hand and only 8788 from the "Adventureworks" hand, I'd have to say no one hammered anything to death. 😉

    This is the results I get when using your three sample data scripts. The solution below scales linear.

    10 times the record, 7 times the duration.

    /* Total time

    10k 1 sec

    100k 7 sec

    1000k50 sec

    */

    -- 0 sec (10k), 4 sec (100k), 39 sec (1000k)

    SELECT x.OID AS LowID,

    y.OID AS HighID,

    MIN(x.ProductItems) AS ProductItems

    INTO#Stage

    FROM(

    SELECTOID,

    COUNT(*) AS ProductItems

    FROMdbo.Orders

    GROUP BYOID

    ) AS x

    INNER JOIN(

    SELECTOID,

    COUNT(*) AS ProductItems

    FROMdbo.Orders

    GROUP BYOID

    ) AS y ON y.ProductItems = x.ProductItems

    INNER JOINdbo.Orders AS x1 ON x1.OID = x.OID

    INNER JOINdbo.Orders AS y1 ON y1.OID = y.OID

    WHEREx1.ItemID = y1.ItemID

    AND x.OID < y.OID

    GROUP BYx.OID,

    y.OID

    HAVINGCOUNT(*) = MIN(x.ProductItems)

    -- 0 sec (10k), 1 sec (100k), 2 sec (1000k)

    CREATE INDEX IX_Low ON #Stage (LowID) INCLUDE (HighID, ProductItems)

    CREATE INDEX IX_High ON #Stage (HighID) INCLUDE (LowID, ProductItems)

    -- 0 sec (10k), 1 sec (100k), 8 sec (1000k)

    ;WITH cteDuplicate

    AS (

    SELECTs.LowID AS theGrp,

    s.LowID AS OID,

    MIN(s.ProductItems) AS ProductItems,

    1 AS SeqID

    FROM #Stage AS s

    WHERE NOT EXISTS (SELECT * FROM #Stage AS x WHERE x.HighID = s.LowID)

    GROUP BY s.LowID

    UNION ALL

    SELECTd.theGrp,

    f.ID AS OID,

    d.ProductItems,

    d.SeqID + 1 AS SeqID

    FROMcteDuplicate AS d

    CROSS APPLY(

    SELECTs.HighID,

    ROW_NUMBER() OVER (ORDER BY s.HighID) AS SeqID

    FROM#Stage AS s

    WHEREs.LowID = d.OID

    ) AS f(ID, SeqID)

    WHERE f.SeqID = 1

    )

    SELECTDENSE_RANK() OVER (ORDER BY theGrp) AS theGroup,

    ProductItems,

    SeqID,

    COUNT(*) OVER (PARTITION BY theGrp) AS OrderItems,

    OID

    INTO #Temp

    FROMcteDuplicate

    OPTION (MAXRECURSION 0)

    -- 0 sec

    DROP TABLE #Stage

    -- 0 sec (10k), 0 sec (100k), 1 sec (1000k)

    SELECTw.ProductItems,

    STUFF(w.Data, 1, 2, '') AS Items,

    w.OrderItems,

    STUFF(f.Data, 1, 2, '') AS Orders

    FROM(

    SELECTt.theGroup,

    t.ProductItems,

    (

    SELECT', ' + CAST(pod.ItemID AS VARCHAR(12))

    FROM dbo.Orders AS pod

    WHEREpod.OID = t.OID

    ORDER BYpod.ItemID

    FOR XMLPATH('')

    ) AS Data,

    t.OrderItems

    FROM#Temp AS t

    WHEREt.SeqID = 1

    ) AS w

    CROSS APPLY(

    SELECT', ' + CAST(t.OID AS VARCHAR(12))

    FROM#Temp AS t

    WHERE t.theGroup = w.theGroup

    ORDER BY t.OID

    FOR XMLPATH('')

    ) AS f(Data)

    -- 0 sec

    DROP TABLE #Temp


    N 56°04'39.16"
    E 12°55'05.25"

  • Jeff Moden (5/3/2011)


    What does the following snippet from your code do?

    ,1 S SeqID FROM #Stage AS s

    Spelling error. It should read

    ,1 AS SeqID


    N 56°04'39.16"
    E 12°55'05.25"

  • Craig Farrell (5/2/2011)


    tfifield (5/2/2011)


    Todd,

    This should give you what you want. It gives each order and its items for only orders that have the same items.

    Todd, that can get fouled up by slightly different item lists.

    IE: oid 1 with 111, 222, 333 and oid 2 with 111, 222, and 444.

    You'll have the same counts, and your join will still return 2 of the 3 rows, possibly showing poor data results.

    To do it the way you're describing, you'd need to nearly crossjoin the table to itself on itemids, take the count of THAT result per oid pairing, then return to the table and compare the counts of the pair to each independent piece, confirming they all match still. It gets nasty fast.

    It would end up looking something like this:

    ;WITH cte AS (

    SELECT

    o1.oid AS oid1,

    o2.oid AS oid2,

    COUNT(*) AS cntForPair

    FROM

    #orders AS o1

    JOIN

    #orders AS o2

    ONo1.oid <> o2.oid

    AND o1.itemID = o2.itemID

    GROUP BY

    o1.oid,

    o2.oid

    ),

    OrderAgg AS (

    SELECT

    oid,

    COUNT(*) AS oCount

    FROM

    #orders

    GROUP BY

    oid

    )

    --select * from orderagg

    SELECT

    cte.*

    FROM

    cte

    JOIN

    orderAgg AS oa1

    ONcte.oid1 = oa1.oid

    JOIN

    orderAgg AS oa2

    ONcte.oid2 = oa2.oid

    WHERE

    cte.cntForPair = oa1.oCount

    AND cte.cntForPair = oa2.oCount

    AND cte.oid1 < cte.oid2

    EDIT: This has been going longer then I expected it, so I edited my original answer post to include the ORDER BY to avoid issues.

    Craig,

    I thought about that about 1 minute after I posted it and did the old Homer Simpson DOH! I then got a bunch of phone calls from clients and haven't been back to SSC until now. I'm a bit red faced on it. Comes from going over the forum before coffee.

    Todd

  • tfifield (5/5/2011)


    Craig,

    I thought about that about 1 minute after I posted it and did the old Homer Simpson DOH! I then got a bunch of phone calls from clients and haven't been back to SSC until now. I'm a bit red faced on it. Comes from going over the forum before coffee.

    Todd

    Never done that. Nope. Not me. No idea what you're talking about. You're all alone in doing that. Yep. Not meeeee..... :Whistling:


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply