Return all rows where one is a match

  • Simple scenario.

    We have order header, order detail.

    I want to return all rows from order detail where at least one has a specified product i.e. I want to see what else was ordered when the product was ordered.

    So I need the following modified to return all the order lines for an order and not just the product match.

    Select oph.order_no, opd.order_line_no, opd.product, opd.quantity

    From OrderDetail opd inner join OrderHeader oph on opd.order_no = oph.order_no

    Where opd.product=’Bike’

    Hope someone understands what I’m trying to do. I know I could do a subquery to return all order_no where the product has been ordered but I don’t want to do it this way.

  • looks like a simple IN() stamtent would do the job...you want the entire order if one of the porducts is 'Bike':

    Select

    oph.order_no,

    opd.order_line_no,

    opd.product,

    opd.quantity

    From OrderDetail opd

    inner join OrderHeader oph

    on opd.order_no = oph.order_no

    Where where opd.order_no in(SELECT order_no FROM OrderDetail WHERE product=’Bike’)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Another solution would be to simply join to the detail again.

    [font="Courier New"]Select

    oph.order_no

    , opd.order_line_no

    , opd.product

    , opd.quantity

    , opd2.product as OtherProduct

    , opd2.quantity as OtherQuantity

    From

    OrderDetail opd

    inner join OrderHeader oph on opd.order_no = oph.order_no

    inner join OrderDetail opd2 on opd2.order_no = opd.order_no

    and opd2.product <> opd.product

    Where

    opd.product=’Bike’[/font]

  • Actually the subquery will probably be more logically correct. I don't know if it is possible for there to be more than one OrderDetail record for the same order with the same product. In Michael's query, it would return all of the detail records for the same order twice if there were two detail records with a product of 'bike'

  • I would probably write this as follows. No subquery needed... 🙂

    SELECT

    oh.order_no

    , od.order_line_no

    , od.product

    , od.quantity

    FROM dbo.OrderHeader oh

    JOIN dbo.OrderDetail od -- Use for full list of products on the order

    ON oh.order_no = od.order_no

    JOIN dbo.OrderDetail odf -- Use to filter data by product

    ON oh.order_no = odf.order_no

    WHERE odf.product = @product

    Gary Johnson
    Sr Database Engineer

  • The IN clause is actually faster than the multiple joins. I tested it on the Northwind database (so it was easily repeatable) as so:

    [font="Courier New"]

    SELECT

                oh.orderid

                , od.productid

                , od.quantity

        FROM dbo.Orders oh

                JOIN dbo.[Order Details] od -- Use for full list of products on the order

                ON oh.orderid = od.orderid

                JOIN dbo.[Order Details] odf -- Use to filter data by product

                ON oh.orderid = odf.orderid

        WHERE odf.productid = 51

        ORDER BY oh.orderid, od.productid

        ---------------- VS ----------------------

    SELECT oh.orderid, od.productid, od.quantity  

        FROM

                orders oh INNER JOIN  [order details] od ON oh.orderid = od.orderid

        WHERE od.orderid IN (

        SELECT orderid

            FROM [order details]

            WHERE productid = 51)[/font]

    Pretty marked difference on a small dataset. This could potentially change with a large enough dataset and a highly populated IN clause (IE. instead of 39 orders with the targeted product in it, there are 30,000) Depending on how large/densely populated the OP's dataset is, maybe he could test the two for us and let us know.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • if using a correlated subquery for checkinga condition\item i prefer EXISTS. I've always been under the impression this should be faster than IN. Am i way off base here?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • It was my understanding that these subqueries are basically the equivalent of derived tables, as they don't reference the outer rows and thus should not repeat for every row. If I'm mistaken in that regard, then it makes sense that they would significantly degrade in performance as the dataset got larger.

    As far as IN and EXISTS, to the best of my knowledge and from everything I've ever read are the same as far as performance goes, assuming they are written the same way. I checked and the execution plan and performance are exactly the same on both in this scenario. That actually makes me doubt my belief that IN is not correlated, as exists does reference the outer table. Maybe they're just normally highly efficient correlated subqueries.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (10/10/2008)


    The IN clause is actually faster than the multiple joins. I tested it on the Northwind database (so it was easily repeatable) as so:

    I ran your test code and, as expected, no difference. Both execution plans are exactly identical because IN resolves to an inner join.

    What kind of difference did you actually come up with?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Subqueries with IN and EXISTS serve slightly different purposes. For this situation, an EXISTS subquery would be a correlated subquery, and the IN subquery would not have to be.

  • Jeff Moden (10/11/2008)


    Garadin (10/10/2008)


    The IN clause is actually faster than the multiple joins. I tested it on the Northwind database (so it was easily repeatable) as so:

    I ran your test code and, as expected, no difference. Both execution plans are exactly identical because IN resolves to an inner join.

    What kind of difference did you actually come up with?

    The execution plans are the same, but the percentages skew slightly. The join query has 235 reads as opposed to 156 for the IN. Unfortunately the dataset is too small and both have a duration of 0, so reads was all I could compare on. That said, I have seen queries with more reads actually performing significantly faster, so I suppose using this as the sole measurement criteria isn't really accurate... but doesn't it at least show that they aren't exactly the same?

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Here is what I get when running the three methods. Apologies for the somewhat sloppy code, I wrote this on my laptop while laying on the couch.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (10/12/2008)


    The execution plans are the same, but the percentages skew slightly. The join query has 235 reads as opposed to 156 for the IN. Unfortunately the dataset is too small and both have a duration of 0, so reads was all I could compare on. That said, I have seen queries with more reads actually performing significantly faster, so I suppose using this as the sole measurement criteria isn't really accurate... but doesn't it at least show that they aren't exactly the same?

    You said "but the percentages skew slightly"... are you talking about the "Percent of Batch" in the execution plan?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/13/2008)


    Garadin (10/12/2008)


    The execution plans are the same, but the percentages skew slightly. The join query has 235 reads as opposed to 156 for the IN. Unfortunately the dataset is too small and both have a duration of 0, so reads was all I could compare on. That said, I have seen queries with more reads actually performing significantly faster, so I suppose using this as the sole measurement criteria isn't really accurate... but doesn't it at least show that they aren't exactly the same?

    You said "but the percentages skew slightly"... are you talking about the "Percent of Batch" in the execution plan?

    Yes.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Not a good thing to do... the Percentage of Batch is highly unreliable and frequently absolutely incorrect...

    Look at the Percentage of Batch on the following code... then run it... SURPRISE! 🙂

    SET NOCOUNT ON

    --=======================================================================================

    -- Recursive method shown by (Name with-held)

    --=======================================================================================

    PRINT '========== Recursive method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @DateVal DATETIME

    SET @DateVal = '2008-01-01'

    ;with mycte as

    (

    select @DateVal AS DateVal

    union all

    select DateVal + 1

    from mycte

    where DateVal + 1 < DATEADD(yy, 5, @DateVal)

    )

    select @Bitbucket = d.dateval

    from mycte d

    OPTION (MAXRECURSION 0)

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    GO

    --=======================================================================================

    -- Tally table method by Jeff Moden

    --=======================================================================================

    PRINT '========== Tally table method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @StartDate AS DATETIME

    SET @StartDate = '2008-01-01'

    SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))

    @Bitbucket = @StartDate-1+t.N

    FROM Tally t

    ORDER BY N

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    Here's another one... both have identical execution plans and percentages...

    SET STATISTICS TIME ON

    SELECT TOP 10 *

    FROM dbo.JBMTest

    ORDER BY SomeLetters2

    SET ROWCOUNT 10

    SELECT *

    FROM dbo.JBMTest

    ORDER BY SomeLetters2

    SET ROWCOUNT 0

    Oh yeah... here's the test table I used for that one...

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply