Return all rows where one is a match

  • I wasn't really relying on the percentage of batch in this case, merely noting the differences. The thing that really makes no sense to me, and seems to go against everything stated as far as optimization goes for the three types are the reads. If the IN is treated the same as an inner join, and the two queries are exactly the same, they should have the same number of reads. The fact that they don't doesn't make a lot of sense to me. Having more reads doesn't necessarily make the inner join slower... but it does seem to imply that it is being handled differently.

    Also, the exists vs. the IN. The exists seems like it would require correlation due to the outer query reference, and thus be slower than the other two methods. However, it performs exactly the same as the IN, regardless of the lack of outer query reference in the IN method.

    I think we're gonna have to run this on significantly more data, and with varying population of the records returned by the subquery to really examine this... although I have to believe that someone has done that before.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (10/13/2008)


    I think we're gonna have to run this on significantly more data, and with varying population of the records returned by the subquery to really examine this... although I have to believe that someone has done that before.

    Heh... you now have a million row test data generator... 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've done some testing, not quite happy that I've covered it all yet, but I'll finish it up tonight when I get home. So far, a couple things of note:

    1. Don't trust the built in Trace in QA. I switched to using profiler and got profound differences(between that and the QA trace, not necessarily between the different methods).

    2. The INNER JOIN query is not technically the same as the other two. I knew there was something that was bothering me about that method that I couldn't quite put my finger on. It will yield identical results only if productID is unique to an order. While it's logical that it would be(and pretty bad design if it's not), it doesn't necessarily have to be, and would change the results of the queries. Adding DISTINCT to it does not necessarily make it match either.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • It has been my experience that IN causes more problems with optimization than EXISTS does, so I recommend that mechanism.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I've prettymuch gotten nowhere with testing these different methods. They all show negligible differences

    in times, regardless of both the amount of data being used and the density of the subquery item. Here is

    what I used to test the three methods. I decided to make my own test table(although I appreciate yours Jeff, and

    I'm sure I'll use that for other stuff) so that I could mimic the original tests exactly and easily control

    the density of the subquery item. I believe that my original confusion came from using the built in Trace

    in QA, which I normally never do, so I didn't realize that it wasn't the same as profiler.

    [font="Courier New"]------------------ SETUP ----------------------------------

    USE SSC

    CREATE TABLE Orders(

    OrderID       INT)

    CREATE TABLE OrderDetails(

    OrderID       INT,

    ProductID     INT)

    DECLARE @Counter    INT,

        @Increment     INT

            

    SET @Counter = 0

    SET @Increment = 11077 -- Max OrderID from Northwind DB is 11077, will use this for increment.

    WHILE @Counter <=1000 -- Took about 54 seconds to execute.

    BEGIN

        

        INSERT INTO Orders(OrderID)

        SELECT DISTINCT OrderID + @Increment

        FROM northwind.dbo.Orders

        

        INSERT INTO OrderDetails(OrderID, ProductID)

        SELECT DISTINCT OrderID + @Increment, ProductID

        FROM northwind.dbo.[Order Details]

        

        SET @Counter = @Counter + 1

        SET @Increment = @Increment + 11077

        CONTINUE

    END

    UPDATE OrderDetails -- Lower density of ProductID 50.

    SET ProductID = 51

    WHERE ProductID = 50 AND OrderID > 130000    

    -----------------------------------------------------------

    ------------------- RESULTS -------------------------------

    -- Orders:  830830ProductID 50 Count: 100

    -- Order Lines:2157155ProductID 51 Count: 48,949

    -----------------------------------------------------------

    [/font]

    [font="Courier New"]

    ---------------- ProductID 50 Tests -----------------------

    -------------- Method 1 (INNER JOINS) ---------------------

    DBCC FREEPROCCACHE

    SELECT   OH.OrderID, OD.ProductID

    FROM Orders oh

        JOIN OrderDetails OD ON oh.orderid = od.orderid

        JOIN OrderDetails ODF ON oh.orderid = odf.orderid

    WHERE ODF.ProductID = 50  

    -- 300 Rows  Duration 2156  CPU 1656  Reads 11028  Writes 0

    -----------------------------------------------------------

    -------------- Method 2 (IN) ------------------------------

    DBCC FREEPROCCACHE

    SELECT O.OrderID, OD.ProductID

    FROM Orders O INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID

    WHERE od.orderid IN (

        SELECT OrderID

        FROM OrderDetails

        WHERE productid = 50)

    -- 300 Rows  Duration 2033 CPU 1688 Reads 11030 Writes 0

    -----------------------------------------------------------

    -------------- Method 3 (EXISTS) --------------------------

    DBCC FREEPROCCACHE

    SELECT O.OrderID, OD.ProductID

    FROM Orders O INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID

    WHERE EXISTS (SELECT OrderID FROM OrderDetails OD2 WHERE OD2.ProductID = 50 AND OD2.OrderID = OD.OrderID)

    -- 300 Rows  Duration 2063  CPU 1766  Reads 11032  Writes 0

    -----------------------------------------------------------[/font]

    [font="Courier New"]

    ----------------- ProductID 51 Tests ----------------------

    -------------- Method 1 (INNER JOINS) ---------------------

    DBCC FREEPROCCACHE

    SELECT OH.OrderID, OD.ProductID

    FROM Orders oh

        JOIN OrderDetails OD ON oh.orderid = od.orderid

        JOIN OrderDetails ODF ON oh.orderid = odf.orderid

    WHERE ODF.ProductID = 51  

    -- Rows 146847  Duration 4296  CPU 2562  Reads  11028  Writes 0

    -----------------------------------------------------------

    -------------- Method 2 (IN) ------------------------------

    DBCC FREEPROCCACHE

    SELECT O.OrderID, OD.ProductID

    FROM Orders O INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID

    WHERE O.orderid IN (

        SELECT OrderID

        FROM OrderDetails

        WHERE productid = 51)

    -- Rows 143874  Duration 3970  CPU 2407  Reads 11030  Writes 0

    -----------------------------------------------------------

    -------------- Method 3 (EXISTS) --------------------------

    DBCC FREEPROCCACHE

    SELECT O.OrderID, OD.ProductID

    FROM Orders O INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID

    WHERE EXISTS (SELECT OrderID FROM OrderDetails OD2 WHERE OD2.ProductID = 51 AND OD2.OrderID = OD.OrderID)

    -- Rows 143874  Duration 4016  CPU 2625  Reads 11032  Writes 0

    -----------------------------------------------------------

    [/font]

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply