October 13, 2008 at 9:24 am
I wasn't really relying on the percentage of batch in this case, merely noting the differences. The thing that really makes no sense to me, and seems to go against everything stated as far as optimization goes for the three types are the reads. If the IN is treated the same as an inner join, and the two queries are exactly the same, they should have the same number of reads. The fact that they don't doesn't make a lot of sense to me. Having more reads doesn't necessarily make the inner join slower... but it does seem to imply that it is being handled differently.
Also, the exists vs. the IN. The exists seems like it would require correlation due to the outer query reference, and thus be slower than the other two methods. However, it performs exactly the same as the IN, regardless of the lack of outer query reference in the IN method.
I think we're gonna have to run this on significantly more data, and with varying population of the records returned by the subquery to really examine this... although I have to believe that someone has done that before.
October 13, 2008 at 9:36 am
Garadin (10/13/2008)
I think we're gonna have to run this on significantly more data, and with varying population of the records returned by the subquery to really examine this... although I have to believe that someone has done that before.
Heh... you now have a million row test data generator... 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
October 13, 2008 at 12:09 pm
I've done some testing, not quite happy that I've covered it all yet, but I'll finish it up tonight when I get home. So far, a couple things of note:
1. Don't trust the built in Trace in QA. I switched to using profiler and got profound differences(between that and the QA trace, not necessarily between the different methods).
2. The INNER JOIN query is not technically the same as the other two. I knew there was something that was bothering me about that method that I couldn't quite put my finger on. It will yield identical results only if productID is unique to an order. While it's logical that it would be(and pretty bad design if it's not), it doesn't necessarily have to be, and would change the results of the queries. Adding DISTINCT to it does not necessarily make it match either.
October 14, 2008 at 7:28 am
It has been my experience that IN causes more problems with optimization than EXISTS does, so I recommend that mechanism.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 14, 2008 at 9:26 am
I've prettymuch gotten nowhere with testing these different methods. They all show negligible differences
in times, regardless of both the amount of data being used and the density of the subquery item. Here is
what I used to test the three methods. I decided to make my own test table(although I appreciate yours Jeff, and
I'm sure I'll use that for other stuff) so that I could mimic the original tests exactly and easily control
the density of the subquery item. I believe that my original confusion came from using the built in Trace
in QA, which I normally never do, so I didn't realize that it wasn't the same as profiler.
[font="Courier New"]------------------ SETUP ----------------------------------
USE SSC
CREATE TABLE Orders(
OrderID INT)
CREATE TABLE OrderDetails(
OrderID INT,
ProductID INT)
DECLARE @Counter INT,
@Increment INT
SET @Counter = 0
SET @Increment = 11077 -- Max OrderID from Northwind DB is 11077, will use this for increment.
WHILE @Counter <=1000 -- Took about 54 seconds to execute.
BEGIN
INSERT INTO Orders(OrderID)
SELECT DISTINCT OrderID + @Increment
FROM northwind.dbo.Orders
INSERT INTO OrderDetails(OrderID, ProductID)
SELECT DISTINCT OrderID + @Increment, ProductID
FROM northwind.dbo.[Order Details]
SET @Counter = @Counter + 1
SET @Increment = @Increment + 11077
CONTINUE
END
UPDATE OrderDetails -- Lower density of ProductID 50.
SET ProductID = 51
WHERE ProductID = 50 AND OrderID > 130000
-----------------------------------------------------------
------------------- RESULTS -------------------------------
-- Orders: 830830ProductID 50 Count: 100
-- Order Lines:2157155ProductID 51 Count: 48,949
-----------------------------------------------------------
[/font]
[font="Courier New"]
---------------- ProductID 50 Tests -----------------------
-------------- Method 1 (INNER JOINS) ---------------------
DBCC FREEPROCCACHE
SELECT OH.OrderID, OD.ProductID
FROM Orders oh
JOIN OrderDetails OD ON oh.orderid = od.orderid
JOIN OrderDetails ODF ON oh.orderid = odf.orderid
WHERE ODF.ProductID = 50
-- 300 Rows Duration 2156 CPU 1656 Reads 11028 Writes 0
-----------------------------------------------------------
-------------- Method 2 (IN) ------------------------------
DBCC FREEPROCCACHE
SELECT O.OrderID, OD.ProductID
FROM Orders O INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID
WHERE od.orderid IN (
SELECT OrderID
FROM OrderDetails
WHERE productid = 50)
-- 300 Rows Duration 2033 CPU 1688 Reads 11030 Writes 0
-----------------------------------------------------------
-------------- Method 3 (EXISTS) --------------------------
DBCC FREEPROCCACHE
SELECT O.OrderID, OD.ProductID
FROM Orders O INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID
WHERE EXISTS (SELECT OrderID FROM OrderDetails OD2 WHERE OD2.ProductID = 50 AND OD2.OrderID = OD.OrderID)
-- 300 Rows Duration 2063 CPU 1766 Reads 11032 Writes 0
-----------------------------------------------------------[/font]
[font="Courier New"]
----------------- ProductID 51 Tests ----------------------
-------------- Method 1 (INNER JOINS) ---------------------
DBCC FREEPROCCACHE
SELECT OH.OrderID, OD.ProductID
FROM Orders oh
JOIN OrderDetails OD ON oh.orderid = od.orderid
JOIN OrderDetails ODF ON oh.orderid = odf.orderid
WHERE ODF.ProductID = 51
-- Rows 146847 Duration 4296 CPU 2562 Reads 11028 Writes 0
-----------------------------------------------------------
-------------- Method 2 (IN) ------------------------------
DBCC FREEPROCCACHE
SELECT O.OrderID, OD.ProductID
FROM Orders O INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID
WHERE O.orderid IN (
SELECT OrderID
FROM OrderDetails
WHERE productid = 51)
-- Rows 143874 Duration 3970 CPU 2407 Reads 11030 Writes 0
-----------------------------------------------------------
-------------- Method 3 (EXISTS) --------------------------
DBCC FREEPROCCACHE
SELECT O.OrderID, OD.ProductID
FROM Orders O INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID
WHERE EXISTS (SELECT OrderID FROM OrderDetails OD2 WHERE OD2.ProductID = 51 AND OD2.OrderID = OD.OrderID)
-- Rows 143874 Duration 4016 CPU 2625 Reads 11032 Writes 0
-----------------------------------------------------------
[/font]
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply