October 10, 2008 at 6:03 am
Simple scenario.
We have order header, order detail.
I want to return all rows from order detail where at least one has a specified product i.e. I want to see what else was ordered when the product was ordered.
So I need the following modified to return all the order lines for an order and not just the product match.
Select oph.order_no, opd.order_line_no, opd.product, opd.quantity
From OrderDetail opd inner join OrderHeader oph on opd.order_no = oph.order_no
Where opd.product=’Bike’
Hope someone understands what I’m trying to do. I know I could do a subquery to return all order_no where the product has been ordered but I don’t want to do it this way.
October 10, 2008 at 6:27 am
looks like a simple IN() stamtent would do the job...you want the entire order if one of the porducts is 'Bike':
Select
oph.order_no,
opd.order_line_no,
opd.product,
opd.quantity
From OrderDetail opd
inner join OrderHeader oph
on opd.order_no = oph.order_no
Where where opd.order_no in(SELECT order_no FROM OrderDetail WHERE product=’Bike’)
Lowell
October 10, 2008 at 6:30 am
Another solution would be to simply join to the detail again.
[font="Courier New"]Select
oph.order_no
, opd.order_line_no
, opd.product
, opd.quantity
, opd2.product as OtherProduct
, opd2.quantity as OtherQuantity
From
OrderDetail opd
inner join OrderHeader oph on opd.order_no = oph.order_no
inner join OrderDetail opd2 on opd2.order_no = opd.order_no
and opd2.product <> opd.product
Where
opd.product=’Bike’[/font]
October 10, 2008 at 8:04 am
Actually the subquery will probably be more logically correct. I don't know if it is possible for there to be more than one OrderDetail record for the same order with the same product. In Michael's query, it would return all of the detail records for the same order twice if there were two detail records with a product of 'bike'
October 10, 2008 at 4:17 pm
I would probably write this as follows. No subquery needed... 🙂
SELECT
oh.order_no
, od.order_line_no
, od.product
, od.quantity
FROM dbo.OrderHeader oh
JOIN dbo.OrderDetail od -- Use for full list of products on the order
ON oh.order_no = od.order_no
JOIN dbo.OrderDetail odf -- Use to filter data by product
ON oh.order_no = odf.order_no
WHERE odf.product = @product
Gary Johnson
Sr Database Engineer
October 10, 2008 at 7:57 pm
The IN clause is actually faster than the multiple joins. I tested it on the Northwind database (so it was easily repeatable) as so:
[font="Courier New"]
SELECT
oh.orderid
, od.productid
, od.quantity
FROM dbo.Orders oh
JOIN dbo.[Order Details] od -- Use for full list of products on the order
ON oh.orderid = od.orderid
JOIN dbo.[Order Details] odf -- Use to filter data by product
ON oh.orderid = odf.orderid
WHERE odf.productid = 51
ORDER BY oh.orderid, od.productid
---------------- VS ----------------------
SELECT oh.orderid, od.productid, od.quantity
FROM
orders oh INNER JOIN [order details] od ON oh.orderid = od.orderid
WHERE od.orderid IN (
SELECT orderid
FROM [order details]
WHERE productid = 51)[/font]
Pretty marked difference on a small dataset. This could potentially change with a large enough dataset and a highly populated IN clause (IE. instead of 39 orders with the targeted product in it, there are 30,000) Depending on how large/densely populated the OP's dataset is, maybe he could test the two for us and let us know.
October 11, 2008 at 3:18 pm
if using a correlated subquery for checkinga condition\item i prefer EXISTS. I've always been under the impression this should be faster than IN. Am i way off base here?
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
October 11, 2008 at 4:07 pm
It was my understanding that these subqueries are basically the equivalent of derived tables, as they don't reference the outer rows and thus should not repeat for every row. If I'm mistaken in that regard, then it makes sense that they would significantly degrade in performance as the dataset got larger.
As far as IN and EXISTS, to the best of my knowledge and from everything I've ever read are the same as far as performance goes, assuming they are written the same way. I checked and the execution plan and performance are exactly the same on both in this scenario. That actually makes me doubt my belief that IN is not correlated, as exists does reference the outer table. Maybe they're just normally highly efficient correlated subqueries.
October 11, 2008 at 8:07 pm
Garadin (10/10/2008)
The IN clause is actually faster than the multiple joins. I tested it on the Northwind database (so it was easily repeatable) as so:
I ran your test code and, as expected, no difference. Both execution plans are exactly identical because IN resolves to an inner join.
What kind of difference did you actually come up with?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 12, 2008 at 9:14 am
Subqueries with IN and EXISTS serve slightly different purposes. For this situation, an EXISTS subquery would be a correlated subquery, and the IN subquery would not have to be.
October 12, 2008 at 9:58 am
Jeff Moden (10/11/2008)
Garadin (10/10/2008)
The IN clause is actually faster than the multiple joins. I tested it on the Northwind database (so it was easily repeatable) as so:I ran your test code and, as expected, no difference. Both execution plans are exactly identical because IN resolves to an inner join.
What kind of difference did you actually come up with?
The execution plans are the same, but the percentages skew slightly. The join query has 235 reads as opposed to 156 for the IN. Unfortunately the dataset is too small and both have a duration of 0, so reads was all I could compare on. That said, I have seen queries with more reads actually performing significantly faster, so I suppose using this as the sole measurement criteria isn't really accurate... but doesn't it at least show that they aren't exactly the same?
October 12, 2008 at 3:02 pm
Here is what I get when running the three methods. Apologies for the somewhat sloppy code, I wrote this on my laptop while laying on the couch.
October 13, 2008 at 8:26 am
Garadin (10/12/2008)
The execution plans are the same, but the percentages skew slightly. The join query has 235 reads as opposed to 156 for the IN. Unfortunately the dataset is too small and both have a duration of 0, so reads was all I could compare on. That said, I have seen queries with more reads actually performing significantly faster, so I suppose using this as the sole measurement criteria isn't really accurate... but doesn't it at least show that they aren't exactly the same?
You said "but the percentages skew slightly"... are you talking about the "Percent of Batch" in the execution plan?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 13, 2008 at 8:28 am
Jeff Moden (10/13/2008)
Garadin (10/12/2008)
The execution plans are the same, but the percentages skew slightly. The join query has 235 reads as opposed to 156 for the IN. Unfortunately the dataset is too small and both have a duration of 0, so reads was all I could compare on. That said, I have seen queries with more reads actually performing significantly faster, so I suppose using this as the sole measurement criteria isn't really accurate... but doesn't it at least show that they aren't exactly the same?You said "but the percentages skew slightly"... are you talking about the "Percent of Batch" in the execution plan?
Yes.
October 13, 2008 at 9:09 am
Not a good thing to do... the Percentage of Batch is highly unreliable and frequently absolutely incorrect...
Look at the Percentage of Batch on the following code... then run it... SURPRISE! 🙂
SET NOCOUNT ON
--=======================================================================================
-- Recursive method shown by (Name with-held)
--=======================================================================================
PRINT '========== Recursive method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @DateVal DATETIME
SET @DateVal = '2008-01-01'
;with mycte as
(
select @DateVal AS DateVal
union all
select DateVal + 1
from mycte
where DateVal + 1 < DATEADD(yy, 5, @DateVal)
)
select @Bitbucket = d.dateval
from mycte d
OPTION (MAXRECURSION 0)
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO
--=======================================================================================
-- Tally table method by Jeff Moden
--=======================================================================================
PRINT '========== Tally table method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @StartDate AS DATETIME
SET @StartDate = '2008-01-01'
SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))
@Bitbucket = @StartDate-1+t.N
FROM Tally t
ORDER BY N
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
Here's another one... both have identical execution plans and percentages...
SET STATISTICS TIME ON
SELECT TOP 10 *
FROM dbo.JBMTest
ORDER BY SomeLetters2
SET ROWCOUNT 10
SELECT *
FROM dbo.JBMTest
ORDER BY SomeLetters2
SET ROWCOUNT 0
Oh yeah... here's the test table I used for that one...
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply