May 21, 2010 at 10:09 am
Nice article. Thanks for sharing.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 21, 2010 at 10:22 am
traughberk (5/21/2010)
I have a question about LEFT JOIN example near the beginning where he starts talking about EXCEPT where he wrote the following SQL.SELECT C.CustomerID
FROM Sales.Customer AS C
LEFT JOIN Sales.SalesOrderHeader AS OH
ON C.CustomerID = OH.CustomerID
AND OrderDate>='2004-07-01'
WHERE OH.CustomerID IS NULL
AND C.TerritoryID=10
Can someone please explain the "AND OrderDate>='2004-07-01' " in the FROM clause? I've never seen that before.
Thanks.
Kris
Kris - the "AND OrderDate >= '2004-07-01'" is filtering under the JOIN function. When you JOIN two tables you define how the two relate to each other. In the example, the "C" table links to the "OH" table through the CustomerID field in each table being equal. You could state that they be different, in which case you link each record in table "C" to all other recrods in table "OH" where the CustomerID's are different. Why you would want to do this is beyond me. I am just illustrating some functionality. So the second parameter in the JOIN linking definition is that on top of the fact that the CustomerID's in table "C" and table "OH" have to be the same, the Order Date must also be greater than or equal to July 1st, 2004. An alternative is to drop the AND statement from the JOIN function, and apply it in the WHERE clause. One way may be more efficient than the other.
May 21, 2010 at 10:37 am
Never mind folks. I figured it out myself which is what I should have done in the first place. I guess I'm just lazy this morning. Sorry for the wasted post.
May 21, 2010 at 12:26 pm
Thank you for a great article, Dave. I never really thought of using any of these commands as I normally would place data into temp tables to run JOIN queries that produce similar results, and I never fully understood their use until seeing it graphically in your article. I can now do a:
SELECT no_match_id FROM [production].[dbo].[ytd_table]
EXCEPT
SELECT no_match_id FROM [production].[dbo].[current_month_table]
To get a list of ID's in my main table that are no longer in the new month's data I load. There's more complexity to it than what I wrote here, but the EXCEPT command will achieve the same results for my table updates than the lengthy queries I'm using now.
And I'm sure I can find many other uses for utilizing both INTERSECT and EXCEPT to replace and simplify my current methodologies in which I use JOINs to match on as many as 60+ columns.
Thanks again for a great article!
Danny Sheridan
Comtekh, Inc.
May 21, 2010 at 2:17 pm
I think a good use of INTERSECT and EXCEPT is when you need to compare many fields. It is much quicker than writing a join or where exists on 5 columns.
May 21, 2010 at 2:39 pm
The reason for the AND OrderDate > '2004-07-01' in the JOIN Clause is so that ALL Customers get returned by the left join. If you moved it to the WHERE Clause then any customers who had not had an order after that date would not show up in the result set.
--
JimFive
May 21, 2010 at 2:48 pm
Robert Dudley (5/21/2010)
traughberk (5/21/2010)
I have a question about LEFT JOIN example near the beginning where he starts talking about EXCEPT where he wrote the following SQL.SELECT C.CustomerID
FROM Sales.Customer AS C
LEFT JOIN Sales.SalesOrderHeader AS OH
ON C.CustomerID = OH.CustomerID
AND OrderDate>='2004-07-01'
WHERE OH.CustomerID IS NULL
AND C.TerritoryID=10
Can someone please explain the "AND OrderDate>='2004-07-01' " in the FROM clause? I've never seen that before.
Thanks.
Kris
Kris - the "AND OrderDate >= '2004-07-01'" is filtering under the JOIN function. When you JOIN two tables you define how the two relate to each other. In the example, the "C" table links to the "OH" table through the CustomerID field in each table being equal. You could state that they be different, in which case you link each record in table "C" to all other recrods in table "OH" where the CustomerID's are different. Why you would want to do this is beyond me. I am just illustrating some functionality. So the second parameter in the JOIN linking definition is that on top of the fact that the CustomerID's in table "C" and table "OH" have to be the same, the Order Date must also be greater than or equal to July 1st, 2004. An alternative is to drop the AND statement from the JOIN function, and apply it in the WHERE clause. One way may be more efficient than the other.
Thanks for the reply. I played around with it for a bit and it seems that it does make a difference where the date filter is placed. If it is placed in the FROM clause you are basically joining table C to a subset of table OH. If you put it in the WHERE clause, you are joining table C to all of table OH. It's similar to the example he wrote following that using the IN statement. (Obviously its similar. That's why he wrote it.)
May 21, 2010 at 2:54 pm
The other, and maybe the main, reason for putting conditionals into the join is to make the query more efficient. If you blind join a table then all matching rows get included in intermediate results (within the server) only to be filtered out later on during the processing of the WHERE clause. I never understood that until I read some very excellent articles on how JOINS work right here on SQLServerCentral.com We were working on a query that just sucked and often timed out. We joined millions of transactions to thousands of customers only to filer that down to hundreds of results. Moving the conditional into the join then joins only those few hundred transactions to the customer set. One heck of performance boost. Check them out.
ATBCharles Kincaid
May 21, 2010 at 3:45 pm
Thanks for the replies everyone. I've learned something new today. That alone makes it a good day. 🙂
May 23, 2010 at 6:14 pm
I'm sure I once read, here, that WHERE IN tends to be expensive. Your test results challenge this assertion. I wonder if SQL Server 2008 is optimising the execution plan a lot more than it used to.
May 23, 2010 at 11:06 pm
When INNER JOIN returns duplicate rows then INTERSECT returns distinct values only. Thats why the difference in execution time occurs.
May 24, 2010 at 1:48 am
Good comparison, David, thanks.
From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries
-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.
/Johan
May 24, 2010 at 7:44 am
I like using the EXCEPT clause because I find it makes the query easier to read. If you have any familiarity with Venn diagrams you can simply focus on understanding each of the two select statements independently and then apply set theory to understand the final result. Nice to know your experiment shows that performance is equivalent
May 24, 2010 at 8:31 am
johan.lindell (5/24/2010)
Good comparison, David, thanks.From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries
-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.
/Johan
I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!
* Noel
May 25, 2010 at 7:39 am
noeld (5/24/2010)
johan.lindell (5/24/2010)
Good comparison, David, thanks.From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries
-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.
/Johan
I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!
+1
Viewing 15 posts - 61 through 75 (of 86 total)
You must be logged in to reply to this topic. Login to reply