Intersect, Except, Union, All and Any

Question

Intersect, Except, Union, All and Any

Viewing 15 posts - 61 through 75 (of 86 total)

You must be logged in to reply to this topic. Login to reply

SQLRNNR SSC Guru Points: 281334 More actions · Answer 1

Nice article. Thanks for sharing.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Robert Dudley SSC Rookie Points: 29 More actions · Answer 2

traughberk (5/21/2010)
I have a question about LEFT JOIN example near the beginning where he starts talking about EXCEPT where he wrote the following SQL.
SELECT C.CustomerID
FROM Sales.Customer AS C
LEFT JOIN Sales.SalesOrderHeader AS OH
ON C.CustomerID = OH.CustomerID
AND OrderDate>='2004-07-01'
WHERE OH.CustomerID IS NULL
AND C.TerritoryID=10
Can someone please explain the "AND OrderDate>='2004-07-01' " in the FROM clause? I've never seen that before.
Thanks.
Kris

Kris - the "AND OrderDate >= '2004-07-01'" is filtering under the JOIN function. When you JOIN two tables you define how the two relate to each other. In the example, the "C" table links to the "OH" table through the CustomerID field in each table being equal. You could state that they be different, in which case you link each record in table "C" to all other recrods in table "OH" where the CustomerID's are different. Why you would want to do this is beyond me. I am just illustrating some functionality. So the second parameter in the JOIN linking definition is that on top of the fact that the CustomerID's in table "C" and table "OH" have to be the same, the Order Date must also be greater than or equal to July 1st, 2004. An alternative is to drop the AND statement from the JOIN function, and apply it in the WHERE clause. One way may be more efficient than the other.

traughber SSC-Addicted Points: 419 More actions · Answer 3

Never mind folks. I figured it out myself which is what I should have done in the first place. I guess I'm just lazy this morning. Sorry for the wasted post.

Danny Sheridan SSChasing Mays Points: 642 More actions · Answer 4

Thank you for a great article, Dave. I never really thought of using any of these commands as I normally would place data into temp tables to run JOIN queries that produce similar results, and I never fully understood their use until seeing it graphically in your article. I can now do a:

SELECT no_match_id FROM [production].[dbo].[ytd_table]

EXCEPT

SELECT no_match_id FROM [production].[dbo].[current_month_table]

To get a list of ID's in my main table that are no longer in the new month's data I load. There's more complexity to it than what I wrote here, but the EXCEPT command will achieve the same results for my table updates than the lengthy queries I'm using now.

And I'm sure I can find many other uses for utilizing both INTERSECT and EXCEPT to replace and simplify my current methodologies in which I use JOINs to match on as many as 60+ columns.

Thanks again for a great article!

Danny Sheridan
Comtekh, Inc.

The Dr. SSC Veteran Points: 234 More actions · Answer 5

I think a good use of INTERSECT and EXCEPT is when you need to compare many fields. It is much quicker than writing a join or where exists on 5 columns.

James Goodwin Hall of Fame Points: 3691 More actions · Answer 6

The reason for the AND OrderDate > '2004-07-01' in the JOIN Clause is so that ALL Customers get returned by the left join. If you moved it to the WHERE Clause then any customers who had not had an order after that date would not show up in the result set.

--

JimFive

traughber SSC-Addicted Points: 419 More actions · Answer 7

Robert Dudley (5/21/2010)
traughberk (5/21/2010)
I have a question about LEFT JOIN example near the beginning where he starts talking about EXCEPT where he wrote the following SQL.
SELECT C.CustomerID
FROM Sales.Customer AS C
LEFT JOIN Sales.SalesOrderHeader AS OH
ON C.CustomerID = OH.CustomerID
AND OrderDate>='2004-07-01'
WHERE OH.CustomerID IS NULL
AND C.TerritoryID=10
Can someone please explain the "AND OrderDate>='2004-07-01' " in the FROM clause? I've never seen that before.
Thanks.
Kris
Kris - the "AND OrderDate >= '2004-07-01'" is filtering under the JOIN function. When you JOIN two tables you define how the two relate to each other. In the example, the "C" table links to the "OH" table through the CustomerID field in each table being equal. You could state that they be different, in which case you link each record in table "C" to all other recrods in table "OH" where the CustomerID's are different. Why you would want to do this is beyond me. I am just illustrating some functionality. So the second parameter in the JOIN linking definition is that on top of the fact that the CustomerID's in table "C" and table "OH" have to be the same, the Order Date must also be greater than or equal to July 1st, 2004. An alternative is to drop the AND statement from the JOIN function, and apply it in the WHERE clause. One way may be more efficient than the other.

Thanks for the reply. I played around with it for a bit and it seems that it does make a difference where the date filter is placed. If it is placed in the FROM clause you are basically joining table C to a subset of table OH. If you put it in the WHERE clause, you are joining table C to all of table OH. It's similar to the example he wrote following that using the IN statement. (Obviously its similar. That's why he wrote it.)

Charles Kincaid SSChampion Points: 13593 More actions · Answer 8

The other, and maybe the main, reason for putting conditionals into the join is to make the query more efficient. If you blind join a table then all matching rows get included in intermediate results (within the server) only to be filtered out later on during the processing of the WHERE clause. I never understood that until I read some very excellent articles on how JOINS work right here on SQLServerCentral.com We were working on a query that just sucked and often timed out. We joined millions of transactions to thousands of customers only to filer that down to hundreds of results. Moving the conditional into the join then joins only those few hundred transactions to the customer set. One heck of performance boost. Check them out.

ATBCharles Kincaid

traughber SSC-Addicted Points: 419 More actions · Answer 9

Thanks for the replies everyone. I've learned something new today. That alone makes it a good day. 🙂

Allen Nugent Right there with Babe Points: 734 More actions · Answer 10

I'm sure I once read, here, that WHERE IN tends to be expensive. Your test results challenge this assertion. I wonder if SQL Server 2008 is optimising the execution plan a lot more than it used to.

pvnidheesh Grasshopper Points: 22 More actions · Answer 11

When INNER JOIN returns duplicate rows then INTERSECT returns distinct values only. Thats why the difference in execution time occurs.

johan.lindell SSC Enthusiast Points: 162 More actions · Answer 12

Good comparison, David, thanks.

From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries

-- INTERSECT

SELECT CustomerID

FROM Sales.Customer

WHERE TerritoryID=10

AND NOT EXISTS (

SELECT CustomerID

FROM Sales.SalesOrderHeader

WHERE OrderDate>='2004-07-01'

AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID

)

-- EXCEPT

SELECT CustomerID

FROM Sales.Customer

WHERE TerritoryID=10

AND NOT EXISTS (

SELECT CustomerID

FROM Sales.SalesOrderHeader

WHERE OrderDate>='2004-07-01'

AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID

)

instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

/Johan

jim.sinclair Newbie Points: 9 More actions · Answer 13

I like using the EXCEPT clause because I find it makes the query easier to read. If you have any familiarity with Venn diagrams you can simply focus on understanding each of the two select statements independently and then apply set theory to understand the final result. Nice to know your experiment shows that performance is equivalent

noeld SSC Guru Points: 96590 More actions · Answer 14

johan.lindell (5/24/2010)
Good comparison, David, thanks.
From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries
-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.
/Johan

I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!

* Noel

Adam Haines SSC-Insane Points: 23217 More actions · Answer 15

noeld (5/24/2010)
johan.lindell (5/24/2010)
Good comparison, David, thanks.
From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries
-- INTERSECT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
-- EXCEPT
SELECT CustomerID
FROM Sales.Customer
WHERE TerritoryID=10
AND NOT EXISTS (
SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE OrderDate>='2004-07-01'
AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
)
instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.
/Johan
I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!

+1

My blog: http://jahaines.blogspot.com