Intersect, Except, Union, All and Any

  • Nice article. Thanks for sharing.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • traughberk (5/21/2010)


    I have a question about LEFT JOIN example near the beginning where he starts talking about EXCEPT where he wrote the following SQL.

    SELECT C.CustomerID

    FROM Sales.Customer AS C

    LEFT JOIN Sales.SalesOrderHeader AS OH

    ON C.CustomerID = OH.CustomerID

    AND OrderDate>='2004-07-01'

    WHERE OH.CustomerID IS NULL

    AND C.TerritoryID=10

    Can someone please explain the "AND OrderDate>='2004-07-01' " in the FROM clause? I've never seen that before.

    Thanks.

    Kris

    Kris - the "AND OrderDate >= '2004-07-01'" is filtering under the JOIN function. When you JOIN two tables you define how the two relate to each other. In the example, the "C" table links to the "OH" table through the CustomerID field in each table being equal. You could state that they be different, in which case you link each record in table "C" to all other recrods in table "OH" where the CustomerID's are different. Why you would want to do this is beyond me. I am just illustrating some functionality. So the second parameter in the JOIN linking definition is that on top of the fact that the CustomerID's in table "C" and table "OH" have to be the same, the Order Date must also be greater than or equal to July 1st, 2004. An alternative is to drop the AND statement from the JOIN function, and apply it in the WHERE clause. One way may be more efficient than the other.

  • Never mind folks. I figured it out myself which is what I should have done in the first place. I guess I'm just lazy this morning. Sorry for the wasted post.

  • Thank you for a great article, Dave. I never really thought of using any of these commands as I normally would place data into temp tables to run JOIN queries that produce similar results, and I never fully understood their use until seeing it graphically in your article. I can now do a:

    SELECT no_match_id FROM [production].[dbo].[ytd_table]

    EXCEPT

    SELECT no_match_id FROM [production].[dbo].[current_month_table]

    To get a list of ID's in my main table that are no longer in the new month's data I load. There's more complexity to it than what I wrote here, but the EXCEPT command will achieve the same results for my table updates than the lengthy queries I'm using now.

    And I'm sure I can find many other uses for utilizing both INTERSECT and EXCEPT to replace and simplify my current methodologies in which I use JOINs to match on as many as 60+ columns.

    Thanks again for a great article!

    Danny Sheridan
    Comtekh, Inc.

  • I think a good use of INTERSECT and EXCEPT is when you need to compare many fields. It is much quicker than writing a join or where exists on 5 columns.

  • The reason for the AND OrderDate > '2004-07-01' in the JOIN Clause is so that ALL Customers get returned by the left join. If you moved it to the WHERE Clause then any customers who had not had an order after that date would not show up in the result set.

    --

    JimFive

  • Robert Dudley (5/21/2010)


    traughberk (5/21/2010)


    I have a question about LEFT JOIN example near the beginning where he starts talking about EXCEPT where he wrote the following SQL.

    SELECT C.CustomerID

    FROM Sales.Customer AS C

    LEFT JOIN Sales.SalesOrderHeader AS OH

    ON C.CustomerID = OH.CustomerID

    AND OrderDate>='2004-07-01'

    WHERE OH.CustomerID IS NULL

    AND C.TerritoryID=10

    Can someone please explain the "AND OrderDate>='2004-07-01' " in the FROM clause? I've never seen that before.

    Thanks.

    Kris

    Kris - the "AND OrderDate >= '2004-07-01'" is filtering under the JOIN function. When you JOIN two tables you define how the two relate to each other. In the example, the "C" table links to the "OH" table through the CustomerID field in each table being equal. You could state that they be different, in which case you link each record in table "C" to all other recrods in table "OH" where the CustomerID's are different. Why you would want to do this is beyond me. I am just illustrating some functionality. So the second parameter in the JOIN linking definition is that on top of the fact that the CustomerID's in table "C" and table "OH" have to be the same, the Order Date must also be greater than or equal to July 1st, 2004. An alternative is to drop the AND statement from the JOIN function, and apply it in the WHERE clause. One way may be more efficient than the other.

    Thanks for the reply. I played around with it for a bit and it seems that it does make a difference where the date filter is placed. If it is placed in the FROM clause you are basically joining table C to a subset of table OH. If you put it in the WHERE clause, you are joining table C to all of table OH. It's similar to the example he wrote following that using the IN statement. (Obviously its similar. That's why he wrote it.)

  • The other, and maybe the main, reason for putting conditionals into the join is to make the query more efficient. If you blind join a table then all matching rows get included in intermediate results (within the server) only to be filtered out later on during the processing of the WHERE clause. I never understood that until I read some very excellent articles on how JOINS work right here on SQLServerCentral.com We were working on a query that just sucked and often timed out. We joined millions of transactions to thousands of customers only to filer that down to hundreds of results. Moving the conditional into the join then joins only those few hundred transactions to the customer set. One heck of performance boost. Check them out.

    ATBCharles Kincaid

  • Thanks for the replies everyone. I've learned something new today. That alone makes it a good day. 🙂

  • I'm sure I once read, here, that WHERE IN tends to be expensive. Your test results challenge this assertion. I wonder if SQL Server 2008 is optimising the execution plan a lot more than it used to.

  • When INNER JOIN returns duplicate rows then INTERSECT returns distinct values only. Thats why the difference in execution time occurs.

  • Good comparison, David, thanks.

    From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries

    -- INTERSECT

    SELECT CustomerID

    FROM Sales.Customer

    WHERE TerritoryID=10

    AND NOT EXISTS (

    SELECT CustomerID

    FROM Sales.SalesOrderHeader

    WHERE OrderDate>='2004-07-01'

    AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID

    )

    -- EXCEPT

    SELECT CustomerID

    FROM Sales.Customer

    WHERE TerritoryID=10

    AND NOT EXISTS (

    SELECT CustomerID

    FROM Sales.SalesOrderHeader

    WHERE OrderDate>='2004-07-01'

    AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID

    )

    instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

    /Johan

  • I like using the EXCEPT clause because I find it makes the query easier to read. If you have any familiarity with Venn diagrams you can simply focus on understanding each of the two select statements independently and then apply set theory to understand the final result. Nice to know your experiment shows that performance is equivalent

  • johan.lindell (5/24/2010)


    Good comparison, David, thanks.

    From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries

    -- INTERSECT

    SELECT CustomerID

    FROM Sales.Customer

    WHERE TerritoryID=10

    AND NOT EXISTS (

    SELECT CustomerID

    FROM Sales.SalesOrderHeader

    WHERE OrderDate>='2004-07-01'

    AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID

    )

    -- EXCEPT

    SELECT CustomerID

    FROM Sales.Customer

    WHERE TerritoryID=10

    AND NOT EXISTS (

    SELECT CustomerID

    FROM Sales.SalesOrderHeader

    WHERE OrderDate>='2004-07-01'

    AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID

    )

    instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

    /Johan

    I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!


    * Noel

  • noeld (5/24/2010)


    johan.lindell (5/24/2010)


    Good comparison, David, thanks.

    From a performance perspective, using NOT IN is "always" slower than using EXISTS. If you try running the below queries

    -- INTERSECT

    SELECT CustomerID

    FROM Sales.Customer

    WHERE TerritoryID=10

    AND NOT EXISTS (

    SELECT CustomerID

    FROM Sales.SalesOrderHeader

    WHERE OrderDate>='2004-07-01'

    AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID

    )

    -- EXCEPT

    SELECT CustomerID

    FROM Sales.Customer

    WHERE TerritoryID=10

    AND NOT EXISTS (

    SELECT CustomerID

    FROM Sales.SalesOrderHeader

    WHERE OrderDate>='2004-07-01'

    AND Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID

    )

    instead of the NOT IN, you'll get a reduction of query time by 75 % (on my computer). The bigger the tables and the "result" in the NOT IN-query is the bigger the difference. I always try to avoid NOT IN in a case like this.

    /Johan

    I disagree. NOT IN *could* give you the exact same execution plan given the right indexes and stats. The optimizer in many cases knows best and chooses one plan or another depending on the data!

    +1

Viewing 15 posts - 61 through 75 (of 86 total)

You must be logged in to reply to this topic. Login to reply