Intersect, Except, Union, All and Any

  • Kim Vermeij (7/20/2009)


    Hey David,

    Nice article, I think you'll find the advantage with EXCEPT when you have to compare multiple columns with possible null values on both sides. When you compare null values with a JOIN or a WHERE statement you will get a "wrong" result, when using the EXCEPT statement you will get the "right" result.

    see also:http://en.wikipedia.org/wiki/Null_(SQL)#Grouping_and_sorting

    Kim

    Wow! The talk page for that wikipedia article is rather interesting....

    And excellent article, David! I just wish I'd had time to read it when it was initially published. Still, better late than never.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Very Good

  • I concur with the multiple columns logic and query performance improvement with EXCEPT.

    I currently use Oracle's MINUS SET Operation (similar to EXCEPT) to compare all columns of ETL source files/tables to production tables to identify changes -very fast and simple to implement when dealing with 100s of sources (CSV,BCP files) via External Tables (similar to OPENROWSET(BULK,...).

    I will eventually get around to building similar code in MSSQL 2008 t-sql and will update this thread with the performance results.

  • I found EXCEPT and INTERSECT very useful many times, although ,as is written in article, it makes the same like NOT IN (resp. IN). For me, they are much more elegant and straightforward in use.

    Thx for article

  • UNION, INTERSECT, and EXCEPT round out basic set operations at the domain level (with tables) instead of the tuple level (with joins). Domain level operations can make code in some instances simpler to maintain. I use them extensively in ETL operations and recovery procedures. I also use UNION ALL to keep duplicates.

  • Just to concur with some of the other posters, for ETL processes the Except function is very useful for comparing 2 tables with the same structure in different databases and picking up the differences.

    The other method I sometimes use is a calculated checksum column. However this has the disadvantage that different input values are not guaranteed to produce different checksum values, and it also means adding an ETL layer into the production database which I want to avoid.

  • I can't help but ask about your final example using ALL. Perhaps I'm not reading it correctly (likely the case), but using the following two queries:

    #1

    SELECT *

    FROM Sales.SalesOrderHeader

    WHERE TotalDue > ALL(SELECT TotalDue FROM Sales.TopSales)

    ORDER BY Sales.TotalDue DESC

    #2

    SELECT *

    FROM Sales.SalesOrderHeader

    WHERE TotalDue > (SELECT MAX(TotalDue) FROM Sales.TopSales)

    ORDER BY Sales.TotalDue DESC

    it seems as though you are not doing the same comparison. In #1 you are comparing Sales.SalesOrderHeader.TotalDue to every possible value that can be returned from Sales.TopSales. In query #2 you are only comparing against the MAX(TotalDue), which is going to be a single value. It seems to me that query #1 would return more results since you are getting every value that is greater than Sales.TopSales.TotalDue whereas query #2 is only going to return the values that are greater than the biggest Sales.TopSales.TotalDue value. These are not the same result sets.

    Somebody please tell me I'm reading this wrong.

  • Nice article!

  • David - Thanks for taking the time to dig a bit. Same question came up last week at our office - confirmed my gut. Thanks to the other post about the NULL case - good point.

    Sean

    Regards,Sean

  • Jeff Moden (7/27/2009)


    phickey (7/27/2009)


    EXCEPT and INTERSECT are especially useful for comparing multiple columns of data where (a) there is no primary key or (b) having the same primary key does not guarantee that the associated data is the same.

    Typical scenario is regression testing:

    (edited because somewhere between me and sqlservercentral.com is a stupid proxy that blocks EVERYTHING containing that word beginning with s and ending with elect...)

    If this gives you the same number of rows as OldTable and NewTable, you know your new process is producing the same data as the old process. Use EXCEPT to examine the rows with differences. If you list the columns, you can narrow down the location of your differences by commenting out various columns.

    The alternatives for this scenario are (a) joining OldTable to NewTable on every single column, or (b) dumping the data to text files and comparing them.

    Cool trick but if you have two identical tables with identical data, you have much bigger problems. 😉

    Jeff, you may want to re-read phickey's post. He's talking about regression testing -- making sure your new code doesn't break existing systems. Generally in regression testing, your problems start when you're new code isn't producing identical data to that from your old code.

  • Dennis O'Connor (7/21/2009)


    My default browser has to be IE here at work, so when I selected this report it was opened in IE. As mentioned by others, the examples were very small blocks with no way to expand them.

    So I copied the URL into my favorite browser - Safari - and everything displayed correctly. I highly recommend Safari on Windows. It is a rare site that is so oriented to IE that I have to switch to IE.

    You apparently don't do much with SSRS.

  • I agree with WayneS. I have found many occasions where the INTERSECT and EXEPT functions allow for more efficient comparisons between data sets. In the event that I want to compare entire records with multiple fields between two tables or views, the INTERSECT and EXEPT functions come in handy. The alternative would be to use a cursor and evaluate row by row while housing each field in a variable for the comparison, and having to store the differences in memory or a temp table until the cursor completed....

  • I would agree, the advantage of except is in comparing multiple columns. I use it routinely to check view results between development and production where there has been schema changes to make sure I have the view dialed in to result parity.

    Pretty reliable as long as I am explicit about column (no wild cards), and accommodate an difference in null handling.

  • With all respect, the comparisons made for EXCEPT and INTERSECT in the article are kind of like driving a Ferrari 458 to your corner grocery, and from that concluding that it really doesn't perform any better than a Honda Civic. You aren't really using it to it's full potential 😉

    INTERSECT and EXCEPT are intended to be used to compare sets; not as alternatives to joins. You don't really see the benefits until you use them for that purpose. They're great for finding changes or matches in wide reporting tables, for example.

  • I have a question about LEFT JOIN example near the beginning where he starts talking about EXCEPT where he wrote the following SQL.

    SELECT C.CustomerID

    FROM Sales.Customer AS C

    LEFT JOIN Sales.SalesOrderHeader AS OH

    ON C.CustomerID = OH.CustomerID

    AND OrderDate>='2004-07-01'

    WHERE OH.CustomerID IS NULL

    AND C.TerritoryID=10

    Can someone please explain the "AND OrderDate>='2004-07-01' " in the FROM clause? I've never seen that before.

    Thanks.

    Kris

Viewing 15 posts - 46 through 60 (of 86 total)

You must be logged in to reply to this topic. Login to reply