August 12, 2009 at 12:56 pm
Kim Vermeij (7/20/2009)
Hey David,Nice article, I think you'll find the advantage with EXCEPT when you have to compare multiple columns with possible null values on both sides. When you compare null values with a JOIN or a WHERE statement you will get a "wrong" result, when using the EXCEPT statement you will get the "right" result.
see also:http://en.wikipedia.org/wiki/Null_(SQL)#Grouping_and_sorting
Kim
Wow! The talk page for that wikipedia article is rather interesting....
And excellent article, David! I just wish I'd had time to read it when it was initially published. Still, better late than never.
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
September 2, 2009 at 6:21 am
Very Good
September 11, 2009 at 9:59 am
I concur with the multiple columns logic and query performance improvement with EXCEPT.
I currently use Oracle's MINUS SET Operation (similar to EXCEPT) to compare all columns of ETL source files/tables to production tables to identify changes -very fast and simple to implement when dealing with 100s of sources (CSV,BCP files) via External Tables (similar to OPENROWSET(BULK,...).
I will eventually get around to building similar code in MSSQL 2008 t-sql and will update this thread with the performance results.
May 14, 2010 at 4:37 am
I found EXCEPT and INTERSECT very useful many times, although ,as is written in article, it makes the same like NOT IN (resp. IN). For me, they are much more elegant and straightforward in use.
Thx for article
May 20, 2010 at 11:35 pm
UNION, INTERSECT, and EXCEPT round out basic set operations at the domain level (with tables) instead of the tuple level (with joins). Domain level operations can make code in some instances simpler to maintain. I use them extensively in ETL operations and recovery procedures. I also use UNION ALL to keep duplicates.
May 21, 2010 at 4:40 am
Just to concur with some of the other posters, for ETL processes the Except function is very useful for comparing 2 tables with the same structure in different databases and picking up the differences.
The other method I sometimes use is a calculated checksum column. However this has the disadvantage that different input values are not guaranteed to produce different checksum values, and it also means adding an ETL layer into the production database which I want to avoid.
May 21, 2010 at 7:02 am
I can't help but ask about your final example using ALL. Perhaps I'm not reading it correctly (likely the case), but using the following two queries:
#1
SELECT *
FROM Sales.SalesOrderHeader
WHERE TotalDue > ALL(SELECT TotalDue FROM Sales.TopSales)
ORDER BY Sales.TotalDue DESC
#2
SELECT *
FROM Sales.SalesOrderHeader
WHERE TotalDue > (SELECT MAX(TotalDue) FROM Sales.TopSales)
ORDER BY Sales.TotalDue DESC
it seems as though you are not doing the same comparison. In #1 you are comparing Sales.SalesOrderHeader.TotalDue to every possible value that can be returned from Sales.TopSales. In query #2 you are only comparing against the MAX(TotalDue), which is going to be a single value. It seems to me that query #1 would return more results since you are getting every value that is greater than Sales.TopSales.TotalDue whereas query #2 is only going to return the values that are greater than the biggest Sales.TopSales.TotalDue value. These are not the same result sets.
Somebody please tell me I'm reading this wrong.
May 21, 2010 at 7:12 am
Nice article!
May 21, 2010 at 7:16 am
David - Thanks for taking the time to dig a bit. Same question came up last week at our office - confirmed my gut. Thanks to the other post about the NULL case - good point.
Sean
Regards,Sean
May 21, 2010 at 8:06 am
Jeff Moden (7/27/2009)
phickey (7/27/2009)
EXCEPT and INTERSECT are especially useful for comparing multiple columns of data where (a) there is no primary key or (b) having the same primary key does not guarantee that the associated data is the same.Typical scenario is regression testing:
(edited because somewhere between me and sqlservercentral.com is a stupid proxy that blocks EVERYTHING containing that word beginning with s and ending with elect...)
If this gives you the same number of rows as OldTable and NewTable, you know your new process is producing the same data as the old process. Use EXCEPT to examine the rows with differences. If you list the columns, you can narrow down the location of your differences by commenting out various columns.
The alternatives for this scenario are (a) joining OldTable to NewTable on every single column, or (b) dumping the data to text files and comparing them.
Cool trick but if you have two identical tables with identical data, you have much bigger problems. 😉
Jeff, you may want to re-read phickey's post. He's talking about regression testing -- making sure your new code doesn't break existing systems. Generally in regression testing, your problems start when you're new code isn't producing identical data to that from your old code.
May 21, 2010 at 8:25 am
Dennis O'Connor (7/21/2009)
My default browser has to be IE here at work, so when I selected this report it was opened in IE. As mentioned by others, the examples were very small blocks with no way to expand them.So I copied the URL into my favorite browser - Safari - and everything displayed correctly. I highly recommend Safari on Windows. It is a rare site that is so oriented to IE that I have to switch to IE.
You apparently don't do much with SSRS.
May 21, 2010 at 9:00 am
I agree with WayneS. I have found many occasions where the INTERSECT and EXEPT functions allow for more efficient comparisons between data sets. In the event that I want to compare entire records with multiple fields between two tables or views, the INTERSECT and EXEPT functions come in handy. The alternative would be to use a cursor and evaluate row by row while housing each field in a variable for the comparison, and having to store the differences in memory or a temp table until the cursor completed....
May 21, 2010 at 9:29 am
I would agree, the advantage of except is in comparing multiple columns. I use it routinely to check view results between development and production where there has been schema changes to make sure I have the view dialed in to result parity.
Pretty reliable as long as I am explicit about column (no wild cards), and accommodate an difference in null handling.
May 21, 2010 at 9:30 am
With all respect, the comparisons made for EXCEPT and INTERSECT in the article are kind of like driving a Ferrari 458 to your corner grocery, and from that concluding that it really doesn't perform any better than a Honda Civic. You aren't really using it to it's full potential 😉
INTERSECT and EXCEPT are intended to be used to compare sets; not as alternatives to joins. You don't really see the benefits until you use them for that purpose. They're great for finding changes or matches in wide reporting tables, for example.
May 21, 2010 at 9:57 am
I have a question about LEFT JOIN example near the beginning where he starts talking about EXCEPT where he wrote the following SQL.
SELECT C.CustomerID
FROM Sales.Customer AS C
LEFT JOIN Sales.SalesOrderHeader AS OH
ON C.CustomerID = OH.CustomerID
AND OrderDate>='2004-07-01'
WHERE OH.CustomerID IS NULL
AND C.TerritoryID=10
Can someone please explain the "AND OrderDate>='2004-07-01' " in the FROM clause? I've never seen that before.
Thanks.
Kris
Viewing 15 posts - 46 through 60 (of 86 total)
You must be logged in to reply to this topic. Login to reply