January 9, 2008 at 12:57 pm
While adding to a table of unique (nonsignificant) IDs to use in place of social security numbers, I did the following, expecting to avoid adding any SSNs for a second time:
INSERT [HR_Joined].[dbo].[UID] (SSN)
SELECT DISTINCT
TIN.SSNO
FROM [dbo].[Joined All Employees]AS TIN
EXCEPT
SELECT TUID.SSN
FROM [HR_Joined].[dbo].[UID] AS TUID
WHERE TUID.TYPE = ''
Instead, I got everything added a second time! (FWIW, TIN.SSNO is varchar(9), while TUID.SSN is int, but I expected that the varchar(9) would be promoted to int.
Also, I meant the "DISTINCT" to apply just to the TIN.SSNO; should I have used parenthesis?
Any clue how I screwed up?
(Background - the actual UID -- not referenced here -- is based on the primary key ID which is an Identity specification. For other reasons, TUID.SSN allows duplicates.)
January 9, 2008 at 2:06 pm
I just double-checked - and the promote does work, no matter what the order is (meaning - int compared to varchar converts all values to INT, whether the varchar is in the first or the second query).
The issue lies elsewhere.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 9, 2008 at 2:15 pm
Thanks Matt. And do come back if you get any inspiration about where elsewhere might be.
January 9, 2008 at 5:58 pm
This is quoted from the BOL entry about EXCEPT
EXCEPT
Returns any distinct values from the query to the left of the EXCEPT operand that are not also returned from the right query.
First off you do not have to specify distinct, as the left side of the except will already be grabbing everthing that is distinct to the left and not present to the right query.
You "reinserted" the data because your left side of the query had 0 matches on the right, thus everything from the left query was inserted. Your where clause on the right query is messing you up.
If you want to insert all SSNO values that do not have matching SSN number, you should have your query look like this:
INSERT [HR_Joined].[dbo].[UID] (SSN)
SELECT
TIN.SSNO
FROM [dbo].[Joined All Employees] AS TIN
EXCEPT
SELECT
TUID.SSN
FROM [HR_Joined].[dbo].[UID] AS TUID
Otherwise, you just need to reverse.
-Adam
January 10, 2008 at 4:50 am
Wonderful Adam, thanks very much!
1. I went back to BOL, and sure enough the word "distinct" occurs in each of the first three lines of text. It is not that I hadn't checked the documentation, next time perhaps I'll read it.
2. It was indeed my where clause, and after you pointed it out (and I assumed you were mistaken) it finally dawned on me why this failed when the last time I tested this it worked fine.
The field "type" was a (not very well thought out) attempt to accommodate some future possibility that a UID would have to be deactivated and reassigned. Initially, I left it null and thought I could add some notation later to define why a UID had been taken out of service. But, because I allowed nulls in the "type" field, I was unable to make type part of the primary key. So I decided to waste the space and initialize it to an empty string (''). About then I must have been interrupted, because, while I changed the query to look for " ='' " instead of "is null", I never got to update the field and redefine my primary key.
I'll bet that was an example of (foolishly) leaving a task half done, locking my machine for the day, and returning the next morning to find that the IT Gods had rebooted my system from on high losing the record of my incomplete work in progress. If it is not on my screen, I have little chance of remembering to pick up where I left off. Someday I'll learn.
Thanks again!
January 10, 2008 at 7:40 am
Keep in mind that the EXCEPT operator will frequently be slower than other potential options. Here is a thread at MSDN that I recall from about a year ago:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1021998&SiteID=1
I have seen at least a couple of instances in which Uma has indicated that EXCEPT will be slower than other options.
January 10, 2008 at 7:53 am
Thanks Kent. Based on Adam's advice, I yanked the "Distinct" from my original query and let the "EXCEPT" take care of that. (With no ill effects.) I would probably have to add it back in to avoid duplicates if I used a WHERE EXISTS sub query, and quite possibly lose any time advantage.
It would probably be smart for me to time the EXCEPT v WHERE EXISTS and a sub query to see if it makes a difference. Are you aware any timing comparisons?
January 10, 2008 at 7:57 am
Yes, I think that specifically WHERE EXISTS will beat EXCEPT; but as you said, it is always best to run some timings and compare the results.
January 10, 2008 at 8:08 am
Except, Where not exists, and Not In all use the same execution plan. The difference here is going to be syntax.
Here is an blog about some performance tests: regardless which you choose you are going to be performing the same execution.
More Info:
http://blog.sqlauthority.com/2007/05/22/sql-server-2005-comparison-except-operator-vs-not-in/
my test queries yielded the same execution plan
USE AdventureWorks;
GO
SELECT ProductID
FROM Production.Product
EXCEPT
SELECT ProductID
FROM Production.WorkOrder ;
GO
USE AdventureWorks;
GO
SELECT ProductID
FROM Production.Product
WHERE ProductID
NOT IN (
SELECT ProductID
FROM Production.WorkOrder);
GO
USE AdventureWorks;
GO
SELECT ProductID
FROM Production.Product a
WHERE ProductID
NOT EXISTS (
SELECT ProductID
FROM Production.WorkOrder b
where a.ProductID = b.ProductID);
January 10, 2008 at 8:10 am
Very good, thank you Adam
January 10, 2008 at 8:13 am
Thanks for the good link, Adam.
I take from it that the bottom line, as in so much in life, is simply:
"It just doesn't matter."
(But another takeaway, which does matter, is don't bother timing that which can be revealed by a query plan.)
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply