Identity field usage

  • Ok... I know it's one of the hottest subject, but, even with all the posts on the subject, I didn't find an answer for this question:

    why couldn't I use an identity column in this situation, what would be the downside?

    I have 2 tables (TBL_TABLE and TBL_TYPE_TABLE ) :

    TBL_TABLE : id_table int PK, ..., id_type_table int FK 

    TBL_TYPE_TABLE : id_type_table FK identity, description varchar(20) unique constraint

    TBL_TYPE_TABLE contains a few rows and data is inserted by users on the client side. They can't delete, only insert and update.

    According to some of you, identity property should never be used. I this situation, should I use the description field as the primary key for this table, since it is the natural primary key? In that case I would have a bigger TBL_TABLE because the int would be replaced by a varchar(20) plus every time this field is updated, I would have to update the field in TBL_TABLE...which would put an extra load on the server.

    We all the posts I read about the subject, I can't find an argument that will go against the use of identity... could someone please clear that up for me?

    Thanks,

    Erik

  • It's like trying to tell whether it's better to be jewish or christian. Just go with what you believe is the best and experience will show you the rest .

  • Or trying to tell if it's better to be a separatist or not... (I do hope someone understands this one

    I would probably use identity anyway ... but I like to understand the opinion of others on a subject. And I understood *almost* everything on the cons of using identity (mainly Celko's opinion!), but I couldn't find a good explanation on the cons of using identity on a type table.

    Thanks,

    Erik

  • Comming from Quebec... I sure do .

  • This question illicits the same type of furor as Windows vs. Linux, Wintel vs. Apple, and 'Tastes Great' vs. 'Less Filling'

    Personally, I don't see a down side to using a surrogate key in your example.  Do what you think is right.

    I assume from your post, you have been reading forums or blogs extolling the virtue of natural keys and the evil of surrogate keys.  I have seen a lot of that.  I respect everyone's right to an opinion but I think most of the 'natural key' advocates react with intuition or emotion rather than logic.  The loudest of them usually bring in some arcane or abstract idea of the need for the primary key to have relevance (or poetic beauty) to the record.  I think they miss the point.

    The following is my opinion and should be taken for what it's worth.  In my defense, these ideas are the result of study, research, and practical experience.  I hope they stand on their own, but it may help to know that I have Masters in CIS, teach database design at the college level, and have read many books and articles on the subject.  I have had discussions with students, teachers and practitioners.  I have used natural keys in many applications; some without problems, many that had to be redesigned.

    Again, this is my opinion and to paraphrase a famous quote "...I ardently defend the right of anyone to disagree.."

    "You will eventually run out of unique values in surrogate keys."  Same can be said for natural keys, and probably a lot sooner.

    "Identity is a property, not true data." As soon as you save a value to a field, it is data.  The fact that the system assigned it rather than a user is irrelevant. Which leads to the question of why you would want users creating the primary key values.

    "A surrogate key based on the identity property takes us back to the bad old days when the key pointed to the physical location of the record.  This violates Dr. Codd's Rule #8."  This one is my favorite.  The value has no relationship to the physical location of the record. To understand this ask yourself what happens when the database is shrunk or restored to a new location, especially after data has been deleted or archived. An argument could be made that a clustered index voids this statement.  However, in the case of a clustered index, the key will always relate to the location of the record regardless of whether the key is surrogate or natural. "Smith" will be stored after "Rogers."

    "A surrogate key has no meaning to the user" The purpose of a primary key is to uniquely identify the record, not describe it. It is part of the plumbing, not the artwork.  The user should never change it, why do they ever have to see it?

    The primary key must be unique.  Will every record have a peice of unique data? In a normalized table, yes it will; however, it is entirely possible (even likely) that the only unique natural key would be a composite key encompassing every field in the record.  That's efficient!! 

    (Example a customer table with fields for name, address, phone number.  You can't use the name fields, too many John Smiths. Can't use address fields or phone number, these will change over time. What data is available that won't violate the rules of normalization?)

    Some would say that there is always a valid candidate key based on data in the record.  For years, this has been the Social Security Number for individuals.  This fails when you hire a non-US Citizen that doesn't have a SSN.  What then, make up a 9 digit number?  Who comes up with the number? After a couple hundred of these how do you find an unused number? And more importantly - How is this different from using a system generated number (i.e. Identity)? These made-up numbers have no natural relationship to the employee. One response is "You could assign a unique EmployeeID number to every employee."  Unless this number is based on some attribute of the person such as name, height or Date Of Birth, it is a surrogate key anyway.

    The primary key must be unique.  Again with the SSN, there are cases of duplicate SSNs.  Ask anyone who has worked on a database at the statewide level.  My own experience was at the county level.

    There is a reason we use surrogate identifiers such as UPC, ISBN, and NDC.  The idea that the primary key should be natural rather than surrogate presents a problem here.  A can of beans at the grocery store will have the same UPC whether the label reads "Beans" or "Frijoles." The UPC is simply used to look up a record in the store's database.  It is not a cryptographic or hashed value describing the contents of the can. 

    The primary key must be immutable (i.e. never change).  How many values never change in the real world? Especially in these days of international business and corporate mergers.  Like SSN, you might select the Federal ID number of a corporate client.  Works great until it merges or splits.

    One more, overriding consideration.  The only purpose of a primary key is to uniquely identify records in a relation/table.  This includes their use in foreign key relationships. Every RDBMS I know of processes numeric values faster than string values. Why not use the most efficient primary key?  Which will the engine find faster, 123456789 or "Chair"+"Office"+"Steel"+"Cloth"+"Blue"?  And if you reply "we will use the model number or SKU of the chair" you have just selected a surrogate key generated by a system other than the RDBMS.

     

  • Nice resume there... Would like to see Joe Celko answer this one .

  • Thanks for the answer... it's good to see a nice developped opinion on the pros of identity!

    Like Remi, I would like Celko's opinion on this one!

    Erik

  • Actually, it's see his opinion again... but it'd be nice to see his answers on those points.

  • And to keep adding wood to fire ALL DW implementation (That I have dealt with) use surrogate Identity Keys, can you guess why?

     


    * Noel

  • Let's hear it .

  • Having read the recent posts, I am very curious about Joe Celko.  I assume he is a strong supporter of natural keys? 

    Does anyone have a link to something he wrote regarding the selection of a primary key?

     

  • http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=184349#bm184550

    I think he uses Copy & Paste A LOT

     


    * Noel

  • Why waste time when a good rant is already out...

    So what about your opinion Noeld?

  • In datawarehousing is the ONLY way to simulate slowly changing dimentions because of the REAL independence that a primary key needs from the Data. What a Natural Primary key may be in one scenario may not be in a different one therefore a surrogate like the Identities one helps tremendously in implementing one.

    I have worked in systems where the natural primary key expands several (10+) fields and the surrogate key was used effectively to speed up joins. I can see very usefulthings coming out of the identity columns for those of us that have to deliver in production systems where performance is needed.

    Just my $0.02

     


    * Noel

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply