Identity field as a key

  • General design question here:

    Say you have Table A, B, & C

    Table Author

    -------------

    AuthorID (identity Field, key)

    Table Book

    -----------

    BookID (identity Field, key)

    AuthorID (foreign key)

    Edition

    -------

    EditionID (Identity Field, key)

    BookID  (foreign key)

    An author has one or more books.  A book has one or more editions.

    I would like hear what some think regarding design philosophies.

    Question:

    Would you make the keys to table Book both BookID and AuthorID?  Doing so has no added value for ensuring a unique key since BookID is an identity field. 

    If not, then BookID is unique enough to represent each row.  That brings me to table Edition.  Edition would not need AuthorID to follow through as a foreign key.  For an edition, I would always have to join with Book to get the authorID.  But, it seems proper heirarchy to have keys flow down through the relationships.

    See this alternate design below.  In this case, the keys flow through each relationship in the heirarchy. 

    Table Author

    -------------

    AuthorID (identity Field, key)

    Table Book

    -----------

    BookID (identity Field, key)

    AuthorID (key)

    Edition

    -------

    EditionID (Identity Field, key)

    BookID  (key)

    AuthorID (key)

    Appreciate some thoughts here.

    thanks

  • Joe Celko.  I already know what you are going to say (about identities), so please don't bother.  I would like to hear from others as I rarely care for your input.  It's just too much self justification than informative information.  You're too busy telling everyone what they don't know rather than what your think and why.  Sorry, just the truth.

     

  • I follow a similar structure throughout my database designs (Identity fields as primary keys) for the following reasons:

    1) It allows me the assumption that a primary key is always an integer or big integer so any find feature within the front end can always search for a key number (very fast). 

    2) It also allows me the ability to create views that turn every primary key field into a standard name so I can hide the key value from the lists but still store the value with each row.

    3) Data bound controls all work very well with a single primary key value and not very well with multiple field values.

    With all of that said, your structures above would follow what I normally do, however are the authors always the same between editions?  I'm thinking of text books or reference books.  I'm not familiar enough with the specifics of authors / books / editions to answer the question.  Also, what I actually see you needing is a one to many relationship between a book and authors, in the case of a collaboration of multiple authors.  So you would have an AuthoredBy table showing BookID and AuthorID.  Now, this table could have an identity field AuthoredByID for consistency sake, but this would possibly be the only type of table where I might use mulitple fields for my primary key. 

    I hope this provides some value and it is only my opinion and preferred way to design.

     

    If the phone doesn't ring...It's me.

  • Charles, thanks for the input.

    I apologize.  i should have mentioned the above is a hypothetical example.  It's not real but just set up to represent my question at hand.  I just used the author-book relationships because MS does so in the samples in the BOLs.

    However, regarding your comments.  Yes, I agree it is easier for interface controls.

  • In my experience using an identity field as your primary key is only a good idea if your entity doesn't already have an attribute that's unique.

    Using your example above, I would suggest that using an ISBN as the primary key in the table Books is better for ensuring data integrity by ensuring that the same book cannot be listed twice.  By using an identity column as your key, you're effectively opening the possibility of having many books with the same ISBN.

    This will obviously mean reduced performance, but at least you can be confident of your data.  Data integrity should come before performance considerations.

    Regarding your earlier point about having a composite key in the Books table, if you do that then you open the possibility of having the same book being recorded as written by different authors.  This may or may not violate your business rules. 

     

     

  • Use Int Identities where you can. Make them a primary key, and build a proper relational database with enforced foreign keys and then you wont have data integrity issues.

    If your entity has an attribute that is unique: create a Natural Key.

    Using ISBN as the key wouldn't be a good idea if its repeated multiple times through a database (storage cost reasons). Also, SQL Server is able to sort through int columns extremely quickly to find the values it needs, whereas it will take longer for SQL Server to do sorting of ISBN columns. So a string would be more costly and less effecient.

    As far as putting AuthorID into Edition, its redundant: UNLESS your BookID Foreign Key allows NULLS. If nulls are not allowed, then there would always be a BookID: then can always find editions by an author by joining book.

    Try to think of an even bigger chain. If you were to follow that you should have all ID's in a heirarchy and it was a book store, would you store a customers order for a book with a link back to the order, customer, address, store, salesperson, manager, author, edition, publisher, distributor, etc in the OrderItem table? Not likely. You would just store the book (and possibly the edition) in the OrderItem, and the rest you would join through other entities when you needed.

     


    Julian Kuiters
    juliankuiters.id.au

  • So, you apparently know Joe Celko's opinion on this topic.

    Well, personally I think there is much truth in it. IMO, even better, is a phrase by Ron Soukup, who was formerly the SQL Server project manager: “Identity primary keys are for people who believe there’s never time to design a table right but there’s always time to do it over.”

    I redesigned my DB's to get rid of the IDENTITY keys. And guess what? The longer you do this, the easier it is to avoid them.

    But you know this is an almost religious discussion. Browse any online community and you'll find long and heated discussions about pro's and con's. Personally I think in this regard bottomline is: Do what you think is best in your situation. If you can live with IDENTITY's, use them. But note, that there *is* a life without them

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Jon,

    I'm working in a company that publishes and distributes books and I can assure you that although it would seem so, ISBN is not always unique. There are rare cases when the publisher assigns the same ISBN to two different titles by mistake, and it happens quite often that a new edition of the book gets a new ISBN. And sometimes you need to enter the title into DB before any ISBN is assigned to it - so it wouldn't be advisable to use ISBN as a primary key. Duplicity check on this column, which will alert you immediately when you try to input ISBN that already exists in the DB, is a must - but that's about all you can require from the database.

    Anyway, the original question used books for illustration only... IMHO it is better to use single identity field as a primary key in a standard production database. On the other hand, in a reporting table that is more or less static (e.g. table of weekly sales sums, into which new records are inserted once in a week by program, and users can only view it) I prefer to use multiple columns as a primary key and I never had any problems with it. And I've seen a few databases that used other than identity fields as a PK and worked fine, so this is probably more or less a matter of personal preference.

    Regarding the keys flowing down through relationships, I don't like that idea... it complicates everything, because then you can have different values at different levels - entered either by mistake, or on purpose. Of course, you can make the column inaccessible to user at lower level, and fill it with a trigger or by program etc. - but all the same, if there are two different columns, you can be almost sure that somehow, sometimes, someone will manage to create confusion :-)). If it can go wrong, it will go wrong...

    Just my opinion though - HTH, Vladan

  • Frank,

    It's not that I don't value Joe's points.  The delivery of his points is packaged with a lot of vanity.  If you tear away the wrapping, yes, there are good points made.  But, since I already know his thoughts on this, I would rather see other opinion.

    The problem sometimes seen in data modeling with using data is that you rely on the client's word that the data is unique.  Look a Vladan's example of ISBN.  It's not supposed to happen, but it does.  Same with SS# and other "thought to be unique" data items.  And, yes, I agree doing the proper due-dilligence as a data modeler is crucial and may address this type of situation.

    I have always tended to use IDENTITY because I can guarantee uniqueness.  I can model just as easy the other way, mind you.  So, it's not a question of habit than it is whether there are any advantages.  And, that is really what I am trying to assess.

    I develop software and my primary goal is to keep it simple.  Using mutilple keys does make the UI more complex because, as stated by Charles, controls work best with a single key.

  • I have always tended to use IDENTITY because I can guarantee uniqueness

    I'm sure you meant this implicitely, but, being maybe picky, the IDENTITY property alone does *not* guarantee uniqueness.

    You additionally need to add a PRIMARY KEY constraint or a UNIQUE constraint to enforce uniqueness.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Well, Frank, I've heard of sitting on the fence but it looks like you've jumped it and half way across the field.

    Frank is right in that this being a 'religious discussion' as people will defend their corner with fanatical determination.

    If a table has an identity column then surely it must be the key, what purpose would it serve otherwise. If there is a logical key for a table then use it. The only exception I would make is that if the key is widely used over several tables then an identity column would save space (eg would you want a 24 char key populated throughout your database !!  )

    I sit on the fence. I use what works and enforce integrity at database level or business level depending how best to handle it.

    I only put keys on other tables for foreign key constraint and navigate relationships to retrieve data, the only exception (there are always exceptions) being if I have high volumn queries between two tables to reduce the number of reads.

    As for the design, since an Author can write many books and a Book can have many Authors (many-to-many) then this would necessitate a 'link' table to join them. If an Edition belongs to one Book only then putting the BookID on the Edition record seems OK it all depends on the type and uniqueness of the relationship.

    The type and nature of keys is as dependant on types and speed of retrieval as it is on the fluidic and volatility of the data.

    Business rules OK.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • quotethe IDENTITY property alone does *not* guarantee uniqueness.

    ** Only if you fiddle while Rome burns ** 

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Well, Frank, I've heard of sitting on the fence but it looks like you've jumped it and half way across the field.

    Can you put this in simple words for me, please?

    Hey, I just found again an interesting quote from Steve Kass in one of these discussions. Here it goes:

    How about putting a UNIQUE NOT NULL CLUSTERED constraint on the identity column and putting the PRIMARY KEY NONCLUSTERED constraint on the multi-column primary key?

    That might confuse the anti-identity fanatics enough so they'll stop

    complaining.  You will have a natural primary key, so they won't think

    the world is coming to an end, but you will go on as you always have,

    using the identity column for its convenience in queries, FK

    constraints, etc. 

     

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Hehe.. I have to write that down somewhere

  • Joe,

    yes I know how rare. Three years ago we had about 50 titles (out of 25.000) that either had a duplicity in EAN or didn't have EAN at all. Since EAN and ISBN are closely related, the same goes for ISBN. At the moment there are only 10 of them still "active" and no new cases during the last year. I'm not counting duplicities of music titles, that we are selling as well, and that sometimes tend to have the same barcode for both CD nad MC version; I only counted books.

    The situation is getting better, and we already succeded with returning the book to publisher because of EAN/ISBN duplicity - they agreed to place stickers with a correct info on the books - but I still wouldn't feel confident enough to use ISBN or EAN as a primary key. Another point is, if such column is a primary key, how do you enter a book into the database before you know the ISBN (or EAN)? We need that very often.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply