Give Up on Natural Primary Keys

  • nova wrote:

    Jeff Moden wrote:

    I have a question for all you folks claiming that every table must have a natural key.  What would you use for a natural key for either an Employee table or a Customer table?

    For a customer table, probably a login name or an account number. Obviously the answer depends on the nature of the business domain (which is what a "business key" AKA natural key is all about).

    The login name can change.  It's not an immutable item.  And, I disagree... a "Business Key" is not the same as a "Natural Key".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ZZartin wrote:

    Jeff Moden wrote:

    I have a question for all you folks claiming that every table must have a natural key.  What would you use for a natural key for either an Employee table or a Customer table?

    Really for both of those I would probably use a key I've created as the new natural key.  For an employee that might be employee ID.

    The argument about natural keys vs surrogate keys only really becomes a debate when you are taking in data that has a logical primary key itself and you have to decide whether you want to maintain that or create your own.

    I guess that "depends" on what the EmployeeID contains.  To me, if it's just a number with no other meaning, then it's not a "natural" key.  It's a "surrogate" key and it contains absolutely no way to identify who the employee actually is without looking it up in a table... as it should be.  It should also NOT contain things like the date of hire and a sequence number.  By itself, it should be totally meaningless.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    The login name can change.  It's not an immutable item.  And, I disagree... a "Business Key" is not the same as a "Natural Key".

    Agreed that login name isn't immutable but keys don't have to be immutable. It is usually desirable for a key to be as stable as possible but immutability is not always essential and in many cases not likely to be achievable. Login names should be unique however and if you make them non-nullable and put a uniqueness constraint on them then certainly they are keys in the relational sense (I'm assuming a login name is automatically "irreducible").

    Returning to the topic of this thread, GDPR makes no distinction as to what type of identifier is used, whether it can change or whether it is "natural" or not.

  • I like the distinction between business key and natural key.  Usually I have used the term business key.  Sometimes the business key has to be an identity.  Employee number seems an obvious example.  It's usually the next number on the list.

  • nova wrote:

    Jeff Moden wrote:

    The login name can change.  It's not an immutable item.  And, I disagree... a "Business Key" is not the same as a "Natural Key".

    Agreed that login name isn't immutable but keys don't have to be immutable. It is usually desirable for a key to be as stable as possible but immutability is not always essential and in many cases not likely to be achievable. Login names should be unique however and if you make them non-nullable and put a uniqueness constraint on them then certainly they are keys in the relational sense (I'm assuming a login name is automatically "irreducible").

    Returning to the topic of this thread, GDPR makes no distinction as to what type of identifier is used, whether it can change or whether it is "natural" or not.

    I have to say that I totally disagree with the idea that the Primary Key of any table can be mutable.  If it is and it changes, then everything that is made to reference it must also be found and changed and that falls way outside the definition of what a Primary key is or should be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Agree that the primary key cannot be subject to change.  Any change must be due to some extraordinary circumstances and not a matter of routine or reasonable possibility.  There are too many that always use a GUID or identity even when there is another viable alternative.  But that doesn't mean they are always bad.  Well, GUIDs are always bad if the database is self-contained, but that's another story.

  • Jeff Moden wrote:

    nova wrote:

    Jeff Moden wrote:

    The login name can change.  It's not an immutable item.  And, I disagree... a "Business Key" is not the same as a "Natural Key".

    Agreed that login name isn't immutable but keys don't have to be immutable. It is usually desirable for a key to be as stable as possible but immutability is not always essential and in many cases not likely to be achievable. Login names should be unique however and if you make them non-nullable and put a uniqueness constraint on them then certainly they are keys in the relational sense (I'm assuming a login name is automatically "irreducible").

    Returning to the topic of this thread, GDPR makes no distinction as to what type of identifier is used, whether it can change or whether it is "natural" or not.

    I have to say that I totally disagree with the idea that the Primary Key of any table can be mutable.  If it is and it changes, then everything that is made to reference it must also be found and changed and that falls way outside the definition of what a Primary key is or should be.

    And if you ever have to go through a wholesale change of your primary keys at some point, you will remember it for years to come.  That can be a real nightmare to go through, especially when your keys are referenced on other (sometimes disconnected) systems.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden wrote:

    ZZartin wrote:

    Jeff Moden wrote:

    I have a question for all you folks claiming that every table must have a natural key.  What would you use for a natural key for either an Employee table or a Customer table?

    Really for both of those I would probably use a key I've created as the new natural key.  For an employee that might be employee ID.

    The argument about natural keys vs surrogate keys only really becomes a debate when you are taking in data that has a logical primary key itself and you have to decide whether you want to maintain that or create your own.

    I guess that "depends" on what the EmployeeID contains.  To me, if it's just a number with no other meaning, then it's not a "natural" key.  It's a "surrogate" key and it contains absolutely no way to identify who the employee actually is without looking it up in a table... as it should be.  It should also NOT contain things like the date of hire and a sequence number.  By itself, it should be totally meaningless.

    Almost all keys we use require some form of context to be meaningful.  For example both employees and customers might link back to a person but really at that point their only natural key would be what their dna sequence?  Even something like an SSN was originally just an arbitrarily assigned number and also might not be applicable in all cases.

  • Jeff Moden wrote:

    I have to say that I totally disagree with the idea that the Primary Key of any table can be mutable.

    My point was about keys in general, not "primary" keys specifically. A key must be unique and irreducible but demonstrably need not be immutable. In the real world identifiers sometimes change and so their values in databases must change too, however inconvenient that may be to software developers.

  • nova wrote:

    Jeff Moden wrote:

    I have to say that I totally disagree with the idea that the Primary Key of any table can be mutable.

    My point was about keys in general, not "primary" keys specifically. A key must be unique and irreducible but demonstrably need not be immutable. In the real world identifiers sometimes change and so their values in databases must change too, however inconvenient that may be to software developers.

    As a developer I vehemently disagree. You need a primary key to be mechanically useful. That means it must be immutable and have other desired characteristics, like uniqueness and small size. Thus surrogate keys comprised of int identities (or bigint if necessary) are both convenient and essential. Natural keys are by their nature seldom immutable or unique, for all they're supposed to be.

     

  • roger.plowman wrote:

    As a developer I vehemently disagree. You need a primary key to be mechanically useful. That means it must be immutable and have other desired characteristics, like uniqueness and small size. Thus surrogate keys comprised of int identities (or bigint if necessary) are both convenient and essential. Natural keys are by their nature seldom immutable or unique, for all they're supposed to be.

    But are you saying you would allow duplicate login names for your web site? I suspect your customers might have some problems with that. For obvious practical (and legal) reasons identifiers that work in the real world are essential for effective use of a system. The way we ensure our identifiers are reliable in a database is by putting uniqueness constraints on stable, but potentially mutable values such as login names and account numbers. I can't imagine responsible developers, say at a bank or anywhere else for that matter, allowing account numbers to be duplicated just for their own convenience. In the present regulatory environment, GDPR and its ilk, that kind of laxity could probably end with company execs taking the rap in court.

  • No.  No duplicate login names or account numbers and, in many cases, no reuse of such things either because that would actually entail a dupe even if all but one were marked as "disabled".  As you say, there can be some rather large and scary legal ramifications to allowing such a thing to happen.  But avoiding duplicates is just one of the many requirements of a PK.  Uniqueness is not enough.  They must be immutable (at the very least).

    A lot of people end up changing things like login names because of things like marriage, etc, or just personal whim when the realize (on sites like this one) that their login name is used as a part of the "handle" that is displayed next to their posts.  The uniqueness of these "avatar" names are still enforced but they can be changed (not immutable).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As I said previously however, I am talking about keys in general, not just "primary" keys. A table can have several keys enforced as irreducibly unique and non-nullable but as a matter of convention only one key per table is deemed to be "the primary one". Keys are fundamentally important but whether you designate one of them to be primary or not is of no great significance because in fundamental and practical terms one key does not have to have different features to any another. Foreign keys for example can reference any key, not just "primary" ones. GDPR of course makes no distinction between a "primary key" identifier and other possible identifiers.

    Immutability is illusory in my experience. Unless you have godlike authority over database systems it's impossible to predict whether someone in a year or ten years time will update a key value. It happens. I have updated surrogate key values sometimes for good practical reasons as I'm sure many people have. Stability, i.e. very stable relative to everything else, is usually a desirable characteristic for any key, not just a primary one. With good design it's possible to make keys that will be very, very stable, but "immutability" is not much more than wishful thinking.

  • That's all good but you made it sound like you were suggesting that keys (in general) were allowed to be mutable.  There are some rare cases where AKs (Alternate Keys) can be mutable but, generally not and never a PK.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You already said, and I agree, that login names can change. Similarly customer account numbers, so those are two relatively common cases of keys that may need to be updated. There are plenty of other examples around. As evidenced by some of the comments in this thread, the expectation that other keys might need to change is one of the major reasons why people use surrogate keys at all. If keys only rarely needed updating then there would be fewer reasons to add a surrogate key to a table.

    never a PK

    If I update a surrogate primary key does it stop being a PK? How do I make it a PK again? I'm not convinced that it matters in any practical sense.

    • This reply was modified 5 years, 3 months ago by  nova.

Viewing 15 posts - 61 through 75 (of 108 total)

You must be logged in to reply to this topic. Login to reply