January 9, 2008 at 3:20 pm
In reading articles on choosing a primary key, it is often argued that a primary key should be immutable, which is why identity columns and uniqueidentifer types are often used for them, as opposed to 'real world' constructs. What exactly is the rationale for this? It seems to apply to ORM, but I'm having trouble finding a clear explanation.
January 9, 2008 at 7:10 pm
A Primary key is the field used as the "anchor" for relations to other tables. In other words, it's the value that is put into a field in another table to create a relationship between the two tables.
If the primary key changes, then it has to be changed in the primary table AND all of the "child" tables (and all of the records holding that value in the child tables). If that key is cascaded on to yet other tables, they too would have to change.
In a sense - the PK is an exception, since it's the one piece of data in a given table that should be expected to be found in many places/duplicated (as foreign keys). Every other piece of data (if you're normalized) is held in one place, so changing it should NOT entail propagate that data to other locations.
One of our missions in life as DBA's/guardians of the sacred data/insert your favorite title or insult here is to maintain the relational integrity, so the PK changing in a given record tends to set up a very expensive/difficult process to propagate that change to all places.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 9, 2008 at 9:53 pm
I'm not sure a primary key is ummutable. If you had a taxID that changed (corporate sale, etc), then it could change, but it must remain unique.
January 11, 2008 at 10:02 am
The choice of a primary key should be based on the following attributes:
- uniqueness
- simplicity
- irreducibility
- stability (related to what you're calling immutability)
- familiarity
In some cases, the above desired attributes are in conflict with each other. For instance, an attribute set that is familiar may not be stable or unique (like a person's name). This is where design experience comes into play -- you must weigh the tradeoffs and make the decision accordingly.
If you do choose to use an arbitrarily generated value as your PK (e.g., identity or guid), be sure and declare your uniqueness constraints on the actual business keys within the table. Also, be prepared for the arbitrary identifier to "escape into the wild" and take on business meaning to the users.
HTH,
TroyK
January 11, 2008 at 10:24 am
Agreed - it can be a tradeoff like Troy mentioned. With one small caveat: uniqueness is not part of the tradeoff, meaning - it's the one attribute he mentioned which isn't "optional". Primary Keys by definition MUST be both unique and non-null (since it has to be an addressable/comparable key).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 11, 2008 at 11:48 am
Good point, Matt. The tradeoff involved in uniqueness is a function of accurately analyzing the universe of discourse. What is unique for the stored information may not be unique in a wider context.
In other words, we may choose a PK that is unique given the current understanding of requirements, but which may lose its uniqueness if the requirements change to expand the expected stored data.
As an example, let's say we are storing a table of States (as in United States), and choose the 2-letter postal codes as the unique identifier. If we later decide that the table should expand to represent propositions about North American regions, American armed forces "states", etc., we could lose the uniqueness attribute of the key.
TroyK
January 11, 2008 at 12:37 pm
For my recent designs, I've been using an identity column as the PK (which, if my understanding is correct, is called a surrogate key), and at least one candidate key, which can be modified by the end-user. This approach avoids having multi-column primary keys (except for junction tables, where I usually just use the primary keys of the parent tables), though it can get somewhat cumbersome since it requires queries have additional joins to retrieve the natural key.
As far as key values 'escaping into the wild', I am well aware of this because it has already happened to us That sounds like an argument for immutability, because if the system doesn't allow PK values to change, then you don't have to worry about the impact of those changes outside the DBMS, such as exporting to spreadsheets, XML Documents, etc.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply