Overusing Identities

  • Steve said:

    But is overusing of identity a big problem? I tend to see most people building databases as beginning to intermediate data modelers. I'd probably include myself in that group as well, and rather than have them get confused about what a good natural key is, or building a complicated complex key, I think the default practice of just sticking a surrogate identity column on the table as the PK is a good idea. It's simple, and in the absence of knowing better, or having a reason not to do it, I think it's a good practice.

    Are we talking about a profession or a hobby?

    Steve, I'm a bit surprised at this statement from you, since you're not just another procedural developer lost in a declarative world. You've been writing this excellent newsletter for years, why are your modeling skills "beginning to intermediate"?

    Are we talking about a profession or a hobby? For some reason, DDL and DML are the two places that hobby-level skills are tolerated (even encouraged) in information systems profession.

    If a person is confused about what the natural key is (or about when there is no useful natural key, such as with individuals [assuming we can't sequence their DNA and rule out identical twins], why are they using a relational database in the first place? Put it in a file for pity's sake.

    Have you ever maintained a "database" full of meaningless pointers? The most terrifying words are "it's just a simple little database". That means "I'm ignoring the true nature of the data and trying to build a network of pointers than has no inherent meaning". Easy to write, impossible to use long term.

    Now, I write lousy Java code. I know it, and that's why I don't do a lot of it, and when I do I ask a good coder to critique it. Java folks are pretty serious fundamentalists - if you don't do it right, they rake you over the coals.

    We don't have to be nasty about it. But yes, we can insist that people learn thier profession and understand how to determine the natural keys (or determine there is no "useful" natural key or another reason a surrogate makes senses) before letting them out of the sandbox.

    Roger Reid

    Roger L Reid

  • Steve,

    Your comment about "when in doubt . . . " struck a nerve on me. I will agree with the "when in doubt, add the INT Identity colummn"; however, if you add a GUID as your identity column and make it a PK, then you have, essentially, made a random number your PK. Think that one through for a minute. A PK is, by default in the SQL Server world, a clustered index. Do you really want to use a random number to sort your table? Doesn't that sort of guarantee that the next entry that is made to the table is going to start causing fragmentation and/or resorting of the table?

    IMHO, if you must use a GUID for your Identity column (and I will concede that there are times when that is the proper choice 😉 ) then you should make it a Unique Key/Index instead of a PK.

    Ralph D. Wilson II
    Development DBA

    "Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
    A. Lincoln

  • I favor using identities in *almost* every table, even those that have one or more natural unique keys. For one thing it makes logging much simpler, a single quad of log tables (one for each CRUD type) with a table ID, a RID (my term for a 4 byte identity), a date/time and some other fluff, and you're good to go. A single quad of triggers to write too. Gotta love the simplicity...

    They also have the advantage of compactness, serialism, and thus speed. As clustered primary keys they're pretty hard to beat.

    At the same time you should always index your unique natural keys, you're going to sort by them if nothing else. RIDs can't be the end-all of indexing by themselves.

    The only time I don't put a RID in a table is if the table is a many-to-many table consisting of a pair of RIDs. Although even here if logging is critical for auditing purposes this table too will get its own RID PK.

    I don't think it's a matter of using *too many* identities, it's a matter of not using them *properly*. Unless your table is over 2 billion rows (or likely to exceed that limit over time) identities are the clear winners from any number of perspectives.

  • GilaMonster (3/1/2010)


    I don't have a problem with using identities. What I do have a problem with is using identities as the only unique identifier of the entire table.

    When the primary key is an identity (or a GUID) and there is no other unique column or set of columns in the table, then it's easy to get into the situation where the only thing that differs between two rows is a meaningless artificial key. That easily leads to unamusing data integrity issues at a later date.

    It's important to identify the candidate keys during modelling. Maybe one gets selected as the primary, maybe an artificial key gets used instead, doesn't really matter, but then the candidate keys should get unique constraints defined on them so that the meaningless artificial key is not the sole measure of uniqueness in the table.

    Quite right. I, on the other hand, have a serious problem with using a composite key and not using identities. Case in point (battle I lost), a customer database that uses a composite key of Customer ID and Customer Set ID (to match our enterprise database). Fine, I said - UK this composite, but put an identity on it. Nope. So - 20 bytes on each and every table that has a link into the master customer table (just about everything). And 200+ developer hours wasted dealing with the composite everywhere, in UDFs, stored procedures, UIs, and most recently in an XML based app. More coding work, more chance for developer error, more debugging, more of everything that wouldn't have been necessary by adding about 4Mb to the database (~0.7%).

  • RalphWilson (3/1/2010)


    IMHO, if you must use a GUID for your Identity column (and I will concede that there are times when that is the proper choice 😉 ) then you should make it a Unique Key/Index instead of a PK.

    There's nothing wrong with a GUID primary key, as long as it is PRIMARY KEY NONCLUSTERED

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Roger L Reid (3/1/2010)


    Steve, I'm a bit surprised at this statement from you, since you're not just another procedural developer lost in a declarative world. You've been writing this excellent newsletter for years, why are your modeling skills "beginning to intermediate"?

    /quote]

    I've designed databases, and worked in the database field a long time, but that hasn't been my primary focus. I'd say that I likely have 3-4 years of modeling experience, repeated quite a few times rather than decades of modeling. I've investigated it, and would consider myself an intermediate modeler, but by no means an expert.

    Asking someone to know their profession is fine, as is looking to improve skills, but I don't know that I think someone using identities means they don't understand what a natural key is. I've seen many systems where there is no natural key. In fact, because business can be fluid, defining one would have resulted on a regular redesign and rebuilding of both schema and keys, which IMHO, often causes more problems than it's worth in code.

  • RalphWilson (3/1/2010)


    ... however, if you add a GUID as your identity column and make it a PK, then you have, essentially, made a random number your PK.

    Good point. I was thinking more of NEWSEQUENTIALID() as the seed value if you use this as the clustered index. I typically don't use a clustered index for the PK, but that is the default, and it is worth pointing out.

    If in doubt, and you add a GUID, make it a sequential GUID, or move the clustered index.

  • [quote-0There's nothing wrong with a GUID primary key, as long as it is PRIMARY KEY NONCLUSTERED[/quote-0]

    True . . . unfortunately, far too often, those who do not realize the true nature of GUID's do not think through the decision to use one as a PK and, as a result, they also do not make the necessary effort to make sure that the PK is non-clustered.

    As far as I am concerned, a non-clustered PK is, in reality, a non-clustered Unique Key.

    Another, implicit point, is that GUID's should only be used if absolutely necessary. They are actually fairly long strings that eat space quickly. 😉

    Ralph D. Wilson II
    Development DBA

    "Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
    A. Lincoln

  • Using identity columns as a PK key and putting unique constraints on the candidate keys is the way to go. In many cases, there is no way to know if a natural key will have to be updated in the future. With an identity PK the value will only have to be updated in one row in one table. With a natural PK, it will have to be updated every place it is referenced. I recently had to deal with an HR system that used employee number as the primary key. When that company was purchased by another company and they had to change the employee number they were getting about 700 deadlocks per day when they were trying to update the employee number in 25 tables.

    Also, you may find in the future that your natural key is not really unique. "Yes, that 10 digit product SKU really is unique. At any point in time. Did we fail to mention we reuse them after 3 years, so your history table has to deal with that."

    Really, I look at it this way: I have had a number of occasions to regret that a natural key was used, but have never regretted using an identity primary key.

  • RalphWilson (3/1/2010)


    As far as I am concerned, a non-clustered PK is, in reality, a non-clustered Unique Key.

    There are some fundamental differences between a primary key and a unique constraint, not the least being to do with replication.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Re PRIMARY KEY CLUSTERED vs NONCLUSTERED:

    IMO there is a reason why primary keys default to CLUSTERED: a well thought-through primary key ought to organize rows in the order in which they are processed most often. That speeds up SELECTs.

  • Quite right. I, on the other hand, have a serious problem with using a composite key and not using identities. Case in point (battle I lost), a customer database that uses a composite key of Customer ID and Customer Set ID (to match our enterprise database). Fine, I said - UK this composite, but put an identity on it. Nope. So - 20 bytes on each and every table that has a link into the master customer table (just about everything). And 200+ developer hours wasted dealing with the composite everywhere, in UDFs, stored procedures, UIs, and most recently in an XML based app. More coding work, more chance for developer error, more debugging, more of everything that wouldn't have been necessary by adding about 4Mb to the database (~0.7%).

    Unfortunately, that's because of an incomplete implementation of SQL vs. the relational model. SQL demands an ad-hoc restatement of the relationship vs. an implicit enforcement of the join already defined. Other products in the past have addressed this implicitly (DataEase 'DQL' was one), but unfortunately this is as yet unaddressed in SQL without using other tools to facilitate the DML creation.

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Steve,

    I'm shocked! :w00t: The only purist rationalization for a surrogate PK is when no 'natural key' exists. Otherwise, it's just a throwback to mainframe days when files used data keys to reference (point) records, and at the least, a corruption/minimalization of a relational model. So much for unordered relations :crazy:

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • 1) No GUIDs as PKs for me please, sequential or otherwise

    2) I am with another poster on OrderDetails: OrderID + Sequential Line Number as PK...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The toughest and most confusing databases I have worked on are those that do not use Identity, and instead relate to tables based on natural keys. Not only does this typically make joining in tables more work and harder to create an index for, its even worse when these systems are not documented so you don't know what combination makes up the primary key.

Viewing 15 posts - 16 through 30 (of 70 total)

You must be logged in to reply to this topic. Login to reply