Overusing Identities

Question

Overusing Identities

Steve Jones - SSC Editor

SSC Guru

Points: 741310
More actions
February 27, 2010 at 11:39 am

#156722

Comments posted to this topic are about the item Overusing Identities

Viewing 15 posts - 1 through 15 (of 71 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 1

I use identities for the same reason that people are issued SSN's... It makes life real simple as a unique identifier in the world of large batch programming. As I prepare to install handrails for the upcoming ride :-P, I'll also tell you that if a natural key exists, I'll still usually use an IDENTITY column as at least a Non-null Unique Key and make the natural key the PK unless, maybe, if it's a totally static table.

I'll usually put the clustered index on an IDENTITY column especially for large tables that are also highly transactional just to keep page splits at the lowest level possible. Sure, a date column would probably suffice but there isn't always a date column and, when there is, data is not always inserted in the same order as those dates. To me, it's worth the 4 bytes per row to keep from having nearly 8k bytes suddenly open up because a single row insert caused a page split. The extra 4 byte column is usually worth it to me in the savings it causes during maintenance.

I also tend to use IDENTITY columns on everything I import. I always import to staging tables and never to the final table so that if someone does send "dupes", I can easily find them and tell the vendor of their problem. Yes, there are several ways to avoid the dupes but I always want to capture them for one reason or another and the IDENTITY column provides uniqueness when there may otherwise be none. The use of IDENTITY columns also makes it a snap to archive data because I can usually archive rows on one logical end of the table while the users continue to use the more recent other "end" of the table without mutual interference.

The bottom line is that as with everything else, "It Depends". I can see arguments for using IDENTITIES for all tables, no tables, or just some tables. I happen to be one of those that favors the use of IDENTITY columns.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

george sibbald SSC Guru Points: 104210 More actions · Answer 2

Just to note that using a GUID as a primary key might not be a good idea if the index is created as a clustered index due to the random values given to GUIDs, and their length at 16 bytes.

NEWSEQUENTIALID() can be used in conjunction with GUIDs if they are to be the clustered index.

---------------------------------------------------------------------

SQLRNNR SSC Guru Points: 281344 More actions · Answer 3

Jeff Moden (2/27/2010)
I use identities for the same reason that people are issued SSN's... It makes life real simple as a unique identifier in the world of large batch programming. ...
I'll usually put the clustered index on an IDENTITY column especially for large tables that are also highly transactional just to keep page splits at the lowest level possible. ...
I also tend to use IDENTITY columns on everything I import. ...
The bottom line is that as with everything else, "It Depends". I can see arguments for using IDENTITIES for all tables, no tables, or just some tables. I happen to be one of those that favors the use of IDENTITY columns.

I favor the use of identity columns for much the same reasons. I don't like clustered indexes on GUID columns, and prefer to use a better natural key or an identity column. Jeff pretty much summed it up for me though.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Steve Cullen SSCertifiable Points: 5598 More actions · Answer 4

We create identity columns for just about every table. It tends to make things much easier. One such area this makes things easier in is transactional replication.

One area we don't do this with is in staging tables for ETL import. I tend to go back and forth on the whole identity idea there because there are occasionally dups to deal with, as Jeff has pointed out. It's never been a major issue though.

Everywhere else we use identity, whether we have a natural key or not.

Converting oxygen into carbon dioxide, since 1955.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

I don't have a problem with using identities. What I do have a problem with is using identities as the only unique identifier of the entire table.

When the primary key is an identity (or a GUID) and there is no other unique column or set of columns in the table, then it's easy to get into the situation where the only thing that differs between two rows is a meaningless artificial key. That easily leads to unamusing data integrity issues at a later date.

It's important to identify the candidate keys during modelling. Maybe one gets selected as the primary, maybe an artificial key gets used instead, doesn't really matter, but then the candidate keys should get unique constraints defined on them so that the meaningless artificial key is not the sole measure of uniqueness in the table.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Michael Lysons SSCertifiable Points: 6497 More actions · Answer 6

I've used Identity columns a lot in my data warehouse work, and I love the simplicity they bring. I've also used them extensively when working on transactional systems, and again found that the simplicity made for easier modelling.

However, more recently I created a new system and decided to go for entirely natural keys. For no other reason than I fancied a change to what it was like, and so I ended up with natural, composite keys that brought a certain elegance to the design.

I'd say it was more work with the natural keys though. And if I was building the same system again, I'm not sure which way I'd go.

bill.sugden Old Hand Points: 349 More actions · Answer 7

When I was a newbie to DB development (from a programmeer background) I liberally spread IDENTITY columns as PK. Simpler for me I thought and easier for the other developers.

Now I'm older and wiser. As Gail says, this can bite you if you are not careful. Now I try very hard to identify natural keys, and if they are there, even if composite keys are required, they are at least given a unique index. but the convenience of using IDENTITY for internal work and easier joining is very great, and I would still recommend their usage.

By the way, I hated Aaron's example of an order detail table. Of course it won't work with a simple Order_ID + Product_ID as PK. I would hope that any order detail table would have a unique ID of Order_ID + LineNo. And this is one case where an identity column IS redundant.

Bill

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 8

I'm with Gail on this. What you choose as a primary key is not important. Candidate keys are the important issue. Just be sure you are enforcing the candidate keys required for the integrity of the data. Don't fall into the trap of thinking that "primary key" means a table has only one key or that one key is more important than others.

Filipe SSC-Addicted Points: 499 More actions · Answer 9

Actually I must disagree and say that Identity columns are nice and convinient but can be very bad as well. Why would someone define an identity column on a table that already has another unique key, specially if that column is an int or bigint?

Recently I had living proof that is not a good practice.

I have one database that was developed with the maxim that identities are never enough, and this one table had a natural key (unique, int) that was theonly field used in the application, and had the identity, that was the clustered PK.

I started noticing dead-locks on that table, and the dead-lock was between updates on the PK and the unique key. Since all access to the table was via the UK, when deleting from that table the order of the locks was the reverse of the selects, and it resulted in dead-locks. Getting rid of the PK (and the identity all together) got rid of the problem as well.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

Filipe (3/1/2010)
Actually I must disagree and say that Identity columns are nice and convinient but can be very bad as well. Why would someone define an identity column on a table that already has another unique key, specially if that column is an int or bigint?
Recently I had living proof that is not a good practice.
I have one database that was developed with the maxim that identities are never enough, and this one table had a natural key (unique, int) that was theonly field used in the application, and had the identity, that was the clustered PK.
I started noticing dead-locks on that table, and the dead-lock was between updates on the PK and the unique key. Since all access to the table was via the UK, when deleting from that table the order of the locks was the reverse of the selects, and it resulted in dead-locks. Getting rid of the PK (and the identity all together) got rid of the problem as well.

Did you move the PK to the remaining column? Same question for the Clustered Index. Also, did this just happen to be a sequence table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WI-DBA SSCrazy Points: 2110 More actions · Answer 11

I agree with Gail. I think the overuse of identities is a problem. Too often I come across data integrity issues caused by the theory of "identity on every table" rather than utilizing the keys from the parent table as part of the child table. Even if the parent table PK is an identity, the child table can benefit (and subsequent grandchildren) from carrying the parent PK. Integrity improves and simplifies queries - and we all know developers need all the help they can get.

Cheers
http://twitter.com/widba
http://widba.blogspot.com/

Filipe SSC-Addicted Points: 499 More actions · Answer 12

Yes, moved PK and cluster to natural key column, and no, this is not a sequence table.

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 13

I'm going to go with Gail & David, there's nothing all that wrong with using Identities as your PK as long as you've also defined the natural key through a unique constraint on the table. Otherwise, it's a recipe for data disaster.

I actually lean towards a design model these days that uses the Identity on all the key tables, and then uses compound keys on all the child and relationship tables. Those tables don't get an identity. I've found, after lots of testing, that most of the data access at our company is pretty hiearchical and building the tables in this way provides for the clustered index being the best access path to the data.

We've also been growing quite a few systems with code that writes code (ORM flavors, nHibernate, CRM, SharePoint,etc.) which frequently leads to GUIDs. The sequential ones work very well. They're a bit fatter than your basic integer, so you suffer in the number of pages used, but other than that, the sequential GUIDs work very well. The random GUIDs on the other hand... they're a royal pain.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

jboc SSC Rookie Points: 46 More actions · Answer 14

I tend to focus on High Performance databases where performance is critical. Aaron Bertrand suggests skipping a PK even while having a cluster in his ActivityLog example. From a performance aspect not having a PK strikes this writer as foolish. I tend to like the "ID INT IDENTITY(1,1) PRIMARY KEY," style since it takes half the index space as making the Datetime a PK. It tends to be a trade off, I like to use more storage space for faster performance, some people will trade performance for a little cost savings.

Best wishes,

Barry O’Connell