Index on IDENTITY column

Question

Index on IDENTITY column

karthik M

One Orange Chip

Points: 29139
More actions
May 25, 2010 at 5:50 am

#221907

All,
I have seen lot of tables with IDENTITY column created with PRIMARY KEY. I am sure we shouldn't create Clustered index/primary key on an IDENTITY column. This is not a natural key. It will be considered as a fake/surrogate key. So we need to create a clustered index on some real column(s)/data.
Am i correct?
Suggestions are welcome!
karthik

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 399377 More actions · Answer 1

It really depends on the needs of the system and, which school of thought you subscribe to. I'm not a fan of using natural keys as the primary key of the table. I do use them to create a unique constraint or alternate key, but in general I like to use artificial keys. The reason being, frequently, data in natural keys changes. This creates hefty problems within the database to enforce referential constraints and even larger problems for down stream systems such as a data warehouse or datamarts. Artificial keys don't seem to cause nearly as many issues.

But again, there are those that swear by using natural keys and only natural keys. Joe Celko, who writes wonderful books, is one such adherent. Track down articles by him for more information on natural keys.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

karthik M One Orange Chip Points: 29139 More actions · Answer 2

Track down articles by him for more information on natural keys.

can you give me the URL?

karthik

Grant Fritchey SSC Guru Points: 399377 More actions · Answer 3

karthikeyan-444867 (5/25/2010)
Track down articles by him for more information on natural keys.
can you give me the URL?

Come on Karthik. You've got the name, Joe Celko, and you have the topic, natural keys. I'm positive you can run Google all on your own.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Steve Jones - SSC Editor SSC Guru Points: 741102 More actions · Answer 4

Here you go: http://lmgtfy.com/?q=Joe+celko+natural+keys

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

karthikeyan-444867 (5/25/2010)
I am sure we shouldn't create Clustered index/primary key on an IDENTITY column.

There's a big, massive, huge difference between not using an int identity for a primary key and not using an int identity for a clustered index. Totally different reasons, totally different principals, totally different object types.

See Joe Celko's stuff about int identity primary keys.

As for int, identity, clustered index - http://www.sqlservercentral.com/articles/Indexing/68563/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

tonyrogerson-498793 SSC Enthusiast Points: 189 More actions · Answer 6

I have seen lot of tables with IDENTITY column created with PRIMARY KEY. I am sure we shouldn't create Clustered index/primary key on an IDENTITY column. This is not a natural key. It will be considered as a fake/surrogate key. So we need to create a clustered index on some real column(s)/data.

Be wary on celko's definition of surrogate keys of which is what a lot of folk use the IDENTITY to supply a machine generated number for the column - he differs from both Codd and Date who are widely regarded as the guys who are the definitive source on the Relational Model.

I've started defining surrogates, how to use them etc.. in a blog (reminds me I need to do the other parts I've said I'll do) -> http://sqlblogcasts.com/blogs/tonyrogerson/archive/2010/03/13/database-design-surrogate-keys-part-1-of-many-rules-for-surrogate-keys-e-f-codd-and-c-j-date-research-and-problems-they-solve.aspx

Basically to answer your question; think of a table having one or many "candidate" keys, you simply choose one as the "primary", that one is usually the one you use when accessing the table - restriction and joins.

Hope that helps.

Tony

tosscrosby SSCoach Points: 18044 More actions · Answer 7

Welcome Tony! Sure took a while for you to finally become a newbie (or is it a new user name?). Still got your techy bits going? I'm ashamed to say I have been on the website lately but good to see you again.

-- You can't be late until you show up.

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 8

karthikeyan-444867 (5/25/2010)
I am sure we shouldn't create Clustered index/primary key on an IDENTITY column.

First, what actually makes you say that and why are you "sure"?

This is not a natural key.

I have to agree with Gail. What does a natural key have to do with a Clustered Index? Nothing really. In fact, having a Clustered Index on a natural key may be the worst thing you can do insofar as fragmentation, etc.

This is not a natural key.

Heh... again, on that same note and despite the heat we're both liable to take for saying so, I have to agree with Grant. I usually use an IDENTITY column as a PK even if a good, strong, not-likely-to-change candidate key is available. Most candidate keys aren't as strong as most people think. People change names, they change social security numbers, and (IIRC) a province in Canada changed it's name a little more than a decade ago. Since I was in the telephone business when that happened, life would have been a wee bit difficult if I had relied on the province abbreviation as the PK.

One of the major advantages of using a Clustered Index on an IDENTITY column is it does prevent page splits on heavy OLTP tables or tables that take a lot of batch inserts.

Yes, I may also use a unique constraint/index (Alternate Key) in addition to a PK especially when humans are making entries one way or another into a table.

To summarize, for all the reasons stated and more, I'll usually (there have been a few exceptions but they were a long time ago and can't remember what they were) use an IDENTITY column with a Clustered Index as the PK of a table and I wonder why you are so sure that shouldn't be done.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

tonyrogerson-498793 SSC Enthusiast Points: 189 More actions · Answer 9

I fleet about a bit - so little time; spare time is going on research around my MSc in BI which is seeing me reading way too much around the Relational Model 🙂

Tony.

tonyrogerson-498793 SSC Enthusiast Points: 189 More actions · Answer 10

Putting implementation issues aside, its good practice to use surrogate keys because there is then one copy of the real data - you aren't duplicating it (as the foreign key columns) into other tables.

Natural keys do change and when they change they can cause all manner of concurrency problems.

Whether to cluster or not really depends what you are doing.

If creating a fact table for instance you wouldn't cluster on the surrogate (the column with the IDENTITY property), you'd cluster on whichever made the most sense in terms of access patterns.

The same is true for OLTP, you need to draw a balance between access patterns and reduced performance because of IO, for instance I have a transactions table that is 30GB and has a significant number of rows in it, I have a surrogate key that I get its value from the IDENTITY property but I cluster on the account number, which so happens to be int but that is imaterial - its to do with the access pattern, the majority of queries sum or access by account so clustering on account number makes perfect sense.

Tony.

karthik M One Orange Chip Points: 29139 More actions · Answer 11

what will happen If I delete some rows from the table. wouldn't cause IDENTITY GAP issue? wouldn't waste the datapage space?

Also, let us consider this scenario. Assume the table capture performance return for one thousand asset for every month.

Table has data as below

ID - IDENTITY PRIMARY KEY

ASSETID - REAL DATA

ID AssetID PerfDate return

1 1 31/JAN/2000 2.34567 --> performance return

2 2

3 3

4 4

5 5

6 6

7 7

8 8

.

1000 1000 31/JAN/2000 4.5675

Now...

1001 1 28/feb/2000 2.3968

1002 2

.

2000 1000 28/feb/2000 6.6754

Again...

2001 1 31/mar/2000 5.7867

what will happen if i write a query like below

select AssetID,PerfDate,Return

from <table>

where AssetID = 1

AFAIK, optimizer has to scan all the page to find the respective assetid. Am I correct?

If I create unique clustered index on AssetID column along with PerfDate, i think the optimizer will use 'Clustered Index Seek' instead 'Clustered index scan'. I tried the same. Can you visit the below thread?

http://www.sqlservercentral.com/Forums/Topic927393-338-1.aspx?Update=1

karthik

karthik M One Orange Chip Points: 29139 More actions · Answer 12

Grant Fritchey (5/25/2010)
karthikeyan-444867 (5/25/2010)
Track down articles by him for more information on natural keys.
can you give me the URL?
Come on Karthik. You've got the name, Joe Celko, and you have the topic, natural keys. I'm positive you can run Google all on your own.

oops...I thought you are referring some pdf format books to download and read...

karthik

karthik M One Orange Chip Points: 29139 More actions · Answer 13

GilaMonster (5/25/2010)
karthikeyan-444867 (5/25/2010)
I am sure we shouldn't create Clustered index/primary key on an IDENTITY column.
There's a big, massive, huge difference between not using an int identity for a primary key and not using an int identity for a clustered index. Totally different reasons, totally different principals, totally different object types.
See Joe Celko's stuff about int identity primary keys.
As for int, identity, clustered index - http://www.sqlservercentral.com/articles/Indexing/68563/

int identity for a primary key is nothing but unique clustered index. I am confused...

not using an int identity for a clustered index.

???

karthik

karthik M One Orange Chip Points: 29139 More actions · Answer 14

tonyrogerson-498793 (5/25/2010)
I have seen lot of tables with IDENTITY column created with PRIMARY KEY. I am sure we shouldn't create Clustered index/primary key on an IDENTITY column. This is not a natural key. It will be considered as a fake/surrogate key. So we need to create a clustered index on some real column(s)/data.
Be wary on celko's definition of surrogate keys of which is what a lot of folk use the IDENTITY to supply a machine generated number for the column - he differs from both Codd and Date who are widely regarded as the guys who are the definitive source on the Relational Model.
I've started defining surrogates, how to use them etc.. in a blog (reminds me I need to do the other parts I've said I'll do) -> http://sqlblogcasts.com/blogs/tonyrogerson/archive/2010/03/13/database-design-surrogate-keys-part-1-of-many-rules-for-surrogate-keys-e-f-codd-and-c-j-date-research-and-problems-they-solve.aspx
Basically to answer your question; think of a table having one or many "candidate" keys, you simply choose one as the "primary", that one is usually the one you use when accessing the table - restriction and joins.
Hope that helps.
Tony

Tony, Let me read your blog and get back here...

karthik