difference between primary key and IDENTITY

Question

difference between primary key and IDENTITY

Brett Burch

Ten Centuries

Points: 1125
More actions
February 26, 2004 at 9:26 am

#82229

Can somebody give me an explanation (or point me to a link which explains) of the difference between IDENTITY and PRIMARY KEYs? I am fairly new to SQL Server, and have set my id columns as PRIMARY KEY and IDENTITY (1,1) YES. I am hoping to implement transactional replication eventually and am not sure this is the best option.
Thanks,
Brett

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 741356 More actions · Answer 1

A PK is a unique identifier for a row of data. An identity column will auto fill based on some increment each new row. Identities are not necessarily unique or contiguous. You can make the identity column your PK, but it doesnot have to be.

Brett Burch Ten Centuries Points: 1125 More actions · Answer 2

ok, so a Primary Key could be something like 'bmw' or 'ford' in a 'mfr_name' column in a table called 'vehicle_mfr' as long as the manufacturer was only in the table once. I suppose my question is then, why wouldn't there be a 'mfr_id' int column, starting at 1 and auto-incrementing, which was set as the primary key. I suppose then for Primary Keys, a default formula rather than a default value should be specified. Is this on the right track? Also, would there be any reason to have an IDENTITY column in a table that has a separate Primary Key?

Many thanks.

noeld SSC Guru Points: 96590 More actions · Answer 3

Ok a primary key(pk) should be INVARIABLE with time. Sometimes it happens that what you thought it could be a good pk candidate is not anymore and you may ended redesigning a lot but if you use a surrogate key (something to identify rows on your table ) you are almost sure to have something stable.

I know there is a lot of debate between to use surrogates or not to use them but that's another topic

About the IDENTITY column where there is a pk is you said that the pk constraint is ENFORCED then there is no apparent need for it but some times it helps to accomplish certain business needs(queries) with ease

* Noel

Frank Kalis SSC Guru Points: 111183 More actions · Answer 4

Yes, a PK shouldn't be volatile. A PK is one of the fundamental concept of relational databases. It is used to uniquely identify a row in a table. The IDENTITY property you can define on a numeric column offers simply a mechanism for providing a unique number. I would look at BOL for explanation of both. And if you have some money to spent, maybe buy a book on designing relational databases. For beginners I think Database Design for Mere Mortals (Second Edition) von Michael J. Hernandez, Addison-Wesley; ISBN: 0-201-75284-0 is quite worth reading because it explains things in a non academic fashion. Another option is to ask questions here. It's a real time saver.

HTH

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Frank Kalis SSC Guru Points: 111183 More actions · Answer 5

ok, so a Primary Key could be something like 'bmw' or 'ford' in a 'mfr_name' column in a table called 'vehicle_mfr' as long as the manufacturer was only in the table once. I suppose my question is then, why wouldn't there be a 'mfr_id' int column, starting at 1 and auto-incrementing, which was set as the primary key. I suppose then for Primary Keys, a default formula rather than a default value should be specified. Is this on the right track?

There is no law that state that you must have a numeric primary key. However, numbers are easy to sort and search, therefore offer performance advantages over strings. If you have a natural PK, of course you can use this. The concept of IDENTITY sometimes seem to lead to rather religious discussions. Difficult to judge who is right and who is wrong. I guess everyone has to find out for himself which way to go.

Also, would there be any reason to have an IDENTITY column in a table that has a separate Primary Key?

No, I would say.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

noeld SSC Guru Points: 96590 More actions · Answer 6

Some systems I have "inherited" had multiple Colums primary keys (4-5) and I was task to speed some of the code! One of the shortest solutions to get it going was to ADD an identity column and a index on it !!!

You see there is not a necesity on the model but in practice sometimes it helps.

This is why some times you have a perfect 3NF design and you DENORMALIZE to get speed!!

* Noel

Frank Kalis SSC Guru Points: 111183 More actions · Answer 7

Brian Kelley has a nice article on normalization http://www.sqlservercentral.com/columnists/bkelley/normalization.asp

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Brett Burch Ten Centuries Points: 1125 More actions · Answer 8

Thanks guys, this will certainly help me!

I think I agree with Frank on the numeric PK performance enhancements. I think it just seems most obvious to visualize the rows in a numeric order, and then sort them in that way. I have seen tables before that do have other unique values, but numeric seems easiest to me.

noeld, if I am ever in your situation I will certainly remember your solution!

Frank, thanks for the article on normalization.

Frank Kalis SSC Guru Points: 111183 More actions · Answer 9

I think I agree with Frank on the numeric PK performance enhancements. I think it just seems most obvious to visualize the rows in a numeric order, and then sort them in that way. I have seen tables before that do have other unique values, but numeric seems easiest to me.

Purists and hardliners might now argue, you've successfully turned a RDBMS into some kind of something else. I know I will get crucified for quoting Joe Celko again, but here it goes. Take it purely as some entertaining evening fun from me.

>> I'm trying to change an existing field [sic] to an

IDENTITY field [sic], but I'm getting syntax errors. It's got to be

simple, but somewhere I'm missing something. <<

What you are missing is the basic concepts of the relational model.

Columns are not fields; rows are not records; tables are not files. An

IDENTITY property cannot be a key by definition. A key is a subset of

attributes that uniquely define an entity in your data model.

The IDENTITY column is a holdover from the early programming language

which were very close to the hardware. For example, the fields in a

COBOL or FORTRAN program were assumed to be physically located in main

storage in the order they were declared in the program.

The early SQLs were based on existing file systems. The data was kept

in physically contiguous disk pages, in physically contiguous rows, made

up of physically contiguous columns. In short, just like a deck of

punch cards or a magnetic tape.

But physically contiguous storage is only one way of building a

relational database and it is not always the best one. But aside from

that, the whole idea of a relational database is that user is not

supposed to know how things are stored at all, much less write code that

depends on the particular physical representation in a particular

release of a particular product.

One of the biggest errors is the IDENTITY column (actually property) in

the Sybase family (SQL Server and Sybase). People actually program with

this "feature" and even use it as the primary key for the table! Now,

let's go into painful details as to why this thing is bad.

The practical considerations are that IDENTITY is proprietary and

non-portable, so you know that you will have maintenance problems when

you change releases or port your system to other products.

But let's look at the logical problems. First try to create a table

with two columns and try to make them both IDENTITY. If you cannot

declare more than one column to be of a certain datatype, then that

thing is not a datatype at all, by definition. It is a property which

belongs to the PHYSICAL table, not the data in the table.

Next, create a table with one column and make it an IDENTITY. Now try

to insert, update and delete different numbers from it. If you cannot

insert, update and delete rows from a table, then it is not a table by

definition.

Finally create a simple table with one IDENTITY and a few other columns.

Use a few statements like

INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');

INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');

INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');

to put a few rows into the table and notice that the IDENTITY

sequentially numbered them in the order they were presented. If you

delete a row, the gap in the sequence is not filled in and the sequence

continues from the highest number that has ever been used in that column

in that particular table.

But now use a statement with a query expression in it, like this:

INSERT INTO Foobar (a, b, c)

SELECT x, y, z

FROM Floob;

Since a query result is a table, and a table is a set which has no

ordering, what should the IDENTITY numbers be? The entire, whole,

completed set is presented to Foobar all at once, not a row at a time.

There are (n!) ways to number (n) rows, so which one do you pick? The

answer has been to use whatever the physical order of the result set

happened to be. That non-relational phrase "physical order" again.

But it is actually worse than that. If the same query is executed

again, but with new statistics or after an index has been dropped or

added, the new execution plan could bring the result set back in a

different physical order. Can you explain from a logical model why the

same rows in the second query get different IDENTITY numbers? In the

relational model, they should be treated the same if all the values of

all the attributes are identical.

Using IDENTITY as a primary key is a sign that there is no data model,

only an imitation of a sequential file system. Since this number exists

only as a result of the state of particular piece of hardware at a

particular time, how do you verify that an entity has such a number in

the reality you are modeling?

To quote from Dr. Codd: "..Database users may cause the system to

generate or delete a surrogate, but they have no control over its value,

nor is its value ever displayed to them ..."(Dr. Codd in ACM TODS, pp

409-410) and Codd, E. (1979), Extending the database relational model to

capture more meaning. ACM Transactions on Database Systems, 4(4). pp.

397-434. This means that a surogate ought ot act like an index; created

by the user, managed by the system and NEVER seen by a user. That means

never used in queries.

Codd also wrote the following:

"There are three difficulties in employing user-controlled keys as

permanent surrogates for entities.

(1) The actual values of user-controlled keys are determined by users

and must therefore be subject to change by them (e.g. if two companies

merge, the two employee databases might be combined with the result that

some or all of the serial numbers might be changed.).

(2) Two relations may have user-controlled keys defined on distinct

domains (e.g. one uses social security, while the other uses employee

serial numbers) and yet the entities denoted are the same.

(3) It may be necessary to carry information about an entity either

before it has been assigned a user-controlled key value or after it has

ceased tohave one (e.g. and applicant for a job and a retiree).

These difficulties have the important consequence that an equi-join on

common key values may not yield the same result as a join on common

entities. A solution - proposed in part [4] and more fully in [14] - is

to introduce entity domains which contain system-assigned surrogates.

Database users may cause the system to generate or delete a surrogate,

but they have no control over its value, nor is its value ever displayed

to them....." (Codd in ACM TODS, pp 409-410).

References

Codd, E. (1979), Extending the database relational model to capture more

meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434

The most common use that a Newbie makes of IDENTITY is to use it as a

record number (under the error that a record nubmer is a key!), so that

he does not have to think about keys, DRI, check digits, proper data

types, international standards and all that hard stuff.

While this was meant as an abstract example, I also fear that you have

not read ISO-11179 because of the silly, redundant, dangerous prefixes

on your code.

--CELKO--

Steve, no diatribe from me!

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

noeld SSC Guru Points: 96590 More actions · Answer 10

This type of scenario is also very used in Datawarehousing, so keep that in mind

* Noel

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 11

I have to say that I have watched with some dismay the tendency of programmers/developers to automatically make the first column of their tables an indentity column without any real thought as to the data contained in the table. Worse, in my opinion, is the fact thes often get set as the PK ( not always clustered to be fair ) but that the column is never ( or rarely ) used in the queries against the database.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

Frank Kalis SSC Guru Points: 111183 More actions · Answer 12

Although I agree with you, this seems like some kind of religious discussion which we won't solve here. Using the IDENTITY property is not bad per se, as long as no natural PK exists. But why being a hardliner on this? There are more severe issuer to watch out for when it comes to bad design.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]