February 26, 2004 at 9:26 am
Can somebody give me an explanation (or point me to a link which explains) of the difference between IDENTITY and PRIMARY KEYs? I am fairly new to SQL Server, and have set my id columns as PRIMARY KEY and IDENTITY (1,1) YES. I am hoping to implement transactional replication eventually and am not sure this is the best option.
Thanks,
Brett
February 26, 2004 at 10:36 am
A PK is a unique identifier for a row of data. An identity column will auto fill based on some increment each new row. Identities are not necessarily unique or contiguous. You can make the identity column your PK, but it doesnot have to be.
February 26, 2004 at 12:14 pm
ok, so a Primary Key could be something like 'bmw' or 'ford' in a 'mfr_name' column in a table called 'vehicle_mfr' as long as the manufacturer was only in the table once. I suppose my question is then, why wouldn't there be a 'mfr_id' int column, starting at 1 and auto-incrementing, which was set as the primary key. I suppose then for Primary Keys, a default formula rather than a default value should be specified. Is this on the right track? Also, would there be any reason to have an IDENTITY column in a table that has a separate Primary Key?
Many thanks.
February 26, 2004 at 12:39 pm
Ok a primary key(pk) should be INVARIABLE with time. Sometimes it happens that what you thought it could be a good pk candidate is not anymore and you may ended redesigning a lot but if you use a surrogate key (something to identify rows on your table ) you are almost sure to have something stable.
I know there is a lot of debate between to use surrogates or not to use them but that's another topic
About the IDENTITY column where there is a pk is you said that the pk constraint is ENFORCED then there is no apparent need for it but some times it helps to accomplish certain business needs(queries) with ease
* Noel
February 26, 2004 at 12:52 pm
Yes, a PK shouldn't be volatile. A PK is one of the fundamental concept of relational databases. It is used to uniquely identify a row in a table. The IDENTITY property you can define on a numeric column offers simply a mechanism for providing a unique number. I would look at BOL for explanation of both. And if you have some money to spent, maybe buy a book on designing relational databases. For beginners I think Database Design for Mere Mortals (Second Edition) von Michael J. Hernandez, Addison-Wesley; ISBN: 0-201-75284-0 is quite worth reading because it explains things in a non academic fashion. Another option is to ask questions here. It's a real time saver.
HTH
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 26, 2004 at 12:59 pm
ok, so a Primary Key could be something like 'bmw' or 'ford' in a 'mfr_name' column in a table called 'vehicle_mfr' as long as the manufacturer was only in the table once. I suppose my question is then, why wouldn't there be a 'mfr_id' int column, starting at 1 and auto-incrementing, which was set as the primary key. I suppose then for Primary Keys, a default formula rather than a default value should be specified. Is this on the right track?
There is no law that state that you must have a numeric primary key. However, numbers are easy to sort and search, therefore offer performance advantages over strings. If you have a natural PK, of course you can use this. The concept of IDENTITY sometimes seem to lead to rather religious discussions. Difficult to judge who is right and who is wrong. I guess everyone has to find out for himself which way to go.
Also, would there be any reason to have an IDENTITY column in a table that has a separate Primary Key?
No, I would say.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 26, 2004 at 1:10 pm
Some systems I have "inherited" had multiple Colums primary keys (4-5) and I was task to speed some of the code! One of the shortest solutions to get it going was to ADD an identity column and a index on it !!!
You see there is not a necesity on the model but in practice sometimes it helps.
This is why some times you have a perfect 3NF design and you DENORMALIZE to get speed!!
* Noel
February 26, 2004 at 1:18 pm
Brian Kelley has a nice article on normalization http://www.sqlservercentral.com/columnists/bkelley/normalization.asp
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 26, 2004 at 2:19 pm
Thanks guys, this will certainly help me!
I think I agree with Frank on the numeric PK performance enhancements. I think it just seems most obvious to visualize the rows in a numeric order, and then sort them in that way. I have seen tables before that do have other unique values, but numeric seems easiest to me.
noeld, if I am ever in your situation I will certainly remember your solution!
Frank, thanks for the article on normalization.
February 26, 2004 at 2:30 pm
I think I agree with Frank on the numeric PK performance enhancements. I think it just seems most obvious to visualize the rows in a numeric order, and then sort them in that way. I have seen tables before that do have other unique values, but numeric seems easiest to me.
Purists and hardliners might now argue, you've successfully turned a RDBMS into some kind of something else. I know I will get crucified for quoting Joe Celko again, but here it goes. Take it purely as some entertaining evening fun from me.
>> I'm trying to change an existing field [sic] to an
IDENTITY field [sic], but I'm getting syntax errors. It's got to be
simple, but somewhere I'm missing something. <<
What you are missing is the basic concepts of the relational model.
Columns are not fields; rows are not records; tables are not files. An
IDENTITY property cannot be a key by definition. A key is a subset of
attributes that uniquely define an entity in your data model.
The IDENTITY column is a holdover from the early programming language
which were very close to the hardware. For example, the fields in a
COBOL or FORTRAN program were assumed to be physically located in main
storage in the order they were declared in the program.
The early SQLs were based on existing file systems. The data was kept
in physically contiguous disk pages, in physically contiguous rows, made
up of physically contiguous columns. In short, just like a deck of
punch cards or a magnetic tape.
But physically contiguous storage is only one way of building a
relational database and it is not always the best one. But aside from
that, the whole idea of a relational database is that user is not
supposed to know how things are stored at all, much less write code that
depends on the particular physical representation in a particular
release of a particular product.
One of the biggest errors is the IDENTITY column (actually property) in
the Sybase family (SQL Server and Sybase). People actually program with
this "feature" and even use it as the primary key for the table! Now,
let's go into painful details as to why this thing is bad.
The practical considerations are that IDENTITY is proprietary and
non-portable, so you know that you will have maintenance problems when
you change releases or port your system to other products.
But let's look at the logical problems. First try to create a table
with two columns and try to make them both IDENTITY. If you cannot
declare more than one column to be of a certain datatype, then that
thing is not a datatype at all, by definition. It is a property which
belongs to the PHYSICAL table, not the data in the table.
Next, create a table with one column and make it an IDENTITY. Now try
to insert, update and delete different numbers from it. If you cannot
insert, update and delete rows from a table, then it is not a table by
definition.
Finally create a simple table with one IDENTITY and a few other columns.
Use a few statements like
INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');
to put a few rows into the table and notice that the IDENTITY
sequentially numbered them in the order they were presented. If you
delete a row, the gap in the sequence is not filled in and the sequence
continues from the highest number that has ever been used in that column
in that particular table.
But now use a statement with a query expression in it, like this:
INSERT INTO Foobar (a, b, c)
SELECT x, y, z
FROM Floob;
Since a query result is a table, and a table is a set which has no
ordering, what should the IDENTITY numbers be? The entire, whole,
completed set is presented to Foobar all at once, not a row at a time.
There are (n!) ways to number (n) rows, so which one do you pick? The
answer has been to use whatever the physical order of the result set
happened to be. That non-relational phrase "physical order" again.
But it is actually worse than that. If the same query is executed
again, but with new statistics or after an index has been dropped or
added, the new execution plan could bring the result set back in a
different physical order. Can you explain from a logical model why the
same rows in the second query get different IDENTITY numbers? In the
relational model, they should be treated the same if all the values of
all the attributes are identical.
Using IDENTITY as a primary key is a sign that there is no data model,
only an imitation of a sequential file system. Since this number exists
only as a result of the state of particular piece of hardware at a
particular time, how do you verify that an entity has such a number in
the reality you are modeling?
To quote from Dr. Codd: "..Database users may cause the system to
generate or delete a surrogate, but they have no control over its value,
nor is its value ever displayed to them ..."(Dr. Codd in ACM TODS, pp
409-410) and Codd, E. (1979), Extending the database relational model to
capture more meaning. ACM Transactions on Database Systems, 4(4). pp.
397-434. This means that a surogate ought ot act like an index; created
by the user, managed by the system and NEVER seen by a user. That means
never used in queries.
Codd also wrote the following:
"There are three difficulties in employing user-controlled keys as
permanent surrogates for entities.
(1) The actual values of user-controlled keys are determined by users
and must therefore be subject to change by them (e.g. if two companies
merge, the two employee databases might be combined with the result that
some or all of the serial numbers might be changed.).
(2) Two relations may have user-controlled keys defined on distinct
domains (e.g. one uses social security, while the other uses employee
serial numbers) and yet the entities denoted are the same.
(3) It may be necessary to carry information about an entity either
before it has been assigned a user-controlled key value or after it has
ceased tohave one (e.g. and applicant for a job and a retiree).
These difficulties have the important consequence that an equi-join on
common key values may not yield the same result as a join on common
entities. A solution - proposed in part [4] and more fully in [14] - is
to introduce entity domains which contain system-assigned surrogates.
Database users may cause the system to generate or delete a surrogate,
but they have no control over its value, nor is its value ever displayed
to them....." (Codd in ACM TODS, pp 409-410).
References
Codd, E. (1979), Extending the database relational model to capture more
meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434
The most common use that a Newbie makes of IDENTITY is to use it as a
record number (under the error that a record nubmer is a key!), so that
he does not have to think about keys, DRI, check digits, proper data
types, international standards and all that hard stuff.
While this was meant as an abstract example, I also fear that you have
not read ISO-11179 because of the silly, redundant, dangerous prefixes
on your code.
--CELKO--
Steve, no diatribe from me!
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 26, 2004 at 2:47 pm
This type of scenario is also very used in Datawarehousing, so keep that in mind
* Noel
February 28, 2004 at 11:51 am
I have to say that I have watched with some dismay the tendency of programmers/developers to automatically make the first column of their tables an indentity column without any real thought as to the data contained in the table. Worse, in my opinion, is the fact thes often get set as the PK ( not always clustered to be fair ) but that the column is never ( or rarely ) used in the queries against the database.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
February 28, 2004 at 2:12 pm
Although I agree with you, this seems like some kind of religious discussion which we won't solve here. Using the IDENTITY property is not bad per se, as long as no natural PK exists. But why being a hardliner on this? There are more severe issuer to watch out for when it comes to bad design.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply