The Identity Debate

Question

The Identity Debate

Viewing 15 posts - 61 through 75 (of 129 total)

You must be logged in to reply to this topic. Login to reply

BethS SSC Rookie Points: 32 More actions · Answer 1

I use surrogate keys, Identity keys only on the primary table and (sometimes) unique for the natural key that is not really a key but uniquely identifies the row. The child tables are then keyed with the parent's identities. That way I can use relational integrity and cascading deletes and updates so no child records are orphaned. An identity key on every table means you have to build integrity into your application. That's more work and more potential problems. If there is a many level down child table, then it's key is complex, but if you are using identities for the primary parents then the key is all integers so it's not bad. I have duplicate trapping on the primary parent tables during insert from my apps for the natural keys that aren't keys.

This is the way my workplace does it for OLTP and it works well.

Cary Hower-563110 Say Hey Kid Points: 662 More actions · Answer 2

There are a number of factors to consider in this debate. My original training in relational design dictated 3rd normal form for databases. The easiest way that I could remember those rules was the following: “The key, the whole key, and nothing but the key, so help me Codd.” This suggests natural keys. I have also learned over the course of many years of doing data design that natural keys are often multi-columnar and, like it or not, changing business needs require that values of these keys columns may change with time. Occasionally, though seldom, one of the columns in a multi column natural primary key may have to be null. I know, I know, no part of a primary key can be null, and this will likely open a can of worms. Consider the possibility that one of the columns is not known for some rows at one time, and the value becomes know later. Again, this happens seldom, but I have seen it happen.

With multi column natural primary keys, referencing tables must repeat all of the columns of the primary key. This makes the database bigger, therefore, slower. It also introduces update anomalies if the primary key values change. Over the course of time, I have slightly modified my stand on database design from what I originally learned. I have simplified my view to the notion that a single fact should be recorded in a single place (except in the case of primary/foreign key situations where there is no choice but to record in more than one place). This virtually eliminates data anomalies. To this end, I use identities as primary keys and the natural key as unique keys. These unique keys allow nulls and easier key value changes. The single column numeric primary key also allows for simpler join predicates. My original training was on mainframes where I learned that numeric, especially integer compares are faster that character compares (especially variable length compares). I have no reason to believe that this is not true on the platforms that I perceive most of us use today.

Bryant McClellan SSCarpal Tunnel Points: 4397 More actions · Answer 3

Matt,

Actually I am fully aware of the differences between primary keys and clustered indexes. Having worked with SQL Server since v4.2.1, I've had ample time to explore many of the things available and changed over time in SQL Server. I agree that SQL Server does a mis-service in ASSUMING that the primary key is the clustered index for the table, rather than imploring the designer to make the decision.

That being said, what I was trying to reinforce was a comment made earlier. And that had to do with design. Seriously considering logical, physical and business use design lead to a better mix of natural and surrogate keys, as well as consideration for indexes that might need to exist beyond the keys to accomplish specific tasks, such as optimizing physical storage to best handle the most common business use cases.

------------
Buy the ticket, take the ride. -- Hunter S. Thompson

Steve Jones - SSC Editor SSC Guru Points: 736445 More actions · Answer 4

What should the default be? Should all PKs be non-clustered? It has to be one way or the other in the system and I'd like to think that whoever build the UI had some reason for doing it this way. At least something better than flipping a coin.

Personally I rarely have the PK clustered. So it annoys me to uncheck the box, but I'm guessing that since so many tables have one index, the PK only, that MS was trying to avoid heaps and so forced you to consciously change the PK to non-clustered when you know that you have another index you're create as clustered.

Cary Hower-563110 Say Hey Kid Points: 662 More actions · Answer 5

This is really a separate topic from identities, but I feel that PKs should be clustered because they are so frequently used in joins. It seems that join performance is enhanced by clustering.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

G Bryant McClellan (2/12/2008)
Matt,
Actually I am fully aware of the differences between primary keys and clustered indexes. Having worked with SQL Server since v4.2.1, I've had ample time to explore many of the things available and changed over time in SQL Server. I agree that SQL Server does a mis-service in ASSUMING that the primary key is the clustered index for the table, rather than imploring the designer to make the decision.
That being said, what I was trying to reinforce was a comment made earlier. And that had to do with design. Seriously considering logical, physical and business use design lead to a better mix of natural and surrogate keys, as well as consideration for indexes that might need to exist beyond the keys to accomplish specific tasks, such as optimizing physical storage to best handle the most common business use cases.

Well said. Good database design to support good data code to support solid application code, and NOT the garbage so often being pushed by "Agile" advocates (screw the database design, we'll figure that out later). Never mind that it's an actual perversion of the technique...

It's really hard to have every comment come out right every single time they get expressed, and then there's the "no guarantee" the reader will interpret it the way I meant. No disrespect was meant, and hopefully none taken.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Mike C SSC-Insane Points: 23224 More actions · Answer 7

Cary Hower (2/12/2008)
This is really a separate topic from identities, but I feel that PKs should be clustered because they are so frequently used in joins. It seems that join performance is enhanced by clustering.

I think clustered PK is not always the best option. You also need to consider your query and update patterns.

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 8

I won’t hash over the reasons for using identity surrogate keys; that has already been well expressed in this thread.

The real reason I don’t use natural primary keys is simple: I have had numerous occasions to regret the use of natural primary keys, but have yet to have an occasion to regret using an identity primary key. So it’s a choice between something that usually works but is occasionally a huge problem and something that has yet to cause me a problem.

chris.compton SSC Veteran Points: 270 More actions · Answer 9

Okay, with the risk of sounding more ignorant than I am, I'll admit that I get a little lost.

I've been programming for almost two decades, and the terminology I have encountered has varied based on where I worked and whether I was talking to a DBA or not (mostly not).

Could someone (briefly!!!) explain the difference between

·Natural key

·Surrogate key

·Candidate key

I though I knew... but reading some sub-discussions in the 68 previous posts leads me to believe that either (1) I don't know as much as I thought or (2) I was right but there are some subtleties I don't appreciate or (3) some other posters were loose or wrong with their examples and/or labeling.

A simple example (or two) would be sufficient; thanks!

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 10

chris.compton (2/12/2008)
Okay, with the risk of sounding more ignorant than I am, I'll admit that I get a little lost.
I've been programming for almost two decades, and the terminology I have encountered has varied based on where I worked and whether I was talking to a DBA or not (mostly not).
Could someone (briefly!!!) explain the difference between
·Natural key
·Surrogate key
·Candidate key
I though I knew... but reading some sub-discussions in the 68 previous posts leads me to believe that either (1) I don't know as much as I thought or (2) I was right but there are some subtleties I don't appreciate or (3) some other posters were loose or wrong with their examples and/or labeling.
A simple example (or two) would be sufficient; thanks!

See if this example helps:

CREATE TABLE [dbo].[LANGUAGE] (

[LANGUAGE_ID] [int] IDENTITY (1, 1) NOT NULL ,

[LANGUAGE_CODE] [varchar] (3) NOT NULL ,

[LANGUAGE_NAME] [varchar] (100) NOT NULL

)

GO

-- Create Surrogate Identity Primary Key

ALTER TABLE [dbo].[LANGUAGE] ADD CONSTRAINT

[PK__LANGUAGE] PRIMARY KEY CLUSTERED ( [LANGUAGE_ID])

-- Create a unique constraint on natural key

ALTER TABLE [dbo].[LANGUAGE] ADD CONSTRAINT

[AK__LANGUAGE__LANGUAGE_CODE] UNIQUE NONCLUSTERED ( [LANGUAGE_CODE] )

-- Create a unique constraint on alternate key

ALTER TABLE [dbo].[LANGUAGE] ADD CONSTRAINT

[AK__LANGUAGE__LANGUAGE_NAME] UNIQUE NONCLUSTERED ( [LANGUAGE_NAME] )

Anders Pedersen SSChampion Points: 11410 More actions · Answer 11

Don't get me started on "agile", or any of the dozen other fancy words used for development techniques since I started working with SQL in 1995.... seems like database design has gotten worse with each one of these fad development techniques.

I'm a fence sitter on this one. Each way has it's place. At a previous job I had a table with 13 columns as the natural PK. Including a variety of datatypes such as int, datetime (multiple columns), varchars, chars. The FK table had 4 more columns. Was a joy to write queries against.... NOT. Did it work? Yes. Pain? Yes. I tried to get my manager to agree to adding a surrogate identity value as the PK in the main table, drop the 13 columns in the FK table.... Why when it worked the way it did? Because eventually some developer would forget to join on one of the 13 columns and make a mess....

chris.compton SSC Veteran Points: 270 More actions · Answer 12

Michael Valentine Jones (2/12/2008)
chris.compton (2/12/2008)
...
Could someone (briefly!!!) explain the difference between
· Natural key
· Surrogate key
· Candidate key
...
See if this example helps:
[simplified by Chris C.]
CREATE TABLE [dbo].[LANGUAGE] (
[LANGUAGE_ID] [int] IDENTITY (1, 1) NOT NULL , -- surrogate key
[LANGUAGE_CODE] [varchar] (3) NOT NULL ,-- natural key
[LANGUAGE_NAME] [varchar] (100) NOT NULL -- alternate key
)

Yes it does, thanks!

Is "candidate" a type of key too?

(or maybe the poster meant 'candidate' in a non db sense

i.e. "candidate to be a key"... can't find the post)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 13

chris.compton (2/12/2008)
Yes it does, thanks!
Is "candidate" a type of key too?
(or maybe the poster meant 'candidate' in a non db sense
i.e. "candidate to be a key"... can't find the post)

Any key that is unique is a "candidate key". In Michael's example above, BOTH the primary and the alternates are candidate keys.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

chris.compton SSC Veteran Points: 270 More actions · Answer 14

Matt Miller (2/12/2008)
Any key that is unique is a "candidate key"...

Oh, that makes sense. I've heard the term "candidate field" (with the meaning being "one of the fields in the table which we could use as a key") but don't recall "candidate key".

Thanks!

P.S. Love your sig; if I could use it here, I would

use it with the added phrase "or is a political appointee..."

Paul Stafford SSChasing Mays Points: 600 More actions · Answer 15

chris.compton (2/12/2008)
Oh, that makes sense. I've heard the term "candidate field" (with the meaning being "one of the fields in the table which we could use as a key") but don't recall "candidate key".

The reason candidate key is a better term is because a key can use more than one field/column. You could have a table with say 3 different multi-column unique keys, each of these would be a candidate key, and you would pick one of them to be the primary key.