Database Design - Need help in Choosing a primary key

Question

Post reply

Database Design - Need help in Choosing a primary key

shafirmubarak

SSC-Addicted

Points: 493
More actions
April 30, 2014 at 9:30 am

#307434

Hi All,
I am kind of confused. I need some help in choosing a primary key for the database which i am designing.
I have few tables which contains 5 -15 fields out of it 3 - 9 columns combined to form the uniqueness of the row.
All are un-related tables. Three parent tables connect with 20 child non-related child tables.
I believe it would not be a wise choice to choose 3 to 9 fields for primary key.
But if i use an auto increment as a key will there be of any use as it might never be used to fetch rows. Then why do i still have to go with that?
Or Is it ok to create a primary key of upto 5 attributes?
Can someone help me on this?
Thanks.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

AlexSQLForums SSChampion Points: 14277 More actions · Answer 1

It all depends on your business rules.

If the rule is that you must identify each row uniquely within a table then you need the default primary key.

How will you join 3 tables with 20 child tables?

Alex S

shafirmubarak SSC-Addicted Points: 493 More actions · Answer 2

Thanks,

For example, user id is used as a linkage across multiple tables.

I am concerned about defining a key for the child tables as it requires 3- 9 columns to make the row unique. Will that be optimal?

Or defining a surrogate key will be of any use?

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 3

shafirmubarak (4/30/2014)
Thanks,
For example, user id is used as a linkage across multiple tables.
I am concerned about defining a key for the child tables as it requires 3- 9 columns to make the row unique. Will that be optimal?
Or defining a surrogate key will be of any use?

If 3-9 columns make the row unique, even if you create a surrogate key, you'll still need a unique constraint to ensure the business model is accurately reflected in your data.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

shafirmubarak SSC-Addicted Points: 493 More actions · Answer 4

Yes, the unique enforcement would be absolute necessary.

But i was not sure how surrogate key will be of use?

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 5

shafirmubarak (4/30/2014)
Yes, the unique enforcement would be absolute necessary.
But i was not sure how surrogate key will be of use?

In some situations it will help performance to have a narrower index. But that's just a general observation. It's hard to be more specific with no details.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Sean Lange SSC Guru Points: 286609 More actions · Answer 6

Grant Fritchey (4/30/2014)
shafirmubarak (4/30/2014)
Yes, the unique enforcement would be absolute necessary.
But i was not sure how surrogate key will be of use?
In some situations it will help performance to have a narrower index. But that's just a general observation. It's hard to be more specific with no details.

It can also make sql development a little easier. You would have only 1 parameter for identification of the row instead of 5. Also your where clause is only 1 predicate instead of 5. Now I wouldn't recommend designing your database around ease of coding but that is a difference.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Bill Talada SSChampion Points: 11954 More actions · Answer 7

The number one rule of a primary key is that it must be "meaningless" yet unique. To be meaningless it must not be one of your four columns that are under user control. Your four columns can and should be an Alternate Key (unique constraint/index).

I've gotten away from Identities as primary keys and I feel liberated. Identities are non-ANSI and prevent batch insertion. Personally, I prefer uniqueidentifiers for the PK.

All non-clustered indexes automatically include the primary key so if you want skinny indexes use a single column for your primary key.

Sean Lange SSC Guru Points: 286609 More actions · Answer 8

Bill Talada (4/30/2014)
The number one rule of a primary key is that it must be "meaningless" yet unique. To be meaningless it must not be one of your four columns that are under user control. Your four columns can and should be an Alternate Key (unique constraint/index).
I've gotten away from Identities as primary keys and I feel liberated. Identities are non-ANSI and prevent batch insertion. Personally, I prefer uniqueidentifiers for the PK.
All non-clustered indexes automatically include the primary key so if you want skinny indexes use a single column for your primary key.

Why a guid over an identity? They take up a lot more storage and are a complete PITA to debug. Non-ANSI honestly doesn't really matter because database agnostic is a farce.

I am not sure how switching from one type of generated value to a different (and more painful) one makes you feel liberated.

What do you mean that an identity prevents batch insertion? That is simply not true.

Also, if you are using a super big column like a guid for a primary key you need to make sure your primary key is NOT clustered. You need to find something else to use as your clustered index.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Sean Lange SSC Guru Points: 286609 More actions · Answer 9

The number one rule of a primary key is that it must be "meaningless" yet unique.

This is a matter of opinion. There is no reason a primary key MUST be a magic value. That is what surrogate keys do. By your definition you have completely ruled out any natural keys.

It should however be a value that is constant. A good example is an automobile. They all have a VIN. It is constant, it is presented to the user AND it makes a perfect primary key for a table that holds information about cars.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Bill Talada SSChampion Points: 11954 More actions · Answer 10

Guids are my preference; I never said they were gospel truth. The difference between guids and identities is that identities get generated on insert whereas guids can be generated on clients so clients can generate child table foreign keys already linked.

Identities can be batch inserted if you use the "SET IDENTITY_INSERT <table> ON" but then you've got to do the locking to prevent others inserting at the same time. If you don't care which identity value is assigned, you're OK but most apps insert child rows after parent rows so you've got to have an alternate key to fetch the rows back and match on to see which identity was assigned to what row, a real PITA.

I generate helpful debugging views for each table so my developers never suffer from the PITA debugging you speak of for guids.

I switched 450 tables from int identities to guids and my app ran noticeable faster on a 100 GB database with a 10% increase in size due to the datatype change. And we are set to batch insert rows now for an estimated 10x improvement in the near future from not network round-tripping identity inserts.

Standardizing development to guid PKs can greatly simplify application development over using different natural keys like VINs for each table and I'm not sure a VIN is any shorter than a guid. You are correct that a PK should definitely be considered read-only and not updated.

Sean Lange SSC Guru Points: 286609 More actions · Answer 11

Bill Talada (4/30/2014)
Guids are my preference; I never said they were gospel truth. The difference between guids and identities is that identities get generated on insert whereas guids can be generated on clients so clients can generate child table foreign keys already linked.
Identities can be batch inserted if you use the "SET IDENTITY_INSERT <table> ON" but then you've got to do the locking to prevent others inserting at the same time. If you don't care which identity value is assigned, you're OK but most apps insert child rows after parent rows so you've got to have an alternate key to fetch the rows back and match on to see which identity was assigned to what row, a real PITA.
I generate helpful debugging views for each table so my developers never suffer from the PITA debugging you speak of for guids.
I switched 450 tables from int identities to guids and my app ran noticeable faster on a 100 GB database with a 10% increase in size due to the datatype change. And we are set to batch insert rows now for an estimated 10x improvement in the near future from not network round-tripping identity inserts.
Standardizing development to guid PKs can greatly simplify application development over using different natural keys like VINs for each table and I'm not sure a VIN is any shorter than a guid. You are correct that a PK should definitely be considered read-only and not updated.

Sounds like you guys do lots of batch inserts. A guid is probably a much better choice in that situation. Just make sure that if you switched your PKs to guids that you either add another index to be the clustered index or make sure you regularly defrag your clustered index on a guid column. The fragmentation of a clustered index on a guid column is staggering. It can and will hit 99% + at as few as 10k rows. If you routinely defrag them then it shouldn't be an issue at all.

As far as VINs my intention was not to suggest length of key but to suggest that is an excellent choice for a natural key over a surrogate key.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

shafirmubarak SSC-Addicted Points: 493 More actions · Answer 12

Thanks everyone for your valuable suggestion.

I have considered all your suggestion. Will go with surrogate key as primary key and have an alternate key to enforce uniqueness and business logic.

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 13

Bill Talada (4/30/2014)
The number one rule of a primary key is that it must be "meaningless" yet unique.

That's your number one rule but definitely not the number one rule. In principle a primary key is any candidate key but by convention it is usually either the "preferred" identifier in the table or the key referenced by foreign keys in other tables.

Some sensible, commonly applied criteria for choosing a key are: Simplicity, Stability and Familiarity. Lack of meaning is not a requirement.

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 14

sqlvogel (5/2/2014)
Bill Talada (4/30/2014)
The number one rule of a primary key is that it must be "meaningless" yet unique.
That's your number one rule but definitely not the number one rule. In principle a primary key is any candidate key but by convention it is usually either the "preferred" identifier in the table or the key referenced by foreign keys in other tables.
Some sensible, commonly applied criteria for choosing a key are: Simplicity, Stability and Familiarity. Lack of meaning is not a requirement.

I don't disagree with your statement, but the one caveat that I would add is that you don't want the PK value to be changeable. It just adds way too much headache to the entire design. Meaningful or not, it should be immutable (hopefully).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning