primary keys

  • The vast majority of primary keys I see in all databases are on the id identity fields, I suppose because they are sequbtial and used in joins more often than not

    But is it more benficial as far as performance is concerned to put a primary key on a combination of fields in a table that are unique and perhaps a unique index on the identity field?

    I know this is a very open question but a general opinion of where primary keys should sit would be great

    Thanks in advance

  • a primary key may not be null so if you are using a concatenated key as your primary key, all fields must contain values. I realize that's more of a comment than an answer to your question; but, saw there were 58 views and no responses, so thought I'd get the ball rolling 🙂

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • erics44 (5/12/2012)


    The vast majority of primary keys I see in all databases are on the id identity fields, I suppose because they are sequbtial and used in joins more often than not

    But is it more benficial as far as performance is concerned to put a primary key on a combination of fields in a table that are unique and perhaps a unique index on the identity field?

    I know this is a very open question but a general opinion of where primary keys should sit would be great

    Thanks in advance

    It generally depends on the nature of the queries you intend against the data. I will typically use two sets of keys. An IDENTITY column as a surrogate key which typically becomes my "ever increasing, narrow, unique" clustered index and an alternate "business" key that may face the users. Which one I make the primary key depends on which is key is more logical to be the PK. For something like a list of U.S. States, I may forgo the IDENTITY column altogether because of the nature of the better alternate key (the 2 character State abbreviation).

    A lot of "blood" has been shed over whether or not an IDENTITY column should be used as a PK or even a Clustered Index (and, yes, they are different). The best idea in this area is to keep an open mind and to remember that "It Depends". Take a little time to study the tables you want to build and then make sure that your PK's and Clustered Indexes can stand the tests of time and scale.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • erics44 (5/12/2012)


    The vast majority of primary keys I see in all databases are on the id identity fields, I suppose because they are sequbtial and used in joins more often than not

    But is it more benficial as far as performance is concerned to put a primary key on a combination of fields in a table that are unique and perhaps a unique index on the identity field?

    I know this is a very open question but a general opinion of where primary keys should sit would be great

    Thanks in advance

    A reasonable question and fortunately one with a simple answer. It makes no difference which key you make your "primary" key. What's important is that you correctly enforce the uniqueness of all the keys you need in each table and you have appropriate indexes on them. Performance will be determined by indexing, partitioning and other optimisation features and by how the table is used. The choice of primary key is irrelevant as far as performance is concerned.

    As a matter of convention it is usual to designate one "preferred" key per table as a primary key and very often (not always) that is the key used for foreign key references in other tables. That is purely a matter of convention however. Unfortunately you will hear and read an awful lot of nonsense talked about what is or isn't a good choice for a primary key based on people's preconceived assumptions about how such a key might be used. I recommend you don't worry too much about it. Treat the choice of all your keys with equal care and attention, choose the right indexing strategy and you'll find that the "primaryness" or otherwise of any one key makes no difference at all.

  • There is a LOT of confusion about primary keys and clustered indexes.

    Primary keys are the minimum number of columns to identify a unique row. If you are using the Identity columns or UniqueIdentifiers, then they are usually the primary key. Now, for whatever reason, Microsoft by default makes it the clustered index. This is 99% of the time wrong.

    The clustered index should be on a better choice, and should almost never be on the primary key.

    The 2 are used for entirely different things, unfortunately Microsoft has confused everyone by their defaults.

  • ...

    Now, for whatever reason, Microsoft by default makes it the clustered index. This is 99% of the time wrong.

    The clustered index should be on a better choice, and should almost never be on the primary key.

    ...

    Completely disagree with both.

    All is other way around:

    Most of the time the primary key is the right choice for clustered index, Microsoft SQL Server uses it as default option.

    I cannot say is it 99% or 98% or whatever, but it's for sure: "most of the time" and for "most of the situations".

    Of cause, there are cases where PK is not the best choice for clustered index.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (5/15/2012)


    ...

    Now, for whatever reason, Microsoft by default makes it the clustered index. This is 99% of the time wrong.

    The clustered index should be on a better choice, and should almost never be on the primary key.

    ...

    Completely disagree with both.

    All is other way around:

    Most of the time the primary key is the right choice for clustered index, Microsoft SQL Server uses it as default option.

    I cannot say is it 99% or 98% or whatever, but it's for sure: "most of the time" and for "most of the situations".

    Of cause, there are cases where PK is not the best choice for clustered index.

    +1 I will also say that 95% of statistics are made up on the spot 🙂 I find that a lot of data may not necessarily have an easily identifiable natural primary key. For example, take a company that has John Smith and John Smith. These days, we have fixed the issue of having to identify these as John Smith from Accounting and Jon Smith from IT in an information system my using employee IDs. A previous company I worked for began their system using employee initials. Of course, over time you begin to run out of representations for similar employees (John A Smith becomes JAS and Jason A Sonderson should also be JAS). So then they started replacing middle initials with uncommon letters: JXS, JZS. Finally, the company had become so large that they were running out of permutations. So 3 numbers were added to the initials: JAS123, and JAS145 let's say.

    Ok... My point? A natural primary key for the simple data before the days of employee ids may have had to include not just the name, but an address. Even that could realistically be violated if there is a Junoir and Senior working at the same place (assuming that they are related in that fashion, but do not use the Sr. or Jr. in their actual names). In the case of JAS123, this makes a great primary key. In fact, it was designed and is enforced to act as such. However, this makes a very bad clustered index in the sense that it is not sequential on insertion. It is also a pain to store this in each and every table that references an employee, such as address table, phone table, and several other application related tables. So we implemented an identity column on that first employee table and used that as the foreign key.

    Finally, in that case... Should the primary key be defined as the identity column or as the employee id (assuming the employee id has a unique constraint)? Does it matter? My first thought is that it does not matter as they are both a unique identifier for an employee. We chose the identity column to be the primary key because it is very possible that the employee id could be changed or done away with altogether in the future.

    Jared
    CE - Microsoft

  • Eugene Elutin (5/15/2012)


    ...

    Now, for whatever reason, Microsoft by default makes it the clustered index. This is 99% of the time wrong.

    The clustered index should be on a better choice, and should almost never be on the primary key.

    ...

    Completely disagree with both.

    All is other way around:

    Most of the time the primary key is the right choice for clustered index, Microsoft SQL Server uses it as default option.

    I cannot say is it 99% or 98% or whatever, but it's for sure: "most of the time" and for "most of the situations".

    Of cause, there are cases where PK is not the best choice for clustered index.

    I am going to have disagree. The primary key isn't always the best choice for the clustered index. As there can only be one clustered index, it is actually better to look at the different access paths on the data of the table. If you find that you have a number of queries that are used extensively that perform a range search on a specific column or combination of columns (multicolumn index) that logically orders data in the appropriate sequence, this could be a better choice for the clustered index.

    Also, if you primary key happens to be a GUID, or a very wide multicolumn natural key, these may be good reasons for this to not be your clustered index.

    It really comes down to the consultants initial answer to everything, "It depends."

  • Lynn Pettis (5/15/2012)


    Eugene Elutin (5/15/2012)


    ...

    Now, for whatever reason, Microsoft by default makes it the clustered index. This is 99% of the time wrong.

    The clustered index should be on a better choice, and should almost never be on the primary key.

    ...

    Completely disagree with both.

    All is other way around:

    Most of the time the primary key is the right choice for clustered index, Microsoft SQL Server uses it as default option.

    I cannot say is it 99% or 98% or whatever, but it's for sure: "most of the time" and for "most of the situations".

    Of cause, there are cases where PK is not the best choice for clustered index.

    I am going to have disagree. The primary key isn't always the best choice for the clustered index. As there can only be one clustered index, it is actually better to look at the different access paths on the data of the table. If you find that you have a number of queries that are used extensively that perform a range search on a specific column or combination of columns (multicolumn index) that logically orders data in the appropriate sequence, this could be a better choice for the clustered index.

    Also, if you primary key happens to be a GUID, or a very wide multicolumn natural key, these may be good reasons for this to not be your clustered index.

    It really comes down to the consultants initial answer to everything, "It depends."

    I see your point, Lynn, but I still agree that a well thought out database will have most tables with a proper primary key and proper clustered index on the same column(s) "more of the time than not." Certainly not "almost never." Definitely not "always" or "never."

    Jared
    CE - Microsoft

  • Lynn Pettis (5/15/2012)


    Eugene Elutin (5/15/2012)


    ...

    Now, for whatever reason, Microsoft by default makes it the clustered index. This is 99% of the time wrong.

    The clustered index should be on a better choice, and should almost never be on the primary key.

    ...

    Completely disagree with both.

    All is other way around:

    Most of the time the primary key is the right choice for clustered index, Microsoft SQL Server uses it as default option.

    I cannot say is it 99% or 98% or whatever, but it's for sure: "most of the time" and for "most of the situations".

    Of cause, there are cases where PK is not the best choice for clustered index.

    I am going to have disagree. The primary key isn't always the best choice for the clustered index. As there can only be one clustered index, it is actually better to look at the different access paths on the data of the table. If you find that you have a number of queries that are used extensively that perform a range search on a specific column or combination of columns (multicolumn index) that logically orders data in the appropriate sequence, this could be a better choice for the clustered index.

    Also, if you primary key happens to be a GUID, or a very wide multicolumn natural key, these may be good reasons for this to not be your clustered index.

    It really comes down to the consultants initial answer to everything, "It depends."

    In no place I've said "always". I've said in "most of the time".

    GUID PK is one of the cases where I wouldn't make it clustered.

    For "multicolumn" (composite) PK's - it's really depends. Is some cases they're fine to be clustered and in another - not.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • As mentioned by others already, the use of surrogate keys (like ID or GUID), clustered indexes on them, primary keys vs clustered indexes, etc., etc., etc., really depends on what the data is for and how it will be used.

    IDs as primary keys with clustered indexes on them is often pretty lazy, but it also is often "good enough". It makes for an easy to design, easy to use system, but not necessarily an optimal system in terms of performance, reliability, and a few other factors.

    It's one of those subjects where a lot of blood can be shed in the arguments over what is "best", when "good enough" is ignored.

    So, without knowing more about your system, your data, your business needs, and a few related questions, I really can't say whether ID columns should even exist in your data, on a theoretical/academic basis. But I can say that they are usually good enough for whatever your business actually needs, both as clustered index and primary key. Usually good enough.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I am going to just say we are going to have to agree to disagree. I am not going to say that the Primary Key should most of the time be the clustered index. I am going to say that you need to fully investigate each index and the data access paths to each table to make the determination as to which index should be the clustered index. Do not blindly select the primary key simply because that is what most people say should be the clustered index or because Microsoft just happens to make the primary key the clustered index by default.

  • Lynn Pettis (5/15/2012)


    I am going to just say we are going to have to agree to disagree. I am not going to say that the Primary Key should most of the time be the clustered index. I am going to say that you need to fully investigate each index and the data access paths to each table to make the determination as to which index should be the clustered index. Do not blindly select the primary key simply because that is what most people say should be the clustered index or because Microsoft just happens to make the primary key the clustered index by default.

    Totally agree with the above!

    😉

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Lynn Pettis (5/15/2012)


    I am going to just say we are going to have to agree to disagree. I am not going to say that the Primary Key should most of the time be the clustered index. I am going to say that you need to fully investigate each index and the data access paths to each table to make the determination as to which index should be the clustered index. Do not blindly select the primary key simply because that is what most people say should be the clustered index or because Microsoft just happens to make the primary key the clustered index by default.

    Agreed. In a fully optimized database, you'd need to know data access patterns in order to define optimum indexing, including clustered and non-clustered indexes (and XML and CLR indexes, for that matter). In initial design/architecture stages, you may have to work with a best-estimate or even a WAG (Wild A** Guess) of what those patterns will be, so you may be stuck with best-effort indexing to start out and have to refactor those as actual usage patterns begin to emerge.

    The primary key should also be defined as what really identifies the tuple in the set, not as a surrogate key, in most cases. However, that's an artificial/theoretical constraint and not always realistically possible. Even Codd later stated that calling a key "primary" was a mistake on his part. So define your best "it tells us what the row describes" key, and use surrogate keys (ID/GUID/whatever) where needed.

    In some cases, your "primary key" will match your "clustered index", but not necessarily.

    Again, it really depends on the data and the business needs the data supports.

    But I do agree with Lynn on the debate over primary+clustered vs primary separate from clustered. Most often, in a fully optimized database, that won't be the case. Of course, most databases don't need to be optimized to that level.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Lynn Pettis (5/15/2012)


    Eugene Elutin (5/15/2012)


    ...

    Now, for whatever reason, Microsoft by default makes it the clustered index. This is 99% of the time wrong.

    The clustered index should be on a better choice, and should almost never be on the primary key.

    ...

    Completely disagree with both.

    All is other way around:

    Most of the time the primary key is the right choice for clustered index, Microsoft SQL Server uses it as default option.

    I cannot say is it 99% or 98% or whatever, but it's for sure: "most of the time" and for "most of the situations".

    Of cause, there are cases where PK is not the best choice for clustered index.

    I am going to have disagree. The primary key isn't always the best choice for the clustered index. As there can only be one clustered index, it is actually better to look at the different access paths on the data of the table. If you find that you have a number of queries that are used extensively that perform a range search on a specific column or combination of columns (multicolumn index) that logically orders data in the appropriate sequence, this could be a better choice for the clustered index.

    Also, if you primary key happens to be a GUID, or a very wide multicolumn natural key, these may be good reasons for this to not be your clustered index.

    It really comes down to the consultants initial answer to everything, "It depends."

    Well stated. There are no hard and fast rules here... every case should be analyzed on its own criteria based upon the design... which is based on the requirements spec. Is the database design a good one? Does it meet the requirements and performance specifications? Is it scalable and extensible? Once those questions have been answered and you have a good design, the decision about primary keys will become self evident.

    And not every table even needs a primary key... for example log tables. A clustered index is typically all they need, a primary key here is useless overhead.

    The probability of survival is inversely proportional to the angle of arrival.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply