Compound Primary Keys

  • I'm pretty desperate for some advice. I've generally designed databases off of Identity or Unique Identifiers in a sort of grid so that tables may look like this (psuedo code):

    Organization

    OrgId int identity primary key

    ...

    OrgAddress

    OrgAddressID int identity primary key

    OrgId int FK to Org

    AddressTypeId

    ...

    AddressUsage (many to many table with no children)

    OrgAddressId int primary key

    AddressUsageTypeId int primary key

    At my company is an MS consultant, who is quite smart, but he's got a serious phobia about bookmark lookups, so he generally designs tables like this:

    Organization

    OrgId int identity primary key...

    OrgAddress (no identities on child tables, only compound keys, like natural keys, but not)

    OrgId PK

    AddressType PK

    AddressUsage

    OrgId PK

    AddressTypeId PK

    AddressUsageTypeID PK

    You get the idea. While walking through this I found one instance where we'd be looking at an 8 column (all integers, but none the less) primary key. The main problem is, he's got math on his side. When you eliminate bookmark lookups, you get about 20% increases in speed. Problem is, you need to be dealing with hundreds of thousands of rows, on every frigging query, to really reap any benefit.

    Help me out here, am I insane for resisting this? Should I simply go to the dark side with quazi-natural keys? I'm not opposed to compound keys where appropriate (that's why I included the many-to-many table above), but this approach seems excessive.

    Feedback would be appreciated.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I am and always will be dead against compound keys.  I have been thrown into environments where they existed and have been faced with nightmare situations in regards to maintenance and data updates and conversions.  The 20% increase in performance can be argued as well.  What I did was I found out what the 20% actually worked out to be and then compared that with the extra maintenance required and the extra space required by having a unique index of like 8 columns.  In one case I had someone telling me that they could save 2.5 seconds by adding a huge index that took up both space and time to program against.  In the end I won out simply because nobody could confirm how much work would be needed to be done on this specific database over say the next couple of years. 

  • Thanks for the info. What types of maintenance headaches did you run into specifically? Disk space is kind of hard to argue, it being so cheap these days, but other information would be useful. I just can't stand the fact that I'm going to be spending X amount more time just writing the 'ON' clause of the joins in TSQL, let alone other work.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • <second attempt: original posting got eaten>

    I haven't (yet) had to deal with this issue, but I always figure I'd work things out as follows.

    First, always identify the natural primary key. (If it's an "Int Identity", good, but never force it to be one!) Two and maybe three columns shouldn't be a serious problem, but anything more and certain issues come up: How do you determine the order that the columns are indexed on (to support query plans)? You probably wouldn't want to cluster the primary key, as that would increase the sizes (and performance times) of all the other indexes. If any of the PK columns change over time, or if the are time (such as a datetime column where milliseconds count), you will probably have issues maintaining uniqueness over time.

    Perhaps more importantly, consider foreign keys. Would all 8 of those columns be used in the child table for anything besides relational integrity? Performance takes a hit, as ever insert (and certain updates) on a child table will require an 8-column lookup on the parent, and a delete on the parent will call for a similar lookup on every child table. Under such circumstances, I'd contemplate adding an Int Identity column to the parent that might only get used to support relational integrity.

    A last point, the use of compound primary keys leads to the issues underlying 4th and 5th order normalization. I don't know about you, but I have trouble wrapping my mind around these concepts, and I find I tend to "design away" from situations where these issues crop up.

    There's a whole world of "it depends" behind these issues, and no one answer is going to fit every possible situation. As this would seem to be a classic "purists vs. practicalists" debate topic, I look forward to seeing what anyone else has to say on the subject.

    <saving my text before hitting "Post Reply" this time>

       Philip

     

  • Sorry, I have to disagree. I'm not going to cite references or argue performance because I have none at hand. I've been doing database development in a number of platforms for 20 years and if there's one thing that I hate it's when a child table someone just slaps an identity field on it as part of the key. I'm just discussing it from the standpoint of experience and training.

    Compound keys are a natural expression of the data structure, but if you're using more than two or three fields to produce that, you might not be properly/sufficiently normalized (recognizing that you can over-normalize a table). I think there would be definite design problems if a table required eight fields to identify a record.

    I think it is weak programming and poor understanding of the data, IMO an identity field should be used to qualify a PK only as a last resort, that being all other data would still be dupes, which suggests a potential design flaw to me.

    I had someone hand me a database (Access) that had to go to SQL. One of their tables did not have a PK, I told him that it needed it. He went and added an identity field to each table. I didn't kill him, but only just.

    To me, again, identity fields in keys are a crutch. Yes, sometimes data does need it. Yes, it can be useful to create identifiers on the fly. Still, I think it should be an exception rather than a rule. I think they would easily lead to line-oriented thinking, rather than set-oriented thinking, which is where we as SQL relational programmers should be.

    But that's just my $0.002 worth.

    It'd be interesting to see if Joe Celko jumps in on this one, I'd love to hear his opinion. 🙂

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • saving my text before hitting "Post Reply" this time

    Philip, Philip, Philip! 144 posts and you haven't learned to compose big replies in Word yet? 🙂

    I've lost more than one good post myself, sucks, doesn't it? 🙂

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • <This is an "aside" post, as it does not directly address the main topic>

    Wayne,

    I could do that, but then I'd have to deal with formatting issues. These days it's:

     - Enter text

     - Ctrl+A

     - Ctrl+C

    and post. Of course, my text only gets lost when I forget steps 2 and 3...

       Philip

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply