Can anyone think of a good reason for NOT having a PK?

  • I know this will shock a lot of people, I pull about 90% of my data from other sources and don't ever use PK.

    I can't guarantee that all or any of the data from the other system won't be unique or that my inter-table referential integrity will be correct. You would be suprised to see the number of problems with data that we have seen from other systems. I do have clustered indexes on most of my tables, but having a unique index on my tables is a luxury that I just can't afford. If I controlled the source of my data, then it would be a different story.

    I just load what's from the other system and then clean it up on my end after the load. It's much easier than having jobs fail and trying to find which row out of 1,000,000 that caused the problem and then reloading at 2am.

    I must be honest and say that I haven't looked into having a primary key that isn't unique, but I assume that you can't. What would be the difference, speed wise, if I have a nonunique-clustered index as opposed to a nonunique-primary key?

    Just for the record, my system is mostly reporting and read-only to the users. I pull from other systems that can't handle the load that other people want to put on them. I try to clean up the data as best we can before presenting it to the users. I try to simulate a PK, but I guess that I don't see the difference speed wise between a nonunique PK and a nonunique clustered index.


    Live to Throw
    Throw to Live
    Will Summers

  • your best bet is to have an identity field as your PK and a non-clustered index that's imported in. The advantage here, is that you can import the data, ignore the query order, and resort the index outside the data file (so the physical data doesn't need resorting with the index) after the data import using minimal resources.

    Also, are you dumping the data and reimporting the table every time or just new records?

  • If my system is purely reporting, why would I want to have an identity field as a PK? I just add rows to the table daily and never remove data.

    Wouldn't the identity column just be a waste of space?

    I would never be quering the data on the identity column, so I don't see the need for it. My clustered index would be the date field (char(10)) to insure that my data would always be inserted at the end of the table.

    I know that putting a clustered index on a char(10) field isn't the optimal column type, but that field is in all of the queries that anyone would run on that table and it helps speed up my inserts due to the minimal page splits(only new pages should be created, in theory).

    I do try to simulate a primary key, but I just can't afford the unique option.


    Live to Throw
    Throw to Live
    Will Summers

  • It gets inserted at the bottom of the table regardless. The clustered key is what would cause the insert to "insert" in the middle of the file (resorting the existing fields). The identity field is just a suggestion, and I have found using them have saved me buttloads in the past when fixing problems. You can add that field at any time. My point is that you don't want to be worrying about clustered indexes on that type of table. If you have a clustered date index and another index you would like to do, I suggest you make them both nonclustered indexes. Your inserts and queries will move along alot faster.

    There is probably a good hand full of threads/articles on this board about how to determine your indexes. SQL Server even comes with a tool to help you.

  • I have used the Index Tuning wizard on many occasions. All it all, it's not a lot of help to me. It mostly just gives me a warm feeling when it says that it has no recommendations. It also helps to make sure I haven't missed any obvious columns.

    My point that I was trying to make about an identity column is that it's not always needed. I have worked with people who want to put identity columns on all tables. If it's needed, I'm all for it. If I have a 100 million row frontend-read-only table, I'm not putting a identity column on it to make the developer happy.


    Live to Throw
    Throw to Live
    Will Summers

  • Elementary view I learned in college:

    Input > Process > Output

    It sounds like you cannot control the Input due to it being external... You cannot control the Process due to political reasons, and the output is what the process does to the input... seems like this system is working flawlessly.

    Note: that this is not what I believe, but this is the analysis of the situation to show the obvious problem.  The process is jacked up...  Fix the process politically or else you will be the input for the "fire individual" when the "sh!tHitFan" method gets "fired" (do not excuse the pun).

     

  • Warm Feeling-> writing code that compiles and outputs correctly the first time...ahhhh

    Anyhow, I feel your pain... been there done that kind of deal... Putting the Identity field wasn't my point. I was just pointing out that you don't need to have a clustered key for your situation.

  • Maybe your DBA wanted to know if you'd notice?

  • Heh... guess he got THAT answer

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Interesting thread! A couple posters have good justifications for not putting a PK on a table. In my databases I would only allow that if the table records were guaranteed NEVER to be updated, for example for a use-once-and-then-discard type of table.

    I would still put some sort of index on the table, perhaps even a clustered index, depending on the situation.

    I would allow the table to have no primary key and no index only if:

    1. The table records were guaranteed NEVER to be updated.

    AND

    2. The table was guaranteed to be read only sequentially.

    A staging table might meet this criteria. That is, data is inserted sequentially into the table, then at some later time the table is read sequentially to create other table(s) or file(s) or web page(s) or ... There are a couple of good examples of this type of table in the posts above.

    Better hope your DBAs go to some classes. Sounds like they're still working with 1990s-era databases.

  • Just a thought... How do you read sequentially from a table without a cursor or loop?  Except for the occasional control loop on processes that handle thousands or millions of rows, I've found that most folks try to avoid loops, even on staging tables.  What would you be doing that requires a table to be read "sequentially"?  I ask because it goes against the basic grain of what an RDBMS is...

    Not trying to be a smart guy or provide you with a challenge... just interested in your statement and wanted to know more...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • >> I suggest you make them both nonclustered indexes. Your inserts and queries will move along alot faster. <<

    That depends on the queries.  For example, if you always specify a certain column value/range as part of the query, a clus index should always improve the performance of that query, more as the table grows larger.

    Inserts can be an issue, so it's best if at all possible to use a clus key that always increases (for example, insert date). 

    If that is not possible, as a last resort, you can always lower FILLFACTOR and reorg as frequently as necessary to maintain proper insert performance.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I have indexes on 99.9% of my tables. I just don't have a PK on any of them. I can't control the data from my sources.

    Here's a good example of some of the data problems that I have to 'massage out' in my process:

    the SSN field from one system doesn't have integer constraints on it, stored as a char(9)!!! Meaning that I have to load that field into a char(9) and then strip out anything that is not an integer. I have seen too many times people inputting the letter O for the number zero.

    Just an example.


    Live to Throw
    Throw to Live
    Will Summers

  • Will, when massaging out issues, are you working the entire table at a time? I could still see the need for a PK, even a surrogate one to allow you to get to any particular row if necessary.

  • We already established that he doesn't care to have an identity field. worse case scenerio, he can add the field before he runs whichever operation he needs.

Viewing 15 posts - 31 through 45 (of 71 total)

You must be logged in to reply to this topic. Login to reply