Worst Practices - Not Using Primary Keys and Clustered Indexes

  • 1) Article is good but required some more information how integrity is at risk without having primary key/clustered index in controlled application....

    2) more relevant to primary key than clustered index.....

  • Not having primary key/cluster index is a bad practice, but not having FK/relationships is disaster. It is the main ingredient to maintain data integrity and Data integrity is vital part of any database.

    I am totally agreed with Andy; no way can we ignore PK’s, but intelligent primary key can play better role to maintain data integrity during data migration process. It can impose data duplication check and is used for reference to build relationships. PK on identity col is a dump pkey, which generates IDs during inserting record in the DB and one has to use unique key as well to avoid data duplication.

    I have gone thru implementing PK/FK throughout a big application which has more than 200 tables but without relationships and proper indexes. I can portray how stable this application is right now after implementing Pk/FK with cluster index on PK /index on FK.

    Asif

  • Hi Andy,

    I will agree that from a logical perspective, all entities must have a primary key. I will also agree that in the transition to the physical realm, if these entities remain intact then the key should also follow. Depending on how the application (not the user) intends on accessing information a surrogate key may be the best selection in the physical environment.

    We may, however, agree to disagree, on the generalized implementation of primary keys and clustered indexes. This is especially true in heavy transactional environments where relational integrity is secondary to transactional integrity and capacity/performance is critical. In correctly constructed transactional systems the business object and its internal data structure become the guardian of integrity. RI management overhead is as much a performance problem as it is redundant. This is especially true in n-tier transaction environments.

    OLAP environments are different beasts altogether. While the Atomic warehouse is populating data-marts indexes are invaluable. And if you are providing drill down capability into the atomic warehouse you must have a unique identifier. However, when you update it, a clustered index on large tables can be cost prohibitive. I have found that it is necessary to drop all indexes to make large batch inserts to some of our largest tables. You can't do this with a primary key.

    The bottom line here is that the role of the DBA is more than just to execute rote and generalized rules. It is to make considered and pragmatic decisions about the physical deployment of logical models based upon the performance characteristics of the physical platform and the targeted application environment. This is what makes them so valuable and why experience is so critical.

  • McDoneP,

    Cant really argue with any of that. Im here to evangelize that you should stick to the rules until you have the experience to know when its ok to break them, and even then...sometimes things change and you should still reconsider before breaking them! By change, I mean that sometimes things that weren't possible/doable in one version may become possible in another version. Or what you can't do on last years hardware becomes totally plausible on this years.

    I like your comment about the business objects even if I don't agree 100%. I like to put as many constraints in place at the db level as I can, even if I mirror those in the app somewhere. Really depends on how you use the db. If you exclusively use the app, using the bo's makes sense. Where I work we do a LOT of data manipulation external to the app, nice to have a few safeguards in place. Another one of those experienced decision points maybe.

    Thanks for your comments!

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • quote:


    I thought I had a reference that shows where a clustered index is better than a heap in terms of performance, but I can't find it. If I do I'll post it.


    Steve, could it be the issue of forward-pointers (they are used in heaps, but not with clustered indexes) that you are referring to?

    As for the article, in my opinion it is great. AI do agree with most of the comments made here in the discussion about identity and such, but as Andy says, the recommendations in the article is better than nothing at all.

    Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

  • This is an excellent thread -- I learned a number of new things! Thanks, Andy.

    As to the use of Identity columns as Primary Keys verus using "natural" keys -- I've designed tables both ways and I think Identity columns are a very good way to go. For example, when you have several layers of 1:many realtionships, you invariably end up with natural keys that are 3, 4, 5 or more columns long. This means that you will have to often join on 3, 4, 5 or more columns.

    I tend to use an int[eger] Identity column as a PK and then, if there is a natural key in the table (even one composed of several columns), I'll make it a Unique Index.

    This seems to give the best of both worlds -- I can do my joins with my efficient "int" PKs and I have the unique "natural" keys for searching, filtering, etc, as well as the unique constraint that prevents inadvertant duplication of the natural key.

    The one situation that might give me pause in using Identity PKs is when the database might get replicated. Then I'll have to evaluate whether I'll want to create the tables with the NOT FOR REPLICATION option and use different Identity seeding on the subscribers or use some other scheme, such as GUIDs for PKs (which may slow down joins.)

    Best regards,

    SteveR

    Stephen Rosenbach

    Arnold, MD

  • Identity cols and replication are a trial. Merge is set up to work with them to a degree, letting you specify ranges for subscribers. Transactional doesn't have that flexibility. It only really matters when you'll be doing inserts on the subscriber. If you need inserts I think GUID's have no peer.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • quote:


    ...where relational integrity is secondary to transactional integrity and capacity/performance is critical. In correctly constructed transactional systems the business object and its internal data structure become the guardian of integrity.

    ...


    Please clarify what you mean by "transactional integrity."

    Do you propose to enforce data integrity *outside* the database in lieu of within it? (As was done years ago and eventually abandoned because of the many and disparate problems with such an approach.)

    It continues to amuse me how often we in technology re-invent the same flawed wheels year after year everytime they put a new coat of paint on it.

  • One rumor I have heard is that all DBMS's do not implement primary keys in the same fashion and because of that, if you wanted to create a cross platform, cross application database application, you could not use primary keys, but instead would need to put a unique constraint on the table in place of the primary key.

    Is this correct? Is there any validity to this rumor?

  • I think you state the obvious in an "idle" way to workaround relational algebra in creating and defining data models. Databases depend on solid data structures and relationships with a minimum normalisation threshold (BCNF)of level 3. What this means is there should be no contention over choosing a "seed" column to act as a primary key on a table, after all in the long term it's more overhead to INSERT and DELETE as well as fragmentation. Which leads me nicely on to blowing your clustered index theory out of the water as performance enhancements are only gained on mainly read only databases, NOT heavily used production databases. The moral of the story is data modelling. All data structures and all DML statements are considered and implemented at a model level, not on the whim of a DBA or senior developer. After all who administers the database administrator ?

  • Well joined the party a bit late but will add my 2 pennies anyway.

    Rules are great, but should be there as a basis, i.e to be challenged

    FKs really have to have indexes if you are going to delete records from the parent. Ever tried deleting data when the FK table has millions of rows but no index.

    Size is key to performance. If you have a PK on a char(10) and its clustered then its 2.5 times larger than using an integer. And all your other indexes are based on it and thus a knock on affect.

    Using composite keys results in multi column joins, increased index size, and in worse performance. It does however allow you to filter grandchild tables based on the a value from the parent without joining to the child and parent (if the filter column is part of the PK)

    I am a great believer that SQL Server is too easy and has shot itself in the foot, as developers have designed bad databases, bad sps and thus SQL Server has crap performance. It is interesting that at an MS presentation they were very adamant that one of the key components of Yukon will be documents and guidelines of how the CLR code should be used with the server.

    Simon Sabin

    Co-author of SQL Server 2000 XML Distilled

    http://www.amazon.co.uk/exec/obidos/ASIN/1904347088


    Simon Sabin
    SQL Server MVP

    http://sqlblogcasts.com/blogs/simons

  • Hi

    First of all from a comment made:

    >>I am a great believer that SQL Server is too easy and has shot itself in the >>foot, as developers have designed bad databases, bad sps and thus SQL Server >>has crap performance.

    is a complete junk, I have a 20Gb db with over 5800 online users, it performs very well and I am more than happy with the overall performance, its the developers and their transactions you need to watch, coupled with indexing etc etc (the list goes on).

    As for the article:

    a) pks are essential, not only in terms of normalisation and the whole 1st principal of a relational db, but also in terms of the db doing the validation and not your so called "app"

    b) save your clustered index for columns that really benefit from it.

    c) key size shouldnt be your dominating answer, clustered bulk-column index can be very beneficial when design correctly against your app. remember, we are doing key lookups primarily, size is not a major issue in many cases.

    d) index defrag commands, you cluster, you get your data defraged as well (to a degree) unlike heaps!

    e) pkey should always be indexed, clustered or not! small tables may result in a scan but its of little concern.

    Cheers

    Chris K


    Chris Kempster
    www.chriskempster.com
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • btw.. i didnt even vote yet, where did my 3 stars come from?! sorry, it should be 5 stars! 🙂


    Chris Kempster
    www.chriskempster.com
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • Andy,

    If Primary keys AND (clustered) indices are so important why MS is not using them on the system tables?

    there is ablosutly no PK on system tables and on some tables there are no index at all.

    BTW I do agree with you.

    I do implement PK on every of my tables even tough I think that on several small tables (wich will remain small) the index is wrong.

    I am not happy to put clustered index on the Identity column (what I do use quite often).

    If I design the database I can define where to put the clustered indexes. Otherwise i'm analysing it vai the Profiler. The clustered index candidate are for me the range selects, the "like" queries end the composit indexes.



    Bye
    Gabor

  • I dont know why they dont do it on system tables. I guess because they enforce it internally and dont want to risk having someone 'tune' it and break something because they removed a constraint. You'd think under the hood they'd be using the same mechanism, but who knows. I'll try to look around, perhaps Inside SQL Server 2000 has something, dont remember seeing it but a good place to start.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

Viewing 15 posts - 61 through 75 (of 184 total)

You must be logged in to reply to this topic. Login to reply