911: developers insist PK/FK constraints impede performance

  • Despite having had the ability to cascade delete for years, I haven't set it up, not one single time. It is too scary and completely out of control. At the same time, all those years also include databases with full DRI and/or triggers to enforce the RI.

    For all those people that don't use DRI, you also don't drive with a seat belt, carry a spare tire, buy a car with airbags, use child safety seats, or helmets when riding bikes do you? Because DRI is just like all that stuff, it's a safety device to protect your data. Just as a seat belt makes getting in & out of the car a bit more of a pain in the tucas, so DRI makes getting data in & out of the database, but  that minor transitional pain is nothing compared to the greater pain of flying through the windshield or looking at thousands of rows of orphaned data.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Quick Survey:

    How many people here use DRI and have never worked with a database system that didn't have DRI capability?

    How many people here do not use DRI and started their development or DBA careers on a database system that didn't have DRI capability?

    Now, how many of each still think the way they did when they were introduced to databases?

    In my experience, it has been the older DBAs or the ones who started with something like MySQL (which didn't support DRI) that still decline to use it. Meanwhile, those of us who cut our teeth on SQL2000 prefer DRI because it was there and we learned how it should be used.

    Opinions?

    -- J.T.

    "I may not always know what I'm talking about, and you may not either."

  • JT, maybe you want to write my editorial one day

    I started with DBase/Clipper, no DRI there. You can to manually open each DB file as well as each Index to ensure they were updated.

    However I always set it up now, no cascading ever. These days it's becoming "rare" where you want to actually physically delete data as opposed to marking it as deleted. Too many audit, backtrack requirements cause you to really need the data there.

  • I agree. I've never seen a database with cascading deletes turned on. Nowadays, it is pretty much taboo to really delete anything anyway versus just marking it deleted so it doesn't appear in a GUI somewhere but has to still exist for auditing purposes.

    As to the parent-child discussion, not using DRI would cause major issues and I don't agree with not using it.

    The purpose of DRI is not to link two records together but to constrain records to ensure integrity. In other words, it can prevent the deletion of a parent record if children exist and can prevent the creation of children where no valid parent exists but it shouldn't hinder the deletion of a child of a valid parent. There's where the business rules come in to play.

    Using the code only to enforce this leaves one open to a direct database change where a parent record is deleted, leaving orphaned children.

    I'm sure I'm just restating the previous arguments, but not using DRI especially in the case of a clear parent-child relationship is a bad decision.

    -- J.T.

    "I may not always know what I'm talking about, and you may not either."

  • I have a huge parent/child schema which has to be partially flushed out every night so deletes for me are a major issue hence why I don't use DRI.

    On a separate system I use DRI because the design makes it feasible to capitalise on the benefits.

    As Steve mentioned earlier, I have an article coming up where I conducted experiments to see precisely what performance penalty there would be. Not wishing to prejudice my article but my conclusion was that there are few absolutes in SQL Server.

    A couple of years ago someone posted a thread where they managed to get high internet traffic by having two database servers supporting the web site. On odd days server one had virtually all indices and DRI stripped from it to allow a massive transaction load. Server two had the indices and DRI rebuilt to allow processing and reporting. On even days the roles for the servers reversed. Now suggesting a machine with nothing but PKs would normally freak out a DBA but in this particular instance it was a case of what was necessary to run a major implementation on a shoe string budget.

  • I guess the correct answer is "it depends". The right way should be a blend of good practice, data integrity and business need.

    Likewise, no one should ever say "DRI is always good" or "DRI is always bad".

    Makes sense. Thanks, David.

    -- J.T.

    "I may not always know what I'm talking about, and you may not either."

  • There's 2 questions that come up with the DRI discussion:

    1) Wthout DRI, is there a guarantee that the data has integrity? How? (This is not to say DRI can enforce all business rules and data relatioships. Not using it is inviting trouble, IMHO.)

    2) If not, does it matter how fast the retrieve is? Because if I'm basing important decisions (think $$$, or worse, decisions affecting lives) on faulty data, I've not been done any favour by having it retrieved quickly.

    Often the problem lies with the use of the DB: some want to use it for OLAP, and the rest use it for OLTP. These are generally mutually exclusive. There are ways to handle this dichotomy without sacrificing data integrity.

  • That post hints at another argument against putting integrity rules into the database... The integrity rules can be complex and cannot all be implemented in the database.

    So, do you put some of the rules in the database and some in the application? That's a maintenance and documentation issue.

    There's a benefit to having all the integrity rules in one place and not scattered about.

    --Peter

  • I think we are blurring the line between business rules and data integrity rules.

    DRI probably should not be used to enforce business rules beyond the very basic ones that double as data protection rules.

    For example,

    • Don't allow an address in an order that has an invalid state
    • Dont allow creation of a child ticket if there is no valid parent ticket
    • Don't allow a parent ticket to be deleted if there are valid children
    • Don't create an order without a valid account number

    etc...

    But then we have the ones that don't fit into DRI:

    • On every third Thursday, increase rebates on new tractors by 2%
    • All orders to Washington state pay sales tax
    • All orders placed by a given customer ID get free expedited shipping

    So, the real-world answer is probably that both already happen on a regular basis. Asking a developer to ensure data integrity and to manage all business rules seems like a good opportunity to miss something. Further, one of the strengths of an RDBMS is maintaining relationships between data. If it isn't used that's kind of like buying a car and only driving it in first gear. Sure, you'll get where you are going, but you aren't using what you've paid for.

    Can you give us an example of an integrity rule that cannot be implemented in a database?

    As for it being a maintenance and documentation issue: It shouldn't lead to that much more documentation. If you have a list of data integrity rules that need to be in place, all you need is a second column indicating if that rule is covered by DRI or code. If DRI, it should be covered in any good database documentation which should exist anyway. If code, then the code should be documented anyway.

    Look at it this way: Properly used DRI frees up the developer to not have to worry about data input issues that would violate integrity. All the developer has to do is trap the SQL errors and handle them. Otherwise, the developer can spend a lot of time reinventing the wheel when SQL already handles this.

    -- J.T.

    "I may not always know what I'm talking about, and you may not either."

  • IMHO, developer awareness of referential integrity considerations, and attempts to enforce RI at the application or middle tier level is certainly commendable. However, the RDBMS provides a definitive (declaritive) level of protection in a "buck stops here" manner. Integrity violations are simply not allowed to occur, period (in theory, of course - bugs aside).

    Blanket rejection of these capabilities on the basis of what are often merely unproven misgivings about potential performance degadation seems unwise to me.

    Centralizing RI enforcement in one place not only provides extra insurance against integrity violations, it can also serve as an early warning of faulty application logic. Depending on developers to always enforce RI is placing a lot of faith in human nature. IMHO...

  • Actually, I thought it was pretty clear that as much as possible should be in the DB. Those business rules that cannot be implemented as DRI should be coded into the stored procedures which handle the data management. (We are all using stored procedures for manipulating our data, aren't we????).

    Why? Quite apart from security and performance considerations, because front-ends come and go. C, C++, Java, PowerBuilder, Delphi, any number of direct-to-web, etc... if all the business logic is in the DB, you only have to code it once. If you put it in the app(s), you have to code it many times, and maintain it in many places.

    One more thing: SQL Server handles DRI checking / enforcement ***way*** faster than any application can. So the statement at the outset of this discussion (the developers maintaining that DRI slows down the system) is only true if they're not going to bother with data integrity.

Viewing 11 posts - 31 through 40 (of 40 total)

You must be logged in to reply to this topic. Login to reply