Foreign keys good or bad practice?

Question

Foreign keys good or bad practice?

Viewing 15 posts - 46 through 60 (of 87 total)

You must be logged in to reply to this topic. Login to reply

RonKyle SSC-Dedicated Points: 31558 More actions · Answer 1

MelS (11/19/2007)
I've inherited a system in which all the foreign keys are disabled and triggers are being used to enforce referential integrity. The nice part about this setup is that errors can be handled more gracefully in a trigger and returned to the application in a more useful form. Having the foreign keys in place, but disabled, makes it easier to maintain an ERD. Not quite sure about the downside of this setup yet since I haven't been able to profile a performance comparison of triggers vs. foreign keys with this application. Anyone have any thoughts?

I've written plenty of VB Code that handles the error elegantly and returns a user friendly message. All you do is substitute the returned error message for a more elegant one based on the error number and maybe the contraint name that appears in the original error message.

aureolin Hall of Fame Points: 3520 More actions · Answer 2

And one place it can be helpful is if you have a system where you enter in an address. The system auto looks up using Full Text or some type of fuzzy logic similar addresses, which are stored in a different table, and can help you zero in on addresses that are similar when they are given but are really the same in practice. You do the referential integrity using a join table between person and address and here's something else that suddenly can give you... If you have multiple hits against an address from person records that are really, really similar, it allows someone to evaluate and potentially merge the records together, thereby streamlining your database.

BZZZZT! Wrong answer, but we have some lovely parting gifts!! LOL!

In this specific application, if you're getting multiple people giving you the same data, you can be almost 100% certain that its bogus information. Fundamentally you have to assume that the original input data is at best mistaken, if not deliberately incorrect. Identity management is a huge issue in an environment where one person may give a different answer every time the walk through the door, and the next 5 people may each give you the exact same answers. It's possible that locating similar addresses / other input could be helpful in allowing you to sniff out potential bad data, rather than streamlining data entry or validation. 😀

Steve G.

"It's not so much that truth is stranger than fiction, it's that fiction has no idea what it's up against."

K. Brian Kelley SSC Guru Points: 114642 More actions · Answer 3

aureolin (11/19/2007)

And one place it can be helpful is if you have a system where you enter in an address. The system auto looks up using Full Text or some type of fuzzy logic similar addresses, which are stored in a different table, and can help you zero in on addresses that are similar when they are given but are really the same in practice. You do the referential integrity using a join table between person and address and here's something else that suddenly can give you... If you have multiple hits against an address from person records that are really, really similar, it allows someone to evaluate and potentially merge the records together, thereby streamlining your database.

BZZZZT! Wrong answer, but we have some lovely parting gifts!! LOL!

In this specific application, if you're getting multiple people giving you the same data, you can be almost 100% certain that its bogus information. Fundamentally you have to assume that the original input data is at best mistaken, if not deliberately incorrect. Identity management is a huge issue in an environment where one person may give a different answer every time the walk through the door, and the next 5 people may each give you the exact same answers. It's possible that locating similar addresses / other input could be helpful in allowing you to sniff out potential bad data, rather than streamlining data entry or validation. :D/quote]

Notice the point about person records being really, really similar. For instance, if you get a Robert Jones, then a Rob Jones, then a Bobby Jones, all 5'9", all same race, obviously all male, with the same distinguishing mark of a burn on the right cheek, and all from the same address, chances are you have 3 hits for the same guy.

I'm not disputing the identity management problem. What I'm saying is you have tools to attack one side of it, which is duplicate data that truly is duplicate. If you've got someone entering data and the guy tells that person he's Bobby Jones from 1234 W. 4th St., and you already have a Robert Jones from 1234 W. 4th St., and the record pops up and the rest of the characteristics which are observable are simple, the data entry person can go, "I think I already have you in the system." And continue on with the processing by asking another couple of confirming questions (open-ended of course).

You're right in that it doesn't solve cases where the people don't know the information (not unusual given the # of mentally ill among the homeless populace) or are intentionally giving false information. Very few things in a database architecture will, and I wholeheartedly agree with you on that.

However, you can use some of the database architecture to attack some of the more solvable problems.

K. Brian Kelley
@kbriankelley

Pratt Mounfield SSC Enthusiast Points: 104 More actions · Answer 4

In back-to-back test, I found the following with SS V7 (I know a while back but probably still valid): No RI is the fastest (of course); RI enforced by SQL (IF EXISTS (SELECT * FROM ... WHERE ...) BEGIN do something END for example) has a 10% penalty; RI by explicit PK/FK relationship had a 20% penalty; RI by a trigger had a 50% (yes fifty percent) penalty. Take your choice, but the 'I' part of RI means something. Don't forget the index ...

John Rowan SSC Guru Points: 56440 More actions · Answer 5

prattmounfield (11/19/2007)
In back-to-back test, I found the following with SS V7 (I know a while back but probably still valid): No RI is the fastest (of course); RI enforced by SQL (IF EXISTS (SELECT * FROM ... WHERE ...) BEGIN do something END for example) has a 10% penalty; RI by explicit PK/FK relationship had a 20% penalty; RI by a trigger had a 50% (yes fifty percent) penalty. Take your choice, but the 'I' part of RI means something. Don't forget the index ...

Interesting. Can you post the code that you used to perform your test? I would like to use it for SS 2000 and compare the differences. If you no longer have the code, please at least detail how you went about performing this test including table schemas and row counts.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

RonKyle SSC-Dedicated Points: 31558 More actions · Answer 6

It would also be good to know the "penalty" that NOT having RI has on ordinary select statements so that orphan records would either be or not be returned as appropriate. In this case, having RI has NO penalty because there is no chance of this occurring, but not having an RI has an associated overheard. We can't just be thinking inserts/deletes.

aureolin Hall of Fame Points: 3520 More actions · Answer 7

'm not disputing the identity management problem. What I'm saying is you have tools to attack one side of it, which is duplicate data that truly is duplicate.

Very true - which harks back to my original statement that appropriate PK/FK relationships & constraints can help, but it's a thin line between helpful dup checking and going overboard with constraints and making the system unusable. 😎

Steve G.

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 8

I've only read the first two pages so forgive me if I say something that was already said later on.

Some have questioned FK relationships based on performance. Yes, there is a small performance hit -- on inserts, updates and deletes -- while actually improving performance on selects. However, if your database is like the overwhelming majority of databases, it will be hit with queries a whole lot more often than with inserts, updates and deletes. So you are taking a small hit in an area where you don't spend much time in order to speed up the area where you spend the bulk of your time. Not a bad trade off.

Another consideration was user response. The user gets an error message and has to deal with that when all he wants to do is enter the %^&! information so he can get his job done! Well, when bad data enters the system, when is the best time to learn about it? If you said "as soon as it can be detected -- before it enters the system if possible!" then go to the head of the class. Constraints serve both functions: it flags the bad data as soon as you try to write it to the database and it prevents the data from entering the database. Besides, how well is the user actually doing his job if he is entering bad data?

A second word about performance. I explained above how constraints (specifically FK constraints) move slow performance from more often used areas of execution to lesser used areas. Another way is to move slow performance from the running application to the application development. Yes, programming with constraints is slower and involves more effort. But the result (assuming other factors were also done correctly) generally results in an application that is faster and easier to use.

The application is being developed to make the user's job easier, not the developer's. A general rule of thumb: if you have designed the database correctly, the developer's job has become more difficult. Sorry, but if you don't like challenges you should consider a career change.

In real estate, the three most important qualities of a good investment are location, location and location. In database design, the three most important qualities are data integrity, data integrity and data integrity.

What about performance? Well, there are just too many ways to improve performance without sacrificing data integrity for this ever to be an issue.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

aureolin Hall of Fame Points: 3520 More actions · Answer 9

The user gets an error message and has to deal with that when all he wants to do is enter the %^&! information so he can get his job done! Well, when bad data enters the system, when is the best time to learn about it? If you said "as soon as it can be detected -- before it enters the system if possible!" then go to the head of the class.

Congratulations! You entirely missed the point. 😛

It is easily possible to create a system that has such stringent data entry requirements that it is nearly impossible to get real-world data into the system. My earlier example was a complex identity management problem. Let's take on something simpler, an order entry system. You could make the order entry completely tight - the salesbot has to know everything about the order to do the data entry. What's the first thing that happens? He(She, It) creates an excel spreadsheet to track all the orders that he doesn't yet have all the data for.

Of course, you'll say that the system should be designed to allow potential orders to be entered, and at that point you're agreeing with me. The integrity of the system should be protected with appropriate PK/FK relationships and constraints. It is way too easy to lose out to the glass house mentality and assume that all data must be both perfect and complete before it can be entered, resulting in a system that is both unusable and generates "hidden" data caches like that excel spreadsheet mentioned earlier.

Steve G.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 10

aureolin (11/16/2007)
More constraints equals better data quality, at the price of system usability. Less constraints equals more usability at the price of system integrity. As DBA's almost everyone will say "Well, that obvious - integrity is so much more important!" And, then you find that in the real world the users are busily working around your perfectly architected solution because they can't enter their imperfect, real world
data.

Only in a poorly designed system. You don't have to compromise a whole lot on either if you design your UI and data models well. No, really - I mean it. Yes - it's a bit more work than "one size fits all, no validation, no checking, no nothing, just GIGO", but then again - that's what our job is.

Ultimately, it's a balancing act. You need enough foreign keys, data constraints, etc. to keep the data reasonably clean and enough flexibility to allow the entry of dirty, incomplete, and occasionally contradictory real world data. Where that line is drawn differs from company to company and application to application. Trying to draw the line way off to one side or the other leads to the user's disliking IT as "not getting it" and "their stuff doesn't do what we need it to" and the proliferation of all these side "projects" that undermine company wide data integrity.

It IS a balancing act, but starting with the attitude that FK's are worthless (which is what you're not just portraying - it's like you're screaming it from the rooftops) is the wrong approach. As I've said before - bad data is worse than NO data (notice I said "bad", not incomplete)

I mean, really - what is the purpose of gathering data that you KNOW is garbage (you said so earlier)? Does it really serve ANY purpose, or does it do more damage than good? Really - think about it.

I understand that you're up against a serious challenge with the app you're discussing, but I'm assuming you're aware that that's a bit of an extreme. You actually MIGHT be better off with a spreadsheet application.

Foreign keys are no different than any other tool. Yes, they can be abused, but that's not an excuse NOT to use them.

It's kind of like passwords. The stronger and better and more secure the password is, the more likely it is to be written on a sticky note and pasted on the monitor for everyone to see.

...which will get you fired where I come from.

Tight, Convenient security isn't.

Clean data with no constraints isn't.

Relational integrity from the business layer is a fantasy.

War is Peace

Freedom is Slavery

Ignorance is Strength

Love Big Brother....

All right, I'm stepping down from the soapbox....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

John Rowan SSC Guru Points: 56440 More actions · Answer 11

It IS a balancing act, but starting with the attitude that FK's are worthless (which is what you're not just portraying - it's like you're screaming it from the rooftops) is the wrong approach. As I've said before - bad data is worse than NO data (notice I said "bad", not incomplete)
I mean, really - what is the purpose of gathering data that you KNOW is garbage (you said so earlier)? Does it really serve ANY purpose, or does it do more damage than good? Really - think about it.
I understand that you're up against a serious challenge with the app you're discussing, but I'm assuming you're aware that that's a bit of an extreme. You actually MIGHT be better off with a spreadsheet application.
Foreign keys are no different than any other tool. Yes, they can be abused, but that's not an excuse NOT to use them.

Exactly, thanks Matt. Starting off with the attitude that FKs are worthless is poor database design. FKs are not only relevant in terms of database design, they are crucial. Data structures should be designed using normalization and referential integrity. Any deviation from this should be the exception, not the rule. The simple fact is that if FKs make a system unusable, your design has major flaws. I think where a lot of folks run into problems is when they confuse data storage with data entry or data retrieval. How you enter data or how you want to 'view' the data should have little impact on how you actually design your data structures. If your data structures are not designed to be normalized, relational representations of the data, you should not be designing databases and you should revert back to flat file/spreadsheets. I see so many folks who's idea of a table is a spreadsheet or file and because they think of the data in this way, their table design shows it with the lack of normalization and referential integrity.

Anyway, Steve Jones just kicked off a series on design today. His first article clearly shows the 'rough draft' of his design. I look forward to seeing how his design progresses as he applies the rules of normalization and referential integrity to it.

Here's the link to Steve's article:

http://www.sqlservercentral.com/articles/Database+Design/61489/

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 12

aureolin (11/20/2007)
Congratulations! You entirely missed the point. 😛
It is easily possible to create a system that has such stringent data entry requirements that it is nearly impossible to get real-world data into the system.

[Sigh] It is certainly possible (though maybe not always so easy) to create just about anything you can think of.

OK, if you think that's true, then it should certainly be possible to create a system with tight data integrity constraints AND flexibility and usability.

Imagine a web site for an online retailer. People come to his site and look around without identifying themselves. Just for grins, they pick out some items and add them to a shopping cart. Then they leave. Weeks later, they return, this time they log onto the site and ask "where is the stuff I selected weeks ago before I had an account here?"

In this scenario, we have order items without an order or even a customer to connect them to. This is a data integrity issue. Is the data valid? No. Should it be allowed in the database? No.

Not, that is, as a regular order. However, there is no reason we can't maintain that data in one form or another keyed to, say, a session id, and then, through the use of cookies or other extra-dbms facility, access the incomplete data and allow the user to finish. But until such time, the information is quarantined so it can't infect our good data. It can't, for example, be used as input to the Shipping or A/R applications until it passes all the integrity restraints.

There are several methods of implementing this, none of them involve loosening the constraints.

I have yet to see the problem that could only be solved by loosening constraints. I draw short of declaring there is no such thing -- I'll remain open to the possibility.

Can data be over-protected? I suppose. By hundreds-to-one, however, I see more data that is under-protected than over-protected. So my time is better spent looking for ways to increase the data integrity rather than the other way around.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

K. Brian Kelley SSC Guru Points: 114642 More actions · Answer 13

In the security arena we have a concept called Defense in Depth. It's a rather simple idea: have several layers of protection against a potential compromise.

For instance, having the different types of firewalls protecting DMZ servers means you have to exploit two different layers (one type of firewall doubled is considered a single layer because a single exploit could bypass both firewalls). Locking down ports and services on the web server is another layer. Putting a bastion host like Microsoft ISA server (an application firewall) would be another layer if it was different from the other firewalls in place. Putting an IPS (Intrusion Prevention System) in-line with respect to the network would be yet another layer. Having HIPS (Host-based Intrusion Prevention System) is an additional layer. And having the external router only allowing web traffic could be considered a final layer.

Put all of that together and we're talking seven layers of protection so if any one layer failed, you've got additional layers to hopefully prevent compromise. Of course, nothing is 100% proof against compromise, but things got awfully difficult, didn't they?

Let's apply this same concept to data integrity. You want to do integrity checks at the application layer? That's great. You want to have meaningful error messages for the end user? Outstanding, because that's all part of a solid user experience and you should be doing it anyone. But that doesn't cut out the practice of using constraints at the database level whenever possible. Constraints should be seen as a safety blanket instead of an impediment. What happens if someone checks in bad code and that makes the build? And what happens if that bad code munges the data where it would effectively render some of the data useless? You want to be the developer that has to try and figure out how to write SQL to fix the data corruption? I sure don't. Have I seen it? Yup. Was it fixable? Nope. Now we have the dreaded data loss.

What would have happened if we had that there foreign key constraint in place? Well, that bad bit of code would have tried to put data in the database that wasn't right.... and an error would have promptly bubbled back indicating there was an issue. Now, if we suddenly start getting this error and we know a build went in just before, it doesn't take a rocket scientist to start checking the code that was modified. If you didn't have that foreign key constraint, you may be unlucky enough to see several builds go in before the data problems became great enough that people started really noticing. And how do you recover from that?

Like I said, I've seen that. I've also seen the case where those foreign key constraints started chirping like a felon turned state's witness. Boy were we glad we had those constraints in place because they flagged a serious issue that got missed during testing. And we were able to remediate immediately.

K. Brian Kelley
@kbriankelley

aureolin Hall of Fame Points: 3520 More actions · Answer 14

Actually, I am finding something interesting about this whole discussion: the apparently large number of people who take my attitude of "appropriate" foreign keys to mean "no" foreign keys. ("screaming from the rooftops" was one description?)

Y'all might want to rethink how you're relating to developers and to your user community if you really do believe "appropriate" = "none". 😛

Steve (the highly amused) G.

Pratt Mounfield SSC Enthusiast Points: 104 More actions · Answer 15

When I posted the results of my own testing several years ago, I assumed that since explicit RI (well defined PK/FK relationships) had such low 'penalty" w/r to using IF EXIST(...) in SQL and even no check at all, and that using Triggers had such high 'penalty' , one would choose explicit RI. We are designing databases after all, not scratch sheets for applications. When your RI is broken your data may have no value to your business.

Buy the way, what is more important that RI ... think uniqueness.