July 8, 2009 at 9:00 am
The fact is that because of a poor implementation, we resort to using "hacks". Surrogates instead of natural keys. There is too much maintenance work involved in having 3-4 candidate keys as a primary key because those would have to be defined manually in all foreign tables. This is where the current implementation of RDBMS fails.
DRI in SQL Server is extremely weak (cyclic foreign keys) and we have to resort to "hacks". Stored Procedures do not accept sets. why? Because there is no "table types" (table value constructors is a hack) in SQL.
The result of all this is high maintenance cost, low performance, duplicates etc. Blaming the relational model when the implmentation is wrong.
At the moment, there is no true RDBMS (SQL Server, oracle, postgre, mysql are not rdbms, they fake rdbms). There are experimental true RDBMS like dataphor and rel which use relational language instead of SQL.
July 8, 2009 at 9:20 am
I've built plenty of high-performance, extremely functional databases without these two features.
I don't see what you're talking about as "hacks". I see them as ways for me to have better control over how the system works.
Making default joins would make sense. It would be like not having to include index use in my queries. I can add an index hint if I feel it's necessary, but I don't have to if the engine does the right thing on its own. Default joins could work the same way. That would be the advantage to me of your pointers thing. Would hide it in the engine by default. (Just so long as I can override that, I'm cool with it.)
I can see some value to that. But not enough to call it one of the two most critical things to work on in the product.
On the table parameters thing, I don't see a way to implement that without all the drawbacks of table variables, or all the drawbacks of temp tables, or (shudder) all the drawbacks of so-called "permanent temp tables", or (even worse) all the drawbacks of global temp tables. If it's going to behave like one of those, I'd prefer that it behave as a standard temp table. Most advantages, least drawbacks, of those four options.
If they're going to work just like temp tables, then why task Microsoft with a high priority dev request, when we can just simply use temp tables already?
I think it would be nice if the table variable parameters that 2008 introduced could be used as real table variables. Updatable values and all that. I can see asking for that. Again, I don't see it as anywhere near one of the top two priorities.
Perhaps I'm missing a fifth option on those. Something that's not a temp table, not a table variable, etc. But I don't see it.
If I had to pick a top feature for MS to work on with SQL Server, it would probably be allowing me to get some of the Enterprise features into Standard Edition for a portion of the difference in license cost. I would really, really, really like backup compression, for example, but I don't need some of the other Enterprise features. If I could get that, for a license cost similar to RedGate's backup product (that does allow compression), it would be significantly more important to me than either of these two things that you're asking for.
For a second feature, I'd ask for the ability to easily make conditional steps in maintenance plans based on things like current index fragmentation levels. That'd be much more important to me than passing table variables back and forth between procs.
I have workarounds for both of those, but they would be higher priority to me than either of these things.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 8, 2009 at 10:17 am
Here is what Celko has to say about using surrogates:
There is no such magical, universal. Elixir of Life "id"; this
Kabbalah magic where all of God's creations have a number and His
number is 213 digits long, etc.
To be is to be something in particular; to be nothing in particular or
everything in general is to be nothing at all. SQL is based on logic
and not Kabbalah magic.
Here is what Celko had to say about my feature:
Oh, other SQLs handle references with pointer chain (with all the
safety we learned to put into Network DBs) under the covers. The
referenced value is in one physical locations, so CASCADE is cheap.
SQL Server is one of the older RDBMS architectures that repeat the
values in both the referenced and referencing tables.
Just for referencing, I did a cascading test. The databases used were Postgre, Firebird, and SQL Server. When the natural key was changed, the cascades were "instant" on Postgre and Firebird. SQL Server on the other hand cause table locks and took more time.
July 8, 2009 at 10:40 am
I will slither out from under my rock for a sec. If you use pointers instead of actual keys - how do you ever CHANGE a foreign key? Sounds to me you might gain some speed, but only at the expense of giving up control.
I'm not a big fan of natural keys, since many of my apps have suffered cruelly at the hands of users changing key avlues even after swearing they wouldn't ever change, etc., but that's why i often implement surrgate keys instead. I must be missing something - I just am not seeing the upside here.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 8, 2009 at 10:40 am
I will slither out from under my rock for a sec. If you use pointers instead of actual keys - how do you ever CHANGE a foreign key? Sounds to me you might gain some speed, but only at the expense of giving up control.
I'm not a big fan of natural keys, since many of my apps have suffered cruelly at the hands of users changing key avlues even after swearing they wouldn't ever change, etc., but that's why i often implement surrgate keys instead. I must be missing something - I just am not seeing the upside here.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 8, 2009 at 4:37 pm
Matt Miller (7/8/2009)
I will slither out from under my rock for a sec. If you use pointers instead of actual keys - how do you ever CHANGE a foreign key? Sounds to me you might gain some speed, but only at the expense of giving up control.I'm not a big fan of natural keys, since many of my apps have suffered cruelly at the hands of users changing key avlues even after swearing they wouldn't ever change, etc., but that's why i often implement surrgate keys instead. I must be missing something - I just am not seeing the upside here.
You would write an update command that would change the foreign key to whatever you like. You're not suppose to worry about "using pointers instead of actual keys". You're suppose to worry about the database model of the application your designing. The database engine is suppose to take care of all the "pointer" stuff. Infact, in the database products Celko mentions, the foreign keys are not present in the foreign tables. They are "pointer based" instead, and cascades are "instant". SQL Server is based off old rdbms architecture where foreign keys are duplicated in foreign key tables.
Also, natural keys can change, this is fine. What matters is that the natural key must be externally verifiable. In a real RDBMS implementation, the natural key can change without ANY performance impact.
July 8, 2009 at 10:44 pm
The benefits of a natural key is that it can be validated externally, debugged easily, doesn't require extra joins in some cases, free index etc. Surrogates do not belong in the relational model because it cannot be validated externally and it does not port.
Regarding the natural key versus surrogate key argument, take a closer look at the "natural" keys and they all have something in common - they are actually surrogate keys whose value is assigned externally. For example, the European Article Number is described on Wikipedia as composed of:
1) GS1 Prefix, the first two or three digits, usually identifying the national GS1 Member Organization to which the manufacturer is registered (not necessarily where the product is actually made).
2) Company number, consisting of four, five or six digits depending on number of GTIN-13s required by the manufacturer to identify different product lines.
3) Item reference, consisting of two to six digits.
4) Check digit,
So the first two components define who assigned the item reference number and the item reference number is a non-meaningful value, so this "natural" key is actually a surrogate key. This same argument applies to all of the other so called natural keys such as Vehicle Identification Number, Social Security Number or the US Tax Identification number. Even DNA cannot be used as an identifier because it is not unique for monozygotic twins.
There is no such magical, universal. Elixir of Life "id"; this Kabbalah magic where all of God's creations have a number and His number is 213 digits long, etc.
That may be true but is unfortunate as formal set theory assumes that all set elements can be uniquely identified. How else can you determine if two set are equal unless there is a means of determining if two set elements are the same without a globally unique identifier? Hence, Dr Codd specification that all relations must have a unique identifier
Pointer-based foreign keys
The relational model has no place for surrogate keys. When you use natural keys as primary keys, sometimes it may require one or more natural keys. The relational tables must also have the same natural keys. This increases maintenance time and decreases performance.
The solution is to have pointer-based foreign keys. When the DBA creates a PK on Table A, they should be able to tell the DB to relate Table B to Table A. The DB would automatically create a pointer between the two tables (the foreign keys do not have to be carried over manually by the DBA).
As Lynn Pettis pointed out, this request is exactly how the Network and Hierarchical Models are implemented based on my 8 years working with IDMS, a network database, and IMS, a hierarchical database. Under IDMS, these are called "sets" but have no correspondence to set theory. The "pointers" are implemented as chains where the parent table contains a pointer to the first row in the set member and then each member record has a pointer to the next member of the set. Performance is fantastic at the cost of mind boggling complexity that is very prone to bugs - try reading about IDMS and the concept of currency.
As a side note, most companies that use these products have major problems with "Pointer-based foreign keys" not matching the correct value and standardized that there is an additional field in the child record that includes the actual value of the foreign key in order to reconcile the pointers versus the actual value.
(this is already implemented in Sybase according to Celko).
Actually, it is the Sybase product SQLAnywhere, formerly known as Watcom, and I worked with this product for four years and there are major performance problems with the foreign key pointer solution when the number of related rows for each parent grows large or the number of foreign key constraints defined becomes significant. The work-around for this problem is to not only define the primary key constraint but to also define a unique index on the same columns.
Other problems with Pointer-based foreign keys:
1) One of the basic advantages of the relational model is logical physical independence but your suggestion will break this independence. Start with the SQL to join two tables that have a foreign key relationship "FROM A JOIN B on B.(foreign key columns) = A.(primary key columns)"
Now implement your suggestion that the foreign key columns do not appear in table B but pointers are used instead, then a new kind of JOIN is needed such as FROM A KEY JOIN B BY .
But now change the physical implementation to not use the pointers but explicit columns and the statement will no longer function.
2) How do you insert into the child table specifying the foreign key values?
3) How do you update the child table specifying new foreign key values?
SQL = Scarcely Qualifies as a Language
July 9, 2009 at 4:41 am
Carl Federl (7/8/2009)
Regarding the natural key versus surrogate key argument, take a closer look at the "natural" keys and they all have something in common - they are actually surrogate keys whose value is assigned externally. For example, the European Article Number is described on Wikipedia as composed of:
1) GS1 Prefix, the first two or three digits, usually identifying the national GS1 Member Organization to which the manufacturer is registered (not necessarily where the product is actually made).
2) Company number, consisting of four, five or six digits depending on number of GTIN-13s required by the manufacturer to identify different product lines.
3) Item reference, consisting of two to six digits.
4) Check digit,
So the first two components define who assigned the item reference number and the item reference number is a non-meaningful value, so this "natural" key is actually a surrogate key. This same argument applies to all of the other so called natural keys such as Vehicle Identification Number, Social Security Number or the US Tax Identification number. Even DNA cannot be used as an identifier because it is not unique for monozygotic twins.
Wrong. Look up the actual definition of a natural key (celko). Natural key can be verified. The European Article Number can be verified (parts of it can be verified). Can a magic number be verified? What does the number 42 mean? Nothing.
SSN is pretty unique (sure it's recycled) but the thing is that it's approach uniqueness. What is the chance someone has the same DNA + SSN + First Name + Last name?
What is the chance number 42 'Bob' is number 42 'Bill' in another system? 100%. No uniqueness. Not externally verifiable.
That may be true but is unfortunate as formal set theory assumes that all set elements can be uniquely identified. How else can you determine if two set are equal unless there is a means of determining if two set elements are the same without a globally unique identifier? Hence, Dr Codd specification that all relations must have a unique identifier
Unique identifier doesn't mean a surrogate. Again, look up the "real" definition of a natural key (it is a subset of attributes which are in a table that can uniquely identify an entity). You are ignoring the definition and misinterpreting Codd's specification, this is why you are confused.
As Lynn Pettis pointed out, this request is exactly how the Network and Hierarchical Models are implemented based on my 8 years working with IDMS, a network database, and IMS, a hierarchical database. Under IDMS, these are called "sets" but have no correspondence to set theory. The "pointers" are implemented as chains where the parent table contains a pointer to the first row in the set member and then each member record has a pointer to the next member of the set. Performance is fantastic at the cost of mind boggling complexity that is very prone to bugs - try reading about IDMS and the concept of currency.
As a side note, most companies that use these products have major problems with "Pointer-based foreign keys" not matching the correct value and standardized that there is an additional field in the child record that includes the actual value of the foreign key in order to reconcile the pointers versus the actual value.
(this is already implemented in Sybase according to Celko).
Actually, it is the Sybase product SQLAnywhere, formerly known as Watcom, and I worked with this product for four years and there are major performance problems with the foreign key pointer solution when the number of related rows for each parent grows large or the number of foreign key constraints defined becomes significant. The work-around for this problem is to not only define the primary key constraint but to also define a unique index on the same columns.
Other problems with Pointer-based foreign keys:
1) One of the basic advantages of the relational model is logical physical independence but your suggestion will break this independence. Start with the SQL to join two tables that have a foreign key relationship "FROM A JOIN B on B.(foreign key columns) = A.(primary key columns)"
Now implement your suggestion that the foreign key columns do not appear in table B but pointers are used instead, then a new kind of JOIN is needed such as FROM A KEY JOIN B BY .
Again you're mixing up the physical and logical side of it. You are suppose to do the join as if the foreign keys were physically there. It is the DB that is suppose to do the physical part of it. You are trying to do the DB's job.
But now change the physical implementation to not use the pointers but explicit columns and the statement will no longer function.
2) How do you insert into the child table specifying the foreign key values?
3) How do you update the child table specifying new foreign key values?
Sigh, again you are mixing up the physical and logical side of it. Why are you worrying about how to insert foreign keys into the child tables??? That is the DB's job. The current poor flawed implemented of RDBMS make you think about the physical side of things instead of the relational logical side.
July 9, 2009 at 5:14 am
To demonstrate how stupid a surrogate key is:
CREATE TABLE members
(
mem_id INT IDENTITY(1,1) NOT NULL,
mem_username VARCHAR(25)
mem_firstname VARCHAR(25),
mem_lastname VARCHAR(25),
account_activated DATETIME,
PRIMARY KEY (mem_id)
);
INSERT INTO members (mem_firstname, mem_lastname, mem_account_activated)
VALUES ('blob', 'Bob', 'Lob', CURRENT_DATE());
A select yields: 42, 'blob', 'Bob', 'Lob', '7/09/10'
What does 'mem_username' mean in relative to Bob? It's Bob's username blob
What does 'mem_firstname' mean in relative to Bob? It's Bob's firstname Bob
What does 'mem_lastname' mean in relative to Bob? It's Bob's lastname Lob
What does 'account_activated' mean in relative to Bob? It's the date Bob's account was activated '7/09/10'
What does 'mem_id' mean in relative to Bob? It's the physical locator row id 42 where Bob is located in a 'virtual fake filesystem' that is sequentially processed
See how mem_id makes you think about the "physical" side of things instead of the logical?
July 9, 2009 at 6:52 am
Any primary key that relies on human names is inherently flawed.
In my case, my legal first name is "Richard", but I go by an abbreviated form of my middle name, "Augustin", and everyone calls me "Gus". On this site, I'm alternately known as "Gus", or "GSquared", and answer to either equally.
Which one will you use? No matter which you use, you're wrong in at least some circumstances.
Also, primary key values should be non-volatile. Actually, they shouldn't change at all once assigned. Names change. SSNs change.
Also, SSNs and other ID numbers are not actually "externally verifiable". There are literally millions of illegal immigrants in the US using stolen SSNs. There are an unknown number of "identity thieves" who might be using dozens of stolen SSNs each.
You cannot prove that you are the only person using your SSN. Credit reporting agencies have to have whole systems in place to deal with duplicate SSNs, and won't reveal to you if someone else has the same one as you. They aren't legally allowed to.
So, flunk on the SSN and name thing as a PK. Fails your own tests. Back to the drawing board and pick something else.
Also, the probability that 42 "Bob" and 42 "Bill" are the same in any two systems is not 100%. If you really think that, you need to redo 8th grade math. The probability that they are the same approaches zero, but the probability that they could be the same is 100%. There's a big difference in those two concepts. They are worlds apart.
At the same time, the probability that "123-00-1234" Bob and "123-00-1234" Ramone are the same approaches zero, but the possibility that they are the same is 100%. In other words, there's nothing that proves they are different, but they probably are.
There are no "natural keys" for human beings that actually satisfy all criteria. Even DNA doesn't work, because of twins.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 9, 2009 at 7:17 am
GSquared (7/9/2009)
Any primary key that relies on human names is inherently flawed.
It was an example.
Also, primary key values should be non-volatile. Actually, they shouldn't change at all once assigned. Names change. SSNs change.
Says who? Microsoft? You choose Microsoft over codd? Microsoft SQL Server can't even handle cyclic foreign keys and you want them to be the authority on "primary keys should be non-volatile"? When you delete and re-create an entity, do you get the same primary key? WAIT, you dont. You get a different magical number that points to the physical location of a row in a sequential virtual fake filesystem. I guess surrogates can change as well!
The point is that you should choose a PK that rarely changes that can guarantee uniqueness to a level BUT is not immune. An example of this is when ISBN-10 transitioned to ISBN-13. In a proper RDBMS, this change is EASILY handled without a performance impact. This is not easily handled in FAKE RDBMS like SQL Server.
PK's can change and the relational model is designed to handle this.
Also, SSNs and other ID numbers are not actually "externally verifiable". There are literally millions of illegal immigrants in the US using stolen SSNs. There are an unknown number of "identity thieves" who might be using dozens of stolen SSNs each.
You cannot prove that you are the only person using your SSN. Credit reporting agencies have to have whole systems in place to deal with duplicate SSNs, and won't reveal to you if someone else has the same one as you. They aren't legally allowed to.
So, flunk on the SSN and name thing as a PK. Fails your own tests. Back to the drawing board and pick something else.
Since when is SSN not externally verifiable?!?!? The fact is, how much do you trust the source?
Are you willing to trust Number 42 Bob more than SSN+Address Bob? See where your argument falls apart and has zero validity?
Anybody who makes SSN the only candidate in a PK should not be designing DBs in the first place.
There are no "natural keys" for human beings that actually satisfy all criteria.
Even DNA doesn't work, because of twins.
Depends on what kind of DB you're modeling. Do you collect DNA for employee records? Can two separate employees have the same Email + SSN?
If you had some sort of DNA system, you would probably go by DNA + Address + First Name + Last Name or some internally valid identifier (not a surrogate). The fact is DNA alone does not guarantee uniqueness so you find other candidate keys to include so you can approach uniqueness.
It's easier to create Number 42 Bobs in sequential virtual fake filesystems instead of using industry standards in RDBMS.
July 9, 2009 at 12:39 pm
Since when is SSN not externally verifiable?!?!? The fact is, how much do you trust the source?
Are you willing to trust Number 42 Bob more than SSN+Address Bob? See where your argument falls apart and has zero validity?
Anybody who makes SSN the only candidate in a PK should not be designing DBs in the first place.
There are no "natural keys" for human beings that actually satisfy all criteria.
Even DNA doesn't work, because of twins.
Depends on what kind of DB you're modeling. Do you collect DNA for employee records? Can two separate employees have the same Email + SSN?
SSNs are not externally verifiable because they were NEVER DESIGNED TO OPERATE AS AN IDENTIFICATION SYSTEM IN THE FIRST PLACE. That data comes from the Social Security Administration, and from a study of the history of the subject.
If you've paid attention to recent news on the subject, it turns out that they became even less valid for people born in 1988 or later, because they follow an even more predictable pattern and can be stolen even more easily.
So, the answer to "since when?" is, "since the moment of their first inception".
This, of course, ignores the fact that your plan of using SSNs as even part of a primary key breaks the moment you have to deal with people who don't have one. If even a single Canadian citizen has to be entered into your database, you suddenly don't have a PK for that person. Same for THE MAJORITY OF THE PEOPLE ON THIS PLANET!
Now, to add to this morass of indefensible data, you're going to add the person's address????? Have you ever worked with address data? Are you aware of the infinity of ways in which address data can be corrupted or made useless? Are you aware of the fact that tying a person to an address violates FIRST normal form? (There are literally millions of people who have more than one address.)
You're seriously going to say that you want your primary key to be subject to flaws like "1001 25th Avenue East" = "1001 25th Ave E" = "1001 Ward St" = "1001 25 E". Those are all the same address, and the post office will deliver mail correctly to all of them. The only one that will satisfy CASS certification and thus be eligible for a postal discount in mass mailings is "1001 25th Ave E", but it'll also need to have the Zip+4 on it for that. And those variations don't even take into account variables in town/city name. I used to live at an address where the mail would get delivered regardless of whether you listed the city as Katy or as Houston. And you want to make something with that much variability in technically valid versions as your PK?
In that case, you're going to end up with duplicate rows, just because of address variations. They'll all list the same person at the same place, but it'll be dozens of potential rows. So, now you have all the disadvantages of a surrogate key, AND all the disadvantages of a natural key, and you're worse off than either of them alone.
I've already pointed out the horrific flaw in using names. I have more than one. My wife has more than one. I'd be willing to bet you have more than one. Again with the violations of 1NF, and you want to use it as a PK!!!!!!
I don't see how you can keep invoking Dr. Codd and then proposing plans that are complete violations of normalization.
On your point of pointers having better cascading performance than standard FKs, my plan of having explicit pointers, as opposed to hidden ones, already accomplishes exactly the same thing. It also has the advantages of being updateable.
If, for example, you have a customer enter some orders into a database, you would have to connect the orders to the customer, right? With "pointers", what do you do if the orders need to be moved to a different customer/account? Happens quite often in real businesses. Basically, since you don't have access to the "pointers", you're going to have to create new orders and delete the existing ones. With explicit connections (ID and FK), you just issue an Update command for that column of the table. So, this has all of your advantages, and even more.
On the point of recovering deleted data with a different ID, that's only valid if you don't have adequate audit trails. Where that kind of thing has been a concern, I've been completely capable of recovering it with the same ID number, if so desired. That's easy.
Yes, if I have a 42 Bob and a 43 Bob, it's possible it's the same person. That's definitely an issue, and has to be handled as such. But, thus far, none of the "natural keys" you've proposed actually prevent exactly the same thing. Nor have any that I've ever seen proposed by anyone else, including Joe Celko. Dr. Codd agrees with me on this one, not with you. Read his material, don't just depend on someone else telling you what it says.
There are no valid natural keys for human beings.
There are also no valid natural keys for addresses.
There are for phone numbers, and other standardized, arbitrary numbers of a similar nature. IP addresses, would be another one.
Even Zip codes aren't a good natural key for themselves, because you have to be able to store multiple city/town/county data for them, if they are to have much practical value, plus there's no enforced rule that different nations couldn't have the same "Postal Codes" as each other. At the very least, you have to include the country in the PK for those.
The chemical elements have a pretty valid natural key in their atomic numbers.
Molecules in their standard notation formulae probably have a valid, albeit complex, natural key, but that one has a variable number of elements (in both senses), which would make it quite difficult to model.
So, no, I don't agree with you that everything that can go into a database can have a valid natural key. Since that premise is the basis of your whole theory, and it's not something you can prove even to a reasonable standard, I say your theory is wrong, and thus your conclusions are wrong.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 9, 2009 at 12:56 pm
sqlguru (7/9/2009)
GSquared (7/9/2009)
Any primary key that relies on human names is inherently flawed.It was an example.
Also, primary key values should be non-volatile. Actually, they shouldn't change at all once assigned. Names change. SSNs change.
Says who? Microsoft? You choose Microsoft over codd? Microsoft SQL Server can't even handle cyclic foreign keys and you want them to be the authority on "primary keys should be non-volatile"? When you delete and re-create an entity, do you get the same primary key? WAIT, you dont. You get a different magical number that points to the physical location of a row in a sequential virtual fake filesystem. I guess surrogates can change as well!
The point is that you should choose a PK that rarely changes that can guarantee uniqueness to a level BUT is not immune. An example of this is when ISBN-10 transitioned to ISBN-13. In a proper RDBMS, this change is EASILY handled without a performance impact. This is not easily handled in FAKE RDBMS like SQL Server.
PK's can change and the relational model is designed to handle this.
Also, SSNs and other ID numbers are not actually "externally verifiable". There are literally millions of illegal immigrants in the US using stolen SSNs. There are an unknown number of "identity thieves" who might be using dozens of stolen SSNs each.
You cannot prove that you are the only person using your SSN. Credit reporting agencies have to have whole systems in place to deal with duplicate SSNs, and won't reveal to you if someone else has the same one as you. They aren't legally allowed to.
So, flunk on the SSN and name thing as a PK. Fails your own tests. Back to the drawing board and pick something else.
Since when is SSN not externally verifiable?!?!? The fact is, how much do you trust the source?
Are you willing to trust Number 42 Bob more than SSN+Address Bob? See where your argument falls apart and has zero validity?
Anybody who makes SSN the only candidate in a PK should not be designing DBs in the first place.
There are no "natural keys" for human beings that actually satisfy all criteria.
Even DNA doesn't work, because of twins.
Depends on what kind of DB you're modeling. Do you collect DNA for employee records? Can two separate employees have the same Email + SSN?
If you had some sort of DNA system, you would probably go by DNA + Address + First Name + Last Name or some internally valid identifier (not a surrogate). The fact is DNA alone does not guarantee uniqueness so you find other candidate keys to include so you can approach uniqueness.
It's easier to create Number 42 Bobs in sequential virtual fake filesystems instead of using industry standards in RDBMS.
Okay, since you seem to think MS SQL Server is such a bad product, why don't you go back to using Dataphor? Oh, wait, Dataphor uses MS SQL Server to actually store data, doesn't it?
July 9, 2009 at 1:57 pm
sqlguru (7/8/2009)
Matt Miller (7/8/2009)
I will slither out from under my rock for a sec. If you use pointers instead of actual keys - how do you ever CHANGE a foreign key? Sounds to me you might gain some speed, but only at the expense of giving up control.I'm not a big fan of natural keys, since many of my apps have suffered cruelly at the hands of users changing key avlues even after swearing they wouldn't ever change, etc., but that's why i often implement surrgate keys instead. I must be missing something - I just am not seeing the upside here.
You would write an update command that would change the foreign key to whatever you like. You're not suppose to worry about "using pointers instead of actual keys". You're suppose to worry about the database model of the application your designing. The database engine is suppose to take care of all the "pointer" stuff. Infact, in the database products Celko mentions, the foreign keys are not present in the foreign tables. They are "pointer based" instead, and cascades are "instant". SQL Server is based off old rdbms architecture where foreign keys are duplicated in foreign key tables.
Also, natural keys can change, this is fine. What matters is that the natural key must be externally verifiable. In a real RDBMS implementation, the natural key can change without ANY performance impact.
You're misunderstanding my point. I'm talking about redirecting the child to an entirely new parent. Without the link infomration available inline, how is it you plan on allowing for such things? And I know - that kind of activity is "bad", should never happen, etc... but - reality is - it happens, and in some cases, a LOT.
It's bad enough to have to cascade changes, and having something better to do that might be nice, but it introduces its own set of headaches.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 9, 2009 at 2:00 pm
Whitefang, is that you?
Viewing 15 posts - 16 through 30 (of 41 total)
You must be logged in to reply to this topic. Login to reply