July 7, 2009 at 8:49 am
These are two of the most fundamental features needed for SQL Server in terms of administration, developing, performance.
Pointer-based foreign keys
The relational model has no place for surrogate keys. When you use natural keys as primary keys, sometimes it may require one or more natural keys. The relational tables must also have the same natural keys. This increases maintenance time and decreases performance.
The solution is to have pointer-based foreign keys. When the DBA creates a PK on Table A, they should be able to tell the DB to relate Table B to Table A. The DB would automatically create a pointer between the two tables (the foreign keys do not have to be carried over manually by the DBA).
Cascade updates should also be done pointer based so it doesn't cause massive updates.
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=472957
Set based stored procedures
SQL-92 defined table value constructors and SQL 2008 is the only database to implement it. However, it has implemented it so poorly that it is unusable.
The solution is to have a automatically declare datatype based of a table and it should have the properties of a table such CRUD against. And it should be able to be declare as an OUTPUT parameter in a stored procedure.
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=472960
Other issues to vote for:
ALTER user defined types (domains) - This feature was asked back in 2000 and has not still been implemented! When you use natural keys as PKs that propagate to related tables, if a PK datatype ever changes, you would have to manually update the datatypes of the propagated foreign keys. This is one of the problems that domains fix but the problem is that SQL Server is the ONLY database in the world that does not have ALTER domain type. So you have to manually drop ALL references to the domain and recreate the domain, and manually reassign all references, a very tedious process.
July 7, 2009 at 8:56 am
sqlguru (7/7/2009)
These are two of the most fundamental features needed for SQL Server in terms of administration, developing, performance.Pointer-based foreign keys
The relational model has no place for surrogate keys. When you use natural keys as primary keys, sometimes it may require one or more natural keys. The relational tables must also have the same natural keys. This increases maintenance time and decreases performance.
The solution is to have pointer-based foreign keys. When the DBA creates a PK on Table A, they should be able to tell the DB to relate Table B to Table A. The DB would automatically create a pointer between the two tables (the foreign keys do not have to be carried over manually by the DBA).
Cascade updates should also be done pointer based so it doesn't cause massive updates.
This is called the Network Model (CODASYL).
July 7, 2009 at 9:02 am
Lynn Pettis (7/7/2009)
This is called the Network Model (CODASYL).
Actually, it's not.
The DB automatically creates the foreign keys between the two tables BUT it is not seen by the DBA, it is done 'internally'. Any changes done to the PK (datatype change, datalength change, column added to PK, column removed from PK) is automatically propagated to the foreign keys by the DB using the pointer. Cascade updates would be "instant" because it is based off a pointer (this is already implemented in sybase according to celko).
July 7, 2009 at 9:06 am
I voted for both.
Have to say that they strike me as "nice to have", not as crucial. I'd rate the first one as "nicer to have" than the second one, and voted accordingly.
The second one can be easily worked around by using temp tables, which are generally a better idea anyway for most purposes. Would be nice to be able to define one once and use it as many times as desired, but it's hardly critical to me.
The first one would certainly be nice on multi-column keys, if only to reduce the overhead of redefining them in all affected tables, but I definitely don't see it as critical. If you have to regularly redefine your primary key, then it needs to be reconsidered as the PK in the first place, or the design and architecture are horribly flawed.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 7, 2009 at 9:23 am
GSquared (7/7/2009)
I voted for both.Have to say that they strike me as "nice to have", not as crucial. I'd rate the first one as "nicer to have" than the second one, and voted accordingly.
The second one can be easily worked around by using temp tables, which are generally a better idea anyway for most purposes. Would be nice to be able to define one once and use it as many times as desired, but it's hardly critical to me.
The first one would certainly be nice on multi-column keys, if only to reduce the overhead of redefining them in all affected tables, but I definitely don't see it as critical. If you have to regularly redefine your primary key, then it needs to be reconsidered as the PK in the first place, or the design and architecture are horribly flawed.
First issue...
If you took username as a primary key, what if 3 years down the road, the customer decides to increase the data length? Using a surrogate is not a solution, it is a kludge/hack.
Or you change the data element name? Now you got 2 entries for the same data element with different "names" in your data dictionary.
Natural keys WILL change at some point (UPC, ISBN), the relational features of a relational model should be able to handle this change, SQL Server cannot. Celko mentioned a DB product that implemented "pointer based" cascades, the result was instant cascade updates.
Second issue...
Temp tables causes recompilation of stored procedures. There was a test done using temp tables vs passing table value parameters, the temp table causes massive cpu usage due to recompilations. If you're talking about actual "stage" tables, what about the massive "locking" (it's sort of a "one true lookup table") issues that will take place in a high concurrent environment? These are all kludges/hacks for "fake" sets.
July 7, 2009 at 9:30 am
sqlguru (7/7/2009)
Lynn Pettis (7/7/2009)
This is called the Network Model (CODASYL).
Actually, it's not.
The DB automatically creates the foreign keys between the two tables BUT it is not seen by the DBA, it is done 'internally'. Any changes done to the PK (datatype change, datalength change, column added to PK, column removed from PK) is automatically propagated to the foreign keys by the DB using the pointer. Cascade updates would be "instant" because it is based off a pointer (this is already implemented in sybase according to celko).
Not going to get into an agrument. IDMS is a network model dbms, and what you describe is exactly how IDMS was described to me at a previous employer. The FK from the parent weren't propagated into the child records, it was done with pointers.
July 7, 2009 at 9:45 am
Lynn Pettis (7/7/2009)
sqlguru (7/7/2009)
Lynn Pettis (7/7/2009)
This is called the Network Model (CODASYL).
Actually, it's not.
The DB automatically creates the foreign keys between the two tables BUT it is not seen by the DBA, it is done 'internally'. Any changes done to the PK (datatype change, datalength change, column added to PK, column removed from PK) is automatically propagated to the foreign keys by the DB using the pointer. Cascade updates would be "instant" because it is based off a pointer (this is already implemented in sybase according to celko).
Not going to get into an agrument. IDMS is a network model dbms, and what you describe is exactly how IDMS was described to me at a previous employer. The FK from the parent weren't propagated into the child records, it was done with pointers.
No, the DB is still based off the relational model but the implementation is slightly different. The foreign keys is just non-user manageable since it is based of a pointer to the primary key.
July 7, 2009 at 9:53 am
sqlguru (7/7/2009)
GSquared (7/7/2009)
I voted for both.Have to say that they strike me as "nice to have", not as crucial. I'd rate the first one as "nicer to have" than the second one, and voted accordingly.
The second one can be easily worked around by using temp tables, which are generally a better idea anyway for most purposes. Would be nice to be able to define one once and use it as many times as desired, but it's hardly critical to me.
The first one would certainly be nice on multi-column keys, if only to reduce the overhead of redefining them in all affected tables, but I definitely don't see it as critical. If you have to regularly redefine your primary key, then it needs to be reconsidered as the PK in the first place, or the design and architecture are horribly flawed.
First issue...
If you took username as a primary key, what if 3 years down the road, the customer decides to increase the data length? Using a surrogate is not a solution, it is a kludge/hack.
Or you change the data element name? Now you got 2 entries for the same data element with different "names" in your data dictionary.
Natural keys WILL change at some point (UPC, ISBN), the relational features of a relational model should be able to handle this change, SQL Server cannot. Celko mentioned a DB product that implemented "pointer based" cascades, the result was instant cascade updates.
Second issue...
Temp tables causes recompilation of stored procedures. There was a test done using temp tables vs passing table value parameters, the temp table causes massive cpu usage due to recompilations. If you're talking about actual "stage" tables, what about the massive "locking" (it's sort of a "one true lookup table") issues that will take place in a high concurrent environment? These are all kludges/hacks for "fake" sets.
Please explain to me the difference between using username as your PK, and having an ID column (not the PK, just an ID), and having that ID in the child table for the FK, and having some sort of "pointer".
In either case, you're going to have to have SOMETHING stored on the disk to link the records together. An integer identity is highly efficient for this.
If you use that, and you want to add to the data length of the UserName column, it has zero impact on child tables with FKs to the ID column. And you can certainly define the FK as having cascading deletes.
In fact, I'd be willing to bet that the "pointers" solution is something remarkably similar to that, just hidden from you behind the scenes, when it comes to actual physical implementation in the database engine.
So, you can implement something that has all the characteristics you're looking for, without having a "surrogate key", already. Thus, I don't consider having a way to hide it from the DBA all that critical.
Second, I've done serious load testing on well-designed procs that pass temp tables back and forth. I've generally seen an improvement in performance over other solutions for the same thing.
Yes, they cause some level of recompilation. That's so they can take advantage of statistics (and indexes, if you have any) on the temp tables. If you don't have the recompilation, you can't have stats. If you don't have stats, you have the engine assuming that they always have one row (just like with regular table variables). The cost to the execution plan of assuming it's always going to be one row is quite often higher than the cost of the recompilations. In my experience, it's higher more often than otherwise.
It would be nice if table variable parameters were updatable. Definitely would be nice, for those rare circumstances in which I would consider using a table variable. Like where I can guarantee it will never hold more than a few rows of data, or where I want it to operate outside of transactions. But those circumstances are quite rare, and I don't use table variables very often, because of the statistics issue. It's a performance killer in far too many circumstances.
That is why I consider both issues "nice", but "non-crucial". It's simply a question of looking at them from an actual engine viewpoint, based on some simple assumptions about how they would have to be implemented in an actual physical environment.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 7, 2009 at 10:07 am
The benefits of a natural key is that it can be validated externally, debugged easily, doesn't require extra joins in some cases, free index etc. Surrogates do not belong in the relational model because it cannot be validated externally and it does not port.
Example, you have two clients using your product. You decide to add lookup data to one of your lookup tables and configure your application to use the "magic surrogate value" as an enum. When you deploy the changes, how do you guarantee that both client databases will have the same IDENTITY value that the application enum is expecting?
What if the client has their own inhouse DBA team and all your changes must goto them as scripts. What if there is already an existing "value"? The surrogate solution would gladly duplicate the data, the natural key solution would throw a duplicate error.
The size of the natural key does not matter anymore. Samsung has a 24-SSD raid solution that has 2GB read/write (SSD is still new technology) and multi-core servers are the norm.
July 7, 2009 at 12:00 pm
Natural Keys vs Surrogate Keys
Here's a battle. I like the idea of using Natural Keys, but sometimes thoses are the best choice as they can change which affects historical information that could cause confusion to customers/users. I like using a combination Natural Key (AK) and Surrogate Key (PK) where appropriate. The users see and use the Natural Key and the system uses the Surrogate Key behind the scenes.
Example:
Natural Key: IEC-6001
This document is sold to numerous customers over a period of time, say three years. At this point, they decide to rename IEC-6001 to IEC-16001. The customers who have purchased this document, know it as IEC-6001, not IEC-16001. You start sending the customers notifications that IEC-16001 has been updated, they are going to be confused and think they are getting erronous notifications.
Now, if you used a Surrogate Key to uniquely identify the document you wouldn't have this issue as the historical information is as it was, and the new information has the new name.
As for having hidden pointers for FK's? I don't know. Looks too much like the Network Model to me being bolted on to the relational model.
July 8, 2009 at 7:15 am
No matter how you implement the pointers, there's going to be a need for some sort of physical storage. Most likely, the physical implementation will involve either a numeric key (like having a clustered index on a non-unique set of columns), or a GUID (like default replication settings).
All you're asking is for that to be hidden in the row metadata, as opposed to appearing in a visible column. Either way, it's still a surrogate key of one sort of another.
If you know of a way to implement "pointers" that doesn't involve adding some sort of key value to each row of data that has a pointer, please explain it.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 8, 2009 at 7:50 am
GSquared (7/8/2009)
No matter how you implement the pointers, there's going to be a need for some sort of physical storage. Most likely, the physical implementation will involve either a numeric key (like having a clustered index on a non-unique set of columns), or a GUID (like default replication settings).All you're asking is for that to be hidden in the row metadata, as opposed to appearing in a visible column. Either way, it's still a surrogate key of one sort of another.
If you know of a way to implement "pointers" that doesn't involve adding some sort of key value to each row of data that has a pointer, please explain it.
The relational model doesn't care about pointers, surrogates, physical storage....it deals with abstraction. The problem is, SQL and the current "fake" implementations of RDBMS makes you think about low-level stuff bits, surrogates, pointers, physical storage, indexes.
In a "true dr.codd" relational database, a DBA would define Table A, define Table B, define Table A's PK, then "relate" Table B to Table A. The physical side of this relation is what the RDBMS handles, not the DBA. The logical relation is what the DBA handles.
This is why we need to dump all current implementations of RDBMS and SQL and go back to the foundations.
July 8, 2009 at 8:24 am
sqlguru (7/8/2009)
In a "true dr.codd" relational database, a DBA would define Table A, define Table B, define Table A's PK, then "relate" Table B to Table A. The physical side of this relation is what the RDBMS handles, not the DBA. The logical relation is what the DBA handles.This is why we need to dump all current implementations of RDBMS and SQL and go back to the foundations.
You know, I am going to have to disagree here from a pragmatic position. I'm not dumping the current RDBM systems. They may not be true relation database systems but they work and work well. There is a lot of third party and custom systems built out there in the world and anyone trying to replace the current RDBM's with a true RDBM system is going to have a difficult time getting companies to change.
July 8, 2009 at 8:24 am
sqlguru (7/8/2009)
GSquared (7/8/2009)
No matter how you implement the pointers, there's going to be a need for some sort of physical storage. Most likely, the physical implementation will involve either a numeric key (like having a clustered index on a non-unique set of columns), or a GUID (like default replication settings).All you're asking is for that to be hidden in the row metadata, as opposed to appearing in a visible column. Either way, it's still a surrogate key of one sort of another.
If you know of a way to implement "pointers" that doesn't involve adding some sort of key value to each row of data that has a pointer, please explain it.
The relational model doesn't care about pointers, surrogates, physical storage....it deals with abstraction. The problem is, SQL and the current "fake" implementations of RDBMS makes you think about low-level stuff bits, surrogates, pointers, physical storage, indexes.
In a "true dr.codd" relational database, a DBA would define Table A, define Table B, define Table A's PK, then "relate" Table B to Table A. The physical side of this relation is what the RDBMS handles, not the DBA. The logical relation is what the DBA handles.
This is why we need to dump all current implementations of RDBMS and SQL and go back to the foundations.
I don't see how that makes these, "...two of the most fundamental features needed for SQL Server...."
What actual advantage would the businesses and such of the world gain from that, and at what cost?
I don't get a paycheck based on the compliance of the database with a particular philosophy's mathematical abstractions. I get paid because the database delivers functionality with value to a business.
I'm willing to be won over on this one, but it needs to have an actual practical value to me and/or my employer. Explain that practical value, and I'll back you up on this. Or point me to someone who can explain that value, in person, on the web, in a book, whatever.
And please don't include anything in the explanation about, "imagine all the possible...". I can imagine intergalactic civilizations. I can imagine monocells and viruses. I can imagine the collapse of waveform from probability to actuality, and I can imagine a universe where that's just a perception, and waveforms and probabilities never actually collapse. I can imagine matter existing as neither particles nor waves but as forces acting on forces without objective space but merely the perception of immediacy being translated into the concept of dimensions as a matrix for comprehension and evaluation. But I can't currently imagine a value to hiding the pointers from me in a database, just to make it look more like something in Dr. Codd's theory.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 8, 2009 at 8:33 am
Curious, if we were interested in looking at something else that may be more of true RDBMS, what would you suggest?
Viewing 15 posts - 1 through 15 (of 41 total)
You must be logged in to reply to this topic. Login to reply