December 11, 2007 at 9:47 am
By confusion, I was referring to the original scenario. PK/FKs are solely for maintaining referential integrity.
Assignments does not need a PK since it will never be a FK in another table and thus no reason to uniquely identify each row.
It seems to me that you are also laboring under a confusion. Assignments HAS a PK, each row is uniquely identified by it's data. The fact that you don't need to define it as a database object doesn't mean that the PK doesn't exist.
--
JimFive
December 11, 2007 at 9:51 am
There is a difference between a key and an index. A key is a component of the conceptual data model and does not have a physical representation. It is a concept. A "primary" key is a referential integrity construct that is implemented at the logical data model. There may be other "keys" in the logical data model including unique keys, natural keys, composite keys, candidate keys etc. There aren't ANY indexes in a conceptual or logical data model.
In the physical data model, a primary key is a constraint. An index on that primary key is optional, clustered or otherwise. As someone said earlier, the constraint is to define referential integrity, the index is to provide quick access to the data. There is no rule that says a primary key must also be a clustered index. In fact, in my experience, it rarely is. A clustered index is useful if you find that you tend to want your data sorted in a particular way in the lions share of your queries, it alleviates the overhead associated with an ORDER BY and speeds up performance of range type queries.
I always create a surrogate key as a primary key for every table. I only create a unique index if that key is going to be used as a foreign key elsewhere. If there is a natural unique key (or at least close to unique) then I will, if it makes sense, create a clustered index.
"Beliefs" get in the way of learning.
December 11, 2007 at 9:53 am
James Goodwin (12/11/2007)
By confusion, I was referring to the original scenario. PK/FKs are solely for maintaining referential integrity.
Assignments does not need a PK since it will never be a FK in another table and thus no reason to uniquely identify each row.
It seems to me that you are also laboring under a confusion. Assignments HAS a PK, each row is uniquely identified by it's data. The fact that you don't need to define it as a database object doesn't mean that the PK doesn't exist.
--
JimFive
What you are describing is a candidate key, not a PK. The fact that you can find a column or set of columns that uniquely identifies a row doesn't make it a PK. PK is a role defined for a specific purpose. In my example there is no purpose for a PK on assignments.
December 11, 2007 at 9:53 am
I have seen a couple people mention that tables without children or certain logging tables may not need PKs.
I actually have a case of this. My table simply contains a log of site hits to a certain page. It contains the page id, an id to some data about why they hit the page, and the date. This table can't have a PK without creating another column specifically for it since it is absolutely possible (even if unlikely with the datetime precision) that I could have duplicate valid rows in it.
The thought of not guarantying uniqueness does cause me to cringe a little, but after sitting down and thinking about it, I can't come up with a good reason to add another column to this table just for that purpose.
December 11, 2007 at 9:54 am
First off, I gotta say that you guys are really spoiled in regards to control over your data. I load into my databases from just about every kind of source. Some are a nightmare of data integrity.
The best example is that I load data from a system that allows alpha characters in the SSN column. 45a343ff3 is a valid SSN in this system. Some of these systems have never heard of Normal forms. It's actually humorous as to the degree in how poorly these databases are designed.
This has led me to have very very very few, if any PKs. I know that I probably simulate a PK in most of my tables by adding a identity column and then putting a clustered index on the table. I will admit that I have done very little research to prove that my way is better than a PK (or if it's even different)
I guess my point to this post is - What am I gaining in putting a PK on a table as opposed to putting a unique clustered index on the table?
I guess I have just came from an Anti-Access background and just haven't been using the PK cause it looked too much like an Access wizard. My databases all seem to perform well and I have little or no deadlocks. Am I missing a chance to improve performance?
December 11, 2007 at 10:10 am
First of all this is a wonderful topic. We have identified logic and concerns from three sides of the data. The logical model, the physical model and the actual interface with the data through the application.
I have worked on all three sides of the data and have been involved in this discussion a number of times. At one time we determined to enforce Ref Integrity through the application. We coupled this with building any index we thought was needed. In this case we often defined a minimum uniqueness set as the clustered index and did not cluster on an identity column. We used this approach for about 4 years and have developed databases of some size. It has worked in more then one system for over a decade.
However, the was some confusion after the data modelers moved on and the implementors followed, leaving the second generation of maintainers in control of the database. It was easier for them to deal with the data using real primary keys and identity fields where needed. Simple reason, they did not have the code available to them and sone have not been able to read that code if it was available. So as we are rehosting the applications we are adding primary keys.
In the systems/databases we have developed since we stopped using the non-pk approach, we have not had to remodel the database and they are better understood by all.
All this said there is a lot of merit in Barry's discussion about associative entities, or tables that resolve many-to-many relationships. These are usually read/used from both sides of the many and there is use of the associative keys but not a generated identify field. Truly this is a candidate for not having a pk. However, if the associative entity has information beyond the keys and stores other data describing the association then it should have a pk. Attributes that would qualify this for having a pk would be date of association, a code value to define what type of association ion it is etc...
Lastly in the application use of the data the applications I have developed are driven by I hope the same business rules and needs that the database was modeled using. If this is the case the code will be reading data using the indices that were developed and the main use of the data should be centered on the minimum uniqueness set, and in most cases that is the primary key of the clustered index. I have never written a piece of code that says read this table by this index. However, I have more often then not defined the database with the indices that I know the optimizer will be using, and in more then one case have written a covered index to insure that the needed speed is there.
Again this is a great topic and has caused interest from many. Thanks for the discussion.
Miles,
"Life is easier when you plow around the stump" words from an old farmer...
Not all gray hairs are Dinosaurs!
December 11, 2007 at 10:16 am
Barry McConnell (12/11/2007)
The reason the PK needs to be unique is so that referential integrity can be maintained, ie the FK must point to exactly one row.By confusion, I was referring to the original scenario. PK/FKs are solely for maintaining referential integrity. Unfortunately so many DBMS vendors and users have confused the roles of PK/FK, indices, and constraints that it is a jumbled mess. When you declare a PK, you also get automatically an index and a constraint, but that doesn't make them equivalent. Let's take a simple example and look at what we actually want and need.
I guess I am still unclear about the definition of PK here, not the fact of how PKs are implemented (or may be wrongly implemented) in real database systems.
My understanding is that a PK is one or more attributes of an entity that uniquely defines a tuple (I don't like that esoteric term but I am trying to stay theoretical) in that entity. If you have a one-entity model - theoretically possible even if it is almost never done in practice - then there is no other entity that will have FKs referencing the PK. Yet the need to uniquely define a tuple will still exist. Otherwise the entity is not a valid entity in the relational model, from what I have read - although in the real world a real table can be created without a PK.
Two entities: employees and projects, business rule = an employee can be assigned to multiple projects and vice versa
Need: three tables - employees, projects, and assignments. employees and projects need PKs because they will be participating in referential integrity. Assignments will consist of two FKs and needs 3 indices (one on each FK and a combined one), and a unique constraint on the combined index. Assignments does not need a PK since it will never be a FK in another table and thus no reason to uniquely identify each row. The single column indices provide fast access for the typical query/join performed on this table, i.e. given one FK for a column find all the corresponding FKs of the other column. The uniqueness constraint solves the duplicate row problem which is not the same as uniquely identifying a row.
Suppose, however, that for whatever reason, you have just an employees table. Would you not need a PK to uniquely identify each employee?
I know this sounds like splitting hairs, but I just want to understand what the real meaning of a PK is. You seem to be saying that it has nothing to do with uniquely identifying a row, which is different from what I learned.
Thanks,
webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
December 11, 2007 at 10:29 am
MattieNH (12/11/2007)
How about a table with only one row?Mattie
This is just about the only one that I can see where you might have a good argument against an index of any kind.
We have a table with state names and abbreviations. There are not likely to be more of them. We also have about a dozen small fact tables like that. Big argument ensued during the design meetings that we did not even need a row ID on these small tables. We have row ID's and primary constraints on all of them. Even the one row table that is database status and version. I also have the black eye going along with it. Vindication came when we handed the table structure to a new developer and he got it in two minutes. Every table has certain features that you can count on. All the tables are optimized for join performance.
There are other considerations beyond RDBMS theory. While putting an index IS overkill on a one row table and a fair percentage overhead, the other considerations (like consistancy) can pay off.
Does it make sense not to have a primary key? In some cases and rare circumstances. You also have to weigh the cost of not doing it. I would put a row ID enven on Steve's report tables. It helps on those dark rainy nights when you get the panic phone call that says there is one bogus row that came from deep space that prevents the report(s) from printing. Find the row and kill it by ID. It's happened enough through the years that I now prepare for things like that.
ATBCharles Kincaid
December 11, 2007 at 10:39 am
We had a table where we just logged data from a series of production lines. Totally unnormalized. We added a row ID and made that the primary key, clustered. Any guesses why? Hmm?
Performance! Since we always inserted at the end this made the inserts faster. Reports could take their sweet time but the inserts had to be instant.
ATBCharles Kincaid
December 11, 2007 at 10:44 am
We have a table with state names and abbreviations. There are not likely to be more of them. We also have about a dozen small fact tables like that. Big argument ensued during the design meetings that we did not even need a row ID on these small tables. We have row ID's and primary constraints on all of them.
Are you seriously saying that you have a table that looks like:
StateID, StateCode, StateName
Instead of:
StateCode, StateName
--
JimFive
December 11, 2007 at 10:50 am
I'm getting slightly lost here, but I wanted to respond to Will a few posts above.
The problems with bad data are a good place TO require a PK, even a surrogate, that will allow you to specifically get to that row if you need to fix or manipulate data. It's a good place to require at least an identity or GUID to let you get to the row.
A PK is a theoretical construct that identifies which column(s) uniquely identify the row. It's a way to get to that particular row if you need it. In SQL Server, it's implemented (as mentioned above) by an index. If you have a unique index, it may or may not be the PK, but often it is. Since by definition it's a way to uniquely identify the row, they're related.
You can have two unique indexes, but that's another debate.
Clustered/nonclustered has nothing to do with the PK. I probably shouldn't have used that as an example, but I was trying to show that there are optimizations based on a table being ordered and not a heap. Indexes are also an optimization to speed recovery of data, and the PK, while an index, it merely an identifier to help you understand your model better.
Putting PKs and FKs in, can (help) ensure your integrity by preventing duplicates and orphaned rows, and it can make it easier for someone besides you (you don't want to manage this db forever, right?), to understand how things are related.
December 11, 2007 at 10:58 am
Hey, could make sense for consistency, all xxID columns as PKs.
I'd agree it's a waste of space, but you'd tinyint the StateID and it's not that much space, 50 bytes.
Or 50-something of you count Puerto Rico, Guam, USVI, and the Canadian Provinces 😛
December 11, 2007 at 11:02 am
James Goodwin (12/11/2007)
Are you seriously saying that you have a table that looks like:StateID, StateCode, StateName
Instead of:
StateCode, StateName
--
JimFive
RowID, StateCode, StateName, Deleted, LastUpdated
Yes indeed. We typically have 5 custom fields on every table (Custom01 through Custom05) but not on the fact tables.
RowID, OrderTypeID, OrderTypeName, Deleted, LastUpdated
How many of those would you like? I think that we have a three so far. We have a CurrencyTypeID field next to each monitary field even though we have yet to make our first sale noth or south of US borders.
Seems to not make any sense. Looks like overkill. But when the time comes that we do take our package to other countries we are, mostly, ready.
ATBCharles Kincaid
December 11, 2007 at 11:19 am
In rereading my post I guess I wasn't too clear on my practices. I don't put a contraint on most of my data because I can't guarantee uniqueness when I don't have control of the data.
If I add a identity column on the table so that I can identify a single row, would it really be helping me any to put a PK on the table if I already have a unique index on it? I can't guarantee parent/child relationiships. If I put a PK on the table without my added identity column, then my loads would fail daily.
Maybe I'm just asking what is the advantage of a PK over just a simple unique index? Is there any potential increase in speed? Cause if there is, I'm all over it!!!
If it's just to help data integrity, then it's not that much of a use to me cause data integrity isn't something I have total control over.
December 11, 2007 at 11:28 am
PK doesn't do either. It doesn't help with speed or with inegrity. It tells you which column identifies data. You can set your unique index column to be the PK and it's visually more obvious which column identifies the row.
Viewing 15 posts - 31 through 45 (of 57 total)
You must be logged in to reply to this topic. Login to reply