This is the third in my series of articles on Worst Practices (see Worst
Practices - Part 1 of a Very Long Series! and Worst
Practices - Objects Not Owned by DBO) that so far has generated quite a bit of response from readers. Not everyone agrees with me so far, but it looks like I'm not the only one that sees a lot of "bad" practices in production environments.
So, let's talk about another worst practice - not putting a primary key on
EVERY table and failing to use a clustered key on EVERY table.
I think 9 times out of 10, this just happens by mistake. If you build your
tables in Enterprise Manager and click the primary key icon on the toolbar, it
makes the column both a primary key and the clustered index. This is not
necessarily the best use of a clustered index, but it's better than nothing. If
you build your tables in Query Analyzer, it's up to you to either execute the
TSQL to create the primary key and clustered index (or switch to EM to do that
part).
It's up to you, the DBA or senior developer, to catch those mistakes and get
them fixed before they make it into production. It's a great opportunity to make
sure that your developers understand how important these issues are - I think
sometimes we fall into the trap of thinking that everyone KNOWS why it's
important. Don't assume, ask. If they knew better and just forgot to do it, a
gentle reminder to stay sharp is worthwhile. If they didn't know better - isn't
it time they learned?
Where it gets interesting (to me anyway!) is when someone tries to convince
me they don't need a primary key. Come on - having a primary key is such a core
concept, why would you not have one? I've heard a couple arguments, let's see if
they have any merit.
The first is that because the table is so small, the index will not be used
in the query plan. Quite possibly true. So what? One, you're making the
assumption the table will always be small, what happens when it grows? Will the
developer (or DBA) remember "Hey, I need to add an index to that
table"? More likely it will get fixed only at the point that a performance
problem arises. To the extent that I can, I like to build it right the first
time. Why have to fix a problem later that I can avoid now? How much time does it take to create a primary key and a clustered index compared to figuring out later why you have performance issues? More important than that though - it's not just about query plans, it's about data integrity. Not having a primary key can provide a fatal failure point. Sure, you know not to add a duplicate to the lookup table, but what would happen if someone did? How long would it take to figure it out?
Another is that it's a temp table, so the rules don't really apply. More than
the first argument, I think there is some validity to this position. Not
a lot, but some! I think the most compelling reason I have in favor of this
argument is that in many places where you use a temp table now you could (and probably should) use a variable of type table (available beginning in SQL 2000) which only allows you to define the structure, not indexes. Just remember this - you'll never wrong go by creating a primary key on a table, temp or not.
Creating a primary key (and/or clustered index) is unnecessary overhead for many situations such as temp tables, lookup tables, even history/audit tables. Bull. Yes, every index, every constraint, every default, every trigger, adds some overhead. Should we seek to minimize that overhead in our design? Always. But never at the expense of data integrity. If your system is loaded to the point that you're worried about the overhead of having a primary key on a table you're in deep trouble. The nice thing about primary keys and clustered indexes is you can only have one of each per table - pretty hard to over use them!
Finally, there is the "I don't need a clustered index". A good
example of this might be a lookup table that contains state name and state
abbreviation. You're going to be doing single record lookups, never a range
query which is where a clustered index excels. You're also not going to have a
lot of indexes, probably one on the name and one on the abbreviation where the index will provide the data, no need to do a seek through the table to find
other bits of information associated with that row. For this example, is it a
fair point? I think so. On the other hand, how much overhead would adding the
clustered index impose? It's not getting updated, the data IS the index, so the
cost is very very low.
My recommendation to junior developers is pretty simple. Every table gets
both a primary key and a clustered index. It may not be a "best"
practice, but a damn good starting point is to just add a identity column called
rowid and make it the primary key and the clustered key. As your skills increase
you can start to look at more advanced alternatives.
I'd like to thank the readers who took the time to add a comment about my
first two articles. Those comments, both the ones who agree as well as disagree, provide a valuable extension to all the articles posted here on
SQLServerCentral.com. Especially for new developers and DBA's, there are a lot of differing opinions on issues like these and it's hard to know which is right,
so the chance to see what other people think about the content is really
helpful. Whether you agree or disagree, how about telling everyone why? Click
the 'Your Opinion' tab below and tell us what you think!