Worst Practices - Not Using Primary Keys and Clustered Indexes

This is the third in my series of articles on Worst Practices (see Worst

Practices - Part 1 of a Very Long Series! and Worst

Practices - Objects Not Owned by DBO) that so far has generated quite a bit of response from readers. Not everyone agrees with me so far, but it looks like I'm not the only one that sees a lot of "bad" practices in production environments.

So, let's talk about another worst practice - not putting a primary key on

EVERY table and failing to use a clustered key on EVERY table.

I think 9 times out of 10, this just happens by mistake. If you build your

tables in Enterprise Manager and click the primary key icon on the toolbar, it

makes the column both a primary key and the clustered index. This is not

necessarily the best use of a clustered index, but it's better than nothing. If

you build your tables in Query Analyzer, it's up to you to either execute the

TSQL to create the primary key and clustered index (or switch to EM to do that

part).

It's up to you, the DBA or senior developer, to catch those mistakes and get

them fixed before they make it into production. It's a great opportunity to make

sure that your developers understand how important these issues are - I think

sometimes we fall into the trap of thinking that everyone KNOWS why it's

important. Don't assume, ask. If they knew better and just forgot to do it, a

gentle reminder to stay sharp is worthwhile. If they didn't know better - isn't

it time they learned?

Where it gets interesting (to me anyway!) is when someone tries to convince

me they don't need a primary key. Come on - having a primary key is such a core

concept, why would you not have one? I've heard a couple arguments, let's see if

they have any merit.

The first is that because the table is so small, the index will not be used

in the query plan. Quite possibly true. So what? One, you're making the

assumption the table will always be small, what happens when it grows? Will the

developer (or DBA) remember "Hey, I need to add an index to that

table"? More likely it will get fixed only at the point that a performance

problem arises. To the extent that I can, I like to build it right the first

time. Why have to fix a problem later that I can avoid now? How much time does it take to create a primary key and a clustered index compared to figuring out later why you have performance issues? More important than that though - it's not just about query plans, it's about data integrity. Not having a primary key can provide a fatal failure point. Sure, you know not to add a duplicate to the lookup table, but what would happen if someone did? How long would it take to figure it out?

Another is that it's a temp table, so the rules don't really apply. More than

the first argument, I think there is some validity to this position. Not

a lot, but some! I think the most compelling reason I have in favor of this

argument is that in many places where you use a temp table now you could (and probably should) use a variable of type table (available beginning in SQL 2000) which only allows you to define the structure, not indexes. Just remember this - you'll never wrong go by creating a primary key on a table, temp or not.

Creating a primary key (and/or clustered index) is unnecessary overhead for many situations such as temp tables, lookup tables, even history/audit tables. Bull. Yes, every index, every constraint, every default, every trigger, adds some overhead. Should we seek to minimize that overhead in our design? Always. But never at the expense of data integrity. If your system is loaded to the point that you're worried about the overhead of having a primary key on a table you're in deep trouble. The nice thing about primary keys and clustered indexes is you can only have one of each per table - pretty hard to over use them!

Finally, there is the "I don't need a clustered index". A good

example of this might be a lookup table that contains state name and state

abbreviation. You're going to be doing single record lookups, never a range

query which is where a clustered index excels. You're also not going to have a

lot of indexes, probably one on the name and one on the abbreviation where the index will provide the data, no need to do a seek through the table to find

other bits of information associated with that row. For this example, is it a

fair point? I think so. On the other hand, how much overhead would adding the

clustered index impose? It's not getting updated, the data IS the index, so the

cost is very very low.

My recommendation to junior developers is pretty simple. Every table gets

both a primary key and a clustered index. It may not be a "best"

practice, but a damn good starting point is to just add a identity column called

rowid and make it the primary key and the clustered key. As your skills increase

you can start to look at more advanced alternatives.

I'd like to thank the readers who took the time to add a comment about my

first two articles. Those comments, both the ones who agree as well as disagree, provide a valuable extension to all the articles posted here on

SQLServerCentral.com. Especially for new developers and DBA's, there are a lot of differing opinions on issues like these and it's hard to know which is right,

so the chance to see what other people think about the content is really

helpful. Whether you agree or disagree, how about telling everyone why? Click

the 'Your Opinion' tab below and tell us what you think!

Worst Practices - Not Using Primary Keys and Clustered Indexes

Rate

Share

Categories

Share

Rate

Worst Practices - Not Using Primary Keys and Clustered Indexes

Rate

Share

Categories

Share

Rate

Related content

Discuss content posted by MAB

Indexed Views

What is Denormalization?

Full-Text Indexing Tips

Normalizing Dates