Strict Database Standards and Conventions
I recently read Brian Knight's “Database
Standards and Conventions”. I found
that I agreed with his recommendations, but I feel that he doesn’t go far
enough. He writes “I receive more hate mail about this article than anything
I've ever written”, so I guess I’m setting myself up for a torrent of criticism
in publishing this…
Background
I learned this standard in 1996 when I started work on a
project for a large insurance company here in Sweden. Since then, I have since
used these conventions on just about every project I have worked on, and I have
learned to love them. Feel free to contact me with any thoughts or comments on
mattias@cambrianlabs.com. I work as a senior applications developer for
Cambrian Labs AB in Stockholm, Sweden.
Modules
First off, when your conceptual database design is more or
less done, divide it into modules. What modules you use totally depend on what
kind of database you’re designing. For instance; you could have one module for
contacts (addresses to companies and individuals), one module for products, one
module for contracts, one module for agents and so on. As a simple rule, tables
should more often reference tables within it’s module than tables outside it. If
they don’t, they probably belong in another module. Keep the modules fairly
small (perhaps 3 to 25 tables), otherwise the advantages are lost.
This also helps to conceptualize the database design,
learning how the system works is easier one module at a time, than the entire
system all at once. They can also be used when several people are working on the
same project - work can easily be divided along module boundaries.
Number the modules 01, 02, 03 and so on. These module
numbers will be used when naming tables.
Tables
Tables are also numbered, but their number is unique within
the module, not across modules. That means that table 01 can exist in many
modules - but they will still be unique in combination with the module numbers.
The name of the table should be T + module number + table number + “_” +
descriptive name. For instance
·
T0101_CONTACT = the first table in the first module
·
T0102_CONTACT_TYPE = the second table in first module
·
T0201_ARTICLE = the first table in the second module
·
T0202_ARTICLE_AUTHOR = the second table in the second module
This has many advantages;
·
the tables will be sorted in the module order in the Enterprise
Manager
·
finding a particular table in a long list of tables is easy when
you just have to match the prefix
·
it’s easy to tell to what module a table belongs, and therefore
what it does
Don’t worry, you’ll memorize the prefix of any frequently
used table very fast.
I find that I sometimes forget the name of tables I haven’t
used in a while, and listing all tables (when there are many) doesn’t really
help finding that one table I need. But I always remember what module the table
belongs to, so I just write sp_tables “T05%” (or actually, I write tables_like
T05, which is a script much like sp_tables, but it lists the table names first,
so I don’t have to scroll a lot ;)). Out pops something like 10 to 20 tables,
and picking out the one I need is a breeze.
Two tables should never have the same prefix, that would
make things much more difficult in the future. If you find that you have, by
mistake, created two tables with the same prefix, correct the mistake as soon as
possible, also correcting the column names!
Column names
When creating columns, their names are prefixed with the
table prefix. The only exception is foreign keys, but more on that later. So,
the name would be table prefix + “_” + descriptive name. The table
T0101_CONTACT would contain columns like T0101_NAME, T0101_ADRESS_1 and so on.
When you join two tables, there’s never any question to which table a certain
column belongs. It’s not even an issue – it’s obvious from the column name.
If there is a simple primary key (simple as in a single
column), then it is treaded as a special case. If the key is an identity column,
it’s named table name + “_” + id. It would be T0101_CONTACT_ID in this
case. If it’s a code of some kind, it’s named table name + “_” + code.
This is because the primary key is often referenced in foreign keys, and
therefore the full name is very useful to have, as I’ll show below.
As Brian Knight suggests in his paper, abbreviations should
be used sparingly, and using underscore between names is good practice. Oh, if
you forget to add an underscore somewhere in a column name and don’t go back and
change it, then it will come back and bite you. Remembering that one
exception, where there wasn’t an underscore, is very difficult.
Foreign keys
When adding a foreign key (a column that references the
primary key on another table) to a table, the column should have the name of the
original column. If my T0101_CONTACT table refers to a T0102_CONTACT_TYPE, then
the column should be named T0102_CONTACT_TYPE_ID (assuming it is the primary key
of T0102_CONTACT_TYPE). This has huge advantages;
·
It’s always obvious what table a foreign key refers to
·
It’s always obvious what columns actually are foreign keys
(as they have a different prefix)
·
When joining two tables, you have most (if not all) of the
information you’ll be needing to write the query
·
When creating the relational integrity diagram in the enterprise
manager, you’re always connecting two keys of identical names, so no searching
is required - it’s the default selection in the editor.
For instance, running the query “select * from
T0101_CONTACT”, it is immediately obvious that it contains a foreign key called
“ T0102_CONTACT_TYPE_ID”. It’s “obviously” a foreign key because it has a
different prefix. It’s also obvious that it refers to a table called
T0102_CONTACT_TYPE. Therefore it would be possible to write a join between these
two tables without doing any more research;
select
*
from
T0101_CONTACT,
T0102_CONTACT_TYPE
where
T0101_CONTACT.T0102_CONTACT_TYPE_ID =
T0102_CONTACT_TYPE.T0102_CONTACT_TYPE_ID
When one table contains two foreign keys to the same table
they’re named with the original column name plus a descriptive text – as always,
separated with an underscore.
Naming views
The naming convention of views is very similar to that of
tables, except the prefix is V instead of T. This makes it “V” + module
number + view number + “_” + descriptive name
However, it’s rather convenient to have views named like
the table they’re looking at (with the V instead of the T) and maybe a
description of what they do. This way they’re easy to remember, and they’re easy
to find. You remember the name and the number of the table, just look for views
with the same number prefixes and you’re home. This would violate the numbering
scheme in case there was more than one view per table. Also, if the view looks
at many tables (as the most often do), which of these tables should supply the
prefix?
Naming stored procedures
Again, the naming convention of stored procedures is very
similar to that of tables, but we prefix them with a P, as in “P” + module
number + stored procedure number + “_” + descriptive name.
In closing
If you were with me this far, I would like to thank you for
taking the time to read this. I hope you at least found some small piece of
wisdom among all these many words.