Design Oversight - Preliminary Review
We all know what the ideal application design environment is for building a
database back-end: an experienced DBA takes inputs from end users and developers
and creates the database design in order to support the application being
developed. Stored procedures, tables, views, indexes, the works, are all
developed by DBAs working closely with the programmers writing the
application. Database best practices are the rule, not the
exception.
But in reality, we don't get the opportunity to do application design like
this very often. The ideal is quickly shoved aside and often developers
are doing their own design and building their own stored procedures and making
decisions on constraints and indexing in order to get an application out the
door the day before yesterday. Meanwhile, the DBA is busy maintaining the
last few applications that went in and ensuring that the database servers are
performing as they should. Every so often, the DBA has the time to take a
close look at what the developers are doing. Usually, this is right before
some milestone when the next piece of the application is supposed to be done and
a demo to management is expected. The DBA looks at the number of tables
and realizes, "I don't even know where to begin!" Sound
familiar?
The problem of where to begin is a common one. If we were to try and
run an sp_help against every table, we might be at it forever. And
that's not even looking at the stored procedures. So we need to do some
sort of preliminary review. Basically, we need some tools to be our
"smoke detector" to spot a potential fire. Here are some
things we can look at:
- Tables without indexing
- Tables without primary keys (a subset of the previous one)
- Tables without foreign keys
- Stored procedures using cursors
Keep in mind, this list is just a smoke detector type of test. Its
purpose is to give some warning if something is suspicious, just like the smoke
detector in the hall goes off when we forget to turn on the fan over the
stove. There isn't necessarily a problem, but we are alerted that there
might be. And our smoke detector isn't going to be able to alert us
when the banks of the river overflow and our house floods. So it's not
going to flag everything. But it does give us that crucial "where to
start."
Here's a warning: the queries I'll present use the system tables (SQL Server
versions 7.0 and 2000).
Microsoft warns that it has the right to make changes to the system tables
without notice. Microsoft recommends using INFORMATION_SCHEMA views
instead, but they don't provide all the information we need or they are unwieldy
for our particular purposes. As a result, we need to
go to the system tables. Should Microsoft make a change in a
Service Pack or next release, the queries may have to be modified accordingly.
Tables Without Indexing
Tables without indexing are a concern for obvious reasons. One of the
things to remember is that primary keys are instituted with indexes, so if a
table doesn't have an index, it doesn't have a primary key. What we're
concerned with here, however, is more than just tables without primary
keys. Tables without any indexes at all are generally warning signs that
the developers don't understand the purpose of indexes and thus have not
implemented them to aid performance. There are exceptions to this
rule. For example, if a table exists as a heap table simply for the
purpose of reporting or the like, we may choose not to place an index on the
table (including a primary key, simply because an index is created) in order to
allow writes to that table to be as quick as possible.
Now in the script to follow, there is a derived table from the sysindexes
table. Basically, we want to create a derived table that doesn't include
an indid of either 0 or 255. For indid = 0 refers to the table
itself. An indid = 255 refers to a text, ntext, or image column for a
particular table. We're looking for clustered and non-clustered indexes,
so we need to look at indid between 1 and 254, 1 being a clustered index and
> 1 being a non-clustered index. I've taken the liberty of adding an
ORDER BY simply for aesthetic reasons:
/* Script to check for tables with no indexes. */
SELECT USER_NAME(so.uid) Owner, so.name [Table]
FROM sysobjects so
LEFT JOIN (SELECT id FROM sysindexes
WHERE indid BETWEEN 1 AND 254) si
ON so.id = si.id
WHERE so.type = 'U'
AND si.id IS NULL
ORDER BY Owner, [Table]
Tables Without Primary Keys
Since primary keys are established using indexes, if we look for tables
without primary keys, we're actually looking at a subset of tables without
indexes. However, it is often beneficial to lock in and point out
specifically what tables don't have primary keys, especially when talking with
developers. Chances are if we say, "These are the tables without
primary keys" we'll get a better understanding and response than if we say,
"These are the tables without indexes," simply because a developer may
not realize the connection between primary keys and indexes. So while the
previous query returns the same information and more as the following query, I
provide it for the sake of isolating those tables without primary keys.
We'll be using the sysconstraints table here, because that lets us look
at the status column. Now while the status column is int, we'll need to
perform a bitwise operation to get at the primary key (we'll do the same for
foreign keys in a bit). If we take a look at Books Online for sysconstraints, we'll
see the following information for the status field:
Value | Meaning |
1 | Primary Key |
2 | Unique Key |
3 | Foreign Key |
4 | CHECK constraint |
5 | DEFAULT constraint |
16 | Column-level constraint |
32 | Table-level constraint |
We'll have to use the bitwise and (&) operator, performing an & 7 which will only return
1 for a
given status field if the first bit is populated and the second and third bits
aren't, indicating we've got a primary
key on that table. The reason we need to do an & 7 is pretty straight
forward if we look at the numbers in binary notation: 001 = 1 and that's a
primary key, 011 = 3 and that's a foreign key, 101 = 5 and that's a
DEFAULT constraint. In all three cases we're seeing the first bit
populated. If we do an & 7, and the second or third bits are
populated, we'll either get back 3 or 5, as the case may be. If only the
first bit is populated, then we get back a 1. And that's what we want.
For this particular query, I'm ensuring dtproperties
does not appear, which is considered a user table, even though it's for storing
database diagrams. If you have replication already established (though
this isn't likely in development), you can filter out those tables as well,
because they are marked as user tables.
/* Script to check for tables with no primary keys. */
SELECT USER_NAME(so.uid) Owner, so.name [Table]
FROM sysobjects so
LEFT JOIN (SELECT id FROM sysconstraints
WHERE status & 7 = 1) sc
ON so.id = sc.id
WHERE so.type = 'U'
AND sc.id IS NULL
AND so.name <> 'dtproperties'
ORDER BY Owner, [Table]
Tables Without Foreign Keys
Tables without foreign keys can be an indicator that the developers aren't
taking a close look at referential integrity. In most any database there
will be tables without foreign keys (for example, lookup tables), but these
would normally be very few in number unless because of business rules we're
having to resort to other methods (such as triggers) to enforce integrity. If we're talking lookup tables, we
can have some sort of naming convention which identifies a lookup table clearly
as such and we can filter out those. In any case, we'll be hitting the
sysconstraints table again, except this time we'll be looking for the value 3, which means we'll be doing an & operation using
3. Since 3 in binary is 011 and 5 is 101, we can simply & 3 here,
since after 5 we jump to 16 (10000) and then 32 (100000). The & 7 will
work, too, but & 3 is sufficient.
/* Script to check for tables with no foreign keys. */
SELECT USER_NAME(so.uid) Owner, so.name [Table]
FROM sysobjects so
LEFT JOIN (SELECT id FROM sysconstraints
WHERE status & 3 = 3) sc
ON so.id = sc.id
WHERE so.type = 'U'
AND sc.id IS NULL
AND so.name <> 'dtproperties'
ORDER BY Owner, [Table]
Stored Procedures Using Cursors
Programmers tend to like cursors, because cursors fit with the procedural
paradigm of most development languages. It can often be a great learning
curve to get out of thinking along procedural lines and into a set-based
paradigm. We know that SQL Server has support for cursors, but these are
best regarded as a last resort. There will be cases where cursors are the
only option. But if we pull a list of stored procedures using cursors, we
can look through them to verify that what the developer is trying to do can't be
done any other way. For this effort, we'll have to hit against the
syscomments table, and we'll have to use the LIKE operator. So this
particular query can consume some resources. It should be noted that
the way this query is written, we'll also capture any triggers using cursors as
well, in case our developers are using them.
If developers have used the
WITH ENCRYPTION, it won't do any good looking at the syscomments table for that
particular stored procedure. So we'll filter out stored procedures that
use such. Also, since there could be multiple rows in the syscomments for
a given stored procedure definition, we'll use a select distinct since we're
interested in the owner and the name the stored procedure and only need to see
that once. So of all the
"tools" presented, this query is certainly the most resource intensive.
/* script to check for stored procedures using cursors */
SELECT DISTINCT USER_NAME(so.uid) Owner, so.name [Stored Procedure]
FROM sysobjects so JOIN
(SELECT id FROM syscomments
WHERE encrypted = 0
AND [text] LIKE '%cursor%') sc ON sc.id = so.id
ORDER BY Owner, [Stored Procedure]
Concluding Remarks
As I stated in the beginning, these queries point out potential problem areas
within the database portion of an application design. They won't spot
everything. They may flag things that are all right. However, they do
give us a starting point if we've not had the optimal amount of oversight for an
application development project. By the way, they are also good if we're
looking at how a third party app implements a database. We may not have
any control over the app (and if they've chosen the encrypt their stored
procedures the last tool won't do us any good), but at least we'll be
forewarned. Hopefully these queries will prove useful in your set of
tools.