Toward Integrity
Modeling Constraints in the Database
Article 2: Tackling some simple constraints
Note: This is part 2 of a multi-part series. You can find the other articles here:Part 1.
This is the second article of a series in which we explore a variety of business rules. We'll take a look at their expression in a conceptual model, and then drill into the details of how they can be modeled at the logical level, and ultimately implemented using T-SQL. In the first article, I introduced the overall conceptual model and explained some of the business rules that apply. In this article, we focus on the "Species", "Vet", and "Pet" entities.
Note: I remind the reader here that the our bias for the logical model is toward correctness and simplicity rather than, e.g., flexibility. Also, the physical implementation will reflect the logical model more closely than might be prudent in practice.
For reference, figure 1 shows the complete conceptual model diagram for our Pets and Vets solution:
Figure 1: Pets and Vets conceptual model
Figure 2 shows the entities of interest for this article:
Figure 2: Conceptual sub-model; Species, Vet, and Pet
The business rules of interest for today's article are:
- A species may be seen by zero, one, or more vets / A vet is licensed to see one and only one species of animal.
- A species may be a classification for zero, one, or more pets / A pet must be an example of one particular species.
- A vet is assigned an "Experience Rating". For this example, the valid values are "Junior" and "Senior".
- A pet is identified by a "License Number" (which is unique within our example jurisdiction, regardless of the species of pet).
- A pet is assigned a "Disposition Rating", which may take either "EasyGoing" or "Difficult" as a value.
To start with, we make a rather straightforward translation to the logical model (figure 3):
Figure 3: Logical sub-model
Species
The "Species" entity from the conceptual model translates rather easily to the table "Species". A particular Species is uniquely identified by its species name (column "SpeciesName"), which is modeled using a character string at the logical level. Each species name should be stored using the singular noun.
The astute reader will notice a bit of sleight-of-hand going on here. Logically, we have a domain of "Species Names" -- a valid set of strings that name a set of species that take singular nouns. We are, however, using a more generic domain -- namely "string" -- to model them. Nothing is preventing the acceptance of certain types of erroneous data (e.g., recording both "Dog" and "Dogs" as distinct species).
This is where we see the first example of poor support for true relational domains. Lacking a facility to easily enforce the "singular noun" rule, we must rely on someone performing the role of data steward to ensure that the business rule is met. Some sort of automated dictionary facility could also prevent problems due to misspellings and plurality.
Modeling "SpeciesName" as the primary key of the table is an example of a table constraint. In fact, all primary key constraints are table constraints, as described in the first article.
Pets
Our "Pets" entity is identified by the value in its "LicenseNbr" column. This fact is noted on the conceptual model diagram by the inclusion of an asterisk next to the "License Number" attribute, and on the logical model diagram by modeling it as the primary key for the table. The domain given in the logical model is "PetLicense", which specifies the acceptable values that LicenseNbr can take. For simplicity of the example, we will consider any positive, non-zero integer to be a valid license number. A more detailed example might include some form of checksum algorithm to help validate license numbers.
The pet's "Name" column is simply a character string value. We're going to allow pretty much anything in here, including symbols. That way, we will be able to store information on "the dog formerly known as Fido", provided we have a symbol for said dog.
The DispositionRating column is of the domain DRating. Logically, DRating is allowed to take the values "EasyGoing" or "Difficult". Again, since we don't have true support for domains in T-SQL, this constraint will be implemented using a CHECK constraint on the column. Another common approach that experienced data modelers take is to create a so-called "lookup table" of valid values for the column to take.
Vets
Moving on to the "Vets", we see that we identify a vet via their TaxID number. It is assumed in the example jurisdiction that every Vet will have a Tax Identification Number assigned as specified on the IRS website:
"An Individual Taxpayer Identification Number (ITIN) is a tax processing number issued by the Internal Revenue Service. It is a nine-digit number that always begins with the number 9 and has a 7 or 8 in the fourth digit, example 9XX-7X-XXXX." This will be a good rule to implement via a CHECK constraint at the physical level.
A vet has a name, which, like pets, is modeled using a simple string datatype. The "ExperienceRating" is similar in structure to the pet's "DispositionRating" - it takes one of two possible values ("Junior" or "Senior"), and will be implemented in the same way (using a CHECK constraint on the column's definition).
Relationships
The association "A species may be a classification for zero, one, or more pets / A pet must be an example of one particular species" is modeled using a foreign key constraint wherein the Pets.SpeciesName column references the Species.SpeciesName column. A foreign key is an example of a database constraint which, stated simply, says that all values in the referencing table's column (or columns) must match a value in the referenced table's corresponding column(s). In SQL, the referenced table's columns must have a table constraint that enforces the uniqueness of the values in those columns for each row.
Another way to think of a foreign key constraint is that the column(s) in the referenced table provide a "pool" of legal values that the referencing table's column(s) can take. Although a rather informal way to view foreign keys, it will serve us well in subsequent articles when we deal with some of the more complex multi-table constraints.
We see, then, that it is fairly straightforward to apply the relational model in order to translate the conceptual model into the logical model for this subset of entities.
To move from the logical model to the physical implementation, we translate the various specialized domain and column constraints to T-SQL column definitions along with their CHECK constraints. This translation is quite simple for today's installment, so I'll skip the picture of the physical model and go straight to T-SQL code. Listing 1 shows the T-SQL code to create the Species, Pets, and Vets tables.
CREATE TABLE dbo.Species(
SpeciesName varchar(24) NOT NULL
CONSTRAINT pk_Species
PRIMARY KEY
)
GO
CREATE TABLE dbo.Pets(
LicenseNbr int NOT NULL
CONSTRAINT pk_Pets
PRIMARY KEY
CONSTRAINT ck_Pets_LicenseNbr
CHECK (LicenseNbr > 0),
PetName varchar(16) NOT NULL,
DispositionRating varchar(12) NOT NULL
CONSTRAINT ck_Pets_DispositionRating
CHECK (DispositionRating IN ('EasyGoing', 'Difficult')),
SpeciesName varchar(24) NOT NULL
CONSTRAINT fk_Pets_to_Species
FOREIGN KEY REFERENCES dbo.Species(SpeciesName)
)
GO
CREATE TABLE dbo.Vets(
TaxID int NOT NULL
CONSTRAINT pk_Vets
PRIMARY KEY
/* An Individual Taxpayer Identification Number (ITIN)
is a tax processing number issued by the Internal
Revenue Service. It is a nine-digit number that always
begins with the number 9 and has a 7 or 8 in the
fourth digit, example 9XX-7X-XXXX. */
CONSTRAINT ck_Vets_TaxID
CHECK (TaxID LIKE ('9__[7,8]_____')),
VetName varchar(16) NOT NULL,
ExperienceRating varchar(12) NOT NULL
CONSTRAINT ck_Vets_ExperienceRating
CHECK (ExperienceRating IN ('Junior', 'Senior')),
SpeciesName varchar(24) NOT NULL
CONSTRAINT fk_Vets_to_Species
FOREIGN KEY REFERENCES dbo.Species(SpeciesName)
)
GO
Listing 1: Species, Pets, and Vets
It's a good idea to get in the habit of verifying that the intended constraints are working correctly. This can be accomplished by attempting to insert various combinations of valid and invalid data into the test database. Listing 2, which follows, is a simple script to do just this. Checking both the success and failure conditions in this way gives us a sort of "unit test" that can be used to validate the implementation. For this and all subsequent articles, a short unit test script will be included so that we can be sure the constraints are working as expected.
PRINT '
Valid: Species'
INSERT INTO dbo.Species(SpeciesName)
SELECT 'Cat' UNION ALL
SELECT 'Dog' UNION ALL
SELECT 'Fish'
GO
PRINT '
Valid: Vets'
INSERT INTO dbo.Vets(TaxID, VetName, ExperienceRating, SpeciesName)
SELECT 900700001, 'Dr. Smith', 'Senior', 'Cat' UNION ALL
SELECT 900700002, 'Dr. Jones', 'Junior', 'Dog' UNION ALL
SELECT 900700003, 'Dr. Star', 'Senior', 'Dog'
GO
PRINT '
Invalid: Vets - bad SpeciesName'
INSERT INTO dbo.Vets(TaxID, VetName, ExperienceRating, SpeciesName)
SELECT 900700004, 'Dr. Error', 'Senior', 'Horse'
GO
PRINT '
Invalid: Vets - bad ExperienceRating'
INSERT INTO dbo.Vets(TaxID, VetName, ExperienceRating, SpeciesName)
SELECT 900700004, 'Dr. Error', 'Masterful', 'Cat'
GO
PRINT '
Invalid: Vets - bad TaxID'
INSERT INTO dbo.Vets(TaxID, VetName, ExperienceRating, SpeciesName)
SELECT 900600001, 'Dr. Error', 'Senior', 'Cat'
GO
PRINT '
Valid: Pets'
INSERT INTO dbo.Pets(LicenseNbr, PetName, DispositionRating, SpeciesName)
SELECT 1001, 'Fido', 'EasyGoing', 'Dog' UNION ALL
SELECT 1002, 'Rover', 'Difficult', 'Dog' UNION ALL
SELECT 1003, 'Fluffy', 'EasyGoing', 'Cat' UNION ALL
SELECT 1004, 'Simba', 'Difficult', 'Cat'
GO
PRINT '
Invalid: Pets - bad SpeciesName'
INSERT INTO dbo.Pets(LicenseNbr, PetName, DispositionRating, SpeciesName)
SELECT 2001, 'ErrorPet', 'EasyGoing', 'Horse'
GO
PRINT '
Invalid: Pets - bad DispositionRating'
INSERT INTO dbo.Pets(LicenseNbr, PetName, DispositionRating, SpeciesName)
SELECT 2001, 'ErrorPet', 'Frantic', 'Cat'
GO
Listing 2: "Unit tests" against constraints
Conclusion
In this article, we took a look at some of the simple constraints present in our models. By now, the reader should have a good understanding of the terminology to be used in this series of articles, thus providing a basis for exploring more complex constraints. We have also introduced a means of verifying the correct behavior of the database using simple T-SQL "unit test" scripting.
In the next installment of this series, we will take a more detailed look at the Pets - Vets relationship
About the author: Troy Ketsdever is a database specialist with over 15 years of commercial softwaredevelopment experience. His main objective and vision is "making the right information available to theright people at the right time".
The author would like to thank Joseph Kelly for his help with reviewing the article drafts for this series, and Logan Schoonmaker for the "pets and vets" sample problem.