Toward Integrity
Modeling Constraints in the Database
Article 5: Entity supertype/subtype
Note: This is part 3 of a multi-part series. You can find the other articles here:Part 1, Part 2,Part 3, and part 4.
This is the fifth and final article of a series in which we explore a variety of business rules. We'll take a look at their expression in a conceptual model, and then drill into the details of how they can be modeled at the logical level, and ultimately implemented using T-SQL. In the first article, I introduced the overall conceptual model and explained some of the business rules that apply. In this article, we focus on the "Pet", "Cat" and "Dog" entities and their relationship to each other.
Note: I remind the reader here that the our bias for the logical model is toward correctness and simplicity rather than, e.g., flexibility . Also, the physical implementation will reflect the logical model more closely than might be prudent in practice.
For reference, figure 1 shows the complete conceptual model diagram for our Pets and Vets solution:
Figure 1: Pets and Vets conceptual model
The conceptual model (figure 2) for today's entities of intereset represents an entity supertype/subtype relationship between the generalized entity "Pet" and the specialization entities "Dog" and "Cat" -- in OO-speak, an "is-a" relationship.
Figure 2: Conceptual sub-model
The business rule of interest for this installment is:
- For dogs, we track whether or not they are "AKC Registered". For cats, our concern is whether or not it's been "Declawed".
But the following rule also comes into play:
- A pet is identified by a "License Number" (which is unique within our example jurisdiction, regardless of the species of pet).
At this point, I would like to bring your attention an ambiguity present in the conceptual model. Note that we can interpret the diagram in at least four ways:
- "Dog" and "Cat" are the only species of "Pet" allowed, and a "Pet" must express the additional information present in the extension.
- "Dog" and "Cat" are the only species of "Pet" allowed, and a "Pet" may express the additional information present in the extension (e.g., a "Pet" that happens to be a "Dog" may or may not store information about AKC registration).
- "Dog" and "Cat" are a subset of all species of "Pet", but are the only ones about which we will store additional information. Additionally, if a "Pet" is either a "Dog" or a "Cat", the extension information must be present
- "Dog" and "Cat" are a subset of all species of "Pet", but are the only ones about which we will store additional information. Additionally, if a "Pet" is either a "Dog" or a "Cat", the extension information may or may not be present
It is the last interpretation that we will aim for in the implementation of this database constraint. This is partly due to the weak support in SQL for expressing certain classes of constraint. More on that later.
In the logical model (figure 3), the three entities are converted to tables with a foreign key "from" each subtype table (Dogs, Cats) "to" the supertype table (Pets). The Dogs and Cats tables "inherit" the candidate key {LicenseNbr} from the Pets table. By sharing a candidate key in this way, the cardinality on the relationship from supertype to subtype is 1:[0..1]. The exclusivity constraint is shown on the logical model diagram by drawing an arc across the relationship lines for supertype and subtypes.
The AKCRegistered and Declawed attributes in each table are of a hypothetical domain "Y/N" which represents a simple "Yes or No" value. The domain and column constraints will be implemented via the familiar datatype + CHECK constraint in T-SQL.
Figure 3: Logical model of the generalization/specialization design
The next step in the refinement of the logical model is to establish a mechanism for enforcing the exclusivity part of the supertype/subtype relationship. To start with, we will propagate the SpeciesName attribute in a foreign key relationship from the Species table to each of the subtype tables, Dogs and Cats (figure 4).
Figure 4: Logical model extending {SpeciesName} attribute to the specialization tables
For the Cats table, SpeciesName will only be allowed to take the value "Cat", and for the Dogs table, it can only be "Dog". These constraints represented on the drawing with annotations will be implemented using simple CHECK constraints.
The supertype entity "Pet", modeled by the table "Pets" can now leverage the SpeciesName attribute to determine which (if any) of the subtype tables are applicable. The attribute SpeciesName, when used this way, is called an inspection attribute [DE HANN, KOPPELAARS, 2007].
Reviewing the physical model diagram (figure 5), we see how we can again leverage a reducible superkey (see earlier articles) in the Pets table, {LicenseNbr, SpeciesName}, to enforce that not only will all table values for Dogs and Cats match a Pets row on LicenseNbr, but will match on SpeciesName as well. The exclusivity between the two tables is enforced by the CHECK constraints on SpeciesName in each. Therefore, we can never have, for example, a particular Pets row that is related to both Dogs and Cats.
Figure 5: Physical model showing the reducible superkey in Pets
So we now have exclusivity between the supertype and subtype tables. At the same time, we notice that there is nothing to ensure that a particular Pets row that happens to model a Dog or a Cat must have a corresponding row in the appropriate subtype table. This gets back to the interpretation we chose for the conceptual model earlier. Let's diverge for a moment and think about what it would take to enforce a rule that if a specialization exists, the generalization table must use it when the subtype matches.
The most straightforward way would be if we could declare a constraint comparing the cardinality of the 'Dog' and 'Cat' rows in Pets with the cardinality of the union of the 'Dogs' and 'Cats' tables. If they're equal, the constraint is satisfied. The first problem we would run into, though, is actually expressing the constraint. The second problem would be evaluating the constraint in the face of multiple table updates - a "chicken-and-egg" problem.
The SQL standard specifies the CREATE ASSERTION syntax for expressing this type of constraint, but it is not supported in SQL Server. We might be able to emulate this behavior with user-defined functions in T-SQL. Assuming we could do that in a way that yields decent performance, we are left with the "chicken-and-egg" problem.
Oracle solves the "chicken-and-egg" issue by allowing one to defer constraint checking to transaction boundaries. This is a workaround for the proper solution, namely to support a syntax for multiple assignment [DATE, 2005]. In T-SQL, we don't have a similar deferred checking feature, though, so we would need to go through some type of "ignore constraint -> perform updates -> reinstate constraints" procedural logic to accomplish the desired results. Either that, or go with a trigger-based enforcement mechanism (which is probably even more undesirable, see article 3 in the series).
Digression aside, let's now take a look at the T-SQL to create the new subtype tables (listing 1). Remember from article 3 that we already have the {LicenseNbr, SpeciesName} superkey in place, so no additional changes are needed for table Pets.
CREATE TABLE dbo.Dogs(
LicenseNbr int NOT NULL,
SpeciesName varchar(24) NOT NULL
CONSTRAINT fk_Dogs_to_Species
FOREIGN KEY REFERENCES dbo.Species(SpeciesName)
CONSTRAINT ck_Dogs_SpeciesName
CHECK (SpeciesName = 'Dog'),
CONSTRAINT fk_Dogs_to_Pets
FOREIGN KEY (LicenseNbr, SpeciesName)
REFERENCES dbo.Pets(LicenseNbr, SpeciesName),
AKCRegistered char(1) NOT NULL
CONSTRAINT ck_Dogs_AKCRegistered
CHECK (AKCRegistered IN ('Y', 'N'))
)
GO
CREATE TABLE dbo.Cats(
LicenseNbr int NOT NULL,
SpeciesName varchar(24) NOT NULL
CONSTRAINT fk_Cats_to_Species
FOREIGN KEY REFERENCES dbo.Species(SpeciesName)
CONSTRAINT ck_Cats_SpeciesName
CHECK (SpeciesName = 'Cat'),
CONSTRAINT fk_Cats_to_Pets
FOREIGN KEY (LicenseNbr, SpeciesName)
REFERENCES dbo.Pets(LicenseNbr, SpeciesName),
Declawed char(1) NOT NULL
CONSTRAINT ck_Cats_Declawed
CHECK (Declawed IN ('Y', 'N'))
)
GO
Listing 1: Dogs and Cats tables including foreign key constraints
And we finish off with the "unit test" for our supertype/subtype design (listing 2).
PRINT '
Valid: Dogs'
INSERT INTO dbo.Dogs(LicenseNbr, SpeciesName, AKCRegistered)
SELECT 1001, 'Dog', 'Y' UNION ALL
SELECT 1002, 'Dog', 'N'
GO
PRINT '
Invalid: Dogs - bad LicenseNbr'
INSERT INTO dbo.Dogs(LicenseNbr, SpeciesName, AKCRegistered)
SELECT 2001, 'Dog', 'Y'
GO
PRINT '
Invalid: Dogs - bad SpeciesName, check'
INSERT INTO dbo.Dogs(LicenseNbr, SpeciesName, AKCRegistered)
SELECT 1003, 'Cat', 'Y'
GO
PRINT '
Invalid: Dogs - bad SpeciesName, FK reference'
INSERT INTO dbo.Dogs(LicenseNbr, SpeciesName, AKCRegistered)
SELECT 1003, 'Dog', 'Y'
GO
PRINT '
Invalid: Dogs - bad AKCRegistered'
UPDATE dbo.Dogs
SET AKCRegistered = 'X'
WHERE LicenseNbr = 1001
GO
PRINT '
Valid: Cats'
INSERT INTO dbo.Cats(LicenseNbr, SpeciesName, Declawed)
SELECT 1003, 'Cat', 'Y' UNION ALL
SELECT 1004, 'Cat', 'N'
GO
PRINT '
Invalid: Cats - bad LicenseNbr'
INSERT INTO dbo.Cats(LicenseNbr, SpeciesName, Declawed)
SELECT 2001, 'Cat', 'Y'
GO
PRINT '
Invalid: Cats - bad SpeciesName, check'
INSERT INTO dbo.Cats(LicenseNbr, SpeciesName, Declawed)
SELECT 1003, 'Dog', 'Y'
GO
PRINT '
Invalid: Cats - bad SpeciesName, FK reference'
INSERT INTO dbo.Cats(LicenseNbr, SpeciesName, Declawed)
SELECT 1001, 'Cat', 'Y'
GO
PRINT '
Invalid: Cats - bad Declawed'
UPDATE dbo.Cats
SET Declawed = 'X'
WHERE LicenseNbr = 1003
GO
Listing 2: Unit test for specialization tables
Conclusion
For this article, we studied a design and implementation for entity supertype/subtype relationships. We saw how the conceptual (business) model must be disambiguated at analysis time, and explored the implications at the logical and physical level given differing interpretations. And, again, we finished with a reference implementation in the form of T-SQL scripts.
Series Wrap-up
As Chris Date and Fabian Pascal have pointed out in various writings, database design is nothing more (nor less) than constraint specification. My hope for the series is that the reader was exposed to design (constraint declaration) ideas they may not have considered before. Approaching constraint specification in alignment with the classification scheme outlined in the first article leads to a methodical approach to design. I think that data quality starts and ends with thoughtful analysis, careful logical modeling and physical implementation, fully cognizant of tradeoffs being made each step of the way.
References
de Haan, Lex; Koppelaars, Toon, Applied Mathematics for Database Professionals (Apress, 2007)
Date, C.J., Database In Depth - Relational Theory for Practitioners (O'Reilly 2005)
About the author: Troy Ketsdever is a database specialist with over 15 years of commercial softwaredevelopment experience. His main objective and vision is "making the right information available to theright people at the right time".
The author would like to thank Joseph Kelly for his help with reviewing the article drafts for this series, and Logan Schoonmaker for the "pets and vets" sample problem.