Lookup Table Design Overview
When I begin to design a new database for a client, one of the dilemmas facing me is how to construct lookup tables.
When we say that a table is a lookup table for another main table, we mean that the lookup table contains values that will be used as the content of a column located in the main table. Usually the lookup table has at least two fields – CD (Code) and Description. The main table stores only the values from the CD field, but since there is a relationship between the lookup and the main table, the value of the associated Description field is also associated with the record in the main table. On the GUI’s level , lookup tables are often implemented as list boxes or combo boxes. For example: state code and state description.
There are 3 main design concepts for the lookup tables. Lets review each concept and discuss some real life scenarios.
Design 1 - Traditional
Each code has its own table with at least two fields – Code and Description (see diagram 1)
This design is simple, understandable for every developer and business person, and self explanatory. And this is why this type of the lookup table design is implemented in most databases. About 25 – 40 % of all lookup tables in a typical database are implemented using this design.
Design 2 – Combine Codes into a few tables
Usually one or two tables are created to combine all codes altogether. (See diagram 2)
Based on 2 tables the codes can be stored in database. A separation between various types of codes is done by a column “code_type”, so that all codes of the same type can be grouped together. For example:
Select lookup_cd, lookup_desc, code_type From code_lookup Where code_type = ‘ADR’
will return all address codes.
The design looks simple, but less self explanatory than the one table per code. It practically always requires the use of ‘where’ clause while join or viewing the specified type of code. The size of the code_lookup table can reach thousands of rows. Still, there are only 2 tables. This is simplistic approach for design 2. It is possible to make the design using one recursive table (see diagram 4). LOOKUP_CD_1 refer to LOOKUP_CD. And the top level of a code keeps LOOKUP_CD_1 with NULL value (or the same value as LOOKUP_CD). It is not a really practical solution because of the difficulties for the data selection with unlimited recursion depth. Even if the required depth of recursion is limited, design will be impractical. Further in this article I will show a more sophisticated and practical sample of this type of design.
Design 3 – Mixed Code Design
This one is a combination of the first and the second designs together. In a lot of cases, there are some industry standard lookup tables and some company specific ones. It may be more practical to separate some codes and combine the others.
For example: if you are using postal zip codes ( or phone area codes) and update them quarterly from USPS - ZIP Code® Directories , it is a good idea to separate them from the other codes. In some other cases, codes may be formatted differently.. For example: country (ISO 3166) or language (ISO 639) codes. A company may use the standard ISO char(2) and/or custom char(3) format. In those situations the mixed solution become more practical.
Real world example of design 2
Let me try outlining a practical implementation for design 2. To be able to satisfy a real business, different conditions should be met:
a)Ability to store various standard codes (For example: ISO, SIC, etc), as well as company or vendor specific codes
b)Ability to separate codes more granularly by type and sub-type
c)Be able to setup a sub-code of a code. For example: there are certain languages spoken in a country. Country and language codes should be entered and then linked to each other.
d)Ability to setup a corporation specific description and/or name versus the standard one.
Keep active and retired codes for audit purposes
Diagram 3 shows a sample design with all the business conditions from above. There are 4 tables which can keep all codes for the database.
Table CODE_TYPE captures the various code types. A code type can refer to an object or a vendor or a transmission type or a
language or an error code etc… For example:
CODE_TYPE_CD | CODE_TYPE_DESC |
GEO | GEOGRAPHIC CODES |
ERR | ERROR CODES |
Table CODE_TYPE_GROUP captures the various groups within code types. A group within a error code type can refer to a various means of errors: application, FTP, email, SQL Server, custom errors.
CODE_TYPE_GROUP_CD | CODE_TYPE_CD | CODE_TYPE_GROUP_DESC |
EFTP | ERR | FTP errors |
EAPP | ERR | Application errors |
COUN | GEO | Country codes |
STAT | GEO | State codes |
Table CODE captures the various codes.
Columns:
CODE_ID - A unique identifier
CODE_NO – Keep specific numbers for the standard codes. For example: FTP has its own error numbers for any specific error.
CODE_TYPE_GROUP – define the group type for the code
CODE_DESC - Provides a description associated with code
CODE_CONSTANT: Used primarily for the standard codes to keep the code value. For example: 3 characters for ISO standard country code.
STATUS – shows the status of the code. For example: ‘A’ – active, ‘D’ – deleted (non active), ‘N’ – no standards
STANDARD_FLAG – shows which standards code belongs to. For example: ‘I’ - ISO, ‘S’ - SIC
CORP_DESC – special (non standard) corporate description, if necessary.
CODE_ID | CODE_NO | CODE_TYPE_GROUP_CD | CODE_DESC | CODE_CONSTANT | STATUS | STANDARD_FLAG | CORP_DESC |
1 | 23 | EFTP | FTP ERR 1 | ERROR_FTP_WRITEFILE | A | N | |
2 | COUN | USA | A | I | US | ||
3 | LANG | ENG | A | I |
And finally, CODE_TO_CODE table keep links between codes.
CODE_ID_1 | CODE_ID_2 | CODE_TO_CODE_DESC | STATUS |
2 | 3 | USA English | A |
2 | 4 | USA Spanish | A |
As you can see, those 4 tables allows handle any code types including government or industry standard codes. Sometimes additional columns may be required to satisfy additional criteria’s. I am using ‘codename’, ‘codetypegroupname’, ‘codetocodename’ and other columns if additional information is required to be stored.
In real world, there is almost impossible to use design 2 in it’s pure form. Usually, the architect chooses design 1 or (less often) design 3.
Lookup Tables Location
Many companies work with completely separated databases and do not want to mix the tables between databases or schemas.
But there are situations where it is reasonable to combine lookup tables (and not only lookup as well) into one administrative database. For example, companies specializing in clinical trial research or telemarketing are working with hundreds of databases for the various clients but using the same non client specific codes such as languages, states, countries, application specific errors, transmissions, etc. Those codes can be centralized in one database. It will provide a greater security, encapsulation, reusability, data integrity, and maintainability. It allows for keeping lesser number of tables in each production database and makes database structure simpler. Plus, and that very important, lookup data is always synchronized between databases and can be maintained by one person. However, it is almost impossible to implement such scenario with design 1 because of sheer number of tables and references to the administrative database. I implemented lookup table solution using design 3 for some of my clients with 100+ clinical trials. Non-client specific codes become centralized in one database, while client specific data is stored in each trial database. Each trial database is created with some predefined views (select permissions only) that allow usage of reference codes while residing in the trial database.
Create view VW_CODE as Select C.CODE_ID, C.CODE_NO, C.CODE_TYPE_GROUP, C.CODE_CONSTANT, C.STATUS, C.STANDARD_FLAG, C.CORP_DESC, CTG.CODE_TYPE_GROUP, CT.CODE_TYPE From DBNM.DBO.CODE C INNER JOIN DBNM.DBO.CODE_TYPE_GROUP CTG ON CTG.CODE_TYPE_GROUP = C.CODE_TYPE_GROUP INNER JOIN DBNM.DBO.CODE_TYPE CT ON CT.CODE_TYPE = CTG.CODE_TYPE Create view VW_CODE_TO_CODE as Select CTC.CODE_TO_CODE_DESC, CTC.CODE_ID_1 , C.CODE_NO AS CODE_NO_1, C.CODE_DESC AS CODE_DESC_1 , C.CODE_CONSTANT AS CODE_CONSTANT_1 , C.CODE_TYPE_GROUP AS CODE_TYPE_GROUP_1 , CTC.CODE_ID_2 , C1.CODE_NO AS CODE_NO_2 , C1.CODE_DESC AS CODE_DESC_2 , C1.CODE_CONSTANT AS CODE_CONSTANT_2, C1.CODE_TYPE_GROUP AS CODE_TYPE_GROUP_2 FROM DBNM.DBO.CODE_TO_CODE CTC INNER JOIN DBNM.DBO.CODE C ON CTC.CODE_ID_1 = C.CODE_ID INNER JOIN DBNM.DBO.CODE C1 ON CTC.CODE_ID_2 = C1.CODE_ID
The disadvantage for this implementation is that the referential integrity can’t be controlled by a foreign key constraint.
Referential integrity was implemented by using triggers.
Conclusion
This article shows only the logical way of thinking for the lookup table design. Lookup table design implementation may have some variations from the described above and should be implemented very carefully based on the specific business conditions.