SQL Server Lookups and Enums
We are told in database design class that normalisation is a good thing. We
shouldn't repeat values, for many reasons including performance and
maintainability. This means most of us have reference/lookup tables to hold
lists of values. These tables will generally have 2 columns, a surrogate primary
key normally the smallest integer data type as possible and a description column
normally varchar(x).
create table PaymentType (PaymentTypeId tinyint not null ,PaymentTypeDescription varchar(255) not null ,constraint PK_PaymentType primary key (PaymentTypeId) )
What this means is that the related tables contain lots of foreign key columns that one there own have no meaning, they just contain a bunch of
numbers.
create table Order (OrderId int not null ,OrderDate datetime not null ... ,PaymentTypeId tinyint not null )
If you want to retrieve data from the related tables based on the lookup value you have two options. You can either hard code the surrogate keys in your code, or join to the lookup table and use the description.
select Order.OrderId, Order.OrderDate, Order.PaymentTypeId from Order where PaymentTypeId = 3 --Invoice type
or
select Order.OrderId, Order.OrderDate, Order.PaymentTypeId from Order join PaymentType on Order.PaymentTypeId = PaymentType.PaymentTypeId where PaymentTypeDescription = 'Invoice'
Neither of these is ideal. The first makes your code un-readable (unless you use comments) and will only work if you have inserted the data into the lookup table with the same surrogate keys. If you have used an identity column this becomes much more difficult, especially when you have multiple environments.
The second option adds a level of complexity to all of your queries and if you have multiple lookups can affect performance.
In SQL Server 2000 and SQL Server 2005 where my application needs to perform lookups like the examples above I change my lookup table to use a meaningful surrogate key. I do this by changing the surrogate key to a 4 character column. This size is a provides data storage the same as an integer and should allow for meaningful values.
create table PaymentType (PaymentTypeCode char(4) not null ,PaymentTypeDescription varchar(255) not null
,constraint PK_PaymentType primary key (PaymentTypeCode) ) create table Order (OrderId int not null ,OrderDate datetime not null ...
,PaymentTypeCode char(4) not null )
Your code can then look like this,
select Order.OrderId, Order.OrderDate, Order.PaymentTypeCode from Order where PaymentTypeCode = 'INVC'
This is still not ideal because even if you only have 10 lookup values you are still using 4 bytes to store the surrogate key, when you could have used a tinyint and only used 1 byte
Proposed Solution
What would be great would be able to have the best of both worlds, be able to
use meaningful in code whilst maintaining the performance of small integer
values.
If you have programmed outside of TSQL i.e. C# or VB then you should now be
muttering something along the lines of "what we need is enums". Spot on, that is
what I am suggesting "Allow literals and read only table data to be represented as enums"
select Order.OrderId, Order.OrderDate, Order.PaymentTypeCode from Order where PaymentTypeId = PaymentType.Invoice
If you think it would be good to be able to do the following then please vote
for the suggestion on connect, "Allow literals and read only table data to be represented as enums( https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=254293)"
About Simon Sabin
Simon is a Independent Consultant focusing on database architecture.
He specializes in performance SQL Server systems and recently on search technologies.
To keep up on Simons thoughts on SQL
Server and other database related topics, read his blog
You can also read about Simon in a
SQLServerCentral article by Steve Jones