Over the years I have come to see that every database has what I call data type drift. Simply put, data type drift is when you have columns with the same name but different data types or length. I’d say about 97% of databases I’ve reviewed have some form of drift. So why is that number so high?
Have you ever gone out to eat at your favorite restaurant and noticed that something in your favorite meal was a bit off? This happened to me. My favorite Chinese restaurant had the same cook for 17 years and one day he decided to be a driver for DHL. I may have never noticed because he wasn’t visible to the front of the house but I knew the food was quite different. I must have had the same Kung Pow 100 or more times and this was not the same. I knew the owner well and he told me about the cook.
This same scenario happens over time with an application. As developers come and go, the database can drift. One developer has always used varchar(20) for FirstName and the next developer has always used nvarchar(50).
This may happen all at once as well. Consider an application that has different modules, each with it’s own developer or development team. When a data architect isn’t present and the data model is not restricted through an ERD, you get drift.
Why is this important?
When columns that don’t have the same data type are used in a join or union an implicit conversion is performed to convert one of the columns to equal the other. This results in higher consumption of I/O, processor, and memory; resulting in longer processing time for the affected queries. In short things are slower and it’s somewhat easy to fix.
How do I identify drift?
I wrote this around eleven years ago and really haven’t updated it since.
Get the script here: Microsoft TechNet Gallery
If any column is used in a union or join then it should probably be corrected.
Here’s an example of the output:
Researching each column is easy, just use the following query:
I’ve got some drift. Should I fix it?
I always recommend only updating the column if and only if it’s frequently used in a join or union with another column that does not match the data type. If you’re never going to query them both together then don’t bother. A good example of this is the [description] column. This column name is used across various tables that have nothing to do with each other. These columns are not used in joins and never will be so they won’t be matched.
When creating a new database it is recommended that you use a data modeling tool, which will enforce “domains” on the column names and any time that column name is used again it will automatically use the same data type and length.