Did you know you can take a database to the sixth normal form? Have you ever seen a database in sixth normal form? First, before you answer, I think it is important to note that I do believe in normalization, and I think it is very important in relational databases. Second, I have recognized over the years that sometimes you can have too much of a good thing and databases can be over normalized.
Clearly, normalization helps reduce redundant data. This makes sense, I don’t know many people that want to copy the same data over and over again. Also, data integrity is important. Having foreign keys can protect your systems from deleting needed data and can also improve performance. Normalization can also help with lookups and populating dropdowns with the only valid data for columns.
The issue I sometimes see with over normalization is when the number of tables explodes since there are only one or two columns on each table. The queries of that highly normalized system end up having ten or more joins for every query. It also seems like old data is nearly impossible to delete since all the foreign keys make it extremely difficult to actually delete any data. Systems do things like soft delete, but the data is still there and the database size keeps growing. In short, highly normalized systems can become difficult to work with.
Clearly, de-normalized databases also have major issues. I have seen tables that have hundreds of columns. Basically, everything is thrown into one table. In this situation there tends to be data issues and a lot of repeated data in the rows.
So is there some happy medium? I don’t want a completely de-normalized system, but I also don’t want one so highly normalized that I have too many joins and have trouble figuring out what table has the data I need.
I think there can be a more middle ground, but it depends on the system. Smaller systems that you can exert more control over are a good example. With these systems you can only allow data access via stored procedures. In this case some of the data validation can exist not via normalization and foreign keys, but by controlling how and when the data gets inserted and updated into the system.
I have a feeling this may be an unpopular view, but I have built a number of these smaller systems now with great success. The tables are still normalized, just not to the degree where they become difficult to work with. I carefully pick and choose which tables to use normalization to force data integrity and which ones to control via stored procs.
So what do you think? Do databases and tables always need to be as normalized as possible? Have you noticed issues in highly normalized databases? Have you found ways to mitigate these issues?