SQL Server as an EAV Database -Good Idea?
It seems like I am getting more and more inquires from potential clients asking for help solving performance related issues with what, after some investigation, is using SQL Server as an EAV database.
First, for those unfamiliar with the concept, an Entity-Attribute-Value (EAV) database seems disarmingly simple and seductive to many developers. In its simplest form, an EAV database requires only one table, and that table has only four columns: an Identifier, the Entity, the Attribute, and the Value of the Attribute. The entity might be a Customer, the Attribute ‘Customer Name’, and the Value would be the actual customer's name. The same table could also contain Products, or Invoices, or Employees, etc.
The beauty of the design is that the developer is free from ever having to consider a table schema. Need another column, that’s just another Attribute/Value pair. Need another table -just a different type of entity. Add as many new entities, or new columns as is desired. One never has to negotiate with a DBA again. There is never a need to have to write that messy and sometimes problematic ALTER TABLE code again. Wonderful concept if you have to distribute changes to the data model in situations where the user (or application) does not have permissions to muck around with the database, the schema, and all of that DBA stuff. There is only one data model, and it is only one table that has four columns. What could go wrong with that?
And, of course, it is possible to select the ATTRIBUTE/VALUE pairs for an ENTITY (or group of entities) and PIVOT the data to simulate a table. It seems simple enough –doesn’t it? (If you want more information, just search for Common Table, EAV Database, Table of Tables, etc.)
In fact, the beauty is in the elegance of the model. Frankly, before the advent of SPARSE columns, you could save a lot of storage when there was a long list of potential attributes (columns), and most entities would only use a small portion of the potential attributes. And it works wonderfully for many situations: obviously, property ‘bags’, somewhat small and static tables –you know, the types of data that will be used to populate various UI controls, or be used for data validation, or internal processes.
I’m not going to address the issues of the increasing complex code required to pivot the faux tables –I’ll leave that to later discussions. Today, I’m interested in performance issues. One client has an EAV database that is approaching two billion rows in that single table –and the performance is starting to become quite troublesome. They really didn’t like my even broaching the subject of a data model redesign –too much ego involved in this elegant design.
I’m interested in your experiences, cautions, and concerns surrounding performance issues of using SQL Server as an EAV database.
So tell me, when does something that is elegant from one perspective become burdensome and inelegant from another?