Entity-Attribute-Value model (EAV) is used interchangeably with name for name value pair.
Generally name value pair data modelling is not a term widely used in relational database management systems. Name value pair is widely used in programming context. But this article will focus on whether name value pair in relational database design environment is useful or not. Name value pair or EAV is one of the design anomalies when it comes to modeling a data. This type of approach towards data modeling are mainly comes from application programmers. The argument is that, it is a widely known ways of programming and application programmers think that it can also be applied to data modeling. I am not saying here that application programmers don't understand data modeling but they are trying to apply the programming context to data modeling . Wouldn't it be good if we don't need to change the schema of the database when we add additional name value pair ? I do agree with some of the points that they are trying to make. But, in real world scenario things are different..
In my experience, I have come across a company trying to build a generic data model to maximize flexibility. In my opinion, data models that are based on maximum flexibility tend to suffer in performance. It is not easy to generalise my opinion here but a well tailored data model would serve its purpose in most cases.
Name value pair violates relational database rules. Is this statement true? I would like to keep you thinking what the answer to the above question would be until you come to the end of this article.
Name value pair in most cases can be stored in single table or for those who are a bit not happy with the idea and want to introduce some relational database design concepts, a bodged normalisation can be done. For instance see the following named value pairs. I purposely added the first column so that a unique key identifies every row in this table.
UniqueId | NameValueId | Name | Value |
---|---|---|---|
1 | 1 | FirstName | Barry |
2 | 1 | LastName | Jon |
3 | 2 | FirstName | Andrew |
4 | 2 | LastName | Nick |
In the above case, one can argue that there is a repetitive value on Name column. So, some people tend to create a lookup table for all attribute names. The Id of the created lookup table will then replace the name column in above table. This just solves the problem with storage and better indexing but the overall design still remains the same. One could argue that the above table is in third normal form, as it doesn't actually violate the rules of normalisation. At the same time one could argue that this is not in any normal forms as one of the columns will store the attribute names rather than the data. It is only the third column that stores a data. Both arguments are right but is it a good idea to design your database in this way? I have come across a database designed in this way but the end result was rather different.
Before going deep into examples of this type of approach, let us look at the cons and pros of this approach towards data modelling. The lists are not confined to the one I have pointed but there could be more that you could add. I have listed here the most obvious once.
1. Advantages
- Very flexible to add an attribute that you don't know in advance without redesigning your database structure
- Can be a good idea to collect the attributes of unknown data. For example, it is difficult to know the attributes of some complex research data and using this methodology would help to assemble this attributes that can feed into proper data modelling.
- Database level complexity is simplified when inserting data into the table. The number of procedures to insert into the above table is only one or at most two if you are going to partition your table to increase performance.
2. Disadvantages
Difficult to query and transform the data into meaningful information. To display a data in
tabular format of meaningful way, your query may involve many case statements, sub-queries, self joins and etc. This will impact on performance of your query specially when the number of rows are getting bigger
- Unless the table is partitioned it can grow fast and querying the table would take longer time.
- It is not possible to enforce business rule constraints and default values for an attribute as attributes are modelled as being data.
- Difficult to the integrity of data as again as attributes are modelled as being data.
- Could result in changing the value column to text data type, as the value of name value pair can be significantly different ranging from bit to text data type. This will result in ineffective storage design
- Effective index design strategy is difficult. Indexing of the value column is could be very difficult as it can result in wide index and can impact inserting and updating.
So, looking at the above cons and pro's of using EAV towards data modeling, I would like to conclude that there might be occasions that you probably be better using this approach but for data model where most of the attributes are known and is tailored to do specific thing which most databases are for, it is better not even to consider this approach.
In part II of my article, I will try to address some of the design suggestions by different people and experiences that I had in managing similar database with examples.
References
1. http://boilerbay.com/infinitydb/forum/?postid=5
2. http://www.sqlmag.com/Articles/Inde...ArticleID=38656
3. http://oracle-wtf.blogspot.com/2006/02/eav-returns-concrete-el