Name value pair (EAV) model

  • @tymberwyld

    When you need to display different descriptions for Products based on culture, you're doing an EAV even though you've defined the columns very specifically to Products, and in most minds it's a 1-to-Many design, it's underlying concept is still EAV.

    No, that's 1 to Many. EAV abstracts out the type of the data. If you have a table like: StringID, LanguageID, DisplayText to handle internationalization, that is not EAV.

    If you have a table like: EntityID, AttributeID, AttributeValue where the various attributes addressed by AttributeID are unrelated and/or AttributeValue can contain multiple types of data then you have EAV.

    When you get on a roll normalizing your database and abstracting everything, it becomes tempting to keep abstracting until you end up having abstracted the database out of your data, that is where you find EAV.

    Having a few untyped fields for the customer to use for reporting is acceptable, but building your database that way isn't.

    @jj B

    As has been mentioned, your survey design isn't EAV (under most circumstances). It is normalized. My survey databases look very much like that described by tymberwyld with a couple of caveats:

    There are two multiple choice types: Choose One and Choose Many

    True/False questions are multiple choice questions (Not special). This allows for No Answer and Not Applicable responses.

    I have two value columns, one for choices and one for free texts.

    To handle Choose Many I set the "value" associated with a response as a power of two and then store the sum of the selected values. This is a little more work on the front end but allows for easily finding surveys with the exact same answers. It does make it more difficult to find "all surveys that selected B", but not that much.

    Also note that my survey sets are very small so I don't notice any performance problems that may exist.

    --

    JimFive

  • Good input. You are right, it is 1-to-many, I probably didn't phrase that correctly. I meant it is somewhat of an EAV, but instead of being so arbitrary (as EAV is), it's become specific to it's purpose. My example was bad but I wanted to note that you didn't need a table per Culture, simply one table that stored the various descriptions and maybe a Culture column on it.

    As for your design, I can't believe I didn't think of storing the T/F as multiple choice and making the multiple choice store as a bit mask, this makes great sense!

    So if I had:

    How much wood can a woodchuck chuck in an hour (multiple choice)?

    1 - five (bitmask = 1)

    2 - four (bitmask = 2)

    3 - ten (bitmask = 4)

    and if the user selected all then the value would be stored as: 7

    Select Power(2, @Answer1 - 1) | Power(2, @Answer2 - 1) | Power(2, @Answer3 - 1)

    Or simply

    Select Power(2, 0) | Power(2, 1) | Power(2, 2)

    Or even

    Select 1 | 2 | 4

    Then, to check which answers the user selected:

    Select

    @Answer & Power(2, 1 - 1) = Power(2, 1 - 1) As DidYouSelectAns1,

    @Answer & Power(2, 2 - 1) = Power(2, 2 - 1) As DidYouSelectAns2,

    @Answer & Power(2, 3 - 1) = Power(2, 3 - 1) As DidYouSelectAns3

    Or more simply

    Select

    @Answer & 1 = 1 As DidYouSelectAns1,

    @Answer & 2 = 2 As DidYouSelectAns2,

    @Answer & 4 = 4 As DidYouSelectAns3

    And if you want to validate against the correct answer

    @CorrectValue = Power(2, 2 - 1)

    If ((@Answer & @CorrectValue) = @CorrectValue)

    Print 'You are correct'

    I hope my examples aren't too confusing, this isn't a discussion on bitmasking, but I want other readers to understand the power of it for storing answers to T/F or multiple choice questions. In a T/F question, the bitmask values are simply 1(T) or 0(F).

  • Surveys are Frequently collected for statistics.

    Bit mask seems like a poor choice. You'd force the query plan to scan the bitmask field to look for all surveys that answered question 2 false. (think about any bitmask fields sort order in its index)

    If there were multiple clients querying for the current total of people that answered Q2 false .....

  • David,

    You are, of course, correct. The best way to store a choose-many would be just to record each answer in a table like: SurveyID, QuestionId, AnswerID [AnswerText]. Allowing each SurveyID, QuestionID to have multiple answers. You then need to do some checking on the choose-ones to make sure that you don't accept multiple responses for those questions.

    I found that the bitmask method made it easier to create the data entry form at the cost of more complicated reporting.

    --

    JimFive

  • I remember building a COM object based system where the com objects represented entities which corresponded to tables in the database. To avoid having to change the interface of the com objects everytime we wanted a new data item we implemented a EAV model in the database table and exposed this through generic methods. In the end performance and the programming model were compromised.

    At the end of the day what you are doing is replicating what the database engine is doing under the hood at a higher level. Databases are good at referring column names and tables and constructing queries based upon them. Its also not generally a problem to alter tables to reflect user requirements (although the application model cant be rigid as in above), or to have specific table constructs to handle user data etc.

    One additional aspect I have not seen covered is reporting. Every tried to stick a third party reporting tool onto an EAV model? Not pretty. You end up having to build views to construct virtual tables based on the EAV tables and performance takes a further dive.

    As has been pointed out where EAVs do work well is in auditing and event reporting. Auditing especially works well with a table, column, oldvalue, newvalue type of EAV, but this is a specific case. In general I would stay well clear

  • I do understand that an Index on this theoretical "Bitmask" column would be expensive to maintain, but how is Indexing a Bitmask column any different than say a "LastName" column? Is SQL any smarter when indexing millions of Patients' last names? In other words does SQL interpret your query and go 'Oh, you're looking for smith, ok, find all names that start with "S" and then break that down into "mith"'? Or does it just look at the name in it's entirety. Isn't SQL supposed to be smarter when looking for numbers?

    Mathematical calculations have always been superb in SQL, so I can't see how "Select ID From TestAnswers Where Anwser & Power(2, 2 - 1) = Power(2, 2 - 1)" (2 = correct answer) is any different than when we are doing summations on Monetary columns? True, you might not Index all columns you want to sum on, but when I want a query that is like "Give me all Sales where the Total Price was >= $500". How is this going to be any better?

    Also, as far as reporting goes, yes, you do have to build views to give a better representation about the data in an EAV. However, that IS the main point of an EAV! EAV's give you the capability to report on the same data in a myriad of ways without the need for extra columns, changes in schema, or redesigning an application. I never report from tables anyway so I've never been limited in this factor and I have been using Reporting Tools for over 10 years (Crystal Reports 6 - 11.5, Reporting Services, MS Access, DevExpress XtraReports, etc, etc).

    I'm not trying to be snoddy, I truly want to understand. 😀

  • Not specifically commenting on the bitmask approach but with the more general mechanismdont your views end up as massive nested statements. ie: select (select value from EAV where id=? and name='Title') as Title,(select value from EAV where id=? and name='Forenames') as Forenames,(select value from EAV where id=? and name='Surname') as Surname

    Where performance is always going to be n * worse than a standard view where n is the number of columns retrieved. Or am I missing something?

    I still maintain that database engines are very going at doing this kind of operation under the hood but poor at surfacing it via SQL

  • Somewhat, but not that bad. Even so, because the EAV tables are usually very narrow (meaning, only a few columns), there's usually 1 or 2 indexes that do the job very efficiently. So, LEFT JOIN-ing the same table multiple times to essentially "PIVOT" the information horizontally has never been a huge performance hit for us because it always hits the Indexes in the execution plan as opposed to doing full table scnas for every JOIN. Now, you might get away with newer techniques in SQL 2005 being more effecient like the use of CTE's and the new PIVOT function (although I've never had huge success with the PIVOT function).

    I have since been using an Xml column for most of my EAV models now that SQL 2005 gives awesome support for breaking out Xml. But in a sense it's pretty much the same except there are no multiple JOINs.

  • I do understand that an Index on this theoretical "Bitmask" column would be expensive to maintain, but how is Indexing a Bitmask column any different than say a "LastName" column? Is SQL any smarter when indexing millions of Patients' last names? In other words does SQL interpret your query and go 'Oh, you're looking for smith, ok, find all names that start with "S" and then break that down into "mith"'?

    Basically, yes. The problem with a Bitmask in general, and with indexing a bitmask is that you rarely want the whole bitmask, you want to know whether some bit is set, but that isn't how indexing works. (Now, if there was a way to mark a bitmask as such and SQL would then index it differently that would change things).

    Mathematical calculations have always been superb in SQL, so I can't see how "Select ID From TestAnswers Where Anwser & Power(2, 2 - 1) = Power(2, 2 - 1)" (2 = correct answer) is any different than when we are doing summations on Monetary columns?

    It is different in the same way that using SUBSTRING is different. It is easier for the engine to find all words that start with "A" than to find all words that have "A" as the 3rd letter. In the same way it is easier for the engine to find all numbers greater than 50 than to find all odd numbers.

    Also, as far as reporting goes, yes, you do have to build views to give a better representation about the data in an EAV. However, that IS the main point of an EAV! EAV's give you the capability to report on the same data in a myriad of ways without the need for extra columns, changes in schema, or redesigning an application.

    Er.. No. The point of an EAV is to allow the customer/user to define their own data structure. You can then build a cumbersome reporting assembly over top of this to allow for ad-hoc reporting if you desire.

    You can always report on the same data in many ways using a database, that is the point of the database. Data Structure helps you to determine whether the reporting you want actually makes sense with the data you have.

    --

    JimFive

  • From someone that is asked to support add hoc reports generated on an EAV it is a difficult exercise to convince end users why certain types of reports are fine and others aren't. Its a way too technical distinction.

    Most non-technical are either happy with the concept of a linear performance degradation as per flat file scanning. Or expect a logarithmic performance degradation as per effective b-tree indexed relational database.

    Having a EAV limits indexing possibilities.

    SQL2005 pivot requires the column names to be defined in the query which makes it unsuitable for EAV. (any usefully implementation has dynamic Attribute set not determinable until query execution)

    If you know the Attributes of the entity prior to query execution then persist them into the database schema as columns. Sure an index only sought and returned a record in 2005 is efficient. But it is not as trivial returning an additional extra couple of hundred bytes of an additional column on the same table.

  • Great discussion here.

    I'm having a similar dilemma. We are building a social component to our website and the requirements are that customers and or external application would be able to create a profile in our system with user define fields.

    I've created a profile table with the basic info. firstname, lastname, photo etc...

    Now I'm trying to figure out the best way to include the additional user defined fields and is where the EAV model came up. I've read in numerous blogs/sites to not do it unless absolutely necessary.

    From these readings, I have not found that anyone has a good alternative. There is always some major deficiency in it.

    Would it make sense to store the user defined fields in an xml column? or in my case would EAV work efficiently where i would only store a subset of the data in an EAV model?

    The other option I have is making the middle layer do the column manipulation. So if I went with the EAV model, my select would only be ...

    select ProfileID, Entity, value

    from EAV_Profile

    where Profileid = 1

    an not the horrible select with numerous joins and case statements.

    The rest of the formatting and logic would happen in the mid layer. Does this seem like

    Any additional advise would be greatly appreciated.

  • Heh... c'mon.... EAV's and NVP's are cool. You can put the entire database in one table and you only have to worry about maintenance on one index. :-P:-P:-P

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 31 through 41 (of 41 total)

You must be logged in to reply to this topic. Login to reply