EAV tables Vs. SQL_VARIANT Vs XML

  • this is a shunt of a thread from here.

    http://www.sqlservercentral.com/Forums/Topic1384989-391-2.aspx

    The topic was hijacked (by me..) and a discussion about EAV tables ensued.

    To Recap. I suggested that Entity-Attribute-Variable tables are generally a bad idea and Jeff Moden proposed using SQL_Variant but had provided litte detail on how or why. I think this is an important discussion and so started this new thread so that others can find it and contribute.

  • today i first time heard about EAV. can you post any simple example so that i can have a idea about it , i havent got any good example over search

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • I'll start by replying to my own post.

    When making the leap from RDBMS to OOP it is generally a good place to start to think of a row of data in a results set as an instance of an object. Simple example A Sales order.

    You would have two tables Order Header which holds the order number, the customer ID, the delivery date, the total order value and the total weight of the order. You would also have order lines which would be linked to the order header by the order number and would have fields for line number, itemID, qty etc.

    In an OOP environment this would be an order object which would have an orderlines collection as one of its properties. Each order object follows the same structure and each order line object would do the same.

    OOP has a concept of inheritance - you define the common characteristics in an object model called its CLASS - the RDBMS equivalent is the CREATE TABLE () DDL statement, but OOP also has the concept of polymorphism which allows a specific object to have changed or additional properties which do not exist in the standard class.

    Lets say that my order is for customised golf items. All items share some common properties (a SKU, description, stock location, price etc..) and these are stored directly in the standard object (RDBMS equivalent is the Items table). A club can be customised with a different grip or shaft length, a golf ball can have a corporate logo added. These two customisations are completely different. In OOP each of these can have their own class (Club class and Ball class) which inherit their common properties from the Item class and then extend that class with their specific properties and methods. There is no equivent concept in RDBMS so what choices do we have?

    The First is to simply add more columns to the items class and then use them when necessary. This is generally not a good idea as you end up with very wide tables with sparsely populated columns which can't be indexed. Think about how many attribute columns Ebay or Amazon would need.

    The second is to use an EAV table. This holds the Item ID (Entity), the Extended property (Attribute) and its Value. In terms of storage it is very efficient as the table is 100% populated but getting the data out in any meaningful form for general reporting is a bit of a nightmare and even in the front end OOP application it can be tricky.

    So what is the compromise. I would propose that a single field with XML to hold the additional properties is a good route to go down, especially in SQL because XQuery will do most analysis that you would need, but I would be interested to learn more about SQL_Variant and any other suggestions, eith theoretical or actual practical ones

  • @Aaron,

    Great explanation and write-up. Thanks for takinng the time.

    The idea of storing XML is definitely a good one for this type of thing. Since I haven't ventured into the world of front end code sinnce 2002 (I gave it up after by boss insisted on a display field with pale Yellow lettering on a White background for a "Caution" indication and a particular color of Pink on a Fusia background for a "Danger" indication and haven't looked back since), I have to rely a bit on what you say.

    With that thought in mind, I do know how to parse such data on the backend but have found that XML is nearly as much of a pain as what a lot of people claim that an EAV audit table is. Do the tools available on the front end allow you to shred XML without necessarily knowing what the XML contains? Yes, I realize that in order to actually use the data, you need to know what the post-shredding element names are but do you have to modify the shredder or the result from the shredder for the object class to recognize the new element names?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • EAV vs XML is a complex subject. I prefer XML because it can be treated as an atomic item in a single row of a table.

    Here's an example of each:

    USE ProofOfConcept;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'tempdb..#EAV') IS NOT NULL

    DROP TABLE #EAV;

    IF OBJECT_ID(N'tempdb..#XML') IS NOT NULL

    DROP TABLE #XML;

    CREATE TABLE #EAV

    (ItemClassID INT NOT NULL, -- Determines which query to use

    EntityID INT NOT NULL, -- Ties rows together to indicate a specific object, Entity

    AttributeName VARCHAR(100) NOT NULL, -- Attribute portion of data

    CONSTRAINT PK_EAV PRIMARY KEY (EntityID, AttributeName), -- Attribute must be unique per entity

    AttributeValue SQL_VARIANT);

    -- Value can be any data type SQL Variant can hold: Int, Binary, Char, Money, Float, Decimal/Numeric, Date/DateTime/Time, etc.

    CREATE TABLE #XML

    (ItemID INT PRIMARY KEY,

    Attributes XML NOT NULL);

    /*

    Describe two simple item in both tables:

    Pen Attributes:

    Nib Type

    Nib Size

    Ink Color

    Ink Volume

    Price

    Pencil Attibutes:

    Hardness

    Has Eraser

    Price

    */

    INSERT INTO #EAV

    (ItemClassID, EntityID, AttributeName, AttributeValue)

    VALUES (1, 1, 'Nib Type', 'Ball'),

    (1, 1, 'Nib Size', '6mm'),

    (1, 1, 'Ink Color', 'Black'),

    (1, 1, 'Ink Volume', '.27 ml'),

    (1, 1, 'Price', '.5'),

    (2, 2, 'Hardness', '2'),

    (2, 2, 'Has Eraser', '1'),

    (2, 2, 'Price', '.5');

    INSERT INTO #XML

    (ItemID, Attributes)

    VALUES (1, '<row Entity="Pen" NibType="Ball" NibSize="6mm" InkColor="Black" InkVolume=".27 ml" Price="$0.50" />'),

    (2, '<row Entity="Pencil" Hardness="2" HasEraser="1" Price="$0.05" />');

    -- Reconstitute the data:

    /*

    EAV version requires a different query for each class of item.

    Can be generated with dynamic SQL or hand-coded.

    */

    -- EAV Pen

    WITH Entities

    AS (SELECT DISTINCT

    EntityID

    FROM #EAV

    WHERE ItemClassID = 1)

    SELECT EntityID,

    (SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Nib Type') AS NibType,

    (SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Nib Size') AS NibSize,

    (SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Ink Color') AS InkColor,

    (SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Ink Volume') AS InkVolume,

    CAST((SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Price') AS MONEY) AS Price

    FROM Entities;

    -- EAV Pencil

    WITH Entities

    AS (SELECT DISTINCT

    EntityID

    FROM #EAV

    WHERE ItemClassID = 2)

    SELECT EntityID,

    (SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Hardness') AS Hardness,

    CAST((SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Has Eraser') AS BIT) AS HasEraser,

    CAST((SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Price') AS MONEY) AS Price

    FROM Entities;

    --XML Pen

    SELECT Attributes.value('(/row/@NibType)[1]', 'varchar(100)') AS NibType,

    Attributes.value('(/row/@NibSize)[1]', 'varchar(100)') AS NibSize,

    Attributes.value('(/row/@InkColor)[1]', 'varchar(100)') AS InkColor,

    Attributes.value('(/row/@InkVolume)[1]', 'varchar(100)') AS InkVolme,

    Attributes.value('(/row/@Price)[1]', 'money') AS Price

    FROM #XML

    WHERE Attributes.value('(/row/@Entity)[1]', 'varchar(100)') = 'Pen';

    SELECT Attributes.value('(/row/@Hardness)[1]', 'varchar(100)') AS Hardness,

    Attributes.value('(/row/@HasEraser)[1]', 'bit') AS HasEraser,

    Attributes.value('(/row/@Price)[1]', 'money') AS Price

    FROM #XML

    WHERE Attributes.value('(/row/@Entity)[1]', 'varchar(100)') = 'Pencil';

    Personally, I find the XQuery version of the data reconstitution much easier to read. It lists the name of the attribute, the expected datatype, and the name of the resulting column, in a very readable format.

    The EAV version could, of course, be done in different ways. You could use Outer Apply instead of Inline Subs in the Select clause. You just move the difficulty in reading it from the Select clause to the From clause, and it's still complex.

    With either one, it's fairly easy to add new attributes. If we want to know if the ink is erasable in pens, we just add a new row to the EAV or a new attribute to the XML:

    INSERT INTO #EAV

    (ItemClassID,

    EntityID,

    AttributeName,

    AttributeValue)

    VALUES (1, 3, 'Nib Type', 'Ball'),

    (1, 3, 'Nib Size', '6mm'),

    (1, 3, 'Ink Color', 'Black'),

    (1, 3, 'Ink Volume', '.27 ml'),

    (1, 3, 'Erasable', '1'),

    (1, 3, 'Price', '.5');

    INSERT INTO #XML

    (ItemID, Attributes)

    VALUES (3, '<row Entity="Pen" NibType="Ball" NibSize="6mm" InkColor="Black" InkVolume=".27 ml" Price="$0.50" Erasable="1" />');

    Modify the query accordingly:

    -- EAV Pen

    WITH Entities

    AS (SELECT DISTINCT

    EntityID

    FROM #EAV

    WHERE ItemClassID = 1)

    SELECT EntityID,

    (SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Nib Type') AS NibType,

    (SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Nib Size') AS NibSize,

    (SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Ink Color') AS InkColor,

    (SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Ink Volume') AS InkVolume,

    CAST((SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Erasable') AS BIT) AS Erasable,

    CAST((SELECT AttributeValue

    FROM #EAV AS EAV

    WHERE EntityID = Entities.EntityID

    AND EAV.AttributeName = 'Price') AS MONEY) AS Price

    FROM Entities;

    SELECT Attributes.value('(/row/@NibType)[1]', 'varchar(100)') AS NibType,

    Attributes.value('(/row/@NibSize)[1]', 'varchar(100)') AS NibSize,

    Attributes.value('(/row/@InkColor)[1]', 'varchar(100)') AS InkColor,

    Attributes.value('(/row/@InkVolume)[1]', 'varchar(100)') AS InkVolme,

    Attributes.value('(/row/@Erasable)[1]', 'bit') AS InkVolme,

    Attributes.value('(/row/@Price)[1]', 'money') AS Price

    FROM #XML

    WHERE Attributes.value('(/row/@Entity)[1]', 'varchar(100)') = 'Pen';

    Simple enough. But if you use Apply or Join instead of Inline for the EAV version, make sure you use Outer on those, otherwise you'll miss out on rows that don't have the Erasable attribute, like pen #1. No such query modification needed for the XML. If it doesn't find a column in a dataset, it just returns it as NULL.

    Now, how do we enforce data typing in the EAV table? We can't easily do that. If the value "1 dollar" goes into the Price attribute, the query will now throw a conversion error. It takes complex triggers and constraints, or we have to move away from a single AttributeValue column of SQL_Variant datatype and start adding one column per datatype we want to enforce. And then we really need a separate table that defines the value Attributes per entity class, and defines which attribute column is supposed to be used. Add another column for which datatype is in use, FK that to the AttributesRules table, and add a constraint that checks "DataType=1 and ValueString is not null or DataType=2 and AttributeNumeric is not null or DataType=3 and AttributeDateTime is not null" and so on.

    Adding those complexities to the table also adds complexity to the inserts (right data in right column) and to the queries (can't just copy-and-paste subqueries or Outer Apply uses).

    How about the XML table? Easy, set up a schema (XSD) and define it in the XML. Then SQL Server will enforce data typing in your XML columns.

    How about performance once you have more than just a tiny table?

    If you're using SQL 2008 or later, you can add filtered indexes to the EAV table, so that each "column" (Attribute) has an index, which can make the subqueries pretty fast. Complex to maintain, since you need to add a new index every time you add a new Attribute. Since the whole idea of this kind of table is that it's easy to add/modify custom data (attributes), adding complexity to that step actually defeats the purpose of the table. Pre-2008, your stuck with indexed views as your best option, which adds even more maintenance overhead to the whole thing.

    I've seen a relatively simple EAV table, pulling 10-12 attributes in final queries, take minutes to resolve queries as they get up to the 100k rows magnitude. Indexing can help a lot with that, but you still end up with very complex query plans because of all the joining needed (regardless of whether you do the queries as Joins, Applies, or In-Line Queries, the engine still needs to join datasets together to rebuild the data).

    How about performance on the XML version? With a simple XML primary index, and possible a values or path index, you can get sub-second query times on very simple execution plans, even with very large tables.

    I've seen complex XML, properly defined (XSD) and indexed (primary only in these cases), get results in under half a second, on tables with as many as 5-million rows.

    The XML drawabacks? Data size. XML indexes can be pretty large, and the XML itself can easily bloat. SQL Server will do its best to compress the XML in the storage, which makes the tags take less space, but it can still end up being measurably larger than comparable 3NF tables. XML is slower than 3NF tables in most cases as well, though it's generally much, much faster than EAV.

    For all of those reasons, I much prefer XML over EAV.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Jeff Moden (11/16/2012)


    @aaron,

    Great explanation and write-up. Thanks for takinng the time.

    The idea of storing XML is definitely a good one for this type of thing. Since I haven't ventured into the world of front end code sinnce 2002 (I gave it up after by boss insisted on a display field with pale Yellow lettering on a White background for a "Caution" indication and a particular color of Pink on a Fusia background for a "Danger" indication and haven't looked back since), I have to rely a bit on what you say.

    With that thought in mind, I do know how to parse such data on the backend but have found that XML is nearly as much of a pain as what a lot of people claim that an EAV audit table is. Do the tools available on the front end allow you to shred XML without necessarily knowing what the XML contains? Yes, I realize that in order to actually use the data, you need to know what the post-shredding element names are but do you have to modify the shredder or the result from the shredder for the object class to recognize the new element names?

    In VB.NET, you can use a pretty simple bit of code to call an XMLNodeReader, which will parse the node names and values for you, dynamically, so long as the input is well-formed XML. It's pretty much in the "trivial effort with trivial simplicity" class of work for anyone familiar with VB.NET and XML. I can do it, and I'm not even a dev.

    Pretty safe to assume JAVA, C#.NET, et al, all have comparable XML shredding capabilities. It's pretty basic these days.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply