EAV Design Questions

Question

EAV Design Questions

Garadin

One Orange Chip

Points: 29613
More actions
August 14, 2009 at 12:27 pm

#385189

[Edit 3] I've decided not to use this system due to the excessive setup required for it to actually function. [/Edit 3]
Greetings all,
I am designing a new EAV system to hold information about site assets. I have a good idea of how I want it to work, but I wanted to get some input from others who have done these in the past, and what their recommendations would be for a couple things I’m not sure on.
Here is my basic design concept(SQL 2000 Database):
------------------- TABLES --------------------------------
CREATE TABLE Assets(
RowPointer-- Unique Identifier Primary Key
-- A bunch of fields(unimportant for this thread) and common attributes that almost all assets have (model/serial/manufacturer),
-- and give me enough information that I don't need to pivot the Asset_Info table for general information queries.
)
-- Holds the type of asset like 'Generator','AC','Roof' etc.
CREATE TABLE Asset_Types(
AssetTypeIDint,
[Name] nvarchar(30),
)
-- Holds a list of information types IE: 'Size', 'Square Footage', 'Number of Lights'
CREATE TABLE Asset_Info_Types(
AssetInfoTypeID int,
[Name] nvarchar(30),
ParentIDINT NULL,-- For SizeType (BTU,TON, etc.). Size would be the parent.
Datatype nvarchar(30), -- What should go in this field (decimal,varchar, etc.)
Validationnvarchar(500)NULL,-- Possible extra validation criteria.
DropDownList nvarchar(4000)NULL, -- Dropdown options available (CSL)
IsDDReq TINYINTNULL-- Force data to be in the dropdown? (Yes/No)
)
-- What info types should apply to what assets(AC could have '# of Compressors','Size','SizeType','EER' etc., but not square footage or # of bulbs.)
CREATE TABLE Asset_Info_Template(
AssetTypeIDint,
AssetInfoTypeID int,
IsRequiredtinyint NULL-- Whether or not to require this field.
)
-- EAV Table (Don’t need AssetTypeID because that will be stored with the Asset.)
CREATE TABLE Asset_Info(
Asset_RPuniqueidentifier, -- (UID of the Asset from the asset table)
AssetInfoTypeIDid,
Valuenvarchar(1000)
)
-----------------------------------------------------------
(Oh nice, the prettifier is default on posts now =))
My questions about this:
1.For the actual EAV table, what are the advantages/disadvantages of using a single large varchar field to store all types of data vs. creating separate value fields for each datatype and populating one of them based on the type of data being stored.
2.The ParentID field in the Asset_Info_Types Table is something I’m not at all sure on. Basically this is to handle situations where the attribute is Size, which would hold a decimal number, but you also need to know what unit of measurement that size is in. (HVAC equipment could be Tons or BTU’s). Is there a better way to do something like this?
3.Is there anything else that could be improved with this design?
Thanks in advance,
Seth
[Edit1]
The number of different asset types is limitless(new ones can and will be added as they are requested). We'll probably start with around 25-40 different types. Every one of these Asset Types can track a limitless number of different attributes (Asset Info Types), but in reality there will likely be an average of 6-10 per type, and I can't imagine anything going over 25. There will be a limit of a single entry for each asset of each info type. (Don't want to open that door) ParentID will never go past 1 level.
The data validation will be handled by both the front end app and the stored procedures that insert the data.
[Edit2]
The concept of having more than 1 type of asset is an extension to a system that already exists. Everything in our system is already designed to pull this information from 1 table. To integrate this into our existing system without completely changing everything in the system, I only saw 3 options.
1. Add 100+ fields to the existing asset table, the vast majority of which will be null for any given row.
2. Add the base information only to that table and create a separate system for the extended attributes.
3. Create additional table(s) for these new types, and replace the current table(that everything uses) with a complex unioned view with multiple triggers on it(using the name of the original table to attempt to trick all of our applications into thinking nothing had changed.)
Of the three, 2 sounds better than 1 and 3 is just scary, which is where this system came in.
Seth Phelabaum
Consistency is only a virtue if you're not a screwup. 😉
Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

EAV is...awful. Besides the data quality aspect you've just run into, it also prevents you from using just about any of the other integrity rules as well (e.g. foreign keys), bloats your table (since the column needs to be wide enough to hold ALL possible values), and stores your data in character form, forcing unnecessary data conversions (which by the way are now risky since you can't make sure that the data even matches the type you want to cast to.)

And then there's the typical performance issue. The continuous pivot/unpivot back and forth just tends to send EAV's into a tailspin.

I wince each and every single time I run into one, because without fail, these kinds of issues start popping up. It's purely a matter of when, not whether, they will arise.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 2

Hi Seth...I think the answer is "it all depends"...apologies for the cliche

How many individual assets are you going to have?

Having recorded all the info against each asset, how do you/users want to query the info?

If you can clarify what the questions are going to be then you can start to plan how/what you need to record.

Kind regards gah

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Garadin One Orange Chip Points: 29613 More actions · Answer 3

It always does =).

I updated the first question in the actual post.

As to the second, the information mainly be queried in two different ways.

The first would be just a list of all the assets of a given type with all of their attributes. Pretty basic crosstab. If they wanted a list of all their assets, it would either be just the general common information for all of them or would be separated out visibly on the web page with different attribute headers for each type.

The second type of query(and the reason we're even bothering to track it it at all) is going to be statistical/cost analysis for budgeting/projection/etc. Different types of assets will have different calculations applied which will use different attributes. Not all attributes will be used in these calculations, some are purely for reference or categorization. These all have to be written differently for different asset types regardless of the table structure used, because what you're calculating and what you're using in those calculations differs with each type. We don't yet know all of the calculations that we'll do on each type, but the system needs to support any relevant data field for any given type being used in a calculation.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 4

Updated the original topic again with additional information about some constraints on this project.

If anyone feels that there is a better solution to this 'opportunity' than my EAV design, I'm more than happy to research it. Just give me a name to search on or a link to go to =)

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

I would have recommended using XML structures (which would allow you to maintain SOME structure and access various indexing strategies) - but you're using a SQL Server 2000 example, which leads me to think those options won't do you any good.

That said - your data model seems a bit too fluid. I can understand having some amount of unique characterestics, but the primary data model should be able to natively capture 90+ % of the circumstances with NO customiization. If not - you're working off of a losing battle for specs.

EAV can work only if it "stays small" and is managed appropriately for its real purpose: to help you find the ACTUAL data structure you should have. In other words, it needs to be iteratively analyzed to find elements that should have existed in the "permanent" structure to begin with if you had had full specs to work from. You need a process to look over and move EAV items to their own structures (once appropriately vetted by someone who understands how to structure data).

Meaning - it's important to think of EAV as a temporary holding area while you get to understand what you're actually dealing with. Once you identify an entity you should have had - then it needs to become "permanent" very quickly.

Your end goal should involve getting the data back to a true normalized state; build generic stubs of objects/entities in the common table, and a series of extensions more specific to the sub-types with their custom info. Not having all of the info to begin with may be fine, but just putting something in to "absorb everything else" leaves the data choices to the end-users, leading to inconsistencies, etc...

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

(and lest it sounds like I was reversing myself)...

The main reason I fight them so strongly is that no matter how they start out - sooner or later, you get pulled away from keeping it under control. By the time you get back to it - the development liability created tends to be pretty bad, and you end up with a lot of work to put it back on track.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Garadin One Orange Chip Points: 29613 More actions · Answer 7

Thanks for the input Matt. Unfortunately, I don't believe we'll ever get past the point where this is a temporary answer. My circumstances require this level of fluidity.

I understand that an EAV system is not a great design in general. I completely understand inheriting something that gives you so many headaches that you cringe at the very mention of its name. I have no desire to create that system for myself or any who come after me down the road. Unfortunately, given the amount of fluidity required along with the constraints of making it work in our current environment, this system is the only one I see that makes sense.

A fully normalized structure(as much as I would prefer it) is not a realistic option in this situation. I would however like to make the best I can of this bad situation with the first two questions related to the different data types, as these *are* things that I can change. Next week I'll likely generate both structures and populate them with test data and attempt to write some sample queries/reports off of each structure and see what the performance difference is/what issues I run into.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 8

You might also consider using the sql_variant data type. This is after all what it was intended for.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Garadin One Orange Chip Points: 29613 More actions · Answer 9

RBarryYoung (8/15/2009)
You might also consider using the sql_variant data type. This is after all what it was intended for.

Sounds like a plan. Thanks Barry.

Any comments from anyone on the ParentID concept? It definitely adds another layer of complexity onto this system, but I feel like I need something to link a child attribute (such as a unit of measurement) to its parent, primarily for screen layout (so that they can be put next to each other on the same line).

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 10

Garadin (8/17/2009)
RBarryYoung (8/15/2009)
You might also consider using the sql_variant data type. This is after all what it was intended for.
Sounds like a plan. Thanks Barry.
Any comments from anyone on the ParentID concept? It definitely adds another layer of complexity onto this system, but I feel like I need something to link a child attribute (such as a unit of measurement) to its parent, primarily for screen layout (so that they can be put next to each other on the same line).

Have any more concrete examples? I can't tell if you are trying to capture Object-type inheritance with this or or metric categories or something else.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Garadin One Orange Chip Points: 29613 More actions · Answer 11

In sitting here trying to justify the existence of this field with an example, I think I've decided that it's something that really shouldn't be done. I was primarily doing it for web page generation, but it creates another entire tier of management that I have to account for individually among types that I'd really rather not deal with.

Even in the previous example of tons vs. btu's, it's pointless to store both, as tons = btu's / 12000. Instead of requiring people to enter in a value as one type all the time, I was going to allow them to enter it in as either one, with a dropdown for which type it was.

There are 3 different situations applicable to this.

1. The field is going to be used in a calculation, and the options for the child can be directly converted. In this case I might as well just store it as 1 type to simplify future calculations.

2. The field is going to be used in a calculation, and the options for the child cannot be directly converted. In this case, there's not much point in me allowing them to put in data that can't be used by a formula/converted from what they are supplying.

3. The field is not going to be used in a calculation. In this case it only a reference field for their use and might as well be a varchar.

Instead of giving them this option, I think I'll add a field for help information that can be displayed by the page to tell them what needs to go in a field.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 12

After I:

º Restructured the tables multiple times to avoid dynamic SQL

º Wrote some of the queries

º Planned out a system to dynamically generate forms via the website for this variable data

º Designed holding tables for the website to submit return data to (Both because the information needs to be approved before being entered into the real tables, and because of the way the EAV system requires mutli-line inserts)

º Conceptualized the extremely complex stored procedures required to deal with this in multiple different systems

º Conceptualized the creation of multiple management tools to approve/alter the data in both the holding tables and the real tables

º Realized how long building all of this would actually take

º Considered how hard it would be to explain to the people actually using it

º etc.

I'm scrapping the entire concept and going back to a normalized method of tracking *some* of this data. The rest of it will either not be tracked, have fields added into the base table for it(this will be limited), or be tracked in a way that does not need to support data analysis (IE. they can put in whatever information they want, but we won't do anything with it).

Thanks for all of the input from everyone who posted. As many people had already said(and as I had read elsewhere), the system was simply more trouble than it was worth.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 13

Heh, glad we could help, Garadin! 🙂

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 14

[º Considered how hard it would be to explain to the people actually using it.

Seth, this comment maybe sums it up....no matter how clever you are in designing a system with complex procs and no matter how many lines of optimised code you have written to deliver the 'new wonder' system ...if the users are not able to use it easily and intuitively... then I think you fail.

I think you have reached the correct decision and I wish you well in your project.

regards Graham

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day