EAV design or Normalized

Question

EAV design or Normalized

S t e f

SSC Eights!

Points: 977
More actions
November 26, 2013 at 8:50 am

#286861

Hi designing a financial budget database, expenditure items require the amount, frequency and comment, there can be over 100 expenditure items, this would create a table with over 300 columns for example
FoodAmount
FoodFrequency
FoodComment
FuelAmount
FuelFrequency
FuelComment
etc...
Would splitting this into 3 tables like the following be normalised or is it just an EAV?
---------------------
ExpenditureItemID
ExpenditureItemName
---------------------
---------------------
FrequencyID
FrequencyName
---------------------
---------------------
ExpenditureID
ExpenditureItemID
FrequencyID
Amount
Comment
---------------------
Thanks for looking, would welcome any other ideas people have.
Thanks.
S t e f

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

TomThomson SSC Guru Points: 104773 More actions · Answer 1

Presumably each of these things turns up more than once, otherwise there are 300+columns in a table which only has one row. But you haven't yet defined what causes there to be more than one row, and your break into several tables only caters for one row in the original table. So first identify what there is in that very wide table that distinguishes one row from another and call that set of columns the primary key for that table.

After that start by looking at what you get from a table which has the

(Expenditure type varchar(some number) not null - this together with the key of teh original wide table forms the key of this table - and of course this Expenditure should be the last column in the primary key for this narrower table,

Frequency varchar(some number) not null defalt('never'),

Amount decimal(some precision and scale) not null default(0),

Comment varchar(some number) not null default ('no comment') )

or something like that.

If it's going to be worth using a tinyint for expenditure type id instead of a varchar for expenditure type and having an extra join on many queries to get the id from the name, then do that - but don't start by doing it without working out whether it's pointless or not. Presumably Frequency is things like "daily", "monthly", "quarterly" and so on? Again, if it looks as if it would be worth adding the extra join to many queries, have an extra table to define a tinyint surrogate key, don't just assume it's the right thing to do without thinking about it. and what about the material that was the original wide table's primary key? - it might be useful to have a surrogate for that, and maybe the defining table for that surrogate will contain other columns from the original wide table that don't fit into the expenditure type/frequency/amount/comment set of columns, so that would be another table (and another join in many queries).

This isn't remotely like an EAV model, and I don't understand why you thought it was. It's an ordinary relational model, potentially with surrogate keys introduced to save storage space and disc IO - but only introduce them if they give a worthwhile saving.

Tom

S t e f SSC Eights! Points: 977 More actions · Answer 2

Hi thanks for the reply.

I think it looks like an EAV as I've effectively take an attribute and changed it to a row:

Before:

Expenditure

BudgetID

FoodAmount

FoodFrequency

FoodComment

FuelAmount

FuelFrequency

FuelComment

etc...

After:

Expenditure

Expenditure

BudgetID

ExpenditureItemID

Amount

Frequency

Comment

ExpenditureItem

ExpenditureItemID

ExpenditureItemName

-------------

Has anyone come across a similar issue for instance a parts table which have common attributes such as weigh, length etc.

Jason Wolfkill SSCrazy Eights Points: 9832 More actions · Answer 3

S t e f (12/12/2013)
Hi thanks for the reply.
I think it looks like an EAV as I've effectively take an attribute and changed it to a row:
Before:
Expenditure
BudgetID
FoodAmount
FoodFrequency
FoodComment
FuelAmount
FuelFrequency
FuelComment
etc...
After:
Expenditure
Expenditure
BudgetID
ExpenditureItemID
Amount
Frequency
Comment
ExpenditureItem
ExpenditureItemID
ExpenditureItemName
-------------
Has anyone come across a similar issue for instance a parts table which have common attributes such as weigh, length etc.

Your proposed solution is a valid relational model called a "star schema". L'Eomot Inverse gave you good advice on how to decide whether this works for you.

An EAV schema can usually be recognized by a main table with three columns defining an entity, an attribute, and a value. In your case, that might look like this:

ExpenditureTable

ExpenditureID ExpenditureAttribute ExpenditureValue

1234 Item Food

1234 Amount 34.56

1234 Frequency Daily

5678 Item Fuel

5678 Amount 137.56

5678 Frequency Weekly

You can see that a single entity, an expenditure, has a row for each attribute. To put together a complete picture of that expenditure, you have to select several rows and probably pivot or cross-tab them into a single row to make the data more easily understood by human beings. Sometimes, this is made even harder by the fact that the EAV table holds codes, which means that lookup tables must be joined to get names. People often think that EAV structures will be great because you can add attributes to an entity at any time without requiring a schema change, but you pay for that flexibility with huge difficulties of querying the data.

Taken to an extreme, you could do this:

ItemTable

ItemID ItemAttribute ItemValue

1234 ItemType Expenditure

1234 Category Food

1234 Amount 34.56

1234 Frequency Daily

5678 ItemType Customer

5678 CustomerName John Smith

5678 TelephoneNumber 555-555-1234

which you might call the "one table to rule them all" approach. In addition to the usual difficulties with EAV schema, you now have to maintain lookup tables that define the valid attributes for each entity and such.

This is different from a star schema, which just substitutes label-type values (like an item name) for a code (usually an integer) that corresponds to a row in a lookup table. This can be useful for a number of reasons. Saving storage space and maintaining standardization of values is a common reason - instead of storing the strings "Purple", "Orange", and "Yellow" in a Color column of an Item table, which takes 6 bytes for every row, you can use tinyints 1, 2 and 3 in a ColorID column, which takes only one byte per row. When it's time to retrieve the data, you can join the Colors table to the Item table on ColorID and include Colors.ColorName in the SELECT list. When the marketing department decides that the color "Purple" should be called "Aubergine" instead, you only have to update one row in the Colors table rather than many rows in the Item table. The requirements of a particular project will determine whether this approach is worth the cost.

The star schema is also the basic data structure used in dimensional modeling, usually to support aggregrate analysis of data.

Jason Wolfkill