SPARSE vs XML vs EAV

Question

Post reply

SPARSE vs XML vs EAV

Toby Harman

SSCarpal Tunnel

Points: 4168
More actions
December 22, 2010 at 5:33 pm

#229088

I need to be able to extend tables in a database and add more columns without changing the table structure. The reasons for this are long and boring, so lets not rehash them here but accept that this is a limitation.
Yes, I would much rather create a normalised design and work with that.
So we have "base tables" like Customer and Product that contain the standard normalised information that you would expect to see (Codes, names, etc)
I need to be able to add a number of attributes to these base tables without changing the table structure and I am considering the 3 variants that spring to mind when doing this:
1. Define a number of SPARSE columns as SQL_VARIANT on the base tables
Pro:
Nearest to a Relational design and simplest to code with
Con:
Not as flexible
Multiple clients may use the same column for different data
2. Define a single XML column and store the data as a set of named attributes in that
Pro:
Almost limitless extensibility
Maintains Relational Design
Con:
Complex to work with
Slow compared to other solutions
3. Use an EAV table as an "Extension" table to store only the attributes that have no home in the base tables.
Pro:
Extensible, and reasonably fast
Con:
Non-relational
I've recently been working with an EAV design db used to store the Name-Value pairs from an XML document which was over 100GB, and that was running fine.
Does anyone out there have any other designs, or comments on the existing options? I know that they are all suboptimal, but that's the difference between bespoke software and generic software!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

I'd go with the XML approach but shred it into relational data behind the scenes (noone needs to know 😉 ).

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Toby Harman SSCarpal Tunnel Points: 4168 More actions · Answer 2

Hmm - But the problem is that I need to keep the db structure unchanged.

If it was about presenting data to the Business Logic layer then no problem but my constraint is in the actual db...

LutzM SSC Guru Points: 107049 More actions · Answer 3

In that case, I'd still go with the XML column. Index it, so it won't be that much of a pain to query the data. And make yourself familiar with efficient XQuery strategies (if that's not a contradiction by itself...:-)).

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Toby Harman SSCarpal Tunnel Points: 4168 More actions · Answer 4

Any particular experience that leads you to this choice?

Of the 3, its my least favourite because

it offers no type security

its slow

Comments / Suggestions?!

LutzM SSC Guru Points: 107049 More actions · Answer 5

I don't like option 1 because you can almost always have a scenario where the requirement exceeds the sparse columns defined (unless you define a large number of columns which I'd consider being a silly approach).

I also vote against option 3 because I don't consider an EAV model as being valid for any kind of a relational table structure.

Regarding your concerns:

How does any of the choices available provide a type security?

What do you mean with "type security"?

Regarding performance I suggested to use an indexed xml column. Behind the scenes SQL Server will shred the data into a table-like structure. Then it comes down to how the data are queried.

Any chance you can provide some sample data so we can see the data structure?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Toby Harman SSCarpal Tunnel Points: 4168 More actions · Answer 6

Type security - Using an SQL_VARIANT column means we get data and type.

I agree with your thoughts on sparse columns - you always need 1 more!

I'll take another look at XML data. What is the performance like on writes if I define an XML index on this?

Happy Holidays!