SPARSE vs XML vs EAV

  • I need to be able to extend tables in a database and add more columns without changing the table structure. The reasons for this are long and boring, so lets not rehash them here but accept that this is a limitation.

    Yes, I would much rather create a normalised design and work with that.

    So we have "base tables" like Customer and Product that contain the standard normalised information that you would expect to see (Codes, names, etc)

    I need to be able to add a number of attributes to these base tables without changing the table structure and I am considering the 3 variants that spring to mind when doing this:

    1. Define a number of SPARSE columns as SQL_VARIANT on the base tables

    Pro:

    Nearest to a Relational design and simplest to code with

    Con:

    Not as flexible

    Multiple clients may use the same column for different data

    2. Define a single XML column and store the data as a set of named attributes in that

    Pro:

    Almost limitless extensibility

    Maintains Relational Design

    Con:

    Complex to work with

    Slow compared to other solutions

    3. Use an EAV table as an "Extension" table to store only the attributes that have no home in the base tables.

    Pro:

    Extensible, and reasonably fast

    Con:

    Non-relational

    I've recently been working with an EAV design db used to store the Name-Value pairs from an XML document which was over 100GB, and that was running fine.

    Does anyone out there have any other designs, or comments on the existing options? I know that they are all suboptimal, but that's the difference between bespoke software and generic software!

  • I'd go with the XML approach but shred it into relational data behind the scenes (noone needs to know 😉 ).



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hmm - But the problem is that I need to keep the db structure unchanged.

    If it was about presenting data to the Business Logic layer then no problem but my constraint is in the actual db...

  • In that case, I'd still go with the XML column. Index it, so it won't be that much of a pain to query the data. And make yourself familiar with efficient XQuery strategies (if that's not a contradiction by itself...:-)).



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Any particular experience that leads you to this choice?

    Of the 3, its my least favourite because

    it offers no type security

    its slow

    Comments / Suggestions?!

  • I don't like option 1 because you can almost always have a scenario where the requirement exceeds the sparse columns defined (unless you define a large number of columns which I'd consider being a silly approach).

    I also vote against option 3 because I don't consider an EAV model as being valid for any kind of a relational table structure.

    Regarding your concerns:

    How does any of the choices available provide a type security?

    What do you mean with "type security"?

    Regarding performance I suggested to use an indexed xml column. Behind the scenes SQL Server will shred the data into a table-like structure. Then it comes down to how the data are queried.

    Any chance you can provide some sample data so we can see the data structure?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Type security - Using an SQL_VARIANT column means we get data and type.

    I agree with your thoughts on sparse columns - you always need 1 more!

    I'll take another look at XML data. What is the performance like on writes if I define an XML index on this?

    Happy Holidays!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply