Best storage strategy XML typed or varchar(max)

  • Hi All,

    I have a very simple question on the storage of XML type vs. varchar(max).

    On SQL 2005 we are planning to store some xml data but need to save it in as small as disk space foot print as possible. Aside from the fact that XML is stored as a unicode (2 bytes) vs. 1 byte what are some of the factors to consider here.

    I will be storing about 4 million Docs a month with about 14K of data each (in varchar format), so with varchar max it will be about 56GB a month (14000*4000000) where as xml will be twice that amount, 112GB per month. I also need to keep this data for 5 years so storage will be quite large, about 3.5TB vs 7TB. Of course there is other ancillary data as well so the storage requirements will be double, 7TB vs. 14TB. The storage will be on a SAN, HP EVAs,

    Has anyone else gone through this consideration what have you discovered? XML indexing vs Full Text indexing for accessing the data, Fine grained access of XML sections vs. Full Doc updates (and it associated I/O) , .Net access to the data vs. SQL access etc.

    Thanks,

  • Most common answer: "it depends" 😀

    The major question is: What are you going to do with the data (= Why do you have to store it in XML format)? Wouldn't it be much more efficient to shred the data and store it in relational tables? I'm sure you'll save quite a few Gb over time...

    Depending on the business case I even might consider storing the files just as files (maybe storing just the file name and path in SQL Server) and load it temporary "on demand".

    There are several options depending on the requirements.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Dang, How did I know that "Depends" was coming. The main reason to store on SQL is so our clients can do some data mining looking for key phases. I do have other XML columns that are shredded but that will not work in this case as the mining criterion necessitates one big blob of data to search through.

    This is not a data warehouse app but I believe it will eventually morph into one, which at that point will make my life a bit easier. I was just interested in if any one had done some analysis on storage requirements vs. query performance between xml and varchar max types and why they went with their chosen schema model.

  • Did you consider using SSAS? It might be helpful for the "blob" scenario while still allowing you to shred the data.

    The other issue I'd be worried about: when dealing with such a large data volume it most probably will become a performance nightmare if someone's trying to query against a rather large amount of unindexed xml data.

    And maybe there is a chance to come up with a good performing T-SQL solution even for those (complicated?) mining criterion...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • dbatodd (6/14/2010)


    Dang, How did I know that "Depends" was coming. The main reason to store on SQL is so our clients can do some data mining looking for key phases. I do have other XML columns that are shredded but that will not work in this case as the mining criterion necessitates one big blob of data to search through.

    This is not a data warehouse app but I believe it will eventually morph into one, which at that point will make my life a bit easier. I was just interested in if any one had done some analysis on storage requirements vs. query performance between xml and varchar max types and why they went with their chosen schema model.

    If it's purely for pulling some random data, then I would look at XML data type with some indexing, but be aware that the XML data type might reorganize the contents of the file.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply