November 27, 2012 at 10:08 am
I'm experimenting with the xml datatype in sql server 2008. I just want to confirm that, when using typed xml, the document you store in an xml datatype doesn't need to have every node described in the XSD. In other words, as long as the xml document conforms to the XSD, you can have as many or as few nodes as necessary.
A somewhat related question: In typed xml, does sql server store all the tags or does it have a more efficient internal mechanism for storing the document?
Thanks in advance for any help.
Chris
November 27, 2012 at 11:28 am
You don't need all the nodes. XML, by default, doesn't contain nodes with NULL values, so the engine just assumes those are NULL if they're missing.
Untyped XML can actually take more storage than the exact same data as varchar(max).
I recently imported a fairly large XML file into a table, so I tested storage size with it by keeping it raw text (varchar(max)) in one table and XML in another table, then checking the Disk Usage By Table report for the database. The raw text took 680k for the table (allocated), 664 (data), while the XML took 1,128 allocated and 1,088 data.
I don't have time right now to set up an XSD for that table, so can't test typed XML right now. Sounds like you have what you need to test that. Try text, untyped XML (no XSD, but stored in an XML datatype column), and typed XML (XML datatype with a declared XSD). See what sizes you get with each of those.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
November 27, 2012 at 12:33 pm
Thanks for confirming what I already suspected re the nodes question.
Regarding the storage question, a client has an old Access app I've been asked to refactor. The app has a table with almost two hundred columns, most of them sparsely populated if populated at all. I've been weighing whether to keep the table, as ugly as it is, or switch over to storing the data using xml. I know in sql you can flag columns as sparse, but I still object on principle to having such a wide table.
Xml would let me store just the nodes/fields that are actually used, but it seems like there's a lot more overhead to storing and extracting the data. Storing all the tags would take up more room than the data they surround. I like the flexibility xml offers but it's so verbose.
I'll try your suggestion to see if typed xml is stored more efficiently.
Thanks again.
November 27, 2012 at 12:55 pm
In other words, as long as the xml document conforms to the XSD, you can have as many or as few nodes as necessary.
That is the crucial thing, as long as the xml document complies with the schema. You need to bear in mind that if the schema has nodes that are defined as mandatory, via the attribute minOccurs (or omitting it entirely which defaults it to 1) then you will need to ensure that those nodes exist in the XML (and for the number times as the schema defines if minOccurs > 1). Additionally through the attribute maxOccurs you may not exceed the number of repetitions of that node defined which again defaults to 1.
So yes you can have as many or as few nodes but the number of those nodes is controlled by the schema itself. you cannot omit nodes unless the minOccurs attribute is set accordingly.
XML is quite expensive to work with, especially if you need to query the contents of the XML on a regular basis. XML indexes in my opinion are very expensive with regard to storage due to the way they are constructed internally so they need to be carefully considered. have you considered a hybrid approach? If you are going to need to query any of the columns in that wide table on a regular basis I would recommend persisting them as columns still, whilst any other columns that are not going to be queried very often can be put into an XML instance.
I help look after TB's of XML where I work and I do like working with it, but it isn't cheap in SQL server. The XML type is great at storing structured data quite simply for middle-tier apps to consume where the SQL server does little more than a simple select statement on the whole XML blob. If you intend the SQL server to query the contents of the XML or shred it entirely on a regular basis then you will incur high performance costs compared to having a nice set of tables.
November 27, 2012 at 1:08 pm
Thanks for your reply.
You're absolutely correct about required columns (minOccurs).
I've been leaning toward the hybrid approach, viz., storing frequently used/searched fields as columns then using an xml column to persist the sparsely used fields.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply