March 12, 2007 at 6:36 pm
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/smunro/2920.asp
March 26, 2007 at 2:42 am
The use of individual columns (type bit) to indicate the sub-type (Person or Organisation) is really awful. It does not cater for sub-sub-types (e.g. different types of Person). It is not extensible without adding a new column (type bit) for each new sub-type. It requires table constraints to guarantee that each row has exactly one type flag set.
I suggest that the type column be a computed column using an expression involving the schema of the xml (ideally) or the elements of the xml (if no schema). The standard help pages for xml have examples of doing this.
March 26, 2007 at 5:32 am
Simon, interesting article.
Our client has used XML in certain areas of designs, with varying degree's of success. For static data, it can be relatively effective, but, as you mention, as soon as data starts getting updates, performance goes to a dark, smelly place. I would note that you mention that XML usage seems alright for smaller tables/areas - this does assume that the DB design has decent, accurate volumetrics, or else a robust archival strategy that ensure that what is today a small/medium sized table, will still be one in a year/2 years etc. Failing that, the 'acceptable performance with XML' of today, could become tomorrow's performance pain.
In playing around with XML, have you done any work using XML schema's to enforce some structure on the XML? I heard a concerning arguement for XML from a db developer of all people. The jist of it was: 'by using XML to describe the entity, a middle-tier programmer can easily enhance the strucutre without having to battle to get it change in the database schema, then get changes in the data access layer. This will mean we can develop a lot faster, not be constrained by getting difficult to resource DB developers,and be more agile'. Yeah - and also mean that suddenly we find an enterprising middle-tier developer has felt the need to add OrderNumber to the Client XML column, so he doesn't need to use the joins to the Order Table structure - and before we know it, XML bloat and completely 'hidden' data that know one understands because the schema defining it is 'somewhere in the middle tier'.
So, if I have to have XML in the databases, I'd like to try and use schema's to make sure that data in the XML is still properly designed, defined and understood. Your thoughts/experiences?
March 26, 2007 at 6:25 am
By and large I've stayed away from XML in the database for all the reasons that most people here can list. I have found that XML can play a part in feeding data to the system. We've found that, depending on the number of columns and their size, that using XML to send in larger amounts of data works better than multiple stored procedure calls. So, for example, we want to insert into table X. We can write a traditional insert proc and then have the developers call it. For one row, this works flawlessly. It also works great for 3,5,10 & even 30 rows. But after a time, and that does vary, testing is required, the number of calls gets nasty. So instead, we pump the data into XML and then use OPENXML (with all it's memory bloat) to shred the data into a table with a single insert statement and it performs better than the multiple calls from the middle tier or client.
The only thing I'd say about implementing this elsewhere is testing, testing, testing. Verify the results in your environment before you bet the farm on any given solution.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 26, 2007 at 7:07 am
Grant, I can confirm that your experience matches ours. We are now looking at standardizing on using XML whenever we need to have a repeating set of data given to a SP - it has proven more efficient than multiple calls, and better than trying to build an artificial mechanism (comma-delimiting etc.) method of passing a set of data to an SP.
March 26, 2007 at 7:16 am
I'm the developer of a system that creates security documentation (DoD-mandated). These documents are around 40+ pages and the system has a "wizard" for leading the user section-by-section to complete the document. When done, we output the doc in RTF format. Each section has many data fields and the fields are different from section to section. I create a huge XML document to store all these data fields (a child element beneath the root for each section) and then store this XML document in the DB. I can't imagine the number of tables and columns I'd need in a database that would represent these documents. The DB-stored XML approach makes these wizards a snap.
March 26, 2007 at 7:28 am
Interested to know how you handle accessing the data, after you've stored it. Fetching the entire XML = far more IO, memory and netowrk usage than might be required, if only a small section of data, like who completed the document, and when, needs to be extracted. If you are using the XML data, how are you seeing performance? It's been pretty poor in our examples of high-volumes, especially if you need to use nodes and elements as part of your critieria, so I'm interested to know who you've handled that - I'm sure I can learn from it...
March 26, 2007 at 7:40 am
I'm currently using "classic ASP" for this system as it's web-based, so I extract the XML using an ADO recordset, load an MSXML IXMLDOMDocument object using the LoadXML method, then extract the XML element which pertains to the section of the wizard the user is currently on. The XML documents are in the 200K range and we haven't had any performance issues.
March 26, 2007 at 8:02 am
If that meets your user requirements, then that's what matters. How many users, and how many documents, do you have, if I might ask?
Our XML columns, where used and when we had issues especially (we 'normalized out' the temporal data which made a huge difference), was on a tables that had 2 Million rows, and was used relatime by an online retail sites that takes, at peak, around 5000 orders an hour - or just under 1.5 orders per second. On average, the basket (which was the XML) was being accessed about 20 times per orders , so you can see that during peak trade, our XML column was being used, and potentially changed, in the order of 100 000 times in an hour, so the best possible performance is kind of critical 🙂
March 26, 2007 at 8:08 am
I currently have 2579 XML documents stored in the DB (about 44meg) and we have 579 users. Yeah, our system is pretty small in comparison to what you were trying to do!
March 26, 2007 at 10:50 am
Did you experiment with SQL 2005's xml indexes? I wonder if they could be used in the sample last name query of the view...
March 26, 2007 at 11:36 am
Ken,
I accept your criticism of the use of boolean discriminator attributes. There are other mechanisms (such as an IsInClass function using bitwise operators) that would allow for the implementation of richer classification mechanisms – but I thought that it would distract from the intention of the article.
The use of some other field, as opposed to an attribute within the xml fragment is redundant but is intended for some performance gain by not having to query the xml directly in order to retrieve the initial subset that you may be interested in and therefore I would avoid having classification structures contained solely within the xml itself.
Simon
March 26, 2007 at 11:48 am
Wanderer,
When it comes to accessing the database I am fairly anal about how developers access the database and tend to create an abstraction layer for developers that use more simple datatypes – regardless of how the data is being physically stored. I don’t like the approach of developers arbitrarily creating their own database access – for reasons that go beyond datatypes – the best way to ensure that developers do not bring your database to its knees is to be ware of how the database is being accessed – although I understand that this is not possible in all environments. I have experimented with xml schemas and find that SQL server implements them well but I haven’t had a need to put them into production – yet. Perhaps once the schemas are cast in stone it is something that I would be obliged to do. I would think that there would be a performance knock on inserts using schemas but queries and indexes may be faster – some investigations would be required on large data volumes to really understand the performance considerations.
Simon
March 26, 2007 at 12:03 pm
Mike,
I like the idea of indexes in xml datatypes, particularly when combined with schemas. There was a comment in one of Steve’s editorial posts a few weeks ago that said something like “let databases do what they are good at – rows and columns” – this makes sense but a small feature such as indexes on xml raises the question “well what if SQL databases were good at xml too?” I don’t know much about the internals of the database engine but I imagine that if the xml indexes were handled in the same way as ‘normal’ indexes that you could get similar performance – regardless of whether your index is on a field or an xml attribute. So, if the query plan for a field looks the same as a query plan for xml, using index seeks, then surely we could argue that SQL databases are becoming good at handling not just rows and columns, but xml too?
My current implementations are tentative seductions by the xml temptress and although it appears that indexes on xml data do work, I haven’t taken the time yet to perform some real tests.
Simon
March 31, 2007 at 1:12 am
Very interresting article Simon, to be honest i am still in the position you had one year ago : never even think about it 🙂 .
Here is a quick comment on Grant's strategy regarding feeding of amount of data .
But after a time, and that does vary, testing is required, the number of calls gets nasty. So instead, we pump the data into XML and then use OPENXML (with all it's memory bloat) to shred the data into a table with a single insert statement and it performs better than the multiple calls from the middle tier or client.
The assumption is correct : row by row management is much slower than an enssemblist aproach . That's exactly why bcp tool exists: bulk copy data from flat file to database. If you can pass parameter through a sp, i guess you can easily store them on a flat file, load it in database and apply your stored proc on an enssemblist way. I bet you'll come back to what dataserver is best for.
Fred
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply