July 1, 2020 at 11:39 pm
Hello, not sure if this is the right subform to post this in, but anyhow...
I have a few tables that all has a column with an XML datatype in it, with lots of data this column. Each of these tables has about 10-20 million rows.
The XML data structure is pretty simple but has about 100 nodes, i.e:
<profile>
<entity name="ADDRESS">
<attr name="NAME1">WHITE HOUSE</attr>
<attr name="STREET">1600 Pennsylvania</attr>
<attr name="CITY1">Washington</attr>
<attr name="REGION">DC</attr>
<attr name="POST_CODE1">01234</attr>
.
.
</entity>
</profile>
Anyhow, I would like to take this XML and move this into a new table that is extracted out into proper columns. However, using conventional XQuery methodologies to extract out (querying using XMLColumnName.value), I'm averaging a scant 10 rows a second to write rows to a temp table, using read batches of 1000 records. I would need to average about 200 rows a second at a bare minimum for this conversion to be able to run on time.
Questions -
July 2, 2020 at 6:45 pm
can you provide a little bigger sample of the XML and some DDL would help. Also, can you provide the query you are running? What does the execution plan say? Are you exporting all to varchar(max) columns or are they sized properly?
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
July 2, 2020 at 10:22 pm
The XML is really simple - it would probably be another 100 nodes more than what's on there.
Query:
SELECT
A.SOURCE_PKEY,
XmlColumn.value('(profile/entity[@name="ADDRESS"]/attr[@name="NAME1"])[1]', 'varchar(255)') NAME1,
XmlColumn.value('(profile/entity[@name="ADDRESS"]/attr[@name="STREET"])[1]', 'varchar(255)') STREET,
XmlColumn.value('(profile/entity[@name="ADDRESS"]/attr[@name="CITY1"])[1]', 'varchar(255)') CITY1,
XmlColumn.value('(profile/entity[@name="ADDRESS"]/attr[@name="REGION"])[1]', 'varchar(255)') REGION,
XmlColumn.value('(profile/entity[@name="ADDRESS"]/attr[@name="POST_CODE1"])[1]', 'varchar(255)') POST_CODE1
.
FROM Table
where XmlColumn is the name of the XML column in the table.
The execution plan is fine - all the time involved is via the shredding itself.
I'm about to go in a different direction now - just exporting the data to a flat file and using C# to shred it that way. Very inconvenient.
July 4, 2020 at 6:00 pm
Is there more than one "Address" element per XML or is there always just 1 (regardless of how many attributes that node contains)?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 6, 2020 at 4:52 pm
in this case, there's only one Address element per XML column row.
The issue is that this is unstandard XML, it should have been all defined as all elements in this case, so I've more or less given up and just going to use write something custom in C# which I didn't particularly want to do. Unfortunately, even after a few improvements to the SQL script, I was able to only insert 850K rows for a 24 hour period, running my extract out the columns from XML and load into a table using XQuery, and this table has almost 20 million rows with this XML column, and literally 99% of the execution plan time is on the shredding.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply