Flattening XML problem

Question

Flattening XML problem

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

Sure. Would be happy to review it. Send me what you have whenever you're ready.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 2

Will do and thanks. I appreciate it very much.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 3

Matt Miller (#4) (7/11/2012)
Sure. Would be happy to review it. Send me what you have whenever you're ready.

Thanks again for your offer, Matt. I sent you the code via PM. I didn't want to post it here because 1) I know comparatively very little about XML and didn't want to post something here that someone might make the mistake of using if it's bad code and 2) I hate XML 😛 enough to write an article about how to get around all of the stuff most folks have to go through to write XML shredders to meet their needs. The code I sent you actually interrogates the XML and writes the necessary T-SQL to shred virtually any well-formed XML even if attributes or "data nodes" are missing from "entities" much like you've done on this very thread. It even "right sizes" the data columns it returns in the result set and can easily be copied and tweaked to change the datatypes of the result set. Heh... it even prints instructions for how to easily modify the code so that it works on a whole column of XML from a table. :w00t:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 4

rice.tx (7/9/2012)
Of course Matt gets credit for the hard part.

You said a mouthful there. Matt's example has changed my life. I don't actually hate XML anymore... I only loath it now. 😀

In all seriousness, if the code I sent Matt passes his muster or I can tweak any corrections/enhancements he may have, even "XML Dummies" like me :unsure: will have no problems with shredding XML to a flattened structure.

Yep... I realize that a lot of people (I've been Googling the subject like crazy) have written code to flatten XML but it's either been horribly complex or actually doesn't do anything automatically in a "point'n'shoot" fashion without knowing (literally) anything about the actual structure of the XML.

BTW... do you mind if I use the example XML data you posted for the article (presuming the code passed Matt's muster, of course)? It's a great example because, although it's nice and short, it has a couple of "levels" of "entities and attributes" built into it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 5

Crud. I might not be done with the code, yet. Is the following considered to be a "normal well formed" type of XML?

SET @pXML = '

<items>

<item id="0001" type="donut">

<name>Cake</name>

<ppu>0.55</ppu>

<batters>

<batter id="1001">Regular</batter>

<batter id="1002">Chocolate</batter>

<batter id="1003">Blueberry</batter>

</batters>

<topping id="5001">None</topping>

<topping id="5002">Glazed</topping>

<topping id="5005">Sugar</topping>

<topping id="5006">Sprinkles</topping>

<topping id="5003">Chocolate</topping>

<topping id="5004">Maple</topping>

</item>

</items>

'

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 6

Ah... I see. It's a legal "hybrid" of both attribute and element based XML. Back to the drawing board.

Matt, if you're reading this, forget about the code I PMd you. It won't handle the hybrid stuff.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 7

Sometimes, I get lucky. I've now got code that will automatically create a flattened result set from any of the following XML. I'll send it to Matt for review before I post it because I could certainly be wrong like I was before. I'm not quite back to hating XML but I am loathing it a bit more. 😀

--=============================================================================

-- Build some test data for this code review. This is NOT a part of the

-- solution. The @pXML variable could be a parameter in a stored proc.

--=============================================================================

--===== This will contain the actual XML document to be shredded and flattened.

DECLARE @pXML XML;

--===== This is an "Attribute Based" example.

-- "FirstName1" and "Zip2" attributes and the "Phone" entity for

-- "PhoneNumber6" have intentionally been left out to demonstrate that the

-- XML DOESN'T have to be "perfect" meaning that not every "entity" needs

-- to have all the attributes. It still needs to be "well formed", though.

SET @pXML = '

<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">

<PlayerInfo PlayerID="1" LastName="LastName1">

<AddressList>

<PlayerAddress>

<Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>

<FutureUse />

<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />

<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />

</PlayerAddress>

<PlayerAddress>

<Address AddressType="billing" State="State2" City="City2" />

<FutureUse />

<Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" />

</PlayerAddress>

</AddressList>

</PlayerInfo>

<PlayerInfo PlayerID="2" FirstName="FirstName2" LastName="LastName2">

<AddressList>

<PlayerAddress>

<Address AddressType="primary" City="City3" State="State3" Zip="Zip3"/>

<FutureUse />

<Phone PhoneNumber="PhoneNumber4" PhoneType="Type4" />

<Phone PhoneNumber="PhoneNumber5" PhoneType="Type5" />

</PlayerAddress>

<PlayerAddress>

<Address AddressType="billing" Zip="Zip4" State="State4" City="City4" />

<FutureUse />

</PlayerAddress>

</AddressList>

</PlayerInfo>

</AccountDetailsRsp>'

--===== This is an "Element Based" example. Uncomment it to see the code work

-- with it instead of the "Attribute Based" example.

-- "Skills" have been left out for "Simon" and "Age" has been left out

-- for "Sally" to demonstrate that the XML DOESN'T have to be perfect

-- meaning that not every "entity" needs to have all the attributes.

-- It still needs to be "well formed", though.

--SET @pXML = '

--<Root>

-- <Person>

-- <Name>Simon</Name>

-- <Age>20</Age>

-- </Person>

-- <Person>

-- <Name>Peter</Name>

-- <Age>21</Age>

-- <Skills>

-- <Skill>Cooking</Skill>

-- <Skill>Carpentry</Skill>

-- <Skill>Hunting</Skill>

-- </Skills>

-- </Person>

-- <Person>

-- <Name>Sally</Name>

-- <Skills>

-- <Skill>Cooking</Skill>

-- <Skill>Carpentry</Skill>

-- </Skills>

-- </Person>

--</Root>

--'

;

--===== This is an "Hybrid Based" example. Uncomment it to see the code work

-- with it instead of the other examples.

--SET @pXML = '

--<items>

--<item id="0001" type="donut">

--<name>Cake</name>

--<ppu>0.55</ppu>

--<batters>

--<batter id="1001">Regular</batter>

--<batter id="1002">Chocolate</batter>

--<batter id="1003">Blueberry</batter>

--</batters>

--<topping id="5001">None</topping>

--<topping id="5002">Glazed</topping>

--<topping id="5005">Sugar</topping>

--<topping id="5006">Sprinkles</topping>

--<topping id="5003">Chocolate</topping>

--<topping id="5004">Maple</topping>

--</item>

--</items>

--'

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 8

Had to start over because of the nuances of "hybrid" XML but I believe I may have nailed down the code to automatically flatten the 3 different types (Attribute Only, Element Only, and Hybrid of the two) of NON-TYPED, NON-XSD XML (the type of stuff most people work with, properly formed, of course).

First, it was a bit unfair of me to try to latch onto Matt for specific help. It's also a bit unfair of me to hold the code back until I can get an article out on the subject.

With that in mind, here's the code I ended up with. To see it work, just replace the XML example in the code with your favorite well formed XML and let it rip (or you can use the simple Hybrid example I included). As always, the details are in the comments in the code.

As an overview, the following code build the "Edge" table (an Adjacency List Hierarchy, really) that SQL Server uses behind the scenes, adds some information to it, and builds the SQL to return the flattened XML as a single derived table result set. It also prints the XML so you can tweak it for production use.

BWAAA-HAAAA!!! I now know more about XML than I ever wanted to. Hopefully, I won't ever have to get into the XSD stuff. 😛 Thanks again to Rice.tx for the original problem and to Matt for posting a reasonable solution for me to automate.

--===== Put your XML document between the single quotes below like I did.

SET NOCOUNT ON;

DECLARE @pXML XML;

SET @pXML =

'

<items>

<item id="0001" type="Donut">

<name>Cake</name>

<ppu>0.55</ppu>

<batter id="1001">Regular</batter>

<batter id="1002">Chocolate</batter>

<batter id="1003">Blueberry</batter>

<topping id="5001">None</topping>

<topping id="5002">Glazed</topping>

<topping id="5005">Sugar</topping>

<topping id="5006">Sprinkles</topping>

<topping id="5003">Chocolate</topping>

<topping id="5004">Maple</topping>

</item>

</items>

'

;

--=============================================================================

-- Copy the "Edge" table including some necessary extra columns to TempDB

-- where we can work on it.

--=============================================================================

--===== Create the edge table that we'll need

IF OBJECT_ID('tempdb..#Edge','U') IS NOT NULL

DROP TABLE #Edge

;

--===== This creates the "Edge" table and gives us a "handle" to refer to it.

DECLARE @DocHandle INT;

EXEC sp_xml_preparedocument @DocHandle OUTPUT, @pXML;

--===== This copies the "Edge" table to a new table on-the-fly so we can work

-- on it. We also create a couple of extra columns that we'll need to

-- build the SQL to flatten the XML with.

SELECT ID = ISNULL(ID,0), --ISNULL makes a NOT NULL column.

ParentID,

NodeType, --1="Entity/Table", 2="Attribute/Column", 3="Data Node"

--4= a new node type to handle "Element Based" data.

ParentLocalName = CAST(NULL AS NVARCHAR(4000)),

LocalName,

Text = CAST(Text AS NVARCHAR(MAX)),

HLevel = CAST(NULL AS INT),

Width = CAST(NULL AS INT),

DeDupeName = CAST(NULL AS NVARCHAR(4000)),

Alias = CAST(NULL AS NVARCHAR(10))

INTO #Edge

FROM OPENXML (@DocHandle, '/') --This means "Prepare EVERYTHING" in the XML

;

--===== We no longer need the real "Edge" table. Drop it to conserve resources.

EXEC sp_xml_removedocument @DocHandle;

--===== Create an index to make things run a bit faster.

-- May have to add to this for XML with a lot of nodes/rows.

CREATE UNIQUE CLUSTERED INDEX IX_#Edge_ID_ParentID

ON #Edge (ID,ParentID)

;

--=============================================================================

-- We need to know the Hierarchical Level and parent LocalName of each

-- node to build the SELECT list. While we're at it, we need to know the

-- MAXimum width of the data for each data node so we can "right size" the

-- columns for the result "table".

--=============================================================================

WITH

cteBuildLevel AS

( --=== This is the "anchor" part of the recursive CTE

SELECT anchor.ID,

HLevel = 1,

LocalName,

ParentLocalName

FROM #Edge AS anchor

WHERE ParentID IS NULL

UNION ALL -------------------------------------------------------------------

--==== This is the "recursive" part of the CTE that adds 1 for each level

-- and carries the previous LocalName forward as the ParentLocalName for

-- the next level of rows.

SELECT recur.ID,

HLevel = cte.HLevel+1,

LocalName = recur.LocalName,

ParentLocalName = cte.LocalName

FROM #Edge AS recur

INNER JOIN cteBuildLevel AS cte

ON cte.ID = recur.ParentID

)--==== This does the actual update of the "edge" table from above.

-- It also calculates the column width of the data so we can modify

-- the resulting T-SQL later to make it so not every column has to

-- be a TEXT or NVARCHAR(MAX) column.

UPDATE tgt

SET tgt.HLevel = bp.HLevel,

tgt.ParentLocalName = bp.ParentLocalName,

tgt.Width = DATALENGTH(tgt.Text)

FROM #Edge tgt

INNER JOIN cteBuildLevel bp

ON tgt.ID = bp.ID

;

--=============================================================================

-- We need to create table aliases for both the SELECT list and the FROM

-- clause. Type 1 and Type 4 (not yet created) nodes will appear in the

-- FROM clause. Type 2 and Type 4 nodes will appear in the SELECT list.

--=============================================================================

--===== Mark Type 1 Nodes with an alias

WITH

cteUniqueLocalNames AS

( --=== Get the unique LocalNames for Type 1 nodes

SELECT DISTINCT

HLevel,LocalName

FROM #Edge

WHERE NodeType = 1

),

cteCreateAliases AS

( --=== Number the unique LocalNames in order by their hierarchical level.

SELECT HLevel,

LocalName,

Alias = 'x'

+ RIGHT('0000' + CAST(

ROW_NUMBER() OVER (ORDER BY HLevel, LocalName)

AS NVARCHAR(10)),4)

FROM cteUniqueLocalNames

) --=== Update all Type 1 (Entity) nodes with the Alias from above

UPDATE edge

SET edge.Alias = alias.Alias

FROM #Edge edge

INNER JOIN cteCreateAliases alias

ON edge.LocalName = alias.LocalName

AND edge.HLevel = alias.HLevel

AND edge.NodeType = 1

;

--===== Mark Type 2 Nodes with the Alias of their parent nodes.

-- For "Attribute Based" nodes, Type 2 Nodes will have data associated

-- with it so the node needs to appear in the SELECT list. This marks the

-- Type 2 Nodes with the alias of their parents so that we can use the

-- correct alias for the nodes especially when there are duplicate named

-- nodes.

UPDATE type2node

SET type2node.Alias = parent.Alias

FROM #Edge type2node

INNER JOIN #Edge parent

ON type2node.ParentID = parent.ID

AND type2node.NodeType = 2

;

--=============================================================================

-- "Attribute Based" nodes will have a Type 2 node that needs to appear in

-- the SELECT list. "Element Based" nodes don't have Type 2 nodes. They

-- appear, instead, as Type 1 nodes with a direct attached Type 3 data

-- node. Well mark those nodes as Type 4 (Element Based) to make building

-- the SELECT list a whole lot easier later on.

--=============================================================================

--===== Mark the Type 1 parents of Type 3 as Type 4 "Element" nodes.

UPDATE parent

SET parent.NodeType = 4 --Element Node (formally a Type 1 Node)

FROM #Edge type3node

INNER JOIN #Edge parent

ON type3node.ParentID = parent.ID

AND type3node.NodeType = 3 --Data Node

AND parent.NodeType = 1 --Entity Node

;

--=============================================================================

-- We need to update the "Attribute" and "Element" nodes with the width

-- of their respective data nodes so we can more easily "right size" the

-- columns in the SELECT list.

--=============================================================================

--===== Copy the column Width from the data nodes to the column nodes.

-- 2="Attribute Based", 4="Element Based" are column nodes.

UPDATE tgt

SET tgt.Width = src.Width

FROM #Edge tgt

INNER JOIN #Edge src

ON tgt.LocalName = src.ParentLocalName

WHERE tgt.NodeType IN (2,4) --Attribute and Element nodes respectively

AND src.NodeType = 3 --Data node

;

--=============================================================================

-- Last but not least, the possibility of duplicate column names exists.

-- In order to prevent that type of duplication, we create a new set of

-- column names by prepending the ParentLocal name and appending the Alias

-- to the LocalName. We'll use these names as the column names in the

-- SELECT list if a dupe exists.

--=============================================================================

--===== Create unique column names to use in the SELECT list.

WITH

cteLocalNames AS

( --=== Find unique combinations of LocalName and Alias so we get a count of

-- column names to find duplicates

SELECT DISTINCT LocalName, Alias

FROM #Edge

WHERE NodeType IN (2,4)

),

cteDuplicateNames AS

(

SELECT LocalName

FROM cteLocalNames

GROUP BY LocalName

HAVING COUNT(*) > 1

)

UPDATE edge

SET DeDupeName = edge.ParentLocalName + '_' + edge.LocalName + '_' + edge.Alias

FROM #Edge edge

INNER JOIN cteDuplicateNames dupe

ON edge.LocalName = dupe.LocalName

;

--=============================================================================

-- Create the FROM clause from the Type 1 (Entity) and Type 4 (Element)

-- Nodes. Each OUTER APPLY is a subset of its parent making a list of

-- "cascading OUTER APPLYs" or (to coin an acronym) "cOAs".

--=============================================================================

--===== Create a place to concatenate the cOAs to build the entire FROM clause.

DECLARE @From VARCHAR(MAX);

SELECT @From = '';

--===== Create the FROM clause using cOAs

WITH

cteFromClauseNodes AS

( --=== Gather the node info that will show up in the FROM clause.

SELECT DISTINCT

ParentLocalName, LocalName, Alias

FROM #Edge

WHERE NodeType IN (1,4)

)

SELECT @From = @From

+ ISNULL(' OUTER APPLY ' + ParentLocalName, ' FROM @pXML')

+ '.nodes (' + QUOTENAME(LocalName,'''') + ') '

+ Alias + ' '

+ '(' + QUOTENAME(LocalName) + ')' + CHAR(10)

FROM cteFromClauseNodes

ORDER BY Alias

;

--=============================================================================

-- Create the SELECT LIST using the unique column names from the Type 2

-- (Attribute) and Type 4 (Element) Nodes.

-- Note how the Type 2 and Type 4 nodes need to be handled differently.

-- Also note that this is where we handle the possibility of needing a MAX

-- datatype column.

--=============================================================================

--===== Create a place to concatenate the columns to build the entire

-- SELECT LIST.

DECLARE @Select VARCHAR(MAX);

--===== Create the SELECT LIST.

WITH

cteFromClauseNodes AS

( --=== Gather the unique node info that will show up in the SELECT LIST.

SELECT NodeType,

ParentLocalName,

LocalName,

ColumnName = ISNULL(DeDupeName, LocalName),

Alias,

Width = CASE --= Change to MAX datatype if bigger than NVARCHAR(4000)

WHEN MAX(Width) <= 4000

THEN CAST(MAX(Width) AS VARCHAR(10))

ELSE 'MAX'

END

FROM #Edge

WHERE NodeType IN (2,4)

GROUP BY NodeType, ParentLocalName, LocalName, Alias, DeDupeName

)

SELECT @Select = ISNULL(@Select + ',' + CHAR(10)+ SPACE(8), '')

+ '[' + ColumnName + ']' + ' = '

+ CASE

WHEN NodeType = 2 --Node Type 4 implied for the ELSE by WHERE

THEN Alias + '.[' + ParentLocalName + '].value '

ELSE Alias + '.[' + LocalName + '].value '

END

+ CASE

WHEN NodeType = 2 --Node Type 4 implied for the ELSE by WHERE

THEN '(''@' + LocalName + '''' +

+ ', ''NVARCHAR(' + Width + ')'')'

ELSE '(''(text())[1]'', ''NVARCHAR(' + Width + ')'')'

END

FROM cteFromClauseNodes

WHERE NodeType IN (2,4)

ORDER BY Alias --Simplifies troubleshooting if needed

;

--=============================================================================

-- Put the final SQL together along with some helpful hints for use.

-- Note that we add a ROW_NUMBER column to ensure uniqueness if the

-- de-duplication effort somehow managed to fail.

--=============================================================================

--===== Declare a variable to hold the final SQL and hints

DECLARE @sql NVARCHAR(MAX);

--===== Put the final SQL all together along with some helpful hints for use.

SELECT @sql = '

/*=============================================================================

The following T-SQL is what shredded the XML you gave it.

If desired, please modify the datatypes and/or column names to suit your

needs. You could also add "INTO #SomeTableName" just before the FROM clause to

"auto-magically" build and store the results into a temporary or other table

as a "staging" table on-the-fly to work from.

--Jeff Moden

=============================================================================*/

'

+ ' SELECT RowNum = ROW_NUMBER() ' --Ensure unique rows

+ 'OVER(ORDER BY (SELECT NULL)),' + CHAR(10) + SPACE(8)

+ @Select + CHAR(10)

+ @From

+ ';' + '

--=============================================================================

-- Hint on how to convert the T-SQL to work on a table column.

--=============================================================================

/*

Change the FROM clause in the SQL above in a similar manner to below to shred

an entire column of similar (ie: same nodes/structure) XML.

Change from: (Note that "... etc x" means

"the rest of the line of existing code.")

FROM @pXML.nodes (... etc 1

OUTER APPLY ... etc 2

OUTER APPLY ... etc 3

... etc x

Change to:

FROM dbo.YourTableName x0

CROSS APPLY YourXMLDeDupeName.nodes( ... etc 1

OUTER APPLY ... etc 2

OUTER APPLY ... etc 3

... etc x

*/

'

;

--=============================================================================

-- We're ready to rock. Display a message to the user as to where to find

-- things, print the SQL for possible copy and modification, and execute

-- the SQL to show that it actually works without error.

--=============================================================================

--===== Tell the operator where to find the final SQL.

SELECT [NOTICE!!!] = 'Please see "Messages" tab for the rendered T-SQL'

UNION ALL

SELECT [NOTICE!!!] = 'that created the "table" below from the XML.'

;

--===== Print the SQL before we execute it to make troubleshooting easier.

PRINT @sql;

--===== Show that the SQL works as advertised and so that we can examine the

-- results to tweak the generated SQL for datatypes, etc, later.

EXECUTE sp_executesql @sql,

N'@pXML XML',

@pXML

;

Here's the result from the "Messages" tab...

/*=============================================================================

The following T-SQL is what shredded the XML you gave it.

If desired, please modify the datatypes and/or column names to suit your

needs. You could also add "INTO #SomeTableName" just before the FROM clause to

"auto-magically" build and store the results into a temporary or other table

as a "staging" table on-the-fly to work from.

--Jeff Moden

=============================================================================*/

SELECT RowNum = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),

[item_id_x0002] = x0002.[item].value ('@id', 'NVARCHAR(8)'),

[type] = x0002.[item].value ('@type', 'NVARCHAR(10)'),

[batter_id_x0003] = x0003.[batter].value ('@id', 'NVARCHAR(8)'),

[batter] = x0003.[batter].value ('(text())[1]', 'NVARCHAR(14)'),

[name] = x0004.[name].value ('(text())[1]', 'NVARCHAR(8)'),

[ppu] = x0005.[ppu].value ('(text())[1]', 'NVARCHAR(8)'),

[topping_id_x0006] = x0006.[topping].value ('@id', 'NVARCHAR(8)'),

[topping] = x0006.[topping].value ('(text())[1]', 'NVARCHAR(8)')

FROM @pXML.nodes ('items') x0001 ([items])

OUTER APPLY items.nodes ('item') x0002 ([item])

OUTER APPLY item.nodes ('batter') x0003 ([batter])

OUTER APPLY item.nodes ('name') x0004 ([name])

OUTER APPLY item.nodes ('ppu') x0005 ([ppu])

OUTER APPLY item.nodes ('topping') x0006 ([topping])

;

--=============================================================================

-- Hint on how to convert the T-SQL to work on a table column.

--=============================================================================

/*

Change the FROM clause in the SQL above in a similar manner to below to shred

an entire column of similar (ie: same nodes/structure) XML.

Change from: (Note that "... etc x" means

"the rest of the line of existing code.")

FROM @pXML.nodes (... etc 1

OUTER APPLY ... etc 2

OUTER APPLY ... etc 3

... etc x

Change to:

FROM dbo.YourTableName x0

CROSS APPLY YourXMLDeDupeName.nodes( ... etc 1

OUTER APPLY ... etc 2

OUTER APPLY ... etc 3

... etc x

*/

Last but not least, here's the flattened result set from the XML sample in the code including "right sized" columns.

RowNum item_id_x0002 type batter_id_x0003 batter name ppu topping_id_x0006 topping

-------------------- ------------- ---------- --------------- -------------- -------- -------- ---------------- --------

1 0001 Donut 1001 Regular Cake 0.55 5001 None

2 0001 Donut 1001 Regular Cake 0.55 5002 Glazed

3 0001 Donut 1001 Regular Cake 0.55 5005 Sugar

4 0001 Donut 1001 Regular Cake 0.55 5006 Sprinkle

5 0001 Donut 1001 Regular Cake 0.55 5003 Chocolat

6 0001 Donut 1001 Regular Cake 0.55 5004 Maple

7 0001 Donut 1002 Chocolate Cake 0.55 5001 None

8 0001 Donut 1002 Chocolate Cake 0.55 5002 Glazed

9 0001 Donut 1002 Chocolate Cake 0.55 5005 Sugar

10 0001 Donut 1002 Chocolate Cake 0.55 5006 Sprinkle

11 0001 Donut 1002 Chocolate Cake 0.55 5003 Chocolat

12 0001 Donut 1002 Chocolate Cake 0.55 5004 Maple

13 0001 Donut 1003 Blueberry Cake 0.55 5001 None

14 0001 Donut 1003 Blueberry Cake 0.55 5002 Glazed

15 0001 Donut 1003 Blueberry Cake 0.55 5005 Sugar

16 0001 Donut 1003 Blueberry Cake 0.55 5006 Sprinkle

17 0001 Donut 1003 Blueberry Cake 0.55 5003 Chocolat

18 0001 Donut 1003 Blueberry Cake 0.55 5004 Maple

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

kenambrose SSCrazy Points: 2513 More actions · Answer 9

Jeff your work is amazing. I am learning a _lot_ from it. Thank you!

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 10

kenambrose (10/17/2012)
Jeff your work is amazing. I am learning a _lot_ from it. Thank you!

Thanks for the feedback, Ken. I really appreciate it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 11

Another article I'm looking forward to reading.

Wish I'd seen this thread sooner as I might have been able to help. Not that I claim to know that much about XML but somehow I manage to brute-force it and get it to work.:-P

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 12

dwain.c (10/19/2012)
Another article I'm looking forward to reading.
Wish I'd seen this thread sooner as I might have been able to help. Not that I claim to know that much about XML but somehow I manage to brute-force it and get it to work.:-P

It is interesting. I just ventured across the 2nd iteration just yesterday. (Sorry Jeff - I don't think I ever saw a followup to the initial thread).

Jeff - a few things: it's a definitely good start. It is VERY good at identifying and sniffing out the internal structure: the #edge table is a very interesting artifact, and I am very curious to keep looking through it to see how we could improve upon it. This script also fairly good at flattening certain kinds of XML and can be a good starting point to create a staging table..

There's a big issue to look at however. There seems to be a presumption that the internal structure is very loose and doesn't in fact neatly fit into tables already. This has an interesting side effect: when the XML IS somewhat based on "tables", the flattening process tends to turn the output into something reasonably horrible. There's a distinct triangle join effect: try running the script against this kind of XML (a lot of the XML message I get look a lot like this).

declare @x varchar(max)

select @x='<root>

<locs>

<location name="New York" id="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"/>

<location name="Philadelphia" id="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"/>

<location name="Boston" id="CCCCCCCC-4925-4DB0-88B5-CBC5408CCEDF"/>

</locs>

<Mascots>

<Mascot name="Tiger" id="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Buffalo" id="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Hawk" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Chicken" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Orca" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Corsair" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

</Mascots>

<Leagues>

<League name="Major league" id="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"/>

<League name="Minor League" id="AAAAAAAA-1111-4DB0-88B5-CBC5408CCEDF"/>

<League name="Bush League" id="AAAAAAAA-2222-4DB0-88B5-CBC5408CCEDF"/>

</Mascots>

<roster>

<teams>

<team name="blue team" id="1"

Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

mascotref="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF">

the

<player name="Bob"/>

cow

<player name="Joe">

<age>21</age>

</player>

jumped over

<player name="Gary"/>

the

<player name="Bruce"/>

moon

</team>

<team name="red team" id="2"

Leagueref="AAAAAAAA-222-4DB0-88B5-CBC5408CCEDF"

locationref="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"

mascotref="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF">

<player name="Mary"/>

<player name="Susie">

<age>45</age>

</player>

<player name="Gary"/>

<player name="Shelia"/>

<player name="Bruce"/>

</team>

<team name="grey team" id="3"

Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

mascotref="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF">

<player name="Bob"/>

<player name="Joe">

<age>21</age>

</player>

<player name="Gary"/>

<player name="Bruce"/>

</team>

</teams>

</roster>

</root>';

The fly in the ointment is that you basically are getting some "reference" tables outside of the main structure. By blowing out the internal structure (which isn't half bad if you are trying to relate the entities to each other), it's now going to be VERY hard to determine how many of "each thing" needs to be inserted/imported when you actually go to use the flattened output.

Specifically, you might not pick up on the fact that you actually have 2 players called "Bruce" on several of the teams, since everything ELSE is duplicated as well.Or - there' no good way to know that some of those "reference values" aren't actually referenced elsewhere (with the cartesian product here - it kind of looks like they are).

Now the example also happens to be using an XML construct called id/idref (essentially PK/FK in XML-speak). I just don't see how you could reasonably sniff that out without a schema.

Again - great building blocks on discovering the internals of the file without a preexisting schema. I am just not quite sure they're leveraged correctly at this point (or properly put boundaries on what this actually does).I need to keep stewing on this a bit more, but just wanted to post a little feedback on it.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 13

Jeff seems to have beat me to it, but here's a similar XML parsing procedure I've used extensively.

CREATE PROCEDURE dbo.ParseXML

@strXML AS XML

,@rootnode NVARCHAR(255)

AS

BEGIN

/*

EXEC dbo.ParseXML

'<items>

<item id="0001" type="Donut">

<name>Cake</name>

<ppu>0.55</ppu>

<batter id="1001">Regular</batter>

<batter id="1002">Chocolate</batter>

<batter id="1003">Blueberry</batter>

<topping id="5001">None</topping>

<topping id="5002">Glazed</topping>

<topping id="5005">Sugar</topping>

<topping id="5006">Sprinkles</topping>

<topping id="5003">Chocolate</topping>

<topping id="5004">Maple</topping>

</item>

</items>'

,'items'

EXEC dbo.ParseXML

'<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">

<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">

<AddressList>

<PlayerAddress>

<Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>

<FutureUse>Example Text1</FutureUse>

<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />

<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />

</PlayerAddress>

<PlayerAddress>

<Address AddressType="billing" City="City1" State="State1" Zip="Zip1"/>

<FutureUse>Example Text2</FutureUse>

<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />

<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />

</PlayerAddress>

</AddressList>

</PlayerInfo>

</AccountDetailsRsp>'

,'AccountDetailsRsp'

*/

SET NOCOUNT ON

DECLARE

@strText AS NVARCHAR(MAX)

,@idoc INT

,@id INT

,@parentid INT

IF OBJECT_ID('tempdb..#ChildList') IS NOT NULL

DROP TABLE #ChildList

CREATE TABLE #ChildList (

[RowNum] INT IDENTITY(1,1) NOT NULL,

[parentid] INT NULL,

[id] INT NULL,

PRIMARY KEY (RowNum),

UNIQUE (RowNum))

IF OBJECT_ID('tempdb..#NodeList') IS NOT NULL

DROP TABLE #NodeList

CREATE TABLE #NodeList (

[RowNum] INT NOT NULL,

[id] INT NULL,

[parentid] INT NULL,

[nodetype] INT NULL,

[localname] NVARCHAR(MAX) NULL,

[text] NVARCHAR(MAX) NULL,

PRIMARY KEY (RowNum),

UNIQUE (RowNum))

SET @id = 1

SET @parentid = NULL

/* Get rid of tabs and extra spaces */

SET @strText = CAST(@strXML AS NVARCHAR(MAX))

SET @strText =

REPLACE(

@strText

,' ',' '+CHAR(7))

,CHAR(7)+' ','')

,CHAR(7),'')

,CHAR(9),' ')

SET @strXML = CONVERT(XML,@strText)

/* Validate the XML */

EXEC sp_xml_preparedocument @idoc OUTPUT, @strXML

/* Parse the XML data */

;WITH cte

AS (

SELECT

CAST(p1.parentid AS INT) AS parentid

,CAST(p1.id AS INT) AS id

FROM

OPENXML (@idoc,@rootnode,2) AS p1

UNION ALL

SELECT

CAST(p2.parentid AS INT) AS parentid

,CAST(p2.id AS INT) AS id

FROM

OPENXML (@idoc,@rootnode,2) AS p2

JOIN

cte

ON CAST(cte.id AS INT) = CAST(p2.ParentID AS INT)

WHERE

CAST(p2.parentid AS INT) = @parentid

)

INSERT INTO #ChildList

SELECT *

FROM cte

INSERT INTO #NodeList

SELECT

#ChildList.RowNum

,xmllist.id

,xmllist.parentid

,xmllist.nodetype

,xmllist.localname

,CAST(xmllist.[text] AS NVARCHAR(MAX)) AS [text]

FROM #ChildList

INNER JOIN

OPENXML (@idoc,@rootnode,2) AS xmllist

ON #ChildList.id = xmllist.id

WHERE

#ChildList.RowNum > 0

/* Display the results */

;WITH RecursiveNodes(RowNum,id,parentid,nodepath,localname,[text],nodetype)

AS (

SELECT

#NodeList.RowNum

,#NodeList.id

,#NodeList.parentid

,CAST('/' + REPLACE(REPLACE(REPLACE(REPLACE(#NodeList.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

,#NodeList.localname

,CAST(#NodeList.[text] AS NVARCHAR(MAX)) AS [text]

,0 AS nodetype

FROM #ChildList

INNER JOIN

#NodeList

ON #ChildList.id = #NodeList.id

WHERE

#NodeList.parentid IS NULL

AND #ChildList.RowNum > 0

AND #NodeList.RowNum > 0

UNION ALL

SELECT

n.RowNum

,n.id

,n.parentid

,CAST(r.nodepath + '/'+ REPLACE(REPLACE(REPLACE(REPLACE(n.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

,n.localname

,n.[text]

,n.nodetype

FROM #NodeList AS n

INNER JOIN

RecursiveNodes AS r

ON n.parentid = r.id

WHERE

n.RowNum > 0

AND r.RowNum > 0

AND n.parentid >= 0

)

SELECT

ROW_NUMBER() OVER (ORDER BY Result.RowNum) AS RowNum

,Result.id

,Result.parentid

,Result.nodepath

,Result.nodetype

,Result.nodename

,Result.property

,Result.value

,Result.nodecontents

FROM

(

SELECT

rn.RowNum

,rn.id

,rn.parentid

,rn.nodepath

,(CASE

WHEN rn.nodetype = 0 THEN 'Root'

WHEN rn.nodetype = 1 THEN 'Node'

WHEN rn.nodetype = 2 THEN 'Property'

ELSE 'Data'

END) AS nodetype

,(CASE

WHEN rn.nodetype = 0 THEN rn.localname

WHEN rn.nodetype = 1 THEN rn.localname

WHEN rn.nodetype = 2 THEN (SELECT TOP(1) localname FROM RecursiveNodes WHERE id = rn.parentid)

ELSE NULL

END) AS nodename

,(CASE

WHEN rn.nodetype = 2 THEN rn.localname

ELSE NULL

END) AS property

,(CASE

WHEN rn.nodetype = 2 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)

ELSE NULL

END) AS value

,(CASE

WHEN rn.nodetype = 1 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)

WHEN rn.nodetype = 2 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.parentid and [text] is not null)

ELSE NULL

END) AS nodecontents

FROM

RecursiveNodes AS rn

WHERE

rn.localname <> '#text'

) AS Result

WHERE

Result.id >= 0

AND (Result.id = 0

OR property IS NOT NULL

OR value IS NOT NULL

OR nodecontents IS NOT NULL)

END

Output example:

RowNumidparentidnodepathnodetypenodenamepropertyvaluenodecontents

10NULL/itemsRootitemsNULLNULLNULL

232/items/item/idPropertyitemid0001NULL

342/items/item/typePropertyitemtypeDonut NULL

452/items/item/nameNodenameNULLNULLCake

562/items/item/ppuNodeppuNULLNULL0.55

687/items/item/batter/idPropertybatterid1001Regular

71110/items/item/batter/idPropertybatterid1002Chocolate

81413/items/item/batter/idPropertybatterid1003Blueberry

91716/items/item/topping/idPropertytoppingid5001None

102019/items/item/topping/idPropertytoppingid5002Glazed

112322/items/item/topping/idPropertytoppingid5005Sugar

122625/items/item/topping/idPropertytoppingid5006Sprinkles

132928/items/item/topping/idPropertytoppingid5003Chocolate

143231/items/item/topping/idPropertytoppingid5004Maple

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 14

Matt Miller (#4) (10/19/2012)
dwain.c (10/19/2012)
Another article I'm looking forward to reading.
Wish I'd seen this thread sooner as I might have been able to help. Not that I claim to know that much about XML but somehow I manage to brute-force it and get it to work.:-P
It is interesting. I just ventured across the 2nd iteration just yesterday. (Sorry Jeff - I don't think I ever saw a followup to the initial thread).
Jeff - a few things: it's a definitely good start. It is VERY good at identifying and sniffing out the internal structure: the #edge table is a very interesting artifact, and I am very curious to keep looking through it to see how we could improve upon it. This script also fairly good at flattening certain kinds of XML and can be a good starting point to create a staging table..
There's a big issue to look at however. There seems to be a presumption that the internal structure is very loose and doesn't in fact neatly fit into tables already. This has an interesting side effect: when the XML IS somewhat based on "tables", the flattening process tends to turn the output into something reasonably horrible. There's a distinct triangle join effect: try running the script against this kind of XML (a lot of the XML message I get look a lot like this).
declare @x varchar(max)
select @x='<root>
<locs>
<location name="New York" id="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"/>
<location name="Philadelphia" id="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"/>
<location name="Boston" id="CCCCCCCC-4925-4DB0-88B5-CBC5408CCEDF"/>
</locs>
<Mascots>
<Mascot name="Tiger" id="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF"/>
<Mascot name="Buffalo" id="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF"/>
<Mascot name="Hawk" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>
<Mascot name="Chicken" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>
<Mascot name="Orca" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>
<Mascot name="Corsair" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>
</Mascots>
<Leagues>
<League name="Major league" id="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"/>
<League name="Minor League" id="AAAAAAAA-1111-4DB0-88B5-CBC5408CCEDF"/>
<League name="Bush League" id="AAAAAAAA-2222-4DB0-88B5-CBC5408CCEDF"/>
</Mascots>
<roster>
<teams>
<team name="blue team" id="1"
Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"
locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"
mascotref="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF">
the
<player name="Bob"/>
cow
<player name="Joe">
<age>21</age>
</player>
jumped over
<player name="Gary"/>
the
<player name="Bruce"/>
<player name="Bruce"/>
moon
</team>
<team name="red team" id="2"
Leagueref="AAAAAAAA-222-4DB0-88B5-CBC5408CCEDF"
locationref="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"
mascotref="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF">
<player name="Mary"/>
<player name="Susie">
<age>45</age>
</player>
<player name="Gary"/>
<player name="Shelia"/>
<player name="Bruce"/>
</team>
<team name="grey team" id="3"
Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"
locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"
mascotref="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF">
<player name="Bob"/>
<player name="Joe">
<age>21</age>
</player>
<player name="Gary"/>
<player name="Bruce"/>
<player name="Bruce"/>
</team>
</teams>
</roster>
</root>';
The fly in the ointment is that you basically are getting some "reference" tables outside of the main structure. By blowing out the internal structure (which isn't half bad if you are trying to relate the entities to each other), it's now going to be VERY hard to determine how many of "each thing" needs to be inserted/imported when you actually go to use the flattened output.
Specifically, you might not pick up on the fact that you actually have 2 players called "Bruce" on several of the teams, since everything ELSE is duplicated as well.Or - there' no good way to know that some of those "reference values" aren't actually referenced elsewhere (with the cartesian product here - it kind of looks like they are).
Now the example also happens to be using an XML construct called id/idref (essentially PK/FK in XML-speak). I just don't see how you could reasonably sniff that out without a schema.
Again - great building blocks on discovering the internals of the file without a preexisting schema. I am just not quite sure they're leveraged correctly at this point (or properly put boundaries on what this actually does).I need to keep stewing on this a bit more, but just wanted to post a little feedback on it.

Haven't done a deep dive on everything above, yet, but wouldn't a big part of the problem be the malformed XML that's included? A snippet from the XML you included from above follows...

<Mascots>

<Mascot name="Tiger" id="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Buffalo" id="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Hawk" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Chicken" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Orca" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Corsair" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

</Mascots>

<Leagues>

<League name="Major league" id="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"/>

<League name="Minor League" id="AAAAAAAA-1111-4DB0-88B5-CBC5408CCEDF"/>

<League name="Bush League" id="AAAAAAAA-2222-4DB0-88B5-CBC5408CCEDF"/>

</Mascots>

Notice that the Leagues tag isn't properly closed.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 15

Sorry about that, I was reformatting the XML after I pasted it in, and screwed up the stuff I posted as a result.

the corrected XML is

declare @x varchar(max)

select @x='<root>

<locs>

<location name="New York" id="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"/>

<location name="Philadelphia" id="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"/>

<location name="Boston" id="CCCCCCCC-4925-4DB0-88B5-CBC5408CCEDF"/>

</locs>

<Mascots>

<Mascot name="Tiger" id="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Buffalo" id="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Hawk" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Chicken" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Orca" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

<Mascot name="Corsair" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

</Mascots>

<Leagues>

<League name="Major league" id="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"/>

<League name="Minor League" id="AAAAAAAA-1111-4DB0-88B5-CBC5408CCEDF"/>

<League name="Bush League" id="AAAAAAAA-2222-4DB0-88B5-CBC5408CCEDF"/>

</Leagues>

<roster>

<teams>

<team name="blue team" id="1"

Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

mascotref="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF">

the

<player name="Bob"/>

cow

<player name="Joe">

<age>21</age>

</player>

jumped over

<player name="Gary"/>

the

<player name="Bruce"/>

moon

</team>

<team name="red team" id="2"

Leagueref="AAAAAAAA-222-4DB0-88B5-CBC5408CCEDF"

locationref="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"

mascotref="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF">

<player name="Mary"/>

<player name="Susie">

<age>45</age>

</player>

<player name="Gary"/>

<player name="Shelia"/>

<player name="Bruce"/>

</team>

<team name="grey team" id="3"

Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

mascotref="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF">

<player name="Bob"/>

<player name="Joe">

<age>21</age>

</player>

<player name="Gary"/>

<player name="Bruce"/>

</team>

</teams>

</roster>

</root>';

The closing tag used to be there when I tested (the output is too long to post).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?